diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,50 @@ -[2024-06-06 12:54:36,757][00159] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-06-06 12:54:36,761][00159] Rollout worker 0 uses device cpu -[2024-06-06 12:54:36,762][00159] Rollout worker 1 uses device cpu -[2024-06-06 12:54:36,765][00159] Rollout worker 2 uses device cpu -[2024-06-06 12:54:36,768][00159] Rollout worker 3 uses device cpu -[2024-06-06 12:54:36,769][00159] Rollout worker 4 uses device cpu -[2024-06-06 12:54:36,771][00159] Rollout worker 5 uses device cpu -[2024-06-06 12:54:36,772][00159] Rollout worker 6 uses device cpu -[2024-06-06 12:54:36,774][00159] Rollout worker 7 uses device cpu -[2024-06-06 12:54:36,975][00159] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 12:54:36,977][00159] InferenceWorker_p0-w0: min num requests: 2 -[2024-06-06 12:54:37,028][00159] Starting all processes... -[2024-06-06 12:54:37,033][00159] Starting process learner_proc0 -[2024-06-06 12:54:38,650][00159] Starting all processes... -[2024-06-06 12:54:38,662][00159] Starting process inference_proc0-0 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc0 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc1 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc2 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc3 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc4 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc5 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc6 -[2024-06-06 12:54:38,663][00159] Starting process rollout_proc7 -[2024-06-06 12:54:54,777][03285] Worker 4 uses CPU cores [0] -[2024-06-06 12:54:55,344][03267] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 12:54:55,346][03267] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-06-06 12:54:55,418][03267] Num visible devices: 1 -[2024-06-06 12:54:55,452][03267] Starting seed is not provided -[2024-06-06 12:54:55,454][03267] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 12:54:55,455][03267] Initializing actor-critic model on device cuda:0 -[2024-06-06 12:54:55,456][03267] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 12:54:55,459][03267] RunningMeanStd input shape: (1,) -[2024-06-06 12:54:55,551][03267] ConvEncoder: input_channels=3 -[2024-06-06 12:54:55,606][03282] Worker 1 uses CPU cores [1] -[2024-06-06 12:54:55,768][03281] Worker 0 uses CPU cores [0] -[2024-06-06 12:54:55,853][03280] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 12:54:55,857][03280] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-06-06 12:54:55,939][03280] Num visible devices: 1 -[2024-06-06 12:54:56,016][03287] Worker 5 uses CPU cores [1] -[2024-06-06 12:54:56,075][03288] Worker 7 uses CPU cores [1] -[2024-06-06 12:54:56,092][03286] Worker 6 uses CPU cores [0] -[2024-06-06 12:54:56,116][03283] Worker 2 uses CPU cores [0] -[2024-06-06 12:54:56,151][03284] Worker 3 uses CPU cores [1] -[2024-06-06 12:54:56,174][03267] Conv encoder output size: 512 -[2024-06-06 12:54:56,174][03267] Policy head output size: 512 -[2024-06-06 12:54:56,231][03267] Created Actor Critic model with architecture: -[2024-06-06 12:54:56,231][03267] ActorCriticSharedWeights( +[2024-06-06 14:15:06,308][01062] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-06-06 14:15:06,314][01062] Rollout worker 0 uses device cpu +[2024-06-06 14:15:06,317][01062] Rollout worker 1 uses device cpu +[2024-06-06 14:15:06,320][01062] Rollout worker 2 uses device cpu +[2024-06-06 14:15:06,321][01062] Rollout worker 3 uses device cpu +[2024-06-06 14:15:06,323][01062] Rollout worker 4 uses device cpu +[2024-06-06 14:15:06,324][01062] Rollout worker 5 uses device cpu +[2024-06-06 14:15:06,326][01062] Rollout worker 6 uses device cpu +[2024-06-06 14:15:06,329][01062] Rollout worker 7 uses device cpu +[2024-06-06 14:15:06,587][01062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 14:15:06,589][01062] InferenceWorker_p0-w0: min num requests: 2 +[2024-06-06 14:15:06,624][01062] Starting all processes... +[2024-06-06 14:15:06,625][01062] Starting process learner_proc0 +[2024-06-06 14:15:08,172][01062] Starting all processes... +[2024-06-06 14:15:08,184][01062] Starting process inference_proc0-0 +[2024-06-06 14:15:08,184][01062] Starting process rollout_proc0 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc1 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc2 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc3 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc4 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc5 +[2024-06-06 14:15:08,189][01062] Starting process rollout_proc6 +[2024-06-06 14:15:08,190][01062] Starting process rollout_proc7 +[2024-06-06 14:15:23,103][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 14:15:23,110][03191] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-06-06 14:15:23,194][03191] Num visible devices: 1 +[2024-06-06 14:15:23,239][03191] Starting seed is not provided +[2024-06-06 14:15:23,240][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 14:15:23,241][03191] Initializing actor-critic model on device cuda:0 +[2024-06-06 14:15:23,242][03191] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:15:23,245][03191] RunningMeanStd input shape: (1,) +[2024-06-06 14:15:23,326][03191] ConvEncoder: input_channels=3 +[2024-06-06 14:15:23,449][03204] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 14:15:23,451][03204] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-06-06 14:15:23,559][03204] Num visible devices: 1 +[2024-06-06 14:15:23,767][03208] Worker 3 uses CPU cores [1] +[2024-06-06 14:15:23,794][03205] Worker 0 uses CPU cores [0] +[2024-06-06 14:15:23,831][03211] Worker 6 uses CPU cores [0] +[2024-06-06 14:15:23,842][03209] Worker 5 uses CPU cores [1] +[2024-06-06 14:15:23,966][03206] Worker 1 uses CPU cores [1] +[2024-06-06 14:15:23,973][03207] Worker 2 uses CPU cores [0] +[2024-06-06 14:15:24,029][03212] Worker 7 uses CPU cores [1] +[2024-06-06 14:15:24,050][03210] Worker 4 uses CPU cores [0] +[2024-06-06 14:15:24,083][03191] Conv encoder output size: 512 +[2024-06-06 14:15:24,084][03191] Policy head output size: 512 +[2024-06-06 14:15:24,146][03191] Created Actor Critic model with architecture: +[2024-06-06 14:15:24,146][03191] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,2507 +85,1178 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-06-06 12:54:56,615][03267] Using optimizer -[2024-06-06 12:54:56,962][00159] Heartbeat connected on Batcher_0 -[2024-06-06 12:54:56,975][00159] Heartbeat connected on InferenceWorker_p0-w0 -[2024-06-06 12:54:56,985][00159] Heartbeat connected on RolloutWorker_w0 -[2024-06-06 12:54:56,999][00159] Heartbeat connected on RolloutWorker_w1 -[2024-06-06 12:54:57,005][00159] Heartbeat connected on RolloutWorker_w2 -[2024-06-06 12:54:57,010][00159] Heartbeat connected on RolloutWorker_w3 -[2024-06-06 12:54:57,013][00159] Heartbeat connected on RolloutWorker_w4 -[2024-06-06 12:54:57,018][00159] Heartbeat connected on RolloutWorker_w5 -[2024-06-06 12:54:57,023][00159] Heartbeat connected on RolloutWorker_w6 -[2024-06-06 12:54:57,029][00159] Heartbeat connected on RolloutWorker_w7 -[2024-06-06 12:54:57,759][03267] No checkpoints found -[2024-06-06 12:54:57,759][03267] Did not load from checkpoint, starting from scratch! -[2024-06-06 12:54:57,759][03267] Initialized policy 0 weights for model version 0 -[2024-06-06 12:54:57,763][03267] LearnerWorker_p0 finished initialization! -[2024-06-06 12:54:57,763][03267] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 12:54:57,770][00159] Heartbeat connected on LearnerWorker_p0 -[2024-06-06 12:54:58,042][03280] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 12:54:58,044][03280] RunningMeanStd input shape: (1,) -[2024-06-06 12:54:58,059][03280] ConvEncoder: input_channels=3 -[2024-06-06 12:54:58,176][03280] Conv encoder output size: 512 -[2024-06-06 12:54:58,176][03280] Policy head output size: 512 -[2024-06-06 12:54:58,237][00159] Inference worker 0-0 is ready! -[2024-06-06 12:54:58,238][00159] All inference workers are ready! Signal rollout workers to start! -[2024-06-06 12:54:58,574][03286] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,601][03288] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,656][03283] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,692][03285] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,704][03284] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,731][03281] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,757][03287] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:58,766][03282] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 12:54:59,836][03282] Decorrelating experience for 0 frames... -[2024-06-06 12:54:59,836][03288] Decorrelating experience for 0 frames... -[2024-06-06 12:55:00,090][03286] Decorrelating experience for 0 frames... -[2024-06-06 12:55:00,130][03283] Decorrelating experience for 0 frames... -[2024-06-06 12:55:00,173][03281] Decorrelating experience for 0 frames... -[2024-06-06 12:55:01,163][00159] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 12:55:01,632][03287] Decorrelating experience for 0 frames... -[2024-06-06 12:55:01,698][03288] Decorrelating experience for 32 frames... -[2024-06-06 12:55:01,708][03282] Decorrelating experience for 32 frames... -[2024-06-06 12:55:01,928][03286] Decorrelating experience for 32 frames... -[2024-06-06 12:55:02,128][03285] Decorrelating experience for 0 frames... -[2024-06-06 12:55:02,212][03283] Decorrelating experience for 32 frames... -[2024-06-06 12:55:02,249][03284] Decorrelating experience for 0 frames... -[2024-06-06 12:55:04,056][03281] Decorrelating experience for 32 frames... -[2024-06-06 12:55:04,649][03285] Decorrelating experience for 32 frames... -[2024-06-06 12:55:04,745][03284] Decorrelating experience for 32 frames... -[2024-06-06 12:55:04,786][03287] Decorrelating experience for 32 frames... -[2024-06-06 12:55:05,346][03288] Decorrelating experience for 64 frames... -[2024-06-06 12:55:05,820][03283] Decorrelating experience for 64 frames... -[2024-06-06 12:55:06,166][00159] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 12:55:08,116][03286] Decorrelating experience for 64 frames... -[2024-06-06 12:55:08,549][03282] Decorrelating experience for 64 frames... -[2024-06-06 12:55:09,005][03281] Decorrelating experience for 64 frames... -[2024-06-06 12:55:09,384][03288] Decorrelating experience for 96 frames... -[2024-06-06 12:55:09,514][03285] Decorrelating experience for 64 frames... -[2024-06-06 12:55:09,758][03287] Decorrelating experience for 64 frames... -[2024-06-06 12:55:09,769][03284] Decorrelating experience for 64 frames... -[2024-06-06 12:55:09,855][03283] Decorrelating experience for 96 frames... -[2024-06-06 12:55:11,164][00159] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 12:55:12,377][03286] Decorrelating experience for 96 frames... -[2024-06-06 12:55:12,400][03281] Decorrelating experience for 96 frames... -[2024-06-06 12:55:12,554][03282] Decorrelating experience for 96 frames... -[2024-06-06 12:55:12,657][03287] Decorrelating experience for 96 frames... -[2024-06-06 12:55:12,972][03285] Decorrelating experience for 96 frames... -[2024-06-06 12:55:15,544][03284] Decorrelating experience for 96 frames... -[2024-06-06 12:55:16,165][00159] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 24.5. Samples: 368. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 12:55:16,167][00159] Avg episode reward: [(0, '1.631')] -[2024-06-06 12:55:17,542][03267] Signal inference workers to stop experience collection... -[2024-06-06 12:55:17,559][03280] InferenceWorker_p0-w0: stopping experience collection -[2024-06-06 12:55:19,721][03267] Signal inference workers to resume experience collection... -[2024-06-06 12:55:19,723][03280] InferenceWorker_p0-w0: resuming experience collection -[2024-06-06 12:55:21,163][00159] Fps is (10 sec: 819.3, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 8192. Throughput: 0: 119.1. Samples: 2382. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2024-06-06 12:55:21,167][00159] Avg episode reward: [(0, '2.441')] -[2024-06-06 12:55:26,163][00159] Fps is (10 sec: 1638.8, 60 sec: 655.4, 300 sec: 655.4). Total num frames: 16384. Throughput: 0: 186.3. Samples: 4658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 12:55:26,167][00159] Avg episode reward: [(0, '3.222')] -[2024-06-06 12:55:31,163][00159] Fps is (10 sec: 2048.0, 60 sec: 955.7, 300 sec: 955.7). Total num frames: 28672. Throughput: 0: 203.9. Samples: 6118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 12:55:31,169][00159] Avg episode reward: [(0, '3.594')] -[2024-06-06 12:55:35,226][03280] Updated weights for policy 0, policy_version 10 (0.0330) -[2024-06-06 12:55:36,163][00159] Fps is (10 sec: 2457.6, 60 sec: 1170.3, 300 sec: 1170.3). Total num frames: 40960. Throughput: 0: 279.8. Samples: 9794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 12:55:36,169][00159] Avg episode reward: [(0, '3.949')] -[2024-06-06 12:55:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 1331.2, 300 sec: 1331.2). Total num frames: 53248. Throughput: 0: 340.8. Samples: 13630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 12:55:41,165][00159] Avg episode reward: [(0, '4.225')] -[2024-06-06 12:55:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 1456.4, 300 sec: 1456.4). Total num frames: 65536. Throughput: 0: 337.2. Samples: 15176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 12:55:46,172][00159] Avg episode reward: [(0, '4.373')] -[2024-06-06 12:55:50,953][03280] Updated weights for policy 0, policy_version 20 (0.0021) -[2024-06-06 12:55:51,163][00159] Fps is (10 sec: 2867.1, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 81920. Throughput: 0: 436.3. Samples: 19632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:55:51,166][00159] Avg episode reward: [(0, '4.503')] -[2024-06-06 12:55:56,163][00159] Fps is (10 sec: 3276.8, 60 sec: 1787.3, 300 sec: 1787.3). Total num frames: 98304. Throughput: 0: 549.3. Samples: 24720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:55:56,165][00159] Avg episode reward: [(0, '4.521')] -[2024-06-06 12:56:01,163][00159] Fps is (10 sec: 2457.7, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 106496. Throughput: 0: 578.6. Samples: 26402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:56:01,166][00159] Avg episode reward: [(0, '4.462')] -[2024-06-06 12:56:01,186][03267] Saving new best policy, reward=4.462! -[2024-06-06 12:56:06,163][00159] Fps is (10 sec: 1638.4, 60 sec: 1911.6, 300 sec: 1764.4). Total num frames: 114688. Throughput: 0: 592.5. Samples: 29044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:56:06,165][00159] Avg episode reward: [(0, '4.336')] -[2024-06-06 12:56:07,582][03280] Updated weights for policy 0, policy_version 30 (0.0020) -[2024-06-06 12:56:11,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2184.6, 300 sec: 1872.5). Total num frames: 131072. Throughput: 0: 629.0. Samples: 32964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:56:11,166][00159] Avg episode reward: [(0, '4.233')] -[2024-06-06 12:56:16,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2389.4, 300 sec: 1911.5). Total num frames: 143360. Throughput: 0: 643.6. Samples: 35080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:56:16,165][00159] Avg episode reward: [(0, '4.240')] -[2024-06-06 12:56:21,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2321.1, 300 sec: 1843.2). Total num frames: 147456. Throughput: 0: 623.3. Samples: 37844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:56:21,168][00159] Avg episode reward: [(0, '4.243')] -[2024-06-06 12:56:24,981][03280] Updated weights for policy 0, policy_version 40 (0.0024) -[2024-06-06 12:56:26,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2457.6, 300 sec: 1927.5). Total num frames: 163840. Throughput: 0: 622.7. Samples: 41652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:56:26,169][00159] Avg episode reward: [(0, '4.431')] -[2024-06-06 12:56:31,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2525.9, 300 sec: 2002.5). Total num frames: 180224. Throughput: 0: 643.9. Samples: 44150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:56:31,170][00159] Avg episode reward: [(0, '4.426')] -[2024-06-06 12:56:31,180][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000044_180224.pth... -[2024-06-06 12:56:36,164][00159] Fps is (10 sec: 2457.3, 60 sec: 2457.6, 300 sec: 1983.3). Total num frames: 188416. Throughput: 0: 603.6. Samples: 46796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:56:36,167][00159] Avg episode reward: [(0, '4.458')] -[2024-06-06 12:56:41,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2457.6, 300 sec: 2007.0). Total num frames: 200704. Throughput: 0: 570.2. Samples: 50380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:56:41,165][00159] Avg episode reward: [(0, '4.601')] -[2024-06-06 12:56:41,177][03267] Saving new best policy, reward=4.601! -[2024-06-06 12:56:41,983][03280] Updated weights for policy 0, policy_version 50 (0.0044) -[2024-06-06 12:56:46,163][00159] Fps is (10 sec: 2457.9, 60 sec: 2457.6, 300 sec: 2028.5). Total num frames: 212992. Throughput: 0: 586.3. Samples: 52784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:56:46,166][00159] Avg episode reward: [(0, '4.548')] -[2024-06-06 12:56:51,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2389.4, 300 sec: 2048.0). Total num frames: 225280. Throughput: 0: 613.0. Samples: 56630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:56:51,167][00159] Avg episode reward: [(0, '4.600')] -[2024-06-06 12:56:56,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2252.8, 300 sec: 2030.2). Total num frames: 233472. Throughput: 0: 575.6. Samples: 58868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:56:56,170][00159] Avg episode reward: [(0, '4.639')] -[2024-06-06 12:56:56,176][03267] Saving new best policy, reward=4.639! -[2024-06-06 12:56:59,749][03280] Updated weights for policy 0, policy_version 60 (0.0015) -[2024-06-06 12:57:01,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2321.1, 300 sec: 2048.0). Total num frames: 245760. Throughput: 0: 569.2. Samples: 60692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:01,172][00159] Avg episode reward: [(0, '4.406')] -[2024-06-06 12:57:06,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2321.1, 300 sec: 2031.6). Total num frames: 253952. Throughput: 0: 575.3. Samples: 63732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:57:06,165][00159] Avg episode reward: [(0, '4.449')] -[2024-06-06 12:57:11,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2252.8, 300 sec: 2048.0). Total num frames: 266240. Throughput: 0: 565.9. Samples: 67116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:11,166][00159] Avg episode reward: [(0, '4.320')] -[2024-06-06 12:57:16,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2252.8, 300 sec: 2063.2). Total num frames: 278528. Throughput: 0: 544.1. Samples: 68636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-06-06 12:57:16,165][00159] Avg episode reward: [(0, '4.431')] -[2024-06-06 12:57:18,187][03280] Updated weights for policy 0, policy_version 70 (0.0035) -[2024-06-06 12:57:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2389.3, 300 sec: 2077.3). Total num frames: 290816. Throughput: 0: 579.8. Samples: 72888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:21,169][00159] Avg episode reward: [(0, '4.576')] -[2024-06-06 12:57:26,166][00159] Fps is (10 sec: 2456.9, 60 sec: 2320.9, 300 sec: 2090.3). Total num frames: 303104. Throughput: 0: 580.2. Samples: 76492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:57:26,168][00159] Avg episode reward: [(0, '4.642')] -[2024-06-06 12:57:26,176][03267] Saving new best policy, reward=4.642! -[2024-06-06 12:57:31,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2116.3, 300 sec: 2048.0). Total num frames: 307200. Throughput: 0: 549.2. Samples: 77500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 12:57:31,167][00159] Avg episode reward: [(0, '4.497')] -[2024-06-06 12:57:36,163][00159] Fps is (10 sec: 1638.9, 60 sec: 2184.6, 300 sec: 2061.2). Total num frames: 319488. Throughput: 0: 514.2. Samples: 79770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 12:57:36,165][00159] Avg episode reward: [(0, '4.438')] -[2024-06-06 12:57:38,993][03280] Updated weights for policy 0, policy_version 80 (0.0022) -[2024-06-06 12:57:41,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2116.3, 300 sec: 2048.0). Total num frames: 327680. Throughput: 0: 540.0. Samples: 83170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:41,176][00159] Avg episode reward: [(0, '4.247')] -[2024-06-06 12:57:46,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2048.0, 300 sec: 2035.6). Total num frames: 335872. Throughput: 0: 525.4. Samples: 84334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 12:57:46,170][00159] Avg episode reward: [(0, '4.141')] -[2024-06-06 12:57:51,163][00159] Fps is (10 sec: 1638.4, 60 sec: 1979.7, 300 sec: 2023.9). Total num frames: 344064. Throughput: 0: 508.8. Samples: 86630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:51,174][00159] Avg episode reward: [(0, '4.154')] -[2024-06-06 12:57:56,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2116.3, 300 sec: 2059.7). Total num frames: 360448. Throughput: 0: 507.0. Samples: 89932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:57:56,165][00159] Avg episode reward: [(0, '4.162')] -[2024-06-06 12:57:58,964][03280] Updated weights for policy 0, policy_version 90 (0.0034) -[2024-06-06 12:58:01,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2116.3, 300 sec: 2070.8). Total num frames: 372736. Throughput: 0: 522.5. Samples: 92150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:01,171][00159] Avg episode reward: [(0, '4.346')] -[2024-06-06 12:58:06,165][00159] Fps is (10 sec: 2457.1, 60 sec: 2184.5, 300 sec: 2081.2). Total num frames: 385024. Throughput: 0: 523.6. Samples: 96452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:58:06,167][00159] Avg episode reward: [(0, '4.427')] -[2024-06-06 12:58:11,169][00159] Fps is (10 sec: 2046.7, 60 sec: 2116.0, 300 sec: 2069.5). Total num frames: 393216. Throughput: 0: 505.3. Samples: 99232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:11,172][00159] Avg episode reward: [(0, '4.627')] -[2024-06-06 12:58:15,322][03280] Updated weights for policy 0, policy_version 100 (0.0058) -[2024-06-06 12:58:16,163][00159] Fps is (10 sec: 2458.1, 60 sec: 2184.5, 300 sec: 2100.5). Total num frames: 409600. Throughput: 0: 529.7. Samples: 101336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:16,165][00159] Avg episode reward: [(0, '4.607')] -[2024-06-06 12:58:21,163][00159] Fps is (10 sec: 2869.0, 60 sec: 2184.5, 300 sec: 2109.4). Total num frames: 421888. Throughput: 0: 573.2. Samples: 105562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:58:21,166][00159] Avg episode reward: [(0, '4.659')] -[2024-06-06 12:58:21,177][03267] Saving new best policy, reward=4.659! -[2024-06-06 12:58:26,168][00159] Fps is (10 sec: 1637.6, 60 sec: 2047.9, 300 sec: 2077.9). Total num frames: 425984. Throughput: 0: 531.9. Samples: 107108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:26,171][00159] Avg episode reward: [(0, '4.647')] -[2024-06-06 12:58:31,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2184.5, 300 sec: 2087.0). Total num frames: 438272. Throughput: 0: 537.6. Samples: 108524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:31,166][00159] Avg episode reward: [(0, '4.576')] -[2024-06-06 12:58:31,177][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000107_438272.pth... -[2024-06-06 12:58:35,285][03280] Updated weights for policy 0, policy_version 110 (0.0035) -[2024-06-06 12:58:36,163][00159] Fps is (10 sec: 2458.8, 60 sec: 2184.5, 300 sec: 2095.6). Total num frames: 450560. Throughput: 0: 571.1. Samples: 112330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:58:36,165][00159] Avg episode reward: [(0, '4.530')] -[2024-06-06 12:58:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2252.8, 300 sec: 2103.9). Total num frames: 462848. Throughput: 0: 592.2. Samples: 116582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 12:58:41,166][00159] Avg episode reward: [(0, '4.454')] -[2024-06-06 12:58:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2321.1, 300 sec: 2111.7). Total num frames: 475136. Throughput: 0: 577.5. Samples: 118136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:58:46,165][00159] Avg episode reward: [(0, '4.382')] -[2024-06-06 12:58:50,499][03280] Updated weights for policy 0, policy_version 120 (0.0042) -[2024-06-06 12:58:51,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2137.0). Total num frames: 491520. Throughput: 0: 577.2. Samples: 122426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:58:51,166][00159] Avg episode reward: [(0, '4.568')] -[2024-06-06 12:58:56,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2457.6, 300 sec: 2161.3). Total num frames: 507904. Throughput: 0: 628.4. Samples: 127508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:58:56,166][00159] Avg episode reward: [(0, '4.803')] -[2024-06-06 12:58:56,172][03267] Saving new best policy, reward=4.803! -[2024-06-06 12:59:01,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2167.5). Total num frames: 520192. Throughput: 0: 620.9. Samples: 129276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:59:01,167][00159] Avg episode reward: [(0, '4.857')] -[2024-06-06 12:59:01,183][03267] Saving new best policy, reward=4.857! -[2024-06-06 12:59:05,788][03280] Updated weights for policy 0, policy_version 130 (0.0026) -[2024-06-06 12:59:06,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2457.7, 300 sec: 2173.4). Total num frames: 532480. Throughput: 0: 597.7. Samples: 132458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:59:06,165][00159] Avg episode reward: [(0, '4.948')] -[2024-06-06 12:59:06,172][03267] Saving new best policy, reward=4.948! -[2024-06-06 12:59:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2594.4, 300 sec: 2195.5). Total num frames: 548864. Throughput: 0: 673.7. Samples: 137422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 12:59:11,166][00159] Avg episode reward: [(0, '4.860')] -[2024-06-06 12:59:16,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2200.6). Total num frames: 561152. Throughput: 0: 697.6. Samples: 139918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:59:16,166][00159] Avg episode reward: [(0, '4.732')] -[2024-06-06 12:59:20,828][03280] Updated weights for policy 0, policy_version 140 (0.0023) -[2024-06-06 12:59:21,164][00159] Fps is (10 sec: 2457.3, 60 sec: 2525.8, 300 sec: 2205.5). Total num frames: 573440. Throughput: 0: 685.1. Samples: 143160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:59:21,169][00159] Avg episode reward: [(0, '4.764')] -[2024-06-06 12:59:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.9, 300 sec: 2225.8). Total num frames: 589824. Throughput: 0: 686.4. Samples: 147472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 12:59:26,169][00159] Avg episode reward: [(0, '4.853')] -[2024-06-06 12:59:31,163][00159] Fps is (10 sec: 2867.6, 60 sec: 2730.7, 300 sec: 2230.0). Total num frames: 602112. Throughput: 0: 708.0. Samples: 149998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:59:31,172][00159] Avg episode reward: [(0, '4.867')] -[2024-06-06 12:59:35,161][03280] Updated weights for policy 0, policy_version 150 (0.0033) -[2024-06-06 12:59:36,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2234.2). Total num frames: 614400. Throughput: 0: 689.6. Samples: 153458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 12:59:36,171][00159] Avg episode reward: [(0, '4.983')] -[2024-06-06 12:59:36,173][03267] Saving new best policy, reward=4.983! -[2024-06-06 12:59:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2238.2). Total num frames: 626688. Throughput: 0: 645.5. Samples: 156556. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 12:59:41,171][00159] Avg episode reward: [(0, '5.201')] -[2024-06-06 12:59:41,183][03267] Saving new best policy, reward=5.201! -[2024-06-06 12:59:46,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2256.4). Total num frames: 643072. Throughput: 0: 663.2. Samples: 159120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 12:59:46,171][00159] Avg episode reward: [(0, '5.199')] -[2024-06-06 12:59:49,440][03280] Updated weights for policy 0, policy_version 160 (0.0023) -[2024-06-06 12:59:51,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2274.0). Total num frames: 659456. Throughput: 0: 706.9. Samples: 164268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 12:59:51,166][00159] Avg episode reward: [(0, '4.979')] -[2024-06-06 12:59:56,164][00159] Fps is (10 sec: 2457.3, 60 sec: 2662.3, 300 sec: 2263.2). Total num frames: 667648. Throughput: 0: 670.6. Samples: 167598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 12:59:56,173][00159] Avg episode reward: [(0, '4.648')] -[2024-06-06 13:00:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2318.8). Total num frames: 684032. Throughput: 0: 652.0. Samples: 169260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:00:01,170][00159] Avg episode reward: [(0, '4.594')] -[2024-06-06 13:00:04,635][03280] Updated weights for policy 0, policy_version 170 (0.0025) -[2024-06-06 13:00:06,163][00159] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2374.3). Total num frames: 700416. Throughput: 0: 692.7. Samples: 174332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:00:06,170][00159] Avg episode reward: [(0, '4.749')] -[2024-06-06 13:00:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2416.0). Total num frames: 712704. Throughput: 0: 691.4. Samples: 178584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:00:11,167][00159] Avg episode reward: [(0, '4.979')] -[2024-06-06 13:00:16,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2415.9). Total num frames: 720896. Throughput: 0: 667.4. Samples: 180030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:00:16,165][00159] Avg episode reward: [(0, '4.935')] -[2024-06-06 13:00:20,520][03280] Updated weights for policy 0, policy_version 180 (0.0029) -[2024-06-06 13:00:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2443.7). Total num frames: 737280. Throughput: 0: 683.1. Samples: 184198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:00:21,172][00159] Avg episode reward: [(0, '4.819')] -[2024-06-06 13:00:26,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2730.7, 300 sec: 2457.6). Total num frames: 753664. Throughput: 0: 721.4. Samples: 189020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:00:26,166][00159] Avg episode reward: [(0, '4.698')] -[2024-06-06 13:00:31,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2457.6). Total num frames: 765952. Throughput: 0: 705.2. Samples: 190856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:00:31,165][00159] Avg episode reward: [(0, '4.659')] -[2024-06-06 13:00:31,188][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth... -[2024-06-06 13:00:31,338][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000044_180224.pth -[2024-06-06 13:00:35,501][03280] Updated weights for policy 0, policy_version 190 (0.0017) -[2024-06-06 13:00:36,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 2457.6). Total num frames: 778240. Throughput: 0: 661.6. Samples: 194040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:00:36,170][00159] Avg episode reward: [(0, '4.872')] -[2024-06-06 13:00:41,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2471.5). Total num frames: 794624. Throughput: 0: 698.1. Samples: 199012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:00:41,170][00159] Avg episode reward: [(0, '5.067')] -[2024-06-06 13:00:46,164][00159] Fps is (10 sec: 3276.4, 60 sec: 2798.9, 300 sec: 2471.5). Total num frames: 811008. Throughput: 0: 717.8. Samples: 201560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:00:46,170][00159] Avg episode reward: [(0, '5.205')] -[2024-06-06 13:00:46,175][03267] Saving new best policy, reward=5.205! -[2024-06-06 13:00:50,057][03280] Updated weights for policy 0, policy_version 200 (0.0023) -[2024-06-06 13:00:51,166][00159] Fps is (10 sec: 2456.8, 60 sec: 2662.3, 300 sec: 2443.7). Total num frames: 819200. Throughput: 0: 677.2. Samples: 204810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:00:51,169][00159] Avg episode reward: [(0, '5.100')] -[2024-06-06 13:00:56,163][00159] Fps is (10 sec: 2457.9, 60 sec: 2799.0, 300 sec: 2471.5). Total num frames: 835584. Throughput: 0: 678.3. Samples: 209108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:00:56,165][00159] Avg episode reward: [(0, '5.017')] -[2024-06-06 13:01:01,165][00159] Fps is (10 sec: 3277.2, 60 sec: 2798.8, 300 sec: 2499.2). Total num frames: 851968. Throughput: 0: 702.8. Samples: 211656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:01:01,168][00159] Avg episode reward: [(0, '5.309')] -[2024-06-06 13:01:01,185][03267] Saving new best policy, reward=5.309! -[2024-06-06 13:01:03,653][03280] Updated weights for policy 0, policy_version 210 (0.0026) -[2024-06-06 13:01:06,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2730.7, 300 sec: 2485.4). Total num frames: 864256. Throughput: 0: 702.4. Samples: 215806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:01:06,171][00159] Avg episode reward: [(0, '5.162')] -[2024-06-06 13:01:11,163][00159] Fps is (10 sec: 2458.1, 60 sec: 2730.7, 300 sec: 2485.4). Total num frames: 876544. Throughput: 0: 666.7. Samples: 219020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:01:11,169][00159] Avg episode reward: [(0, '5.173')] -[2024-06-06 13:01:16,163][00159] Fps is (10 sec: 2867.3, 60 sec: 2867.2, 300 sec: 2527.0). Total num frames: 892928. Throughput: 0: 682.8. Samples: 221584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:01:16,165][00159] Avg episode reward: [(0, '5.007')] -[2024-06-06 13:01:18,126][03280] Updated weights for policy 0, policy_version 220 (0.0028) -[2024-06-06 13:01:21,168][00159] Fps is (10 sec: 3275.1, 60 sec: 2866.9, 300 sec: 2527.0). Total num frames: 909312. Throughput: 0: 725.8. Samples: 226706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:01:21,173][00159] Avg episode reward: [(0, '5.087')] -[2024-06-06 13:01:26,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2499.3). Total num frames: 917504. Throughput: 0: 687.6. Samples: 229954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:01:26,170][00159] Avg episode reward: [(0, '5.313')] -[2024-06-06 13:01:26,174][03267] Saving new best policy, reward=5.313! -[2024-06-06 13:01:31,163][00159] Fps is (10 sec: 2458.9, 60 sec: 2798.9, 300 sec: 2527.0). Total num frames: 933888. Throughput: 0: 669.2. Samples: 231672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:01:31,169][00159] Avg episode reward: [(0, '5.624')] -[2024-06-06 13:01:31,180][03267] Saving new best policy, reward=5.624! -[2024-06-06 13:01:33,577][03280] Updated weights for policy 0, policy_version 230 (0.0042) -[2024-06-06 13:01:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2540.9). Total num frames: 950272. Throughput: 0: 707.7. Samples: 236656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:01:36,170][00159] Avg episode reward: [(0, '5.774')] -[2024-06-06 13:01:36,176][03267] Saving new best policy, reward=5.774! -[2024-06-06 13:01:41,166][00159] Fps is (10 sec: 2866.2, 60 sec: 2798.8, 300 sec: 2540.9). Total num frames: 962560. Throughput: 0: 705.0. Samples: 240836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:01:41,169][00159] Avg episode reward: [(0, '5.628')] -[2024-06-06 13:01:46,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2527.0). Total num frames: 970752. Throughput: 0: 682.2. Samples: 242352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:01:46,165][00159] Avg episode reward: [(0, '5.484')] -[2024-06-06 13:01:48,814][03280] Updated weights for policy 0, policy_version 240 (0.0022) -[2024-06-06 13:01:51,163][00159] Fps is (10 sec: 2458.4, 60 sec: 2799.1, 300 sec: 2554.8). Total num frames: 987136. Throughput: 0: 683.2. Samples: 246548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:01:51,166][00159] Avg episode reward: [(0, '5.560')] -[2024-06-06 13:01:56,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2568.7). Total num frames: 1003520. Throughput: 0: 725.7. Samples: 251678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:01:56,166][00159] Avg episode reward: [(0, '5.800')] -[2024-06-06 13:01:56,195][03267] Saving new best policy, reward=5.800! -[2024-06-06 13:02:01,166][00159] Fps is (10 sec: 2866.2, 60 sec: 2730.6, 300 sec: 2582.5). Total num frames: 1015808. Throughput: 0: 708.5. Samples: 253468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:02:01,174][00159] Avg episode reward: [(0, '6.108')] -[2024-06-06 13:02:01,186][03267] Saving new best policy, reward=6.108! -[2024-06-06 13:02:05,037][03280] Updated weights for policy 0, policy_version 250 (0.0036) -[2024-06-06 13:02:06,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2568.7). Total num frames: 1024000. Throughput: 0: 654.9. Samples: 256174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:02:06,165][00159] Avg episode reward: [(0, '6.281')] -[2024-06-06 13:02:06,169][03267] Saving new best policy, reward=6.281! -[2024-06-06 13:02:11,163][00159] Fps is (10 sec: 2048.7, 60 sec: 2662.4, 300 sec: 2568.7). Total num frames: 1036288. Throughput: 0: 646.4. Samples: 259044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:02:11,166][00159] Avg episode reward: [(0, '6.440')] -[2024-06-06 13:02:11,178][03267] Saving new best policy, reward=6.440! -[2024-06-06 13:02:16,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2568.7). Total num frames: 1048576. Throughput: 0: 655.5. Samples: 261170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:02:16,167][00159] Avg episode reward: [(0, '6.484')] -[2024-06-06 13:02:16,174][03267] Saving new best policy, reward=6.484! -[2024-06-06 13:02:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2526.1, 300 sec: 2568.7). Total num frames: 1060864. Throughput: 0: 633.0. Samples: 265142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:02:21,165][00159] Avg episode reward: [(0, '6.510')] -[2024-06-06 13:02:21,184][03267] Saving new best policy, reward=6.510! -[2024-06-06 13:02:22,225][03280] Updated weights for policy 0, policy_version 260 (0.0058) -[2024-06-06 13:02:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 1077248. Throughput: 0: 631.6. Samples: 269256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:02:26,171][00159] Avg episode reward: [(0, '6.128')] -[2024-06-06 13:02:31,163][00159] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2638.1). Total num frames: 1097728. Throughput: 0: 659.1. Samples: 272012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:02:31,170][00159] Avg episode reward: [(0, '5.809')] -[2024-06-06 13:02:31,181][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... -[2024-06-06 13:02:31,309][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000107_438272.pth -[2024-06-06 13:02:33,394][03280] Updated weights for policy 0, policy_version 270 (0.0038) -[2024-06-06 13:02:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2652.0). Total num frames: 1110016. Throughput: 0: 687.4. Samples: 277482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:02:36,165][00159] Avg episode reward: [(0, '5.927')] -[2024-06-06 13:02:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.6, 300 sec: 2665.9). Total num frames: 1122304. Throughput: 0: 652.7. Samples: 281048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:02:41,165][00159] Avg episode reward: [(0, '6.138')] -[2024-06-06 13:02:46,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2693.6). Total num frames: 1138688. Throughput: 0: 665.6. Samples: 283418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:02:46,166][00159] Avg episode reward: [(0, '6.177')] -[2024-06-06 13:02:47,430][03280] Updated weights for policy 0, policy_version 280 (0.0018) -[2024-06-06 13:02:51,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2798.9, 300 sec: 2693.6). Total num frames: 1155072. Throughput: 0: 711.1. Samples: 288174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:02:51,171][00159] Avg episode reward: [(0, '6.408')] -[2024-06-06 13:02:56,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2679.8). Total num frames: 1163264. Throughput: 0: 728.4. Samples: 291824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:02:56,168][00159] Avg episode reward: [(0, '6.507')] -[2024-06-06 13:03:01,166][00159] Fps is (10 sec: 1637.9, 60 sec: 2594.1, 300 sec: 2665.9). Total num frames: 1171456. Throughput: 0: 693.7. Samples: 292388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:03:01,168][00159] Avg episode reward: [(0, '6.580')] -[2024-06-06 13:03:01,186][03267] Saving new best policy, reward=6.580! -[2024-06-06 13:03:06,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2679.8). Total num frames: 1183744. Throughput: 0: 669.8. Samples: 295284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:06,170][00159] Avg episode reward: [(0, '6.953')] -[2024-06-06 13:03:06,176][03267] Saving new best policy, reward=6.953! -[2024-06-06 13:03:08,102][03280] Updated weights for policy 0, policy_version 290 (0.0034) -[2024-06-06 13:03:11,163][00159] Fps is (10 sec: 2048.7, 60 sec: 2594.1, 300 sec: 2652.0). Total num frames: 1191936. Throughput: 0: 651.8. Samples: 298586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:03:11,170][00159] Avg episode reward: [(0, '7.150')] -[2024-06-06 13:03:11,181][03267] Saving new best policy, reward=7.150! -[2024-06-06 13:03:16,165][00159] Fps is (10 sec: 1228.5, 60 sec: 2457.5, 300 sec: 2624.2). Total num frames: 1196032. Throughput: 0: 615.3. Samples: 299700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:16,171][00159] Avg episode reward: [(0, '7.039')] -[2024-06-06 13:03:21,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 1212416. Throughput: 0: 557.3. Samples: 302560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:21,170][00159] Avg episode reward: [(0, '7.436')] -[2024-06-06 13:03:21,180][03267] Saving new best policy, reward=7.436! -[2024-06-06 13:03:26,163][00159] Fps is (10 sec: 2867.8, 60 sec: 2457.6, 300 sec: 2665.9). Total num frames: 1224704. Throughput: 0: 571.5. Samples: 306766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:26,169][00159] Avg episode reward: [(0, '7.624')] -[2024-06-06 13:03:26,178][03267] Saving new best policy, reward=7.624! -[2024-06-06 13:03:26,701][03280] Updated weights for policy 0, policy_version 300 (0.0027) -[2024-06-06 13:03:31,165][00159] Fps is (10 sec: 2047.5, 60 sec: 2252.7, 300 sec: 2652.0). Total num frames: 1232896. Throughput: 0: 556.6. Samples: 308464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:31,171][00159] Avg episode reward: [(0, '7.556')] -[2024-06-06 13:03:36,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2184.5, 300 sec: 2638.1). Total num frames: 1241088. Throughput: 0: 488.8. Samples: 310168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:03:36,168][00159] Avg episode reward: [(0, '7.882')] -[2024-06-06 13:03:36,176][03267] Saving new best policy, reward=7.882! -[2024-06-06 13:03:41,163][00159] Fps is (10 sec: 1638.8, 60 sec: 2116.3, 300 sec: 2624.2). Total num frames: 1249280. Throughput: 0: 465.7. Samples: 312782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:03:41,168][00159] Avg episode reward: [(0, '8.001')] -[2024-06-06 13:03:41,179][03267] Saving new best policy, reward=8.001! -[2024-06-06 13:03:46,163][00159] Fps is (10 sec: 1638.4, 60 sec: 1979.7, 300 sec: 2596.4). Total num frames: 1257472. Throughput: 0: 484.9. Samples: 314206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:03:46,169][00159] Avg episode reward: [(0, '8.392')] -[2024-06-06 13:03:46,174][03267] Saving new best policy, reward=8.392! -[2024-06-06 13:03:51,163][00159] Fps is (10 sec: 2047.9, 60 sec: 1911.5, 300 sec: 2582.6). Total num frames: 1269760. Throughput: 0: 482.2. Samples: 316982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:03:51,168][00159] Avg episode reward: [(0, '8.622')] -[2024-06-06 13:03:51,178][03280] Updated weights for policy 0, policy_version 310 (0.0057) -[2024-06-06 13:03:51,179][03267] Saving new best policy, reward=8.622! -[2024-06-06 13:03:56,163][00159] Fps is (10 sec: 2048.0, 60 sec: 1911.5, 300 sec: 2568.7). Total num frames: 1277952. Throughput: 0: 482.2. Samples: 320284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:03:56,168][00159] Avg episode reward: [(0, '9.328')] -[2024-06-06 13:03:56,270][03267] Saving new best policy, reward=9.328! -[2024-06-06 13:04:01,163][00159] Fps is (10 sec: 2048.1, 60 sec: 1979.8, 300 sec: 2568.7). Total num frames: 1290240. Throughput: 0: 502.6. Samples: 322314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:04:01,166][00159] Avg episode reward: [(0, '9.604')] -[2024-06-06 13:04:01,176][03267] Saving new best policy, reward=9.604! -[2024-06-06 13:04:06,163][00159] Fps is (10 sec: 2457.6, 60 sec: 1979.7, 300 sec: 2554.8). Total num frames: 1302528. Throughput: 0: 519.7. Samples: 325948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:04:06,166][00159] Avg episode reward: [(0, '9.371')] -[2024-06-06 13:04:09,827][03280] Updated weights for policy 0, policy_version 320 (0.0030) -[2024-06-06 13:04:11,163][00159] Fps is (10 sec: 2048.0, 60 sec: 1979.7, 300 sec: 2540.9). Total num frames: 1310720. Throughput: 0: 479.1. Samples: 328326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:04:11,168][00159] Avg episode reward: [(0, '9.875')] -[2024-06-06 13:04:11,190][03267] Saving new best policy, reward=9.875! -[2024-06-06 13:04:16,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2048.1, 300 sec: 2527.0). Total num frames: 1318912. Throughput: 0: 453.4. Samples: 328866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:04:16,171][00159] Avg episode reward: [(0, '10.262')] -[2024-06-06 13:04:16,180][03267] Saving new best policy, reward=10.262! -[2024-06-06 13:04:21,163][00159] Fps is (10 sec: 1638.4, 60 sec: 1911.5, 300 sec: 2499.3). Total num frames: 1327104. Throughput: 0: 493.6. Samples: 332382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:04:21,170][00159] Avg episode reward: [(0, '9.875')] -[2024-06-06 13:04:26,164][00159] Fps is (10 sec: 2047.8, 60 sec: 1911.4, 300 sec: 2499.2). Total num frames: 1339392. Throughput: 0: 490.7. Samples: 334866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:04:26,168][00159] Avg episode reward: [(0, '9.995')] -[2024-06-06 13:04:31,163][00159] Fps is (10 sec: 2048.0, 60 sec: 1911.5, 300 sec: 2485.4). Total num frames: 1347584. Throughput: 0: 491.2. Samples: 336312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:04:31,165][00159] Avg episode reward: [(0, '10.575')] -[2024-06-06 13:04:31,176][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_1347584.pth... -[2024-06-06 13:04:31,311][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth -[2024-06-06 13:04:31,331][03267] Saving new best policy, reward=10.575! -[2024-06-06 13:04:31,647][03280] Updated weights for policy 0, policy_version 330 (0.0027) -[2024-06-06 13:04:36,163][00159] Fps is (10 sec: 2048.2, 60 sec: 1979.7, 300 sec: 2485.4). Total num frames: 1359872. Throughput: 0: 504.9. Samples: 339702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:04:36,167][00159] Avg episode reward: [(0, '10.546')] -[2024-06-06 13:04:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2048.0, 300 sec: 2471.5). Total num frames: 1372160. Throughput: 0: 526.5. Samples: 343976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:04:41,167][00159] Avg episode reward: [(0, '10.597')] -[2024-06-06 13:04:41,190][03267] Saving new best policy, reward=10.597! -[2024-06-06 13:04:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2116.3, 300 sec: 2457.6). Total num frames: 1384448. Throughput: 0: 518.7. Samples: 345654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:04:46,169][00159] Avg episode reward: [(0, '10.712')] -[2024-06-06 13:04:46,177][03267] Saving new best policy, reward=10.712! -[2024-06-06 13:04:49,960][03280] Updated weights for policy 0, policy_version 340 (0.0023) -[2024-06-06 13:04:51,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2048.0, 300 sec: 2457.6). Total num frames: 1392640. Throughput: 0: 500.5. Samples: 348472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:04:51,165][00159] Avg episode reward: [(0, '10.867')] -[2024-06-06 13:04:51,179][03267] Saving new best policy, reward=10.867! -[2024-06-06 13:04:56,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2116.3, 300 sec: 2443.7). Total num frames: 1404928. Throughput: 0: 517.1. Samples: 351594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:04:56,165][00159] Avg episode reward: [(0, '10.668')] -[2024-06-06 13:05:01,170][00159] Fps is (10 sec: 2865.1, 60 sec: 2184.3, 300 sec: 2443.7). Total num frames: 1421312. Throughput: 0: 557.6. Samples: 353960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:05:01,173][00159] Avg episode reward: [(0, '11.072')] -[2024-06-06 13:05:01,187][03267] Saving new best policy, reward=11.072! -[2024-06-06 13:05:06,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2116.3, 300 sec: 2429.8). Total num frames: 1429504. Throughput: 0: 555.2. Samples: 357368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:05:06,169][00159] Avg episode reward: [(0, '11.663')] -[2024-06-06 13:05:06,171][03267] Saving new best policy, reward=11.663! -[2024-06-06 13:05:07,369][03280] Updated weights for policy 0, policy_version 350 (0.0040) -[2024-06-06 13:05:11,165][00159] Fps is (10 sec: 2458.8, 60 sec: 2252.7, 300 sec: 2457.6). Total num frames: 1445888. Throughput: 0: 587.9. Samples: 361322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:05:11,168][00159] Avg episode reward: [(0, '12.061')] -[2024-06-06 13:05:11,179][03267] Saving new best policy, reward=12.061! -[2024-06-06 13:05:16,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2321.1, 300 sec: 2443.7). Total num frames: 1458176. Throughput: 0: 602.9. Samples: 363444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:05:16,165][00159] Avg episode reward: [(0, '11.393')] -[2024-06-06 13:05:21,164][00159] Fps is (10 sec: 2048.3, 60 sec: 2321.0, 300 sec: 2415.9). Total num frames: 1466368. Throughput: 0: 602.6. Samples: 366820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:05:21,172][00159] Avg episode reward: [(0, '11.221')] -[2024-06-06 13:05:25,519][03280] Updated weights for policy 0, policy_version 360 (0.0034) -[2024-06-06 13:05:26,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2252.8, 300 sec: 2402.1). Total num frames: 1474560. Throughput: 0: 554.5. Samples: 368928. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:05:26,170][00159] Avg episode reward: [(0, '10.681')] -[2024-06-06 13:05:31,163][00159] Fps is (10 sec: 2457.9, 60 sec: 2389.3, 300 sec: 2415.9). Total num frames: 1490944. Throughput: 0: 561.0. Samples: 370900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:05:31,166][00159] Avg episode reward: [(0, '10.672')] -[2024-06-06 13:05:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2415.9). Total num frames: 1507328. Throughput: 0: 607.1. Samples: 375790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:05:36,168][00159] Avg episode reward: [(0, '10.504')] -[2024-06-06 13:05:40,029][03280] Updated weights for policy 0, policy_version 370 (0.0027) -[2024-06-06 13:05:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2389.3, 300 sec: 2388.2). Total num frames: 1515520. Throughput: 0: 616.1. Samples: 379320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:05:41,166][00159] Avg episode reward: [(0, '10.329')] -[2024-06-06 13:05:46,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2389.3, 300 sec: 2402.1). Total num frames: 1527808. Throughput: 0: 596.9. Samples: 380818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:05:46,166][00159] Avg episode reward: [(0, '10.893')] -[2024-06-06 13:05:51,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2402.1). Total num frames: 1544192. Throughput: 0: 629.6. Samples: 385702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:05:51,168][00159] Avg episode reward: [(0, '10.824')] -[2024-06-06 13:05:53,682][03280] Updated weights for policy 0, policy_version 380 (0.0015) -[2024-06-06 13:05:56,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2594.1, 300 sec: 2402.1). Total num frames: 1560576. Throughput: 0: 647.4. Samples: 390452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:05:56,167][00159] Avg episode reward: [(0, '11.064')] -[2024-06-06 13:06:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2457.9, 300 sec: 2388.2). Total num frames: 1568768. Throughput: 0: 633.5. Samples: 391952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:06:01,173][00159] Avg episode reward: [(0, '10.839')] -[2024-06-06 13:06:06,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 2402.1). Total num frames: 1585152. Throughput: 0: 629.6. Samples: 395150. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:06:06,166][00159] Avg episode reward: [(0, '10.967')] -[2024-06-06 13:06:09,751][03280] Updated weights for policy 0, policy_version 390 (0.0037) -[2024-06-06 13:06:11,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2594.2, 300 sec: 2402.1). Total num frames: 1601536. Throughput: 0: 695.9. Samples: 400244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:06:11,170][00159] Avg episode reward: [(0, '10.182')] -[2024-06-06 13:06:16,163][00159] Fps is (10 sec: 2457.5, 60 sec: 2525.9, 300 sec: 2374.3). Total num frames: 1609728. Throughput: 0: 697.5. Samples: 402288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:06:16,168][00159] Avg episode reward: [(0, '10.012')] -[2024-06-06 13:06:21,163][00159] Fps is (10 sec: 1638.4, 60 sec: 2525.9, 300 sec: 2374.3). Total num frames: 1617920. Throughput: 0: 642.0. Samples: 404678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:06:21,169][00159] Avg episode reward: [(0, '9.667')] -[2024-06-06 13:06:26,163][00159] Fps is (10 sec: 2048.1, 60 sec: 2594.1, 300 sec: 2360.4). Total num frames: 1630208. Throughput: 0: 647.4. Samples: 408454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:06:26,169][00159] Avg episode reward: [(0, '9.739')] -[2024-06-06 13:06:29,465][03280] Updated weights for policy 0, policy_version 400 (0.0045) -[2024-06-06 13:06:31,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2457.6, 300 sec: 2332.6). Total num frames: 1638400. Throughput: 0: 642.0. Samples: 409710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:06:31,165][00159] Avg episode reward: [(0, '9.847')] -[2024-06-06 13:06:31,183][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth... -[2024-06-06 13:06:31,412][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth -[2024-06-06 13:06:36,164][00159] Fps is (10 sec: 1638.1, 60 sec: 2321.0, 300 sec: 2318.8). Total num frames: 1646592. Throughput: 0: 587.7. Samples: 412150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:06:36,167][00159] Avg episode reward: [(0, '10.164')] -[2024-06-06 13:06:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2346.5). Total num frames: 1662976. Throughput: 0: 559.9. Samples: 415646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:06:41,166][00159] Avg episode reward: [(0, '10.655')] -[2024-06-06 13:06:45,127][03280] Updated weights for policy 0, policy_version 410 (0.0030) -[2024-06-06 13:06:46,163][00159] Fps is (10 sec: 3687.0, 60 sec: 2594.1, 300 sec: 2360.4). Total num frames: 1683456. Throughput: 0: 587.9. Samples: 418408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:06:46,165][00159] Avg episode reward: [(0, '11.305')] -[2024-06-06 13:06:51,163][00159] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2360.4). Total num frames: 1699840. Throughput: 0: 656.8. Samples: 424704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:06:51,167][00159] Avg episode reward: [(0, '11.382')] -[2024-06-06 13:06:56,165][00159] Fps is (10 sec: 2866.5, 60 sec: 2525.8, 300 sec: 2360.4). Total num frames: 1712128. Throughput: 0: 632.9. Samples: 428724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:06:56,168][00159] Avg episode reward: [(0, '11.037')] -[2024-06-06 13:06:57,811][03280] Updated weights for policy 0, policy_version 420 (0.0019) -[2024-06-06 13:07:01,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2402.1). Total num frames: 1732608. Throughput: 0: 641.5. Samples: 431156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:07:01,166][00159] Avg episode reward: [(0, '11.399')] -[2024-06-06 13:07:06,163][00159] Fps is (10 sec: 4096.9, 60 sec: 2798.9, 300 sec: 2429.8). Total num frames: 1753088. Throughput: 0: 721.6. Samples: 437148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:07:06,165][00159] Avg episode reward: [(0, '10.888')] -[2024-06-06 13:07:08,249][03280] Updated weights for policy 0, policy_version 430 (0.0018) -[2024-06-06 13:07:11,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2429.8). Total num frames: 1765376. Throughput: 0: 750.1. Samples: 442208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:07:11,165][00159] Avg episode reward: [(0, '10.881')] -[2024-06-06 13:07:16,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2443.7). Total num frames: 1781760. Throughput: 0: 762.8. Samples: 444036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:07:16,166][00159] Avg episode reward: [(0, '11.304')] -[2024-06-06 13:07:20,650][03280] Updated weights for policy 0, policy_version 440 (0.0021) -[2024-06-06 13:07:21,163][00159] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2457.6). Total num frames: 1802240. Throughput: 0: 834.0. Samples: 449680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:07:21,174][00159] Avg episode reward: [(0, '11.809')] -[2024-06-06 13:07:26,164][00159] Fps is (10 sec: 4095.5, 60 sec: 3208.5, 300 sec: 2457.6). Total num frames: 1822720. Throughput: 0: 894.6. Samples: 455904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:07:26,169][00159] Avg episode reward: [(0, '13.023')] -[2024-06-06 13:07:26,173][03267] Saving new best policy, reward=13.023! -[2024-06-06 13:07:31,163][00159] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2457.6). Total num frames: 1835008. Throughput: 0: 874.7. Samples: 457768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:07:31,170][00159] Avg episode reward: [(0, '14.047')] -[2024-06-06 13:07:31,186][03267] Saving new best policy, reward=14.047! -[2024-06-06 13:07:33,528][03280] Updated weights for policy 0, policy_version 450 (0.0043) -[2024-06-06 13:07:36,163][00159] Fps is (10 sec: 2867.6, 60 sec: 3413.4, 300 sec: 2471.5). Total num frames: 1851392. Throughput: 0: 831.8. Samples: 462134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:07:36,169][00159] Avg episode reward: [(0, '13.858')] -[2024-06-06 13:07:41,163][00159] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 2485.4). Total num frames: 1871872. Throughput: 0: 879.4. Samples: 468294. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:07:41,169][00159] Avg episode reward: [(0, '13.255')] -[2024-06-06 13:07:43,538][03280] Updated weights for policy 0, policy_version 460 (0.0015) -[2024-06-06 13:07:46,163][00159] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2485.4). Total num frames: 1888256. Throughput: 0: 891.6. Samples: 471278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:07:46,165][00159] Avg episode reward: [(0, '12.997')] -[2024-06-06 13:07:51,163][00159] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2499.3). Total num frames: 1900544. Throughput: 0: 845.6. Samples: 475202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:07:51,171][00159] Avg episode reward: [(0, '13.045')] -[2024-06-06 13:07:56,079][03280] Updated weights for policy 0, policy_version 470 (0.0020) -[2024-06-06 13:07:56,163][00159] Fps is (10 sec: 3686.3, 60 sec: 3550.0, 300 sec: 2554.8). Total num frames: 1925120. Throughput: 0: 860.5. Samples: 480932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:07:56,170][00159] Avg episode reward: [(0, '13.546')] -[2024-06-06 13:08:01,165][00159] Fps is (10 sec: 4504.6, 60 sec: 3549.7, 300 sec: 2582.5). Total num frames: 1945600. Throughput: 0: 891.4. Samples: 484150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:08:01,169][00159] Avg episode reward: [(0, '13.882')] -[2024-06-06 13:08:06,163][00159] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2596.4). Total num frames: 1957888. Throughput: 0: 873.5. Samples: 488986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:08:06,168][00159] Avg episode reward: [(0, '14.603')] -[2024-06-06 13:08:06,171][03267] Saving new best policy, reward=14.603! -[2024-06-06 13:08:08,965][03280] Updated weights for policy 0, policy_version 480 (0.0034) -[2024-06-06 13:08:11,163][00159] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 2638.1). Total num frames: 1974272. Throughput: 0: 835.3. Samples: 493492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:08:11,166][00159] Avg episode reward: [(0, '15.196')] -[2024-06-06 13:08:11,174][03267] Saving new best policy, reward=15.196! -[2024-06-06 13:08:16,163][00159] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2652.0). Total num frames: 1994752. Throughput: 0: 861.2. Samples: 496520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:08:16,165][00159] Avg episode reward: [(0, '15.516')] -[2024-06-06 13:08:16,169][03267] Saving new best policy, reward=15.516! -[2024-06-06 13:08:18,897][03280] Updated weights for policy 0, policy_version 490 (0.0023) -[2024-06-06 13:08:21,163][00159] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 2665.9). Total num frames: 2011136. Throughput: 0: 896.5. Samples: 502478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:08:21,169][00159] Avg episode reward: [(0, '14.578')] -[2024-06-06 13:08:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2679.8). Total num frames: 2023424. Throughput: 0: 835.9. Samples: 505908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:08:26,170][00159] Avg episode reward: [(0, '13.930')] -[2024-06-06 13:08:31,163][00159] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 2707.5). Total num frames: 2039808. Throughput: 0: 823.4. Samples: 508330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:08:31,166][00159] Avg episode reward: [(0, '14.923')] -[2024-06-06 13:08:31,177][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000498_2039808.pth... -[2024-06-06 13:08:31,309][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_1347584.pth -[2024-06-06 13:08:32,897][03280] Updated weights for policy 0, policy_version 500 (0.0048) -[2024-06-06 13:08:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2735.3). Total num frames: 2056192. Throughput: 0: 855.0. Samples: 513676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:08:36,169][00159] Avg episode reward: [(0, '15.707')] -[2024-06-06 13:08:36,174][03267] Saving new best policy, reward=15.707! -[2024-06-06 13:08:41,163][00159] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2749.2). Total num frames: 2068480. Throughput: 0: 815.8. Samples: 517642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:08:41,165][00159] Avg episode reward: [(0, '16.863')] -[2024-06-06 13:08:41,181][03267] Saving new best policy, reward=16.863! -[2024-06-06 13:08:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2749.2). Total num frames: 2080768. Throughput: 0: 780.1. Samples: 519254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:08:46,169][00159] Avg episode reward: [(0, '16.868')] -[2024-06-06 13:08:46,172][03267] Saving new best policy, reward=16.868! -[2024-06-06 13:08:48,131][03280] Updated weights for policy 0, policy_version 510 (0.0036) -[2024-06-06 13:08:51,163][00159] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2776.9). Total num frames: 2097152. Throughput: 0: 774.9. Samples: 523858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:08:51,175][00159] Avg episode reward: [(0, '17.348')] -[2024-06-06 13:08:51,195][03267] Saving new best policy, reward=17.348! -[2024-06-06 13:08:56,168][00159] Fps is (10 sec: 3275.0, 60 sec: 3140.0, 300 sec: 2790.8). Total num frames: 2113536. Throughput: 0: 788.5. Samples: 528978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:08:56,171][00159] Avg episode reward: [(0, '17.300')] -[2024-06-06 13:09:01,163][00159] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2790.8). Total num frames: 2125824. Throughput: 0: 757.6. Samples: 530614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:09:01,167][00159] Avg episode reward: [(0, '17.760')] -[2024-06-06 13:09:01,182][03267] Saving new best policy, reward=17.760! -[2024-06-06 13:09:02,561][03280] Updated weights for policy 0, policy_version 520 (0.0028) -[2024-06-06 13:09:06,163][00159] Fps is (10 sec: 2458.9, 60 sec: 3003.7, 300 sec: 2804.7). Total num frames: 2138112. Throughput: 0: 710.6. Samples: 534456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:09:06,171][00159] Avg episode reward: [(0, '17.481')] -[2024-06-06 13:09:11,163][00159] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2846.4). Total num frames: 2158592. Throughput: 0: 753.2. Samples: 539802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:09:11,172][00159] Avg episode reward: [(0, '16.249')] -[2024-06-06 13:09:15,579][03280] Updated weights for policy 0, policy_version 530 (0.0025) -[2024-06-06 13:09:16,164][00159] Fps is (10 sec: 3276.4, 60 sec: 2935.4, 300 sec: 2860.2). Total num frames: 2170880. Throughput: 0: 751.8. Samples: 542162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:09:16,170][00159] Avg episode reward: [(0, '17.256')] -[2024-06-06 13:09:21,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 2179072. Throughput: 0: 702.2. Samples: 545276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:09:21,165][00159] Avg episode reward: [(0, '16.305')] -[2024-06-06 13:09:26,163][00159] Fps is (10 sec: 2867.5, 60 sec: 2935.5, 300 sec: 2888.0). Total num frames: 2199552. Throughput: 0: 715.5. Samples: 549840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:09:26,165][00159] Avg episode reward: [(0, '16.862')] -[2024-06-06 13:09:29,586][03280] Updated weights for policy 0, policy_version 540 (0.0041) -[2024-06-06 13:09:31,163][00159] Fps is (10 sec: 3686.3, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 2215936. Throughput: 0: 736.7. Samples: 552404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:09:31,166][00159] Avg episode reward: [(0, '17.127')] -[2024-06-06 13:09:36,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 2224128. Throughput: 0: 722.3. Samples: 556362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:09:36,168][00159] Avg episode reward: [(0, '17.809')] -[2024-06-06 13:09:36,172][03267] Saving new best policy, reward=17.809! -[2024-06-06 13:09:41,163][00159] Fps is (10 sec: 2048.1, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 2236416. Throughput: 0: 675.1. Samples: 559356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:09:41,165][00159] Avg episode reward: [(0, '17.883')] -[2024-06-06 13:09:41,180][03267] Saving new best policy, reward=17.883! -[2024-06-06 13:09:45,699][03280] Updated weights for policy 0, policy_version 550 (0.0029) -[2024-06-06 13:09:46,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 2252800. Throughput: 0: 691.9. Samples: 561748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:09:46,166][00159] Avg episode reward: [(0, '19.579')] -[2024-06-06 13:09:46,172][03267] Saving new best policy, reward=19.579! -[2024-06-06 13:09:51,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 2265088. Throughput: 0: 714.6. Samples: 566612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:09:51,166][00159] Avg episode reward: [(0, '19.339')] -[2024-06-06 13:09:56,166][00159] Fps is (10 sec: 2456.9, 60 sec: 2730.8, 300 sec: 2902.0). Total num frames: 2277376. Throughput: 0: 663.8. Samples: 569676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:09:56,172][00159] Avg episode reward: [(0, '19.697')] -[2024-06-06 13:09:56,178][03267] Saving new best policy, reward=19.697! -[2024-06-06 13:10:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2915.8). Total num frames: 2289664. Throughput: 0: 649.1. Samples: 571370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:10:01,170][00159] Avg episode reward: [(0, '19.878')] -[2024-06-06 13:10:01,182][03267] Saving new best policy, reward=19.878! -[2024-06-06 13:10:01,675][03280] Updated weights for policy 0, policy_version 560 (0.0017) -[2024-06-06 13:10:06,163][00159] Fps is (10 sec: 2868.0, 60 sec: 2798.9, 300 sec: 2915.8). Total num frames: 2306048. Throughput: 0: 687.3. Samples: 576206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:06,171][00159] Avg episode reward: [(0, '19.245')] -[2024-06-06 13:10:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2915.8). Total num frames: 2318336. Throughput: 0: 678.7. Samples: 580380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:11,170][00159] Avg episode reward: [(0, '18.826')] -[2024-06-06 13:10:16,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2594.2, 300 sec: 2915.8). Total num frames: 2326528. Throughput: 0: 655.0. Samples: 581880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:16,165][00159] Avg episode reward: [(0, '17.512')] -[2024-06-06 13:10:17,395][03280] Updated weights for policy 0, policy_version 570 (0.0020) -[2024-06-06 13:10:21,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2957.5). Total num frames: 2347008. Throughput: 0: 657.4. Samples: 585946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:10:21,166][00159] Avg episode reward: [(0, '17.182')] -[2024-06-06 13:10:26,163][00159] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2957.5). Total num frames: 2363392. Throughput: 0: 702.7. Samples: 590976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:26,168][00159] Avg episode reward: [(0, '17.373')] -[2024-06-06 13:10:31,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2929.7). Total num frames: 2371584. Throughput: 0: 689.7. Samples: 592784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:10:31,171][00159] Avg episode reward: [(0, '18.332')] -[2024-06-06 13:10:31,182][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth... -[2024-06-06 13:10:31,402][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth -[2024-06-06 13:10:31,879][03280] Updated weights for policy 0, policy_version 580 (0.0027) -[2024-06-06 13:10:36,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2943.6). Total num frames: 2383872. Throughput: 0: 650.5. Samples: 595884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:36,170][00159] Avg episode reward: [(0, '18.554')] -[2024-06-06 13:10:41,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2971.3). Total num frames: 2404352. Throughput: 0: 693.4. Samples: 600876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:10:41,171][00159] Avg episode reward: [(0, '18.797')] -[2024-06-06 13:10:45,205][03280] Updated weights for policy 0, policy_version 590 (0.0018) -[2024-06-06 13:10:46,165][00159] Fps is (10 sec: 3276.1, 60 sec: 2730.6, 300 sec: 2957.4). Total num frames: 2416640. Throughput: 0: 712.6. Samples: 603440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:10:46,167][00159] Avg episode reward: [(0, '19.660')] -[2024-06-06 13:10:51,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2929.7). Total num frames: 2424832. Throughput: 0: 679.3. Samples: 606774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:10:51,166][00159] Avg episode reward: [(0, '20.334')] -[2024-06-06 13:10:51,181][03267] Saving new best policy, reward=20.334! -[2024-06-06 13:10:56,163][00159] Fps is (10 sec: 1638.7, 60 sec: 2594.2, 300 sec: 2929.7). Total num frames: 2433024. Throughput: 0: 643.6. Samples: 609344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:10:56,172][00159] Avg episode reward: [(0, '20.302')] -[2024-06-06 13:11:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2929.7). Total num frames: 2449408. Throughput: 0: 643.9. Samples: 610856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:11:01,168][00159] Avg episode reward: [(0, '21.878')] -[2024-06-06 13:11:01,183][03267] Saving new best policy, reward=21.878! -[2024-06-06 13:11:03,834][03280] Updated weights for policy 0, policy_version 600 (0.0055) -[2024-06-06 13:11:06,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2915.8). Total num frames: 2461696. Throughput: 0: 650.0. Samples: 615198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:11:06,168][00159] Avg episode reward: [(0, '22.483')] -[2024-06-06 13:11:06,172][03267] Saving new best policy, reward=22.483! -[2024-06-06 13:11:11,164][00159] Fps is (10 sec: 2047.8, 60 sec: 2525.8, 300 sec: 2915.8). Total num frames: 2469888. Throughput: 0: 603.9. Samples: 618152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:11:11,171][00159] Avg episode reward: [(0, '22.012')] -[2024-06-06 13:11:16,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2943.6). Total num frames: 2486272. Throughput: 0: 610.0. Samples: 620236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:11:16,169][00159] Avg episode reward: [(0, '22.186')] -[2024-06-06 13:11:19,203][03280] Updated weights for policy 0, policy_version 610 (0.0034) -[2024-06-06 13:11:21,163][00159] Fps is (10 sec: 3277.1, 60 sec: 2594.1, 300 sec: 2957.5). Total num frames: 2502656. Throughput: 0: 651.1. Samples: 625182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:11:21,169][00159] Avg episode reward: [(0, '22.617')] -[2024-06-06 13:11:21,181][03267] Saving new best policy, reward=22.617! -[2024-06-06 13:11:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2971.3). Total num frames: 2514944. Throughput: 0: 622.7. Samples: 628896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:11:26,167][00159] Avg episode reward: [(0, '22.308')] -[2024-06-06 13:11:31,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2985.2). Total num frames: 2527232. Throughput: 0: 598.6. Samples: 630378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:11:31,170][00159] Avg episode reward: [(0, '22.327')] -[2024-06-06 13:11:34,732][03280] Updated weights for policy 0, policy_version 620 (0.0039) -[2024-06-06 13:11:36,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2985.2). Total num frames: 2543616. Throughput: 0: 630.4. Samples: 635142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:11:36,172][00159] Avg episode reward: [(0, '20.477')] -[2024-06-06 13:11:41,164][00159] Fps is (10 sec: 2866.8, 60 sec: 2525.8, 300 sec: 2957.4). Total num frames: 2555904. Throughput: 0: 673.5. Samples: 639652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:11:41,171][00159] Avg episode reward: [(0, '19.769')] -[2024-06-06 13:11:46,165][00159] Fps is (10 sec: 2047.6, 60 sec: 2457.6, 300 sec: 2929.7). Total num frames: 2564096. Throughput: 0: 673.2. Samples: 641150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:11:46,170][00159] Avg episode reward: [(0, '19.121')] -[2024-06-06 13:11:50,321][03280] Updated weights for policy 0, policy_version 630 (0.0018) -[2024-06-06 13:11:51,163][00159] Fps is (10 sec: 2458.0, 60 sec: 2594.1, 300 sec: 2943.6). Total num frames: 2580480. Throughput: 0: 655.8. Samples: 644710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:11:51,166][00159] Avg episode reward: [(0, '19.527')] -[2024-06-06 13:11:56,163][00159] Fps is (10 sec: 3277.5, 60 sec: 2730.7, 300 sec: 2929.7). Total num frames: 2596864. Throughput: 0: 701.3. Samples: 649710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:11:56,172][00159] Avg episode reward: [(0, '19.800')] -[2024-06-06 13:12:01,167][00159] Fps is (10 sec: 2865.9, 60 sec: 2662.2, 300 sec: 2901.9). Total num frames: 2609152. Throughput: 0: 705.1. Samples: 651970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:12:01,170][00159] Avg episode reward: [(0, '20.145')] -[2024-06-06 13:12:05,734][03280] Updated weights for policy 0, policy_version 640 (0.0024) -[2024-06-06 13:12:06,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2901.9). Total num frames: 2621440. Throughput: 0: 663.0. Samples: 655018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:12:06,168][00159] Avg episode reward: [(0, '20.694')] -[2024-06-06 13:12:11,163][00159] Fps is (10 sec: 2868.5, 60 sec: 2799.0, 300 sec: 2901.9). Total num frames: 2637824. Throughput: 0: 688.3. Samples: 659870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:12:11,171][00159] Avg episode reward: [(0, '21.916')] -[2024-06-06 13:12:16,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 2654208. Throughput: 0: 710.7. Samples: 662360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:12:16,167][00159] Avg episode reward: [(0, '23.499')] -[2024-06-06 13:12:16,170][03267] Saving new best policy, reward=23.499! -[2024-06-06 13:12:19,187][03280] Updated weights for policy 0, policy_version 650 (0.0033) -[2024-06-06 13:12:21,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2860.3). Total num frames: 2666496. Throughput: 0: 688.3. Samples: 666116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:12:21,166][00159] Avg episode reward: [(0, '24.102')] -[2024-06-06 13:12:21,183][03267] Saving new best policy, reward=24.102! -[2024-06-06 13:12:26,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2860.3). Total num frames: 2678784. Throughput: 0: 676.5. Samples: 670094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:12:26,170][00159] Avg episode reward: [(0, '25.100')] -[2024-06-06 13:12:26,176][03267] Saving new best policy, reward=25.100! -[2024-06-06 13:12:31,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2860.3). Total num frames: 2695168. Throughput: 0: 701.1. Samples: 672696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:12:31,173][00159] Avg episode reward: [(0, '24.530')] -[2024-06-06 13:12:31,255][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000659_2699264.pth... -[2024-06-06 13:12:31,388][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000498_2039808.pth -[2024-06-06 13:12:32,317][03280] Updated weights for policy 0, policy_version 660 (0.0029) -[2024-06-06 13:12:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 2711552. Throughput: 0: 734.8. Samples: 677778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:12:36,166][00159] Avg episode reward: [(0, '24.968')] -[2024-06-06 13:12:41,167][00159] Fps is (10 sec: 2866.0, 60 sec: 2798.8, 300 sec: 2832.4). Total num frames: 2723840. Throughput: 0: 693.7. Samples: 680928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:12:41,169][00159] Avg episode reward: [(0, '24.889')] -[2024-06-06 13:12:46,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2935.6, 300 sec: 2846.4). Total num frames: 2740224. Throughput: 0: 690.4. Samples: 683034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:12:46,170][00159] Avg episode reward: [(0, '24.199')] -[2024-06-06 13:12:47,177][03280] Updated weights for policy 0, policy_version 670 (0.0023) -[2024-06-06 13:12:51,163][00159] Fps is (10 sec: 3278.2, 60 sec: 2935.5, 300 sec: 2818.6). Total num frames: 2756608. Throughput: 0: 738.3. Samples: 688240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:12:51,166][00159] Avg episode reward: [(0, '22.940')] -[2024-06-06 13:12:56,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2790.9). Total num frames: 2768896. Throughput: 0: 725.4. Samples: 692512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:12:56,165][00159] Avg episode reward: [(0, '22.345')] -[2024-06-06 13:13:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2867.4, 300 sec: 2790.8). Total num frames: 2781184. Throughput: 0: 706.4. Samples: 694146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:13:01,165][00159] Avg episode reward: [(0, '21.314')] -[2024-06-06 13:13:02,119][03280] Updated weights for policy 0, policy_version 680 (0.0029) -[2024-06-06 13:13:06,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2935.5, 300 sec: 2790.8). Total num frames: 2797568. Throughput: 0: 727.9. Samples: 698870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:13:06,166][00159] Avg episode reward: [(0, '20.619')] -[2024-06-06 13:13:11,172][00159] Fps is (10 sec: 3273.9, 60 sec: 2935.0, 300 sec: 2776.9). Total num frames: 2813952. Throughput: 0: 753.0. Samples: 703986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:13:11,176][00159] Avg episode reward: [(0, '19.714')] -[2024-06-06 13:13:16,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 2822144. Throughput: 0: 729.1. Samples: 705504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:13:16,165][00159] Avg episode reward: [(0, '19.375')] -[2024-06-06 13:13:16,606][03280] Updated weights for policy 0, policy_version 690 (0.0028) -[2024-06-06 13:13:21,163][00159] Fps is (10 sec: 2459.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 2838528. Throughput: 0: 693.9. Samples: 709002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:13:21,168][00159] Avg episode reward: [(0, '20.181')] -[2024-06-06 13:13:26,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 2854912. Throughput: 0: 737.4. Samples: 714106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:13:26,165][00159] Avg episode reward: [(0, '20.871')] -[2024-06-06 13:13:29,870][03280] Updated weights for policy 0, policy_version 700 (0.0034) -[2024-06-06 13:13:31,164][00159] Fps is (10 sec: 2867.0, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 2867200. Throughput: 0: 744.8. Samples: 716552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:13:31,170][00159] Avg episode reward: [(0, '21.117')] -[2024-06-06 13:13:36,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 2879488. Throughput: 0: 698.2. Samples: 719660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:13:36,170][00159] Avg episode reward: [(0, '21.636')] -[2024-06-06 13:13:41,163][00159] Fps is (10 sec: 2867.3, 60 sec: 2867.4, 300 sec: 2763.1). Total num frames: 2895872. Throughput: 0: 703.7. Samples: 724178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:13:41,166][00159] Avg episode reward: [(0, '23.355')] -[2024-06-06 13:13:44,429][03280] Updated weights for policy 0, policy_version 710 (0.0035) -[2024-06-06 13:13:46,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 2912256. Throughput: 0: 723.1. Samples: 726686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:13:46,170][00159] Avg episode reward: [(0, '24.400')] -[2024-06-06 13:13:51,167][00159] Fps is (10 sec: 2866.1, 60 sec: 2798.8, 300 sec: 2749.2). Total num frames: 2924544. Throughput: 0: 707.4. Samples: 730704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:13:51,169][00159] Avg episode reward: [(0, '26.740')] -[2024-06-06 13:13:51,189][03267] Saving new best policy, reward=26.740! -[2024-06-06 13:13:56,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 2936832. Throughput: 0: 668.2. Samples: 734050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:13:56,171][00159] Avg episode reward: [(0, '26.659')] -[2024-06-06 13:13:59,811][03280] Updated weights for policy 0, policy_version 720 (0.0016) -[2024-06-06 13:14:01,163][00159] Fps is (10 sec: 2868.3, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 2953216. Throughput: 0: 690.5. Samples: 736578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:14:01,166][00159] Avg episode reward: [(0, '27.289')] -[2024-06-06 13:14:01,181][03267] Saving new best policy, reward=27.289! -[2024-06-06 13:14:06,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 2965504. Throughput: 0: 723.0. Samples: 741538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:14:06,166][00159] Avg episode reward: [(0, '26.836')] -[2024-06-06 13:14:11,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2731.1, 300 sec: 2735.3). Total num frames: 2977792. Throughput: 0: 677.6. Samples: 744596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:14:11,169][00159] Avg episode reward: [(0, '26.594')] -[2024-06-06 13:14:15,014][03280] Updated weights for policy 0, policy_version 730 (0.0026) -[2024-06-06 13:14:16,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 2990080. Throughput: 0: 664.8. Samples: 746468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:14:16,165][00159] Avg episode reward: [(0, '26.302')] -[2024-06-06 13:14:21,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 3006464. Throughput: 0: 706.3. Samples: 751444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:14:21,171][00159] Avg episode reward: [(0, '26.516')] -[2024-06-06 13:14:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 3018752. Throughput: 0: 697.5. Samples: 755566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:14:26,167][00159] Avg episode reward: [(0, '25.631')] -[2024-06-06 13:14:30,702][03280] Updated weights for policy 0, policy_version 740 (0.0025) -[2024-06-06 13:14:31,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 3031040. Throughput: 0: 676.0. Samples: 757108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:14:31,168][00159] Avg episode reward: [(0, '25.844')] -[2024-06-06 13:14:31,180][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth... -[2024-06-06 13:14:31,316][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth -[2024-06-06 13:14:36,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3047424. Throughput: 0: 682.0. Samples: 761392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:14:36,169][00159] Avg episode reward: [(0, '25.507')] -[2024-06-06 13:14:41,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3063808. Throughput: 0: 717.9. Samples: 766356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:14:41,165][00159] Avg episode reward: [(0, '25.613')] -[2024-06-06 13:14:44,641][03280] Updated weights for policy 0, policy_version 750 (0.0032) -[2024-06-06 13:14:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 3072000. Throughput: 0: 695.5. Samples: 767876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:14:46,172][00159] Avg episode reward: [(0, '25.257')] -[2024-06-06 13:14:51,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 3088384. Throughput: 0: 661.0. Samples: 771284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:14:51,170][00159] Avg episode reward: [(0, '26.196')] -[2024-06-06 13:14:56,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 3104768. Throughput: 0: 703.1. Samples: 776236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:14:56,173][00159] Avg episode reward: [(0, '25.943')] -[2024-06-06 13:14:58,410][03280] Updated weights for policy 0, policy_version 760 (0.0017) -[2024-06-06 13:15:01,164][00159] Fps is (10 sec: 2866.8, 60 sec: 2730.6, 300 sec: 2749.2). Total num frames: 3117056. Throughput: 0: 717.0. Samples: 778736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:01,169][00159] Avg episode reward: [(0, '25.952')] -[2024-06-06 13:15:06,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 3125248. Throughput: 0: 675.5. Samples: 781842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:06,175][00159] Avg episode reward: [(0, '24.967')] -[2024-06-06 13:15:11,163][00159] Fps is (10 sec: 2457.9, 60 sec: 2730.7, 300 sec: 2763.1). Total num frames: 3141632. Throughput: 0: 681.9. Samples: 786250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:11,168][00159] Avg episode reward: [(0, '24.551')] -[2024-06-06 13:15:13,665][03280] Updated weights for policy 0, policy_version 770 (0.0028) -[2024-06-06 13:15:16,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3158016. Throughput: 0: 702.6. Samples: 788724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:15:16,168][00159] Avg episode reward: [(0, '23.552')] -[2024-06-06 13:15:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 3166208. Throughput: 0: 681.7. Samples: 792070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:21,168][00159] Avg episode reward: [(0, '23.415')] -[2024-06-06 13:15:26,165][00159] Fps is (10 sec: 1638.0, 60 sec: 2594.0, 300 sec: 2721.4). Total num frames: 3174400. Throughput: 0: 624.9. Samples: 794476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:26,168][00159] Avg episode reward: [(0, '22.536')] -[2024-06-06 13:15:31,163][00159] Fps is (10 sec: 2457.5, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 3190784. Throughput: 0: 627.2. Samples: 796100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:15:31,170][00159] Avg episode reward: [(0, '20.950')] -[2024-06-06 13:15:32,040][03280] Updated weights for policy 0, policy_version 780 (0.0053) -[2024-06-06 13:15:36,163][00159] Fps is (10 sec: 3277.5, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 3207168. Throughput: 0: 663.7. Samples: 801152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:15:36,165][00159] Avg episode reward: [(0, '21.177')] -[2024-06-06 13:15:41,163][00159] Fps is (10 sec: 2867.3, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 3219456. Throughput: 0: 648.3. Samples: 805408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:15:41,165][00159] Avg episode reward: [(0, '21.354')] -[2024-06-06 13:15:46,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 3231744. Throughput: 0: 625.0. Samples: 806862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:15:46,169][00159] Avg episode reward: [(0, '19.533')] -[2024-06-06 13:15:47,281][03280] Updated weights for policy 0, policy_version 790 (0.0035) -[2024-06-06 13:15:51,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3248128. Throughput: 0: 651.0. Samples: 811136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:15:51,166][00159] Avg episode reward: [(0, '19.300')] -[2024-06-06 13:15:56,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3264512. Throughput: 0: 663.7. Samples: 816116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:15:56,166][00159] Avg episode reward: [(0, '19.549')] -[2024-06-06 13:16:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2749.2). Total num frames: 3272704. Throughput: 0: 646.8. Samples: 817832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:16:01,166][00159] Avg episode reward: [(0, '20.000')] -[2024-06-06 13:16:01,831][03280] Updated weights for policy 0, policy_version 800 (0.0024) -[2024-06-06 13:16:06,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3284992. Throughput: 0: 642.7. Samples: 820992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:16:06,166][00159] Avg episode reward: [(0, '20.408')] -[2024-06-06 13:16:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3301376. Throughput: 0: 701.6. Samples: 826048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:16:11,166][00159] Avg episode reward: [(0, '21.719')] -[2024-06-06 13:16:15,467][03280] Updated weights for policy 0, policy_version 810 (0.0034) -[2024-06-06 13:16:16,164][00159] Fps is (10 sec: 3276.3, 60 sec: 2662.3, 300 sec: 2763.1). Total num frames: 3317760. Throughput: 0: 721.5. Samples: 828570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:16:16,169][00159] Avg episode reward: [(0, '22.912')] -[2024-06-06 13:16:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 3325952. Throughput: 0: 681.2. Samples: 831808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:16:21,166][00159] Avg episode reward: [(0, '23.657')] -[2024-06-06 13:16:26,163][00159] Fps is (10 sec: 2457.9, 60 sec: 2799.0, 300 sec: 2763.1). Total num frames: 3342336. Throughput: 0: 678.8. Samples: 835952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:16:26,165][00159] Avg episode reward: [(0, '24.381')] -[2024-06-06 13:16:30,363][03280] Updated weights for policy 0, policy_version 820 (0.0030) -[2024-06-06 13:16:31,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 3358720. Throughput: 0: 703.0. Samples: 838496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:16:31,169][00159] Avg episode reward: [(0, '25.621')] -[2024-06-06 13:16:31,183][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... -[2024-06-06 13:16:31,353][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000659_2699264.pth -[2024-06-06 13:16:36,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2763.1). Total num frames: 3371008. Throughput: 0: 704.3. Samples: 842828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:16:36,165][00159] Avg episode reward: [(0, '25.687')] -[2024-06-06 13:16:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2777.0). Total num frames: 3383296. Throughput: 0: 663.2. Samples: 845962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:16:41,170][00159] Avg episode reward: [(0, '25.437')] -[2024-06-06 13:16:45,685][03280] Updated weights for policy 0, policy_version 830 (0.0018) -[2024-06-06 13:16:46,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2776.9). Total num frames: 3399680. Throughput: 0: 680.6. Samples: 848460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:16:46,165][00159] Avg episode reward: [(0, '23.744')] -[2024-06-06 13:16:51,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2776.9). Total num frames: 3416064. Throughput: 0: 723.3. Samples: 853542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:16:51,165][00159] Avg episode reward: [(0, '22.794')] -[2024-06-06 13:16:56,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3424256. Throughput: 0: 685.1. Samples: 856878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:16:56,170][00159] Avg episode reward: [(0, '23.741')] -[2024-06-06 13:17:00,855][03280] Updated weights for policy 0, policy_version 840 (0.0048) -[2024-06-06 13:17:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2776.9). Total num frames: 3440640. Throughput: 0: 663.8. Samples: 858442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:17:01,165][00159] Avg episode reward: [(0, '23.624')] -[2024-06-06 13:17:06,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2867.2, 300 sec: 2776.9). Total num frames: 3457024. Throughput: 0: 704.7. Samples: 863520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:17:06,166][00159] Avg episode reward: [(0, '24.765')] -[2024-06-06 13:17:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 3469312. Throughput: 0: 712.2. Samples: 868002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:17:11,166][00159] Avg episode reward: [(0, '25.034')] -[2024-06-06 13:17:15,965][03280] Updated weights for policy 0, policy_version 850 (0.0040) -[2024-06-06 13:17:16,165][00159] Fps is (10 sec: 2457.1, 60 sec: 2730.6, 300 sec: 2763.0). Total num frames: 3481600. Throughput: 0: 690.8. Samples: 869582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:17:16,171][00159] Avg episode reward: [(0, '25.447')] -[2024-06-06 13:17:21,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2776.9). Total num frames: 3497984. Throughput: 0: 685.8. Samples: 873688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:17:21,171][00159] Avg episode reward: [(0, '27.004')] -[2024-06-06 13:17:26,163][00159] Fps is (10 sec: 3277.5, 60 sec: 2867.2, 300 sec: 2776.9). Total num frames: 3514368. Throughput: 0: 727.4. Samples: 878694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:17:26,165][00159] Avg episode reward: [(0, '28.146')] -[2024-06-06 13:17:26,169][03267] Saving new best policy, reward=28.146! -[2024-06-06 13:17:29,234][03280] Updated weights for policy 0, policy_version 860 (0.0033) -[2024-06-06 13:17:31,165][00159] Fps is (10 sec: 2457.2, 60 sec: 2730.6, 300 sec: 2749.2). Total num frames: 3522560. Throughput: 0: 713.9. Samples: 880586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:17:31,169][00159] Avg episode reward: [(0, '28.540')] -[2024-06-06 13:17:31,273][03267] Saving new best policy, reward=28.540! -[2024-06-06 13:17:36,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3534848. Throughput: 0: 668.5. Samples: 883626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:17:36,174][00159] Avg episode reward: [(0, '27.493')] -[2024-06-06 13:17:41,165][00159] Fps is (10 sec: 3276.6, 60 sec: 2867.1, 300 sec: 2763.0). Total num frames: 3555328. Throughput: 0: 706.8. Samples: 888684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:17:41,168][00159] Avg episode reward: [(0, '26.839')] -[2024-06-06 13:17:43,651][03280] Updated weights for policy 0, policy_version 870 (0.0017) -[2024-06-06 13:17:46,165][00159] Fps is (10 sec: 3276.3, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3567616. Throughput: 0: 729.6. Samples: 891276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:17:46,169][00159] Avg episode reward: [(0, '25.433')] -[2024-06-06 13:17:51,163][00159] Fps is (10 sec: 2458.2, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3579904. Throughput: 0: 692.8. Samples: 894696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:17:51,168][00159] Avg episode reward: [(0, '25.292')] -[2024-06-06 13:17:56,163][00159] Fps is (10 sec: 2458.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3592192. Throughput: 0: 683.1. Samples: 898740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:17:56,172][00159] Avg episode reward: [(0, '23.054')] -[2024-06-06 13:17:58,735][03280] Updated weights for policy 0, policy_version 880 (0.0037) -[2024-06-06 13:18:01,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 3612672. Throughput: 0: 704.7. Samples: 901292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:18:01,173][00159] Avg episode reward: [(0, '22.928')] -[2024-06-06 13:18:06,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2730.7, 300 sec: 2735.4). Total num frames: 3620864. Throughput: 0: 712.8. Samples: 905764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:18:06,166][00159] Avg episode reward: [(0, '23.251')] -[2024-06-06 13:18:11,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3633152. Throughput: 0: 672.6. Samples: 908962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:18:11,166][00159] Avg episode reward: [(0, '23.428')] -[2024-06-06 13:18:13,829][03280] Updated weights for policy 0, policy_version 890 (0.0032) -[2024-06-06 13:18:16,163][00159] Fps is (10 sec: 2867.3, 60 sec: 2799.0, 300 sec: 2749.2). Total num frames: 3649536. Throughput: 0: 687.5. Samples: 911524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:18:16,165][00159] Avg episode reward: [(0, '23.869')] -[2024-06-06 13:18:21,166][00159] Fps is (10 sec: 3275.7, 60 sec: 2798.8, 300 sec: 2749.1). Total num frames: 3665920. Throughput: 0: 733.9. Samples: 916656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:18:21,169][00159] Avg episode reward: [(0, '24.209')] -[2024-06-06 13:18:26,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3678208. Throughput: 0: 696.2. Samples: 920012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:18:26,165][00159] Avg episode reward: [(0, '24.952')] -[2024-06-06 13:18:28,775][03280] Updated weights for policy 0, policy_version 900 (0.0036) -[2024-06-06 13:18:31,164][00159] Fps is (10 sec: 2867.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 3694592. Throughput: 0: 675.1. Samples: 921656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:18:31,171][00159] Avg episode reward: [(0, '25.548')] -[2024-06-06 13:18:31,183][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000902_3694592.pth... -[2024-06-06 13:18:31,312][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth -[2024-06-06 13:18:36,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 3710976. Throughput: 0: 711.4. Samples: 926708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:18:36,171][00159] Avg episode reward: [(0, '27.440')] -[2024-06-06 13:18:41,166][00159] Fps is (10 sec: 2866.6, 60 sec: 2798.9, 300 sec: 2749.1). Total num frames: 3723264. Throughput: 0: 719.5. Samples: 931120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:18:41,169][00159] Avg episode reward: [(0, '27.356')] -[2024-06-06 13:18:43,035][03280] Updated weights for policy 0, policy_version 910 (0.0028) -[2024-06-06 13:18:46,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 3731456. Throughput: 0: 696.3. Samples: 932624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:18:46,166][00159] Avg episode reward: [(0, '27.612')] -[2024-06-06 13:18:51,163][00159] Fps is (10 sec: 2458.4, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3747840. Throughput: 0: 688.1. Samples: 936728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:18:51,168][00159] Avg episode reward: [(0, '27.287')] -[2024-06-06 13:18:56,163][00159] Fps is (10 sec: 3276.7, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 3764224. Throughput: 0: 728.6. Samples: 941748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:18:56,168][00159] Avg episode reward: [(0, '25.818')] -[2024-06-06 13:18:56,663][03280] Updated weights for policy 0, policy_version 920 (0.0015) -[2024-06-06 13:19:01,163][00159] Fps is (10 sec: 2867.1, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3776512. Throughput: 0: 712.8. Samples: 943600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:19:01,166][00159] Avg episode reward: [(0, '26.124')] -[2024-06-06 13:19:06,163][00159] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3788800. Throughput: 0: 668.8. Samples: 946748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:19:06,166][00159] Avg episode reward: [(0, '27.144')] -[2024-06-06 13:19:11,163][00159] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 3805184. Throughput: 0: 708.8. Samples: 951906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:19:11,171][00159] Avg episode reward: [(0, '26.373')] -[2024-06-06 13:19:11,366][03280] Updated weights for policy 0, policy_version 930 (0.0049) -[2024-06-06 13:19:16,166][00159] Fps is (10 sec: 3275.9, 60 sec: 2867.1, 300 sec: 2763.0). Total num frames: 3821568. Throughput: 0: 730.0. Samples: 954506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:19:16,169][00159] Avg episode reward: [(0, '26.356')] -[2024-06-06 13:19:21,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 3829760. Throughput: 0: 693.4. Samples: 957912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:19:21,165][00159] Avg episode reward: [(0, '26.576')] -[2024-06-06 13:19:26,163][00159] Fps is (10 sec: 2458.3, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 3846144. Throughput: 0: 686.9. Samples: 962028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:19:26,173][00159] Avg episode reward: [(0, '27.595')] -[2024-06-06 13:19:26,638][03280] Updated weights for policy 0, policy_version 940 (0.0032) -[2024-06-06 13:19:31,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2799.0, 300 sec: 2763.1). Total num frames: 3862528. Throughput: 0: 710.4. Samples: 964590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:19:31,165][00159] Avg episode reward: [(0, '26.910')] -[2024-06-06 13:19:36,171][00159] Fps is (10 sec: 2864.8, 60 sec: 2730.3, 300 sec: 2749.1). Total num frames: 3874816. Throughput: 0: 716.2. Samples: 968962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:19:36,174][00159] Avg episode reward: [(0, '27.747')] -[2024-06-06 13:19:41,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2763.1). Total num frames: 3887104. Throughput: 0: 673.3. Samples: 972046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:19:41,175][00159] Avg episode reward: [(0, '27.667')] -[2024-06-06 13:19:41,700][03280] Updated weights for policy 0, policy_version 950 (0.0036) -[2024-06-06 13:19:46,163][00159] Fps is (10 sec: 2459.6, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 3899392. Throughput: 0: 684.5. Samples: 974404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:19:46,170][00159] Avg episode reward: [(0, '27.250')] -[2024-06-06 13:19:51,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 3911680. Throughput: 0: 682.4. Samples: 977456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:19:51,167][00159] Avg episode reward: [(0, '26.046')] -[2024-06-06 13:19:56,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 3919872. Throughput: 0: 636.0. Samples: 980526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:19:56,166][00159] Avg episode reward: [(0, '24.773')] -[2024-06-06 13:20:00,032][03280] Updated weights for policy 0, policy_version 960 (0.0051) -[2024-06-06 13:20:01,163][00159] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 3936256. Throughput: 0: 613.5. Samples: 982110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:20:01,165][00159] Avg episode reward: [(0, '23.114')] -[2024-06-06 13:20:06,163][00159] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 3952640. Throughput: 0: 649.8. Samples: 987154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:20:06,173][00159] Avg episode reward: [(0, '23.766')] -[2024-06-06 13:20:11,165][00159] Fps is (10 sec: 2866.6, 60 sec: 2662.3, 300 sec: 2735.3). Total num frames: 3964928. Throughput: 0: 658.8. Samples: 991676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:20:11,170][00159] Avg episode reward: [(0, '23.351')] -[2024-06-06 13:20:14,547][03280] Updated weights for policy 0, policy_version 970 (0.0027) -[2024-06-06 13:20:16,163][00159] Fps is (10 sec: 2048.0, 60 sec: 2526.0, 300 sec: 2735.3). Total num frames: 3973120. Throughput: 0: 634.7. Samples: 993152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:20:16,172][00159] Avg episode reward: [(0, '22.772')] -[2024-06-06 13:20:21,163][00159] Fps is (10 sec: 2458.1, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 3989504. Throughput: 0: 628.1. Samples: 997222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:20:21,165][00159] Avg episode reward: [(0, '23.209')] -[2024-06-06 13:20:24,919][03267] Stopping Batcher_0... -[2024-06-06 13:20:24,920][03267] Loop batcher_evt_loop terminating... -[2024-06-06 13:20:24,920][00159] Component Batcher_0 stopped! -[2024-06-06 13:20:24,932][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-06-06 13:20:25,007][03280] Weights refcount: 2 0 -[2024-06-06 13:20:25,015][00159] Component InferenceWorker_p0-w0 stopped! -[2024-06-06 13:20:25,015][03280] Stopping InferenceWorker_p0-w0... -[2024-06-06 13:20:25,025][03280] Loop inference_proc0-0_evt_loop terminating... -[2024-06-06 13:20:25,050][03285] Stopping RolloutWorker_w4... -[2024-06-06 13:20:25,051][03285] Loop rollout_proc4_evt_loop terminating... -[2024-06-06 13:20:25,050][00159] Component RolloutWorker_w4 stopped! -[2024-06-06 13:20:25,058][00159] Component RolloutWorker_w5 stopped! -[2024-06-06 13:20:25,065][03287] Stopping RolloutWorker_w5... -[2024-06-06 13:20:25,065][03287] Loop rollout_proc5_evt_loop terminating... -[2024-06-06 13:20:25,072][03286] Stopping RolloutWorker_w6... -[2024-06-06 13:20:25,075][03286] Loop rollout_proc6_evt_loop terminating... -[2024-06-06 13:20:25,073][00159] Component RolloutWorker_w6 stopped! -[2024-06-06 13:20:25,091][00159] Component RolloutWorker_w7 stopped! -[2024-06-06 13:20:25,098][03283] Stopping RolloutWorker_w2... -[2024-06-06 13:20:25,099][03283] Loop rollout_proc2_evt_loop terminating... -[2024-06-06 13:20:25,098][00159] Component RolloutWorker_w2 stopped! -[2024-06-06 13:20:25,107][03281] Stopping RolloutWorker_w0... -[2024-06-06 13:20:25,108][03281] Loop rollout_proc0_evt_loop terminating... -[2024-06-06 13:20:25,109][03282] Stopping RolloutWorker_w1... -[2024-06-06 13:20:25,109][03282] Loop rollout_proc1_evt_loop terminating... -[2024-06-06 13:20:25,108][00159] Component RolloutWorker_w0 stopped! -[2024-06-06 13:20:25,114][00159] Component RolloutWorker_w1 stopped! -[2024-06-06 13:20:25,119][03288] Stopping RolloutWorker_w7... -[2024-06-06 13:20:25,131][03288] Loop rollout_proc7_evt_loop terminating... -[2024-06-06 13:20:25,135][00159] Component RolloutWorker_w3 stopped! -[2024-06-06 13:20:25,137][03284] Stopping RolloutWorker_w3... -[2024-06-06 13:20:25,143][03267] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth -[2024-06-06 13:20:25,144][03284] Loop rollout_proc3_evt_loop terminating... -[2024-06-06 13:20:25,169][03267] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-06-06 13:20:25,382][03267] Stopping LearnerWorker_p0... -[2024-06-06 13:20:25,382][03267] Loop learner_proc0_evt_loop terminating... -[2024-06-06 13:20:25,383][00159] Component LearnerWorker_p0 stopped! -[2024-06-06 13:20:25,386][00159] Waiting for process learner_proc0 to stop... -[2024-06-06 13:20:27,520][00159] Waiting for process inference_proc0-0 to join... -[2024-06-06 13:20:27,527][00159] Waiting for process rollout_proc0 to join... -[2024-06-06 13:20:30,488][00159] Waiting for process rollout_proc1 to join... -[2024-06-06 13:20:30,874][00159] Waiting for process rollout_proc2 to join... -[2024-06-06 13:20:30,880][00159] Waiting for process rollout_proc3 to join... -[2024-06-06 13:20:30,887][00159] Waiting for process rollout_proc4 to join... -[2024-06-06 13:20:30,890][00159] Waiting for process rollout_proc5 to join... -[2024-06-06 13:20:30,895][00159] Waiting for process rollout_proc6 to join... -[2024-06-06 13:20:30,900][00159] Waiting for process rollout_proc7 to join... -[2024-06-06 13:20:30,904][00159] Batcher 0 profile tree view: -batching: 28.6482, releasing_batches: 0.0428 -[2024-06-06 13:20:30,908][00159] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0052 - wait_policy_total: 537.9611 -update_model: 14.5551 - weight_update: 0.0020 -one_step: 0.0044 - handle_policy_step: 905.5214 - deserialize: 24.6233, stack: 5.4127, obs_to_device_normalize: 184.4027, forward: 492.5886, send_messages: 44.2074 - prepare_outputs: 110.5839 - to_cpu: 57.7050 -[2024-06-06 13:20:30,910][00159] Learner 0 profile tree view: -misc: 0.0057, prepare_batch: 14.9343 -train: 78.1430 - epoch_init: 0.0063, minibatch_init: 0.0123, losses_postprocess: 0.7220, kl_divergence: 0.8606, after_optimizer: 34.9406 - calculate_losses: 28.3048 - losses_init: 0.0124, forward_head: 1.6688, bptt_initial: 17.7691, tail: 1.4542, advantages_returns: 0.3196, losses: 3.9017 - bptt: 2.6576 - bptt_forward_core: 2.5458 - update: 12.4541 - clip: 1.1720 -[2024-06-06 13:20:30,911][00159] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.4295, enqueue_policy_requests: 167.0857, env_step: 1167.1406, overhead: 25.7791, complete_rollouts: 9.5851 -save_policy_outputs: 32.6104 - split_output_tensors: 13.1462 -[2024-06-06 13:20:30,913][00159] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.5461, enqueue_policy_requests: 171.5998, env_step: 1167.8541, overhead: 24.8985, complete_rollouts: 9.0936 -save_policy_outputs: 30.4404 - split_output_tensors: 12.0340 -[2024-06-06 13:20:30,914][00159] Loop Runner_EvtLoop terminating... -[2024-06-06 13:20:30,916][00159] Runner profile tree view: -main_loop: 1553.8885 -[2024-06-06 13:20:30,917][00159] Collected {0: 4005888}, FPS: 2578.0 -[2024-06-06 13:20:32,266][00159] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-06-06 13:20:32,268][00159] Overriding arg 'num_workers' with value 1 passed from command line -[2024-06-06 13:20:32,271][00159] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-06-06 13:20:32,273][00159] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-06-06 13:20:32,274][00159] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 13:20:32,276][00159] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-06-06 13:20:32,278][00159] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 13:20:32,279][00159] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-06-06 13:20:32,280][00159] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-06-06 13:20:32,281][00159] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-06-06 13:20:32,282][00159] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-06-06 13:20:32,283][00159] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-06-06 13:20:32,284][00159] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-06-06 13:20:32,285][00159] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-06-06 13:20:32,286][00159] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-06-06 13:20:32,326][00159] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:20:32,331][00159] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 13:20:32,333][00159] RunningMeanStd input shape: (1,) -[2024-06-06 13:20:32,350][00159] ConvEncoder: input_channels=3 -[2024-06-06 13:20:32,462][00159] Conv encoder output size: 512 -[2024-06-06 13:20:32,464][00159] Policy head output size: 512 -[2024-06-06 13:20:32,760][00159] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-06-06 13:20:33,619][00159] Num frames 100... -[2024-06-06 13:20:33,765][00159] Num frames 200... -[2024-06-06 13:20:33,922][00159] Num frames 300... -[2024-06-06 13:20:34,074][00159] Num frames 400... -[2024-06-06 13:20:34,223][00159] Num frames 500... -[2024-06-06 13:20:34,379][00159] Num frames 600... -[2024-06-06 13:20:34,445][00159] Avg episode rewards: #0: 13.050, true rewards: #0: 6.050 -[2024-06-06 13:20:34,448][00159] Avg episode reward: 13.050, avg true_objective: 6.050 -[2024-06-06 13:20:34,589][00159] Num frames 700... -[2024-06-06 13:20:34,741][00159] Num frames 800... -[2024-06-06 13:20:34,885][00159] Num frames 900... -[2024-06-06 13:20:35,031][00159] Num frames 1000... -[2024-06-06 13:20:35,175][00159] Num frames 1100... -[2024-06-06 13:20:35,335][00159] Num frames 1200... -[2024-06-06 13:20:35,480][00159] Num frames 1300... -[2024-06-06 13:20:35,627][00159] Num frames 1400... -[2024-06-06 13:20:35,771][00159] Num frames 1500... -[2024-06-06 13:20:35,940][00159] Avg episode rewards: #0: 20.365, true rewards: #0: 7.865 -[2024-06-06 13:20:35,942][00159] Avg episode reward: 20.365, avg true_objective: 7.865 -[2024-06-06 13:20:35,986][00159] Num frames 1600... -[2024-06-06 13:20:36,137][00159] Num frames 1700... -[2024-06-06 13:20:36,293][00159] Num frames 1800... -[2024-06-06 13:20:36,443][00159] Num frames 1900... -[2024-06-06 13:20:36,586][00159] Num frames 2000... -[2024-06-06 13:20:36,738][00159] Num frames 2100... -[2024-06-06 13:20:36,885][00159] Num frames 2200... -[2024-06-06 13:20:37,031][00159] Num frames 2300... -[2024-06-06 13:20:37,177][00159] Num frames 2400... -[2024-06-06 13:20:37,333][00159] Num frames 2500... -[2024-06-06 13:20:37,441][00159] Avg episode rewards: #0: 20.443, true rewards: #0: 8.443 -[2024-06-06 13:20:37,443][00159] Avg episode reward: 20.443, avg true_objective: 8.443 -[2024-06-06 13:20:37,540][00159] Num frames 2600... -[2024-06-06 13:20:37,696][00159] Num frames 2700... -[2024-06-06 13:20:37,848][00159] Num frames 2800... -[2024-06-06 13:20:37,995][00159] Num frames 2900... -[2024-06-06 13:20:38,141][00159] Num frames 3000... -[2024-06-06 13:20:38,290][00159] Num frames 3100... -[2024-06-06 13:20:38,448][00159] Num frames 3200... -[2024-06-06 13:20:38,601][00159] Num frames 3300... -[2024-06-06 13:20:38,757][00159] Num frames 3400... -[2024-06-06 13:20:38,910][00159] Num frames 3500... -[2024-06-06 13:20:39,059][00159] Num frames 3600... -[2024-06-06 13:20:39,209][00159] Num frames 3700... -[2024-06-06 13:20:39,362][00159] Num frames 3800... -[2024-06-06 13:20:39,529][00159] Num frames 3900... -[2024-06-06 13:20:39,680][00159] Num frames 4000... -[2024-06-06 13:20:39,827][00159] Num frames 4100... -[2024-06-06 13:20:39,986][00159] Num frames 4200... -[2024-06-06 13:20:40,141][00159] Num frames 4300... -[2024-06-06 13:20:40,328][00159] Avg episode rewards: #0: 27.722, true rewards: #0: 10.972 -[2024-06-06 13:20:40,331][00159] Avg episode reward: 27.722, avg true_objective: 10.972 -[2024-06-06 13:20:40,349][00159] Num frames 4400... -[2024-06-06 13:20:40,500][00159] Num frames 4500... -[2024-06-06 13:20:40,647][00159] Num frames 4600... -[2024-06-06 13:20:40,795][00159] Num frames 4700... -[2024-06-06 13:20:40,942][00159] Num frames 4800... -[2024-06-06 13:20:41,137][00159] Num frames 4900... -[2024-06-06 13:20:41,357][00159] Num frames 5000... -[2024-06-06 13:20:41,569][00159] Num frames 5100... -[2024-06-06 13:20:41,777][00159] Num frames 5200... -[2024-06-06 13:20:42,002][00159] Num frames 5300... -[2024-06-06 13:20:42,210][00159] Num frames 5400... -[2024-06-06 13:20:42,421][00159] Num frames 5500... -[2024-06-06 13:20:42,657][00159] Num frames 5600... -[2024-06-06 13:20:42,875][00159] Num frames 5700... -[2024-06-06 13:20:43,084][00159] Num frames 5800... -[2024-06-06 13:20:43,294][00159] Num frames 5900... -[2024-06-06 13:20:43,524][00159] Avg episode rewards: #0: 29.756, true rewards: #0: 11.956 -[2024-06-06 13:20:43,526][00159] Avg episode reward: 29.756, avg true_objective: 11.956 -[2024-06-06 13:20:43,579][00159] Num frames 6000... -[2024-06-06 13:20:43,808][00159] Num frames 6100... -[2024-06-06 13:20:44,003][00159] Num frames 6200... -[2024-06-06 13:20:44,148][00159] Num frames 6300... -[2024-06-06 13:20:44,293][00159] Num frames 6400... -[2024-06-06 13:20:44,441][00159] Num frames 6500... -[2024-06-06 13:20:44,598][00159] Num frames 6600... -[2024-06-06 13:20:44,747][00159] Num frames 6700... -[2024-06-06 13:20:44,897][00159] Num frames 6800... -[2024-06-06 13:20:45,045][00159] Num frames 6900... -[2024-06-06 13:20:45,206][00159] Num frames 7000... -[2024-06-06 13:20:45,368][00159] Num frames 7100... -[2024-06-06 13:20:45,532][00159] Num frames 7200... -[2024-06-06 13:20:45,721][00159] Avg episode rewards: #0: 29.625, true rewards: #0: 12.125 -[2024-06-06 13:20:45,723][00159] Avg episode reward: 29.625, avg true_objective: 12.125 -[2024-06-06 13:20:45,767][00159] Num frames 7300... -[2024-06-06 13:20:45,923][00159] Num frames 7400... -[2024-06-06 13:20:46,071][00159] Num frames 7500... -[2024-06-06 13:20:46,221][00159] Num frames 7600... -[2024-06-06 13:20:46,374][00159] Num frames 7700... -[2024-06-06 13:20:46,522][00159] Num frames 7800... -[2024-06-06 13:20:46,682][00159] Num frames 7900... -[2024-06-06 13:20:46,833][00159] Num frames 8000... -[2024-06-06 13:20:46,986][00159] Num frames 8100... -[2024-06-06 13:20:47,055][00159] Avg episode rewards: #0: 28.153, true rewards: #0: 11.581 -[2024-06-06 13:20:47,057][00159] Avg episode reward: 28.153, avg true_objective: 11.581 -[2024-06-06 13:20:47,199][00159] Num frames 8200... -[2024-06-06 13:20:47,355][00159] Num frames 8300... -[2024-06-06 13:20:47,510][00159] Num frames 8400... -[2024-06-06 13:20:47,675][00159] Num frames 8500... -[2024-06-06 13:20:47,822][00159] Num frames 8600... -[2024-06-06 13:20:47,976][00159] Num frames 8700... -[2024-06-06 13:20:48,122][00159] Num frames 8800... -[2024-06-06 13:20:48,270][00159] Num frames 8900... -[2024-06-06 13:20:48,423][00159] Num frames 9000... -[2024-06-06 13:20:48,576][00159] Num frames 9100... -[2024-06-06 13:20:48,730][00159] Num frames 9200... -[2024-06-06 13:20:48,878][00159] Num frames 9300... -[2024-06-06 13:20:49,029][00159] Num frames 9400... -[2024-06-06 13:20:49,182][00159] Num frames 9500... -[2024-06-06 13:20:49,332][00159] Num frames 9600... -[2024-06-06 13:20:49,486][00159] Num frames 9700... -[2024-06-06 13:20:49,602][00159] Avg episode rewards: #0: 29.799, true rewards: #0: 12.174 -[2024-06-06 13:20:49,604][00159] Avg episode reward: 29.799, avg true_objective: 12.174 -[2024-06-06 13:20:49,709][00159] Num frames 9800... -[2024-06-06 13:20:49,854][00159] Num frames 9900... -[2024-06-06 13:20:49,999][00159] Num frames 10000... -[2024-06-06 13:20:50,148][00159] Num frames 10100... -[2024-06-06 13:20:50,290][00159] Num frames 10200... -[2024-06-06 13:20:50,436][00159] Num frames 10300... -[2024-06-06 13:20:50,596][00159] Num frames 10400... -[2024-06-06 13:20:50,755][00159] Num frames 10500... -[2024-06-06 13:20:50,902][00159] Num frames 10600... -[2024-06-06 13:20:51,052][00159] Num frames 10700... -[2024-06-06 13:20:51,202][00159] Num frames 10800... -[2024-06-06 13:20:51,348][00159] Num frames 10900... -[2024-06-06 13:20:51,499][00159] Num frames 11000... -[2024-06-06 13:20:51,647][00159] Num frames 11100... -[2024-06-06 13:20:51,807][00159] Num frames 11200... -[2024-06-06 13:20:51,931][00159] Avg episode rewards: #0: 30.381, true rewards: #0: 12.492 -[2024-06-06 13:20:51,933][00159] Avg episode reward: 30.381, avg true_objective: 12.492 -[2024-06-06 13:20:52,030][00159] Num frames 11300... -[2024-06-06 13:20:52,178][00159] Num frames 11400... -[2024-06-06 13:20:52,326][00159] Num frames 11500... -[2024-06-06 13:20:52,474][00159] Num frames 11600... -[2024-06-06 13:20:52,619][00159] Num frames 11700... -[2024-06-06 13:20:52,772][00159] Num frames 11800... -[2024-06-06 13:20:52,930][00159] Num frames 11900... -[2024-06-06 13:20:53,079][00159] Num frames 12000... -[2024-06-06 13:20:53,230][00159] Num frames 12100... -[2024-06-06 13:20:53,384][00159] Num frames 12200... -[2024-06-06 13:20:53,530][00159] Num frames 12300... -[2024-06-06 13:20:53,689][00159] Num frames 12400... -[2024-06-06 13:20:53,849][00159] Num frames 12500... -[2024-06-06 13:20:54,050][00159] Avg episode rewards: #0: 30.687, true rewards: #0: 12.587 -[2024-06-06 13:20:54,053][00159] Avg episode reward: 30.687, avg true_objective: 12.587 -[2024-06-06 13:22:15,894][00159] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-06-06 13:31:43,856][00159] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-06-06 13:31:43,859][00159] Overriding arg 'num_workers' with value 1 passed from command line -[2024-06-06 13:31:43,863][00159] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-06-06 13:31:43,864][00159] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-06-06 13:31:43,868][00159] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 13:31:43,870][00159] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-06-06 13:31:43,871][00159] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-06-06 13:31:43,872][00159] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-06-06 13:31:43,873][00159] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-06-06 13:31:43,874][00159] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-06-06 13:31:43,875][00159] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-06-06 13:31:43,877][00159] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-06-06 13:31:43,878][00159] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-06-06 13:31:43,880][00159] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-06-06 13:31:43,882][00159] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-06-06 13:31:43,932][00159] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 13:31:43,935][00159] RunningMeanStd input shape: (1,) -[2024-06-06 13:31:43,953][00159] ConvEncoder: input_channels=3 -[2024-06-06 13:31:44,026][00159] Conv encoder output size: 512 -[2024-06-06 13:31:44,028][00159] Policy head output size: 512 -[2024-06-06 13:31:44,058][00159] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-06-06 13:31:45,044][00159] Num frames 100... -[2024-06-06 13:31:45,284][00159] Num frames 200... -[2024-06-06 13:31:45,506][00159] Num frames 300... -[2024-06-06 13:31:45,732][00159] Num frames 400... -[2024-06-06 13:31:45,939][00159] Num frames 500... -[2024-06-06 13:31:46,186][00159] Num frames 600... -[2024-06-06 13:31:46,402][00159] Num frames 700... -[2024-06-06 13:31:46,558][00159] Avg episode rewards: #0: 15.680, true rewards: #0: 7.680 -[2024-06-06 13:31:46,560][00159] Avg episode reward: 15.680, avg true_objective: 7.680 -[2024-06-06 13:31:46,612][00159] Num frames 800... -[2024-06-06 13:31:46,769][00159] Num frames 900... -[2024-06-06 13:31:46,921][00159] Num frames 1000... -[2024-06-06 13:31:47,072][00159] Num frames 1100... -[2024-06-06 13:31:47,231][00159] Num frames 1200... -[2024-06-06 13:31:47,385][00159] Num frames 1300... -[2024-06-06 13:31:47,540][00159] Num frames 1400... -[2024-06-06 13:31:47,700][00159] Num frames 1500... -[2024-06-06 13:31:47,858][00159] Num frames 1600... -[2024-06-06 13:31:48,005][00159] Num frames 1700... -[2024-06-06 13:31:48,197][00159] Avg episode rewards: #0: 19.960, true rewards: #0: 8.960 -[2024-06-06 13:31:48,199][00159] Avg episode reward: 19.960, avg true_objective: 8.960 -[2024-06-06 13:31:48,226][00159] Num frames 1800... -[2024-06-06 13:31:48,382][00159] Num frames 1900... -[2024-06-06 13:31:48,536][00159] Num frames 2000... -[2024-06-06 13:31:48,685][00159] Num frames 2100... -[2024-06-06 13:31:48,832][00159] Num frames 2200... -[2024-06-06 13:31:48,977][00159] Num frames 2300... -[2024-06-06 13:31:49,124][00159] Num frames 2400... -[2024-06-06 13:31:49,276][00159] Avg episode rewards: #0: 18.213, true rewards: #0: 8.213 -[2024-06-06 13:31:49,278][00159] Avg episode reward: 18.213, avg true_objective: 8.213 -[2024-06-06 13:31:49,335][00159] Num frames 2500... -[2024-06-06 13:31:49,487][00159] Num frames 2600... -[2024-06-06 13:31:49,637][00159] Num frames 2700... -[2024-06-06 13:31:49,790][00159] Num frames 2800... -[2024-06-06 13:31:49,943][00159] Num frames 2900... -[2024-06-06 13:31:50,065][00159] Avg episode rewards: #0: 15.610, true rewards: #0: 7.360 -[2024-06-06 13:31:50,067][00159] Avg episode reward: 15.610, avg true_objective: 7.360 -[2024-06-06 13:31:50,157][00159] Num frames 3000... -[2024-06-06 13:31:50,314][00159] Num frames 3100... -[2024-06-06 13:31:50,467][00159] Num frames 3200... -[2024-06-06 13:31:50,619][00159] Num frames 3300... -[2024-06-06 13:31:50,770][00159] Num frames 3400... -[2024-06-06 13:31:50,915][00159] Num frames 3500... -[2024-06-06 13:31:51,067][00159] Num frames 3600... -[2024-06-06 13:31:51,222][00159] Num frames 3700... -[2024-06-06 13:31:51,386][00159] Num frames 3800... -[2024-06-06 13:31:51,537][00159] Num frames 3900... -[2024-06-06 13:31:51,689][00159] Num frames 4000... -[2024-06-06 13:31:51,839][00159] Num frames 4100... -[2024-06-06 13:31:51,984][00159] Num frames 4200... -[2024-06-06 13:31:52,139][00159] Num frames 4300... -[2024-06-06 13:31:52,288][00159] Num frames 4400... -[2024-06-06 13:31:52,448][00159] Num frames 4500... -[2024-06-06 13:31:52,591][00159] Avg episode rewards: #0: 21.318, true rewards: #0: 9.118 -[2024-06-06 13:31:52,593][00159] Avg episode reward: 21.318, avg true_objective: 9.118 -[2024-06-06 13:31:52,657][00159] Num frames 4600... -[2024-06-06 13:31:52,803][00159] Num frames 4700... -[2024-06-06 13:31:52,955][00159] Num frames 4800... -[2024-06-06 13:31:53,103][00159] Num frames 4900... -[2024-06-06 13:31:53,253][00159] Num frames 5000... -[2024-06-06 13:31:53,411][00159] Num frames 5100... -[2024-06-06 13:31:53,563][00159] Num frames 5200... -[2024-06-06 13:31:53,719][00159] Num frames 5300... -[2024-06-06 13:31:53,835][00159] Avg episode rewards: #0: 20.208, true rewards: #0: 8.875 -[2024-06-06 13:31:53,838][00159] Avg episode reward: 20.208, avg true_objective: 8.875 -[2024-06-06 13:31:54,165][00159] Num frames 5400... -[2024-06-06 13:31:54,319][00159] Num frames 5500... -[2024-06-06 13:31:54,489][00159] Num frames 5600... -[2024-06-06 13:31:54,641][00159] Num frames 5700... -[2024-06-06 13:31:54,800][00159] Num frames 5800... -[2024-06-06 13:31:54,954][00159] Num frames 5900... -[2024-06-06 13:31:55,102][00159] Num frames 6000... -[2024-06-06 13:31:55,252][00159] Num frames 6100... -[2024-06-06 13:31:55,416][00159] Num frames 6200... -[2024-06-06 13:31:55,572][00159] Num frames 6300... -[2024-06-06 13:31:55,724][00159] Num frames 6400... -[2024-06-06 13:31:55,877][00159] Num frames 6500... -[2024-06-06 13:31:56,024][00159] Num frames 6600... -[2024-06-06 13:31:56,182][00159] Num frames 6700... -[2024-06-06 13:31:56,334][00159] Num frames 6800... -[2024-06-06 13:31:56,439][00159] Avg episode rewards: #0: 22.327, true rewards: #0: 9.756 -[2024-06-06 13:31:56,442][00159] Avg episode reward: 22.327, avg true_objective: 9.756 -[2024-06-06 13:31:56,595][00159] Num frames 6900... -[2024-06-06 13:31:56,818][00159] Num frames 7000... -[2024-06-06 13:31:57,034][00159] Num frames 7100... -[2024-06-06 13:31:57,251][00159] Num frames 7200... -[2024-06-06 13:31:57,481][00159] Avg episode rewards: #0: 20.346, true rewards: #0: 9.096 -[2024-06-06 13:31:57,483][00159] Avg episode reward: 20.346, avg true_objective: 9.096 -[2024-06-06 13:31:57,539][00159] Num frames 7300... -[2024-06-06 13:31:57,780][00159] Num frames 7400... -[2024-06-06 13:31:58,009][00159] Num frames 7500... -[2024-06-06 13:31:58,239][00159] Num frames 7600... -[2024-06-06 13:31:58,479][00159] Num frames 7700... -[2024-06-06 13:31:58,723][00159] Num frames 7800... -[2024-06-06 13:31:58,951][00159] Num frames 7900... -[2024-06-06 13:31:59,173][00159] Num frames 8000... -[2024-06-06 13:31:59,428][00159] Num frames 8100... -[2024-06-06 13:31:59,651][00159] Num frames 8200... -[2024-06-06 13:31:59,887][00159] Num frames 8300... -[2024-06-06 13:32:00,049][00159] Num frames 8400... -[2024-06-06 13:32:00,195][00159] Num frames 8500... -[2024-06-06 13:32:00,339][00159] Num frames 8600... -[2024-06-06 13:32:00,484][00159] Num frames 8700... -[2024-06-06 13:32:00,651][00159] Num frames 8800... -[2024-06-06 13:32:00,803][00159] Num frames 8900... -[2024-06-06 13:32:00,952][00159] Num frames 9000... -[2024-06-06 13:32:01,105][00159] Num frames 9100... -[2024-06-06 13:32:01,192][00159] Avg episode rewards: #0: 23.574, true rewards: #0: 10.130 -[2024-06-06 13:32:01,193][00159] Avg episode reward: 23.574, avg true_objective: 10.130 -[2024-06-06 13:32:01,318][00159] Num frames 9200... -[2024-06-06 13:32:01,474][00159] Num frames 9300... -[2024-06-06 13:32:01,629][00159] Num frames 9400... -[2024-06-06 13:32:01,783][00159] Num frames 9500... -[2024-06-06 13:32:01,934][00159] Num frames 9600... -[2024-06-06 13:32:02,036][00159] Avg episode rewards: #0: 22.029, true rewards: #0: 9.629 -[2024-06-06 13:32:02,037][00159] Avg episode reward: 22.029, avg true_objective: 9.629 -[2024-06-06 13:33:05,226][00159] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-06-06 13:33:10,908][00159] The model has been pushed to https://huggingface.co/swritchie/rl_course_vizdoom_health_gathering_supreme -[2024-06-06 13:37:26,950][20018] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-06-06 13:37:26,956][20018] Rollout worker 0 uses device cpu -[2024-06-06 13:37:26,960][20018] Rollout worker 1 uses device cpu -[2024-06-06 13:37:26,965][20018] Rollout worker 2 uses device cpu -[2024-06-06 13:37:26,969][20018] Rollout worker 3 uses device cpu -[2024-06-06 13:37:26,986][20018] Rollout worker 4 uses device cpu -[2024-06-06 13:37:26,987][20018] Rollout worker 5 uses device cpu -[2024-06-06 13:37:26,989][20018] Rollout worker 6 uses device cpu -[2024-06-06 13:37:26,991][20018] Rollout worker 7 uses device cpu -[2024-06-06 13:37:27,229][20018] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 13:37:27,232][20018] InferenceWorker_p0-w0: min num requests: 2 -[2024-06-06 13:37:27,303][20018] Starting all processes... -[2024-06-06 13:37:27,309][20018] Starting process learner_proc0 -[2024-06-06 13:37:31,191][20018] Starting all processes... -[2024-06-06 13:37:31,231][20018] Starting process inference_proc0-0 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc0 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc1 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc2 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc3 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc4 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc5 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc6 -[2024-06-06 13:37:31,232][20018] Starting process rollout_proc7 -[2024-06-06 13:37:48,597][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 13:37:48,602][20333] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-06-06 13:37:48,649][20354] Worker 3 uses CPU cores [1] -[2024-06-06 13:37:48,675][20333] Num visible devices: 1 -[2024-06-06 13:37:48,702][20355] Worker 4 uses CPU cores [0] -[2024-06-06 13:37:48,712][20333] Starting seed is not provided -[2024-06-06 13:37:48,713][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 13:37:48,714][20333] Initializing actor-critic model on device cuda:0 -[2024-06-06 13:37:48,715][20333] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 13:37:48,717][20333] RunningMeanStd input shape: (1,) -[2024-06-06 13:37:48,730][20018] Heartbeat connected on Batcher_0 -[2024-06-06 13:37:48,744][20353] Worker 2 uses CPU cores [0] -[2024-06-06 13:37:48,775][20333] ConvEncoder: input_channels=3 -[2024-06-06 13:37:48,789][20018] Heartbeat connected on RolloutWorker_w3 -[2024-06-06 13:37:48,861][20018] Heartbeat connected on RolloutWorker_w4 -[2024-06-06 13:37:48,874][20358] Worker 7 uses CPU cores [1] -[2024-06-06 13:37:48,884][20018] Heartbeat connected on RolloutWorker_w2 -[2024-06-06 13:37:48,920][20351] Worker 0 uses CPU cores [0] -[2024-06-06 13:37:48,953][20352] Worker 1 uses CPU cores [1] -[2024-06-06 13:37:48,988][20018] Heartbeat connected on RolloutWorker_w7 -[2024-06-06 13:37:48,999][20018] Heartbeat connected on RolloutWorker_w0 -[2024-06-06 13:37:49,043][20350] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 13:37:49,043][20350] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-06-06 13:37:49,057][20018] Heartbeat connected on RolloutWorker_w1 -[2024-06-06 13:37:49,091][20350] Num visible devices: 1 -[2024-06-06 13:37:49,115][20018] Heartbeat connected on InferenceWorker_p0-w0 -[2024-06-06 13:37:49,120][20356] Worker 5 uses CPU cores [1] -[2024-06-06 13:37:49,142][20357] Worker 6 uses CPU cores [0] -[2024-06-06 13:37:49,155][20018] Heartbeat connected on RolloutWorker_w6 -[2024-06-06 13:37:49,163][20018] Heartbeat connected on RolloutWorker_w5 -[2024-06-06 13:37:49,197][20333] Conv encoder output size: 512 -[2024-06-06 13:37:49,198][20333] Policy head output size: 512 -[2024-06-06 13:37:49,217][20333] Created Actor Critic model with architecture: -[2024-06-06 13:37:49,217][20333] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2024-06-06 13:37:49,491][20333] Using optimizer -[2024-06-06 13:37:50,572][20333] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-06-06 13:37:50,607][20333] Loading model from checkpoint -[2024-06-06 13:37:50,609][20333] Loaded experiment state at self.train_step=978, self.env_steps=4005888 -[2024-06-06 13:37:50,610][20333] Initialized policy 0 weights for model version 978 -[2024-06-06 13:37:50,613][20333] LearnerWorker_p0 finished initialization! -[2024-06-06 13:37:50,614][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-06-06 13:37:50,614][20018] Heartbeat connected on LearnerWorker_p0 -[2024-06-06 13:37:50,832][20350] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 13:37:50,833][20350] RunningMeanStd input shape: (1,) -[2024-06-06 13:37:50,846][20350] ConvEncoder: input_channels=3 -[2024-06-06 13:37:50,962][20350] Conv encoder output size: 512 -[2024-06-06 13:37:50,962][20350] Policy head output size: 512 -[2024-06-06 13:37:51,022][20018] Inference worker 0-0 is ready! -[2024-06-06 13:37:51,024][20018] All inference workers are ready! Signal rollout workers to start! -[2024-06-06 13:37:51,270][20358] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,274][20351] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,274][20352] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,276][20355] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,272][20354] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,278][20353] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,273][20357] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:51,276][20356] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 13:37:52,359][20356] Decorrelating experience for 0 frames... -[2024-06-06 13:37:52,364][20358] Decorrelating experience for 0 frames... -[2024-06-06 13:37:52,797][20358] Decorrelating experience for 32 frames... -[2024-06-06 13:37:53,078][20357] Decorrelating experience for 0 frames... -[2024-06-06 13:37:53,082][20355] Decorrelating experience for 0 frames... -[2024-06-06 13:37:53,088][20353] Decorrelating experience for 0 frames... -[2024-06-06 13:37:53,090][20351] Decorrelating experience for 0 frames... -[2024-06-06 13:37:53,819][20358] Decorrelating experience for 64 frames... -[2024-06-06 13:37:54,003][20018] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 13:37:54,190][20356] Decorrelating experience for 32 frames... -[2024-06-06 13:37:54,250][20354] Decorrelating experience for 0 frames... -[2024-06-06 13:37:54,478][20353] Decorrelating experience for 32 frames... -[2024-06-06 13:37:54,479][20355] Decorrelating experience for 32 frames... -[2024-06-06 13:37:54,493][20357] Decorrelating experience for 32 frames... -[2024-06-06 13:37:55,253][20351] Decorrelating experience for 32 frames... -[2024-06-06 13:37:55,593][20358] Decorrelating experience for 96 frames... -[2024-06-06 13:37:55,611][20354] Decorrelating experience for 32 frames... -[2024-06-06 13:37:56,227][20352] Decorrelating experience for 0 frames... -[2024-06-06 13:37:56,660][20355] Decorrelating experience for 64 frames... -[2024-06-06 13:37:58,247][20356] Decorrelating experience for 64 frames... -[2024-06-06 13:37:58,379][20351] Decorrelating experience for 64 frames... -[2024-06-06 13:37:59,003][20018] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 1.2. Samples: 6. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 13:37:59,053][20357] Decorrelating experience for 64 frames... -[2024-06-06 13:37:59,260][20352] Decorrelating experience for 32 frames... -[2024-06-06 13:37:59,392][20353] Decorrelating experience for 64 frames... -[2024-06-06 13:37:59,600][20355] Decorrelating experience for 96 frames... -[2024-06-06 13:38:01,376][20356] Decorrelating experience for 96 frames... -[2024-06-06 13:38:01,481][20351] Decorrelating experience for 96 frames... -[2024-06-06 13:38:02,211][20354] Decorrelating experience for 64 frames... -[2024-06-06 13:38:02,733][20353] Decorrelating experience for 96 frames... -[2024-06-06 13:38:03,101][20352] Decorrelating experience for 64 frames... -[2024-06-06 13:38:04,003][20018] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 108.8. Samples: 1088. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-06-06 13:38:04,006][20018] Avg episode reward: [(0, '2.592')] -[2024-06-06 13:38:06,445][20354] Decorrelating experience for 96 frames... -[2024-06-06 13:38:06,576][20357] Decorrelating experience for 96 frames... -[2024-06-06 13:38:07,122][20333] Signal inference workers to stop experience collection... -[2024-06-06 13:38:07,142][20350] InferenceWorker_p0-w0: stopping experience collection -[2024-06-06 13:38:07,253][20352] Decorrelating experience for 96 frames... -[2024-06-06 13:38:08,378][20333] Signal inference workers to resume experience collection... -[2024-06-06 13:38:08,380][20350] InferenceWorker_p0-w0: resuming experience collection -[2024-06-06 13:38:09,003][20018] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4009984. Throughput: 0: 128.0. Samples: 1920. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2024-06-06 13:38:09,009][20018] Avg episode reward: [(0, '2.427')] -[2024-06-06 13:38:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 4026368. Throughput: 0: 251.3. Samples: 5026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:38:14,009][20018] Avg episode reward: [(0, '7.512')] -[2024-06-06 13:38:19,003][20018] Fps is (10 sec: 2457.4, 60 sec: 1146.8, 300 sec: 1146.8). Total num frames: 4034560. Throughput: 0: 318.0. Samples: 7950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:38:19,006][20018] Avg episode reward: [(0, '8.834')] -[2024-06-06 13:38:24,003][20018] Fps is (10 sec: 1638.4, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 4042752. Throughput: 0: 306.0. Samples: 9180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:38:24,005][20018] Avg episode reward: [(0, '10.417')] -[2024-06-06 13:38:24,951][20350] Updated weights for policy 0, policy_version 988 (0.0046) -[2024-06-06 13:38:29,003][20018] Fps is (10 sec: 2457.7, 60 sec: 1521.4, 300 sec: 1521.4). Total num frames: 4059136. Throughput: 0: 354.3. Samples: 12402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:38:29,005][20018] Avg episode reward: [(0, '17.916')] -[2024-06-06 13:38:34,003][20018] Fps is (10 sec: 3276.8, 60 sec: 1740.8, 300 sec: 1740.8). Total num frames: 4075520. Throughput: 0: 433.9. Samples: 17354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:38:34,005][20018] Avg episode reward: [(0, '25.079')] -[2024-06-06 13:38:38,948][20350] Updated weights for policy 0, policy_version 998 (0.0036) -[2024-06-06 13:38:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 1820.4, 300 sec: 1820.4). Total num frames: 4087808. Throughput: 0: 433.9. Samples: 19524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:38:39,010][20018] Avg episode reward: [(0, '24.380')] -[2024-06-06 13:38:44,003][20018] Fps is (10 sec: 2048.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 4096000. Throughput: 0: 502.3. Samples: 22608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:38:44,012][20018] Avg episode reward: [(0, '24.183')] -[2024-06-06 13:38:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2010.8, 300 sec: 2010.8). Total num frames: 4116480. Throughput: 0: 585.4. Samples: 27430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:38:49,005][20018] Avg episode reward: [(0, '24.553')] -[2024-06-06 13:38:52,696][20350] Updated weights for policy 0, policy_version 1008 (0.0027) -[2024-06-06 13:38:54,005][20018] Fps is (10 sec: 3276.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4128768. Throughput: 0: 624.5. Samples: 30022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:38:54,008][20018] Avg episode reward: [(0, '24.900')] -[2024-06-06 13:38:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2184.5, 300 sec: 2016.5). Total num frames: 4136960. Throughput: 0: 620.7. Samples: 32958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:38:59,008][20018] Avg episode reward: [(0, '26.942')] -[2024-06-06 13:39:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2106.5). Total num frames: 4153344. Throughput: 0: 640.6. Samples: 36776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:39:04,010][20018] Avg episode reward: [(0, '27.074')] -[2024-06-06 13:39:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2075.3). Total num frames: 4161536. Throughput: 0: 666.5. Samples: 39172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-06-06 13:39:09,009][20018] Avg episode reward: [(0, '26.819')] -[2024-06-06 13:39:12,170][20350] Updated weights for policy 0, policy_version 1018 (0.0046) -[2024-06-06 13:39:14,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2389.3, 300 sec: 2048.0). Total num frames: 4169728. Throughput: 0: 648.0. Samples: 41562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:39:14,008][20018] Avg episode reward: [(0, '25.730')] -[2024-06-06 13:39:19,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2389.4, 300 sec: 2023.9). Total num frames: 4177920. Throughput: 0: 588.7. Samples: 43846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:39:19,011][20018] Avg episode reward: [(0, '25.004')] -[2024-06-06 13:39:19,024][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001020_4177920.pth... -[2024-06-06 13:39:19,182][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000902_3694592.pth -[2024-06-06 13:39:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2093.5). Total num frames: 4194304. Throughput: 0: 590.5. Samples: 46098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:39:24,005][20018] Avg episode reward: [(0, '25.016')] -[2024-06-06 13:39:27,843][20350] Updated weights for policy 0, policy_version 1028 (0.0036) -[2024-06-06 13:39:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2525.9, 300 sec: 2155.8). Total num frames: 4210688. Throughput: 0: 634.8. Samples: 51176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:39:29,010][20018] Avg episode reward: [(0, '24.625')] -[2024-06-06 13:39:34,008][20018] Fps is (10 sec: 2865.6, 60 sec: 2457.4, 300 sec: 2170.8). Total num frames: 4222976. Throughput: 0: 609.7. Samples: 54870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:39:34,011][20018] Avg episode reward: [(0, '23.834')] -[2024-06-06 13:39:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2184.5). Total num frames: 4235264. Throughput: 0: 587.4. Samples: 56456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:39:39,011][20018] Avg episode reward: [(0, '22.416')] -[2024-06-06 13:39:43,121][20350] Updated weights for policy 0, policy_version 1038 (0.0018) -[2024-06-06 13:39:44,003][20018] Fps is (10 sec: 2868.8, 60 sec: 2594.1, 300 sec: 2234.2). Total num frames: 4251648. Throughput: 0: 627.0. Samples: 61174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:39:44,005][20018] Avg episode reward: [(0, '23.734')] -[2024-06-06 13:39:49,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2525.8, 300 sec: 2279.5). Total num frames: 4268032. Throughput: 0: 649.0. Samples: 65984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:39:49,009][20018] Avg episode reward: [(0, '23.523')] -[2024-06-06 13:39:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2252.8). Total num frames: 4276224. Throughput: 0: 630.6. Samples: 67550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:39:54,005][20018] Avg episode reward: [(0, '23.176')] -[2024-06-06 13:39:58,050][20350] Updated weights for policy 0, policy_version 1048 (0.0022) -[2024-06-06 13:39:59,003][20018] Fps is (10 sec: 2458.1, 60 sec: 2594.1, 300 sec: 2293.8). Total num frames: 4292608. Throughput: 0: 659.5. Samples: 71238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:39:59,010][20018] Avg episode reward: [(0, '22.383')] -[2024-06-06 13:40:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2331.6). Total num frames: 4308992. Throughput: 0: 722.6. Samples: 76364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:40:04,005][20018] Avg episode reward: [(0, '22.037')] -[2024-06-06 13:40:09,006][20018] Fps is (10 sec: 2866.3, 60 sec: 2662.3, 300 sec: 2336.2). Total num frames: 4321280. Throughput: 0: 724.4. Samples: 78698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:40:09,008][20018] Avg episode reward: [(0, '22.129')] -[2024-06-06 13:40:13,300][20350] Updated weights for policy 0, policy_version 1058 (0.0029) -[2024-06-06 13:40:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2340.6). Total num frames: 4333568. Throughput: 0: 677.6. Samples: 81670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:40:14,008][20018] Avg episode reward: [(0, '23.298')] -[2024-06-06 13:40:19,003][20018] Fps is (10 sec: 2868.1, 60 sec: 2867.2, 300 sec: 2372.9). Total num frames: 4349952. Throughput: 0: 700.6. Samples: 86394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:40:19,005][20018] Avg episode reward: [(0, '21.871')] -[2024-06-06 13:40:24,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2403.0). Total num frames: 4366336. Throughput: 0: 720.8. Samples: 88892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:40:24,008][20018] Avg episode reward: [(0, '22.377')] -[2024-06-06 13:40:27,116][20350] Updated weights for policy 0, policy_version 1068 (0.0026) -[2024-06-06 13:40:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2378.3). Total num frames: 4374528. Throughput: 0: 702.4. Samples: 92784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:40:29,005][20018] Avg episode reward: [(0, '21.844')] -[2024-06-06 13:40:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.2, 300 sec: 2406.4). Total num frames: 4390912. Throughput: 0: 675.3. Samples: 96372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:40:34,008][20018] Avg episode reward: [(0, '21.551')] -[2024-06-06 13:40:39,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2432.8). Total num frames: 4407296. Throughput: 0: 696.4. Samples: 98888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:40:39,005][20018] Avg episode reward: [(0, '23.563')] -[2024-06-06 13:40:40,930][20350] Updated weights for policy 0, policy_version 1078 (0.0041) -[2024-06-06 13:40:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2433.5). Total num frames: 4419584. Throughput: 0: 721.7. Samples: 103714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:40:44,005][20018] Avg episode reward: [(0, '24.745')] -[2024-06-06 13:40:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2434.2). Total num frames: 4431872. Throughput: 0: 674.7. Samples: 106726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:40:49,007][20018] Avg episode reward: [(0, '24.777')] -[2024-06-06 13:40:54,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2457.6). Total num frames: 4448256. Throughput: 0: 668.8. Samples: 108792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:40:54,005][20018] Avg episode reward: [(0, '25.201')] -[2024-06-06 13:40:56,205][20350] Updated weights for policy 0, policy_version 1088 (0.0022) -[2024-06-06 13:40:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2479.7). Total num frames: 4464640. Throughput: 0: 713.4. Samples: 113772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:40:59,005][20018] Avg episode reward: [(0, '26.337')] -[2024-06-06 13:41:04,002][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2457.6). Total num frames: 4472832. Throughput: 0: 694.6. Samples: 117652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:41:04,007][20018] Avg episode reward: [(0, '26.051')] -[2024-06-06 13:41:09,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.8, 300 sec: 2457.6). Total num frames: 4485120. Throughput: 0: 673.9. Samples: 119216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:09,005][20018] Avg episode reward: [(0, '26.368')] -[2024-06-06 13:41:12,219][20350] Updated weights for policy 0, policy_version 1098 (0.0042) -[2024-06-06 13:41:14,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2478.1). Total num frames: 4501504. Throughput: 0: 681.0. Samples: 123430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:14,009][20018] Avg episode reward: [(0, '27.421')] -[2024-06-06 13:41:19,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2497.5). Total num frames: 4517888. Throughput: 0: 713.5. Samples: 128482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:19,010][20018] Avg episode reward: [(0, '27.133')] -[2024-06-06 13:41:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001103_4517888.pth... -[2024-06-06 13:41:19,204][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth -[2024-06-06 13:41:24,002][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2477.1). Total num frames: 4526080. Throughput: 0: 690.6. Samples: 129964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:41:24,006][20018] Avg episode reward: [(0, '26.136')] -[2024-06-06 13:41:27,164][20350] Updated weights for policy 0, policy_version 1108 (0.0038) -[2024-06-06 13:41:29,002][20018] Fps is (10 sec: 2458.3, 60 sec: 2798.9, 300 sec: 2495.7). Total num frames: 4542464. Throughput: 0: 662.7. Samples: 133534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:41:29,005][20018] Avg episode reward: [(0, '25.694')] -[2024-06-06 13:41:34,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2513.5). Total num frames: 4558848. Throughput: 0: 709.6. Samples: 138660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:41:34,011][20018] Avg episode reward: [(0, '24.416')] -[2024-06-06 13:41:39,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2512.2). Total num frames: 4571136. Throughput: 0: 718.5. Samples: 141126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:39,007][20018] Avg episode reward: [(0, '24.509')] -[2024-06-06 13:41:41,673][20350] Updated weights for policy 0, policy_version 1118 (0.0036) -[2024-06-06 13:41:44,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2730.6, 300 sec: 2511.0). Total num frames: 4583424. Throughput: 0: 675.4. Samples: 144168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:44,010][20018] Avg episode reward: [(0, '25.813')] -[2024-06-06 13:41:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2527.3). Total num frames: 4599808. Throughput: 0: 688.6. Samples: 148638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:49,011][20018] Avg episode reward: [(0, '25.435')] -[2024-06-06 13:41:54,002][20018] Fps is (10 sec: 3277.4, 60 sec: 2798.9, 300 sec: 2542.9). Total num frames: 4616192. Throughput: 0: 709.4. Samples: 151138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:41:54,009][20018] Avg episode reward: [(0, '26.091')] -[2024-06-06 13:41:54,882][20350] Updated weights for policy 0, policy_version 1128 (0.0031) -[2024-06-06 13:41:59,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2524.5). Total num frames: 4624384. Throughput: 0: 707.6. Samples: 155272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:41:59,006][20018] Avg episode reward: [(0, '26.249')] -[2024-06-06 13:42:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2539.5). Total num frames: 4640768. Throughput: 0: 672.4. Samples: 158736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:42:04,005][20018] Avg episode reward: [(0, '25.724')] -[2024-06-06 13:42:09,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2554.0). Total num frames: 4657152. Throughput: 0: 696.8. Samples: 161320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:42:09,005][20018] Avg episode reward: [(0, '26.064')] -[2024-06-06 13:42:09,849][20350] Updated weights for policy 0, policy_version 1138 (0.0023) -[2024-06-06 13:42:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2552.1). Total num frames: 4669440. Throughput: 0: 728.6. Samples: 166322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:42:14,009][20018] Avg episode reward: [(0, '27.983')] -[2024-06-06 13:42:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2550.3). Total num frames: 4681728. Throughput: 0: 684.3. Samples: 169454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:42:19,010][20018] Avg episode reward: [(0, '27.945')] -[2024-06-06 13:42:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2563.8). Total num frames: 4698112. Throughput: 0: 672.3. Samples: 171378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:42:24,010][20018] Avg episode reward: [(0, '27.245')] -[2024-06-06 13:42:24,913][20350] Updated weights for policy 0, policy_version 1148 (0.0050) -[2024-06-06 13:42:29,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2867.2, 300 sec: 2576.8). Total num frames: 4714496. Throughput: 0: 719.1. Samples: 176526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:42:29,006][20018] Avg episode reward: [(0, '26.392')] -[2024-06-06 13:42:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2574.6). Total num frames: 4726784. Throughput: 0: 709.0. Samples: 180542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:42:34,005][20018] Avg episode reward: [(0, '25.773')] -[2024-06-06 13:42:39,003][20018] Fps is (10 sec: 2048.2, 60 sec: 2730.7, 300 sec: 2558.2). Total num frames: 4734976. Throughput: 0: 686.4. Samples: 182026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:42:39,005][20018] Avg episode reward: [(0, '26.134')] -[2024-06-06 13:42:40,562][20350] Updated weights for policy 0, policy_version 1158 (0.0017) -[2024-06-06 13:42:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2570.6). Total num frames: 4751360. Throughput: 0: 688.4. Samples: 186248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:42:44,010][20018] Avg episode reward: [(0, '26.152')] -[2024-06-06 13:42:49,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2582.6). Total num frames: 4767744. Throughput: 0: 722.9. Samples: 191266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:42:49,007][20018] Avg episode reward: [(0, '26.560')] -[2024-06-06 13:42:54,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2730.6, 300 sec: 2624.2). Total num frames: 4780032. Throughput: 0: 701.8. Samples: 192904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:42:54,009][20018] Avg episode reward: [(0, '26.713')] -[2024-06-06 13:42:55,456][20350] Updated weights for policy 0, policy_version 1168 (0.0018) -[2024-06-06 13:42:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2665.9). Total num frames: 4792320. Throughput: 0: 665.1. Samples: 196252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:42:59,010][20018] Avg episode reward: [(0, '27.026')] -[2024-06-06 13:43:04,006][20018] Fps is (10 sec: 2866.8, 60 sec: 2798.8, 300 sec: 2707.5). Total num frames: 4808704. Throughput: 0: 698.2. Samples: 200876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:04,019][20018] Avg episode reward: [(0, '26.138')] -[2024-06-06 13:43:09,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2679.8). Total num frames: 4816896. Throughput: 0: 689.1. Samples: 202388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:43:09,008][20018] Avg episode reward: [(0, '26.782')] -[2024-06-06 13:43:13,126][20350] Updated weights for policy 0, policy_version 1178 (0.0036) -[2024-06-06 13:43:14,005][20018] Fps is (10 sec: 1638.5, 60 sec: 2594.0, 300 sec: 2679.7). Total num frames: 4825088. Throughput: 0: 629.6. Samples: 204860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:14,010][20018] Avg episode reward: [(0, '26.224')] -[2024-06-06 13:43:19,003][20018] Fps is (10 sec: 2048.1, 60 sec: 2594.1, 300 sec: 2693.6). Total num frames: 4837376. Throughput: 0: 620.8. Samples: 208480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:43:19,008][20018] Avg episode reward: [(0, '26.233')] -[2024-06-06 13:43:19,021][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001181_4837376.pth... -[2024-06-06 13:43:19,163][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001020_4177920.pth -[2024-06-06 13:43:24,003][20018] Fps is (10 sec: 3277.5, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4857856. Throughput: 0: 643.1. Samples: 210964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:24,005][20018] Avg episode reward: [(0, '27.320')] -[2024-06-06 13:43:26,526][20350] Updated weights for policy 0, policy_version 1188 (0.0036) -[2024-06-06 13:43:29,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2594.0, 300 sec: 2693.6). Total num frames: 4870144. Throughput: 0: 659.2. Samples: 215916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:29,008][20018] Avg episode reward: [(0, '26.221')] -[2024-06-06 13:43:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2679.8). Total num frames: 4878336. Throughput: 0: 614.2. Samples: 218906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:43:34,005][20018] Avg episode reward: [(0, '25.327')] -[2024-06-06 13:43:39,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4894720. Throughput: 0: 624.6. Samples: 221012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:39,011][20018] Avg episode reward: [(0, '24.100')] -[2024-06-06 13:43:42,093][20350] Updated weights for policy 0, policy_version 1198 (0.0031) -[2024-06-06 13:43:44,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 4911104. Throughput: 0: 659.3. Samples: 225922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:43:44,005][20018] Avg episode reward: [(0, '24.614')] -[2024-06-06 13:43:49,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2594.1, 300 sec: 2693.6). Total num frames: 4923392. Throughput: 0: 643.6. Samples: 229834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:43:49,008][20018] Avg episode reward: [(0, '24.841')] -[2024-06-06 13:43:54,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2594.2, 300 sec: 2707.5). Total num frames: 4935680. Throughput: 0: 643.3. Samples: 231336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:43:54,005][20018] Avg episode reward: [(0, '26.531')] -[2024-06-06 13:43:57,414][20350] Updated weights for policy 0, policy_version 1208 (0.0026) -[2024-06-06 13:43:59,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4952064. Throughput: 0: 689.0. Samples: 235864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:43:59,011][20018] Avg episode reward: [(0, '26.444')] -[2024-06-06 13:44:04,003][20018] Fps is (10 sec: 3276.9, 60 sec: 2662.5, 300 sec: 2735.3). Total num frames: 4968448. Throughput: 0: 718.9. Samples: 240830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:44:04,009][20018] Avg episode reward: [(0, '25.697')] -[2024-06-06 13:44:09,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2662.3, 300 sec: 2735.3). Total num frames: 4976640. Throughput: 0: 697.9. Samples: 242372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:44:09,007][20018] Avg episode reward: [(0, '25.071')] -[2024-06-06 13:44:12,791][20350] Updated weights for policy 0, policy_version 1218 (0.0020) -[2024-06-06 13:44:14,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 4988928. Throughput: 0: 662.2. Samples: 245712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:44:14,006][20018] Avg episode reward: [(0, '25.129')] -[2024-06-06 13:44:19,003][20018] Fps is (10 sec: 3277.4, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5009408. Throughput: 0: 708.8. Samples: 250800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:44:19,005][20018] Avg episode reward: [(0, '24.814')] -[2024-06-06 13:44:24,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2730.5, 300 sec: 2749.2). Total num frames: 5021696. Throughput: 0: 718.1. Samples: 253328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:44:24,008][20018] Avg episode reward: [(0, '24.198')] -[2024-06-06 13:44:27,256][20350] Updated weights for policy 0, policy_version 1228 (0.0038) -[2024-06-06 13:44:29,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2735.3). Total num frames: 5029888. Throughput: 0: 676.8. Samples: 256378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:44:29,005][20018] Avg episode reward: [(0, '22.802')] -[2024-06-06 13:44:34,003][20018] Fps is (10 sec: 2867.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5050368. Throughput: 0: 688.9. Samples: 260836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:44:34,006][20018] Avg episode reward: [(0, '22.799')] -[2024-06-06 13:44:39,003][20018] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5066752. Throughput: 0: 711.8. Samples: 263366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:44:39,005][20018] Avg episode reward: [(0, '23.041')] -[2024-06-06 13:44:41,049][20350] Updated weights for policy 0, policy_version 1238 (0.0036) -[2024-06-06 13:44:44,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5074944. Throughput: 0: 698.6. Samples: 267302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:44:44,005][20018] Avg episode reward: [(0, '23.123')] -[2024-06-06 13:44:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5087232. Throughput: 0: 665.2. Samples: 270766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:44:49,008][20018] Avg episode reward: [(0, '24.571')] -[2024-06-06 13:44:54,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5107712. Throughput: 0: 687.5. Samples: 273308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:44:54,005][20018] Avg episode reward: [(0, '25.210')] -[2024-06-06 13:44:55,121][20350] Updated weights for policy 0, policy_version 1248 (0.0025) -[2024-06-06 13:44:59,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2749.2). Total num frames: 5120000. Throughput: 0: 734.1. Samples: 278750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:44:59,008][20018] Avg episode reward: [(0, '25.624')] -[2024-06-06 13:45:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5132288. Throughput: 0: 695.2. Samples: 282086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:04,005][20018] Avg episode reward: [(0, '26.340')] -[2024-06-06 13:45:09,005][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2763.0). Total num frames: 5148672. Throughput: 0: 684.1. Samples: 284114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:09,012][20018] Avg episode reward: [(0, '26.613')] -[2024-06-06 13:45:09,735][20350] Updated weights for policy 0, policy_version 1258 (0.0021) -[2024-06-06 13:45:14,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 5165056. Throughput: 0: 728.5. Samples: 289160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:45:14,012][20018] Avg episode reward: [(0, '28.372')] -[2024-06-06 13:45:19,003][20018] Fps is (10 sec: 2868.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5177344. Throughput: 0: 720.7. Samples: 293266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:45:19,009][20018] Avg episode reward: [(0, '28.487')] -[2024-06-06 13:45:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001264_5177344.pth... -[2024-06-06 13:45:19,218][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001103_4517888.pth -[2024-06-06 13:45:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2763.1). Total num frames: 5189632. Throughput: 0: 696.0. Samples: 294686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:24,010][20018] Avg episode reward: [(0, '29.361')] -[2024-06-06 13:45:24,013][20333] Saving new best policy, reward=29.361! -[2024-06-06 13:45:24,983][20350] Updated weights for policy 0, policy_version 1268 (0.0045) -[2024-06-06 13:45:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 5206016. Throughput: 0: 709.5. Samples: 299230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:29,010][20018] Avg episode reward: [(0, '28.119')] -[2024-06-06 13:45:34,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5222400. Throughput: 0: 741.4. Samples: 304130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:45:34,005][20018] Avg episode reward: [(0, '28.570')] -[2024-06-06 13:45:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5230592. Throughput: 0: 718.2. Samples: 305628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:39,013][20018] Avg episode reward: [(0, '29.010')] -[2024-06-06 13:45:40,207][20350] Updated weights for policy 0, policy_version 1278 (0.0037) -[2024-06-06 13:45:44,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5242880. Throughput: 0: 669.8. Samples: 308888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:45:44,005][20018] Avg episode reward: [(0, '28.986')] -[2024-06-06 13:45:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5259264. Throughput: 0: 707.2. Samples: 313910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:45:49,010][20018] Avg episode reward: [(0, '29.509')] -[2024-06-06 13:45:49,104][20333] Saving new best policy, reward=29.509! -[2024-06-06 13:45:54,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5271552. Throughput: 0: 714.5. Samples: 316264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:45:54,010][20018] Avg episode reward: [(0, '30.129')] -[2024-06-06 13:45:54,012][20333] Saving new best policy, reward=30.129! -[2024-06-06 13:45:54,691][20350] Updated weights for policy 0, policy_version 1288 (0.0019) -[2024-06-06 13:45:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 5283840. Throughput: 0: 668.8. Samples: 319258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:45:59,005][20018] Avg episode reward: [(0, '29.440')] -[2024-06-06 13:46:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5300224. Throughput: 0: 677.4. Samples: 323748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:46:04,005][20018] Avg episode reward: [(0, '27.721')] -[2024-06-06 13:46:08,245][20350] Updated weights for policy 0, policy_version 1298 (0.0016) -[2024-06-06 13:46:09,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2799.0, 300 sec: 2763.0). Total num frames: 5316608. Throughput: 0: 701.0. Samples: 326234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:46:09,014][20018] Avg episode reward: [(0, '28.397')] -[2024-06-06 13:46:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5328896. Throughput: 0: 687.9. Samples: 330186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:46:14,005][20018] Avg episode reward: [(0, '28.027')] -[2024-06-06 13:46:19,003][20018] Fps is (10 sec: 2048.5, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 5337088. Throughput: 0: 655.3. Samples: 333618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:46:19,008][20018] Avg episode reward: [(0, '28.289')] -[2024-06-06 13:46:23,737][20350] Updated weights for policy 0, policy_version 1308 (0.0028) -[2024-06-06 13:46:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5357568. Throughput: 0: 678.7. Samples: 336170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:46:24,009][20018] Avg episode reward: [(0, '27.744')] -[2024-06-06 13:46:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5369856. Throughput: 0: 719.3. Samples: 341258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:46:29,011][20018] Avg episode reward: [(0, '27.312')] -[2024-06-06 13:46:34,005][20018] Fps is (10 sec: 2456.9, 60 sec: 2662.3, 300 sec: 2749.2). Total num frames: 5382144. Throughput: 0: 675.3. Samples: 344302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:46:34,008][20018] Avg episode reward: [(0, '26.906')] -[2024-06-06 13:46:38,998][20350] Updated weights for policy 0, policy_version 1318 (0.0028) -[2024-06-06 13:46:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5398528. Throughput: 0: 665.5. Samples: 346212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:46:39,005][20018] Avg episode reward: [(0, '27.093')] -[2024-06-06 13:46:44,003][20018] Fps is (10 sec: 2868.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5410816. Throughput: 0: 710.7. Samples: 351240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:46:44,006][20018] Avg episode reward: [(0, '28.390')] -[2024-06-06 13:46:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5423104. Throughput: 0: 699.8. Samples: 355238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:46:49,007][20018] Avg episode reward: [(0, '29.759')] -[2024-06-06 13:46:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5435392. Throughput: 0: 678.9. Samples: 356782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:46:54,010][20018] Avg episode reward: [(0, '28.427')] -[2024-06-06 13:46:54,545][20350] Updated weights for policy 0, policy_version 1328 (0.0046) -[2024-06-06 13:46:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5455872. Throughput: 0: 689.9. Samples: 361232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:46:59,005][20018] Avg episode reward: [(0, '27.701')] -[2024-06-06 13:47:04,008][20018] Fps is (10 sec: 3274.9, 60 sec: 2798.7, 300 sec: 2749.1). Total num frames: 5468160. Throughput: 0: 725.8. Samples: 366284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:47:04,011][20018] Avg episode reward: [(0, '28.158')] -[2024-06-06 13:47:08,721][20350] Updated weights for policy 0, policy_version 1338 (0.0026) -[2024-06-06 13:47:09,007][20018] Fps is (10 sec: 2456.4, 60 sec: 2730.6, 300 sec: 2749.1). Total num frames: 5480448. Throughput: 0: 703.2. Samples: 367818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:47:09,011][20018] Avg episode reward: [(0, '28.420')] -[2024-06-06 13:47:14,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5492736. Throughput: 0: 665.4. Samples: 371202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:47:14,009][20018] Avg episode reward: [(0, '29.323')] -[2024-06-06 13:47:19,003][20018] Fps is (10 sec: 2868.6, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5509120. Throughput: 0: 708.7. Samples: 376192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:47:19,009][20018] Avg episode reward: [(0, '30.120')] -[2024-06-06 13:47:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth... -[2024-06-06 13:47:19,152][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001181_4837376.pth -[2024-06-06 13:47:22,380][20350] Updated weights for policy 0, policy_version 1348 (0.0015) -[2024-06-06 13:47:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5521408. Throughput: 0: 721.7. Samples: 378688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:47:24,007][20018] Avg episode reward: [(0, '28.871')] -[2024-06-06 13:47:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5533696. Throughput: 0: 678.1. Samples: 381756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:47:29,005][20018] Avg episode reward: [(0, '29.723')] -[2024-06-06 13:47:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2763.1). Total num frames: 5550080. Throughput: 0: 684.9. Samples: 386060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:47:34,017][20018] Avg episode reward: [(0, '29.338')] -[2024-06-06 13:47:37,416][20350] Updated weights for policy 0, policy_version 1358 (0.0027) -[2024-06-06 13:47:39,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5566464. Throughput: 0: 706.0. Samples: 388552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:47:39,009][20018] Avg episode reward: [(0, '31.329')] -[2024-06-06 13:47:39,021][20333] Saving new best policy, reward=31.329! -[2024-06-06 13:47:44,004][20018] Fps is (10 sec: 2457.4, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 5574656. Throughput: 0: 696.3. Samples: 392566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:47:44,007][20018] Avg episode reward: [(0, '31.324')] -[2024-06-06 13:47:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5586944. Throughput: 0: 655.0. Samples: 395754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:47:49,005][20018] Avg episode reward: [(0, '28.732')] -[2024-06-06 13:47:54,006][20018] Fps is (10 sec: 2457.1, 60 sec: 2730.5, 300 sec: 2735.3). Total num frames: 5599232. Throughput: 0: 667.0. Samples: 397834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:47:54,010][20018] Avg episode reward: [(0, '28.208')] -[2024-06-06 13:47:54,917][20350] Updated weights for policy 0, policy_version 1368 (0.0016) -[2024-06-06 13:47:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2707.6). Total num frames: 5607424. Throughput: 0: 659.3. Samples: 400872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:47:59,011][20018] Avg episode reward: [(0, '27.807')] -[2024-06-06 13:48:04,003][20018] Fps is (10 sec: 2048.6, 60 sec: 2526.1, 300 sec: 2721.4). Total num frames: 5619712. Throughput: 0: 614.0. Samples: 403820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:48:04,008][20018] Avg episode reward: [(0, '27.721')] -[2024-06-06 13:48:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.1, 300 sec: 2735.3). Total num frames: 5632000. Throughput: 0: 594.4. Samples: 405434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:48:09,012][20018] Avg episode reward: [(0, '29.175')] -[2024-06-06 13:48:11,618][20350] Updated weights for policy 0, policy_version 1378 (0.0033) -[2024-06-06 13:48:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5648384. Throughput: 0: 639.5. Samples: 410534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:48:14,005][20018] Avg episode reward: [(0, '28.412')] -[2024-06-06 13:48:19,006][20018] Fps is (10 sec: 3275.8, 60 sec: 2594.0, 300 sec: 2735.3). Total num frames: 5664768. Throughput: 0: 638.8. Samples: 414808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:48:19,012][20018] Avg episode reward: [(0, '28.011')] -[2024-06-06 13:48:24,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2525.8, 300 sec: 2721.4). Total num frames: 5672960. Throughput: 0: 616.5. Samples: 416296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:48:24,013][20018] Avg episode reward: [(0, '26.777')] -[2024-06-06 13:48:27,085][20350] Updated weights for policy 0, policy_version 1388 (0.0027) -[2024-06-06 13:48:29,003][20018] Fps is (10 sec: 2458.4, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5689344. Throughput: 0: 619.1. Samples: 420426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:48:29,005][20018] Avg episode reward: [(0, '26.795')] -[2024-06-06 13:48:34,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5705728. Throughput: 0: 659.1. Samples: 425412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:48:34,010][20018] Avg episode reward: [(0, '27.800')] -[2024-06-06 13:48:39,004][20018] Fps is (10 sec: 2866.8, 60 sec: 2525.8, 300 sec: 2735.3). Total num frames: 5718016. Throughput: 0: 656.6. Samples: 427378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:48:39,007][20018] Avg episode reward: [(0, '28.618')] -[2024-06-06 13:48:42,087][20350] Updated weights for policy 0, policy_version 1398 (0.0029) -[2024-06-06 13:48:44,005][20018] Fps is (10 sec: 2456.9, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 5730304. Throughput: 0: 658.9. Samples: 430524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:48:44,008][20018] Avg episode reward: [(0, '28.016')] -[2024-06-06 13:48:49,003][20018] Fps is (10 sec: 2867.6, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 5746688. Throughput: 0: 709.1. Samples: 435730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:48:49,011][20018] Avg episode reward: [(0, '27.836')] -[2024-06-06 13:48:54,003][20018] Fps is (10 sec: 3277.7, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 5763072. Throughput: 0: 732.4. Samples: 438394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:48:54,005][20018] Avg episode reward: [(0, '28.303')] -[2024-06-06 13:48:54,897][20350] Updated weights for policy 0, policy_version 1408 (0.0022) -[2024-06-06 13:48:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5775360. Throughput: 0: 696.5. Samples: 441876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:48:59,005][20018] Avg episode reward: [(0, '27.045')] -[2024-06-06 13:49:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5791744. Throughput: 0: 698.0. Samples: 446214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:49:04,005][20018] Avg episode reward: [(0, '27.788')] -[2024-06-06 13:49:08,580][20350] Updated weights for policy 0, policy_version 1418 (0.0015) -[2024-06-06 13:49:09,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2777.0). Total num frames: 5808128. Throughput: 0: 725.2. Samples: 448930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:49:09,010][20018] Avg episode reward: [(0, '27.232')] -[2024-06-06 13:49:14,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2867.1, 300 sec: 2749.2). Total num frames: 5820416. Throughput: 0: 731.0. Samples: 453324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:49:14,006][20018] Avg episode reward: [(0, '27.308')] -[2024-06-06 13:49:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2749.2). Total num frames: 5832704. Throughput: 0: 689.1. Samples: 456422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:49:19,010][20018] Avg episode reward: [(0, '27.351')] -[2024-06-06 13:49:19,022][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001424_5832704.pth... -[2024-06-06 13:49:19,171][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001264_5177344.pth -[2024-06-06 13:49:23,901][20350] Updated weights for policy 0, policy_version 1428 (0.0028) -[2024-06-06 13:49:24,003][20018] Fps is (10 sec: 2867.6, 60 sec: 2935.5, 300 sec: 2776.9). Total num frames: 5849088. Throughput: 0: 699.8. Samples: 458870. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:49:24,008][20018] Avg episode reward: [(0, '25.795')] -[2024-06-06 13:49:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5861376. Throughput: 0: 741.7. Samples: 463900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:49:29,009][20018] Avg episode reward: [(0, '25.712')] -[2024-06-06 13:49:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5873664. Throughput: 0: 700.2. Samples: 467238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:49:34,005][20018] Avg episode reward: [(0, '25.408')] -[2024-06-06 13:49:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2749.2). Total num frames: 5885952. Throughput: 0: 674.3. Samples: 468738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:49:39,011][20018] Avg episode reward: [(0, '24.941')] -[2024-06-06 13:49:39,527][20350] Updated weights for policy 0, policy_version 1438 (0.0052) -[2024-06-06 13:49:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.3, 300 sec: 2763.1). Total num frames: 5902336. Throughput: 0: 708.3. Samples: 473748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:49:44,005][20018] Avg episode reward: [(0, '25.818')] -[2024-06-06 13:49:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5914624. Throughput: 0: 707.2. Samples: 478036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:49:49,011][20018] Avg episode reward: [(0, '26.302')] -[2024-06-06 13:49:54,004][20018] Fps is (10 sec: 2457.3, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 5926912. Throughput: 0: 679.9. Samples: 479524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:49:54,007][20018] Avg episode reward: [(0, '26.634')] -[2024-06-06 13:49:54,915][20350] Updated weights for policy 0, policy_version 1448 (0.0042) -[2024-06-06 13:49:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5943296. Throughput: 0: 671.6. Samples: 483544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:49:59,006][20018] Avg episode reward: [(0, '25.409')] -[2024-06-06 13:50:04,003][20018] Fps is (10 sec: 3277.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5959680. Throughput: 0: 712.5. Samples: 488486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:04,005][20018] Avg episode reward: [(0, '26.158')] -[2024-06-06 13:50:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 5967872. Throughput: 0: 699.3. Samples: 490340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:50:09,005][20018] Avg episode reward: [(0, '27.691')] -[2024-06-06 13:50:09,682][20350] Updated weights for policy 0, policy_version 1458 (0.0026) -[2024-06-06 13:50:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2721.4). Total num frames: 5980160. Throughput: 0: 653.2. Samples: 493294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:14,009][20018] Avg episode reward: [(0, '27.932')] -[2024-06-06 13:50:19,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5996544. Throughput: 0: 688.4. Samples: 498216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:50:19,006][20018] Avg episode reward: [(0, '28.560')] -[2024-06-06 13:50:23,591][20350] Updated weights for policy 0, policy_version 1468 (0.0021) -[2024-06-06 13:50:24,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6012928. Throughput: 0: 710.6. Samples: 500714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:50:24,006][20018] Avg episode reward: [(0, '29.066')] -[2024-06-06 13:50:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6021120. Throughput: 0: 674.4. Samples: 504094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:50:29,005][20018] Avg episode reward: [(0, '29.491')] -[2024-06-06 13:50:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6037504. Throughput: 0: 666.8. Samples: 508040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:34,008][20018] Avg episode reward: [(0, '29.805')] -[2024-06-06 13:50:38,366][20350] Updated weights for policy 0, policy_version 1478 (0.0018) -[2024-06-06 13:50:39,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 6053888. Throughput: 0: 688.5. Samples: 510506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:39,005][20018] Avg episode reward: [(0, '29.112')] -[2024-06-06 13:50:44,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 6066176. Throughput: 0: 698.1. Samples: 514960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:50:44,007][20018] Avg episode reward: [(0, '30.930')] -[2024-06-06 13:50:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6074368. Throughput: 0: 654.6. Samples: 517942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:49,005][20018] Avg episode reward: [(0, '30.438')] -[2024-06-06 13:50:54,003][20018] Fps is (10 sec: 2458.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6090752. Throughput: 0: 666.6. Samples: 520336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:54,011][20018] Avg episode reward: [(0, '29.671')] -[2024-06-06 13:50:54,100][20350] Updated weights for policy 0, policy_version 1488 (0.0018) -[2024-06-06 13:50:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6107136. Throughput: 0: 711.0. Samples: 525290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:50:59,006][20018] Avg episode reward: [(0, '29.318')] -[2024-06-06 13:51:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6119424. Throughput: 0: 679.6. Samples: 528798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:51:04,009][20018] Avg episode reward: [(0, '29.295')] -[2024-06-06 13:51:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6131712. Throughput: 0: 658.0. Samples: 530326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:51:09,005][20018] Avg episode reward: [(0, '28.874')] -[2024-06-06 13:51:09,429][20350] Updated weights for policy 0, policy_version 1498 (0.0039) -[2024-06-06 13:51:14,008][20018] Fps is (10 sec: 2865.6, 60 sec: 2798.7, 300 sec: 2749.1). Total num frames: 6148096. Throughput: 0: 691.1. Samples: 535198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:51:14,016][20018] Avg episode reward: [(0, '29.764')] -[2024-06-06 13:51:19,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6160384. Throughput: 0: 703.8. Samples: 539712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:51:19,008][20018] Avg episode reward: [(0, '29.126')] -[2024-06-06 13:51:19,064][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001505_6164480.pth... -[2024-06-06 13:51:19,218][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth -[2024-06-06 13:51:24,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6172672. Throughput: 0: 682.7. Samples: 541226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:51:24,005][20018] Avg episode reward: [(0, '28.099')] -[2024-06-06 13:51:25,018][20350] Updated weights for policy 0, policy_version 1508 (0.0024) -[2024-06-06 13:51:29,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6189056. Throughput: 0: 667.7. Samples: 545004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:51:29,005][20018] Avg episode reward: [(0, '27.215')] -[2024-06-06 13:51:34,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2735.3). Total num frames: 6205440. Throughput: 0: 712.8. Samples: 550022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:51:34,007][20018] Avg episode reward: [(0, '25.411')] -[2024-06-06 13:51:37,990][20350] Updated weights for policy 0, policy_version 1518 (0.0020) -[2024-06-06 13:51:39,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6217728. Throughput: 0: 711.2. Samples: 552338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:51:39,006][20018] Avg episode reward: [(0, '25.190')] -[2024-06-06 13:51:44,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6230016. Throughput: 0: 670.4. Samples: 555458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:51:44,011][20018] Avg episode reward: [(0, '24.904')] -[2024-06-06 13:51:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6246400. Throughput: 0: 696.0. Samples: 560118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:51:49,005][20018] Avg episode reward: [(0, '23.187')] -[2024-06-06 13:51:52,289][20350] Updated weights for policy 0, policy_version 1528 (0.0036) -[2024-06-06 13:51:54,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6262784. Throughput: 0: 718.1. Samples: 562642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:51:54,009][20018] Avg episode reward: [(0, '23.643')] -[2024-06-06 13:51:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.5). Total num frames: 6270976. Throughput: 0: 693.8. Samples: 566416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:51:59,009][20018] Avg episode reward: [(0, '24.250')] -[2024-06-06 13:52:04,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6287360. Throughput: 0: 678.8. Samples: 570258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:04,005][20018] Avg episode reward: [(0, '25.145')] -[2024-06-06 13:52:07,087][20350] Updated weights for policy 0, policy_version 1538 (0.0019) -[2024-06-06 13:52:09,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6303744. Throughput: 0: 704.5. Samples: 572928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:52:09,006][20018] Avg episode reward: [(0, '26.829')] -[2024-06-06 13:52:14,004][20018] Fps is (10 sec: 2866.8, 60 sec: 2799.1, 300 sec: 2735.3). Total num frames: 6316032. Throughput: 0: 732.2. Samples: 577952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:52:14,012][20018] Avg episode reward: [(0, '26.352')] -[2024-06-06 13:52:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6328320. Throughput: 0: 688.4. Samples: 580996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:19,012][20018] Avg episode reward: [(0, '26.017')] -[2024-06-06 13:52:22,239][20350] Updated weights for policy 0, policy_version 1548 (0.0036) -[2024-06-06 13:52:24,003][20018] Fps is (10 sec: 2867.7, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6344704. Throughput: 0: 683.8. Samples: 583108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:24,008][20018] Avg episode reward: [(0, '25.576')] -[2024-06-06 13:52:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6361088. Throughput: 0: 729.5. Samples: 588286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:52:29,008][20018] Avg episode reward: [(0, '24.663')] -[2024-06-06 13:52:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2735.3). Total num frames: 6373376. Throughput: 0: 709.6. Samples: 592052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:34,008][20018] Avg episode reward: [(0, '23.795')] -[2024-06-06 13:52:38,398][20350] Updated weights for policy 0, policy_version 1558 (0.0029) -[2024-06-06 13:52:39,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6381568. Throughput: 0: 682.4. Samples: 593352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:39,008][20018] Avg episode reward: [(0, '23.474')] -[2024-06-06 13:52:44,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6389760. Throughput: 0: 660.3. Samples: 596130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:52:44,005][20018] Avg episode reward: [(0, '23.506')] -[2024-06-06 13:52:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 6406144. Throughput: 0: 669.0. Samples: 600364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:52:49,010][20018] Avg episode reward: [(0, '23.392')] -[2024-06-06 13:52:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6414336. Throughput: 0: 650.0. Samples: 602180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:52:54,008][20018] Avg episode reward: [(0, '23.988')] -[2024-06-06 13:52:56,234][20350] Updated weights for policy 0, policy_version 1568 (0.0023) -[2024-06-06 13:52:59,003][20018] Fps is (10 sec: 2048.1, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 6426624. Throughput: 0: 606.3. Samples: 605236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:52:59,010][20018] Avg episode reward: [(0, '23.889')] -[2024-06-06 13:53:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 6447104. Throughput: 0: 646.8. Samples: 610102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:53:04,011][20018] Avg episode reward: [(0, '26.348')] -[2024-06-06 13:53:09,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6459392. Throughput: 0: 656.2. Samples: 612638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:53:09,006][20018] Avg episode reward: [(0, '26.564')] -[2024-06-06 13:53:09,801][20350] Updated weights for policy 0, policy_version 1578 (0.0015) -[2024-06-06 13:53:14,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2594.2, 300 sec: 2735.3). Total num frames: 6471680. Throughput: 0: 618.2. Samples: 616106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:53:14,010][20018] Avg episode reward: [(0, '27.438')] -[2024-06-06 13:53:19,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6483968. Throughput: 0: 622.3. Samples: 620056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:53:19,006][20018] Avg episode reward: [(0, '27.859')] -[2024-06-06 13:53:19,021][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001583_6483968.pth... -[2024-06-06 13:53:19,166][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001424_5832704.pth -[2024-06-06 13:53:24,003][20018] Fps is (10 sec: 2867.4, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6500352. Throughput: 0: 647.5. Samples: 622490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:53:24,012][20018] Avg episode reward: [(0, '28.806')] -[2024-06-06 13:53:24,277][20350] Updated weights for policy 0, policy_version 1588 (0.0020) -[2024-06-06 13:53:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6512640. Throughput: 0: 685.0. Samples: 626956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:53:29,013][20018] Avg episode reward: [(0, '29.426')] -[2024-06-06 13:53:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6524928. Throughput: 0: 658.1. Samples: 629980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:53:34,005][20018] Avg episode reward: [(0, '28.733')] -[2024-06-06 13:53:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 6541312. Throughput: 0: 672.5. Samples: 632444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:53:39,011][20018] Avg episode reward: [(0, '28.706')] -[2024-06-06 13:53:39,721][20350] Updated weights for policy 0, policy_version 1598 (0.0020) -[2024-06-06 13:53:44,006][20018] Fps is (10 sec: 3275.6, 60 sec: 2798.8, 300 sec: 2749.1). Total num frames: 6557696. Throughput: 0: 717.3. Samples: 637516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:53:44,009][20018] Avg episode reward: [(0, '25.904')] -[2024-06-06 13:53:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6565888. Throughput: 0: 682.5. Samples: 640816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:53:49,013][20018] Avg episode reward: [(0, '26.589')] -[2024-06-06 13:53:54,003][20018] Fps is (10 sec: 2048.8, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6578176. Throughput: 0: 661.6. Samples: 642410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:53:54,011][20018] Avg episode reward: [(0, '25.272')] -[2024-06-06 13:53:55,364][20350] Updated weights for policy 0, policy_version 1608 (0.0026) -[2024-06-06 13:53:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6598656. Throughput: 0: 694.4. Samples: 647352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-06-06 13:53:59,010][20018] Avg episode reward: [(0, '25.098')] -[2024-06-06 13:54:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6610944. Throughput: 0: 706.0. Samples: 651826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:54:04,010][20018] Avg episode reward: [(0, '25.363')] -[2024-06-06 13:54:09,004][20018] Fps is (10 sec: 2047.7, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6619136. Throughput: 0: 683.4. Samples: 653244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:54:09,007][20018] Avg episode reward: [(0, '26.583')] -[2024-06-06 13:54:10,830][20350] Updated weights for policy 0, policy_version 1618 (0.0027) -[2024-06-06 13:54:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6635520. Throughput: 0: 671.2. Samples: 657160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-06-06 13:54:14,009][20018] Avg episode reward: [(0, '26.282')] -[2024-06-06 13:54:19,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 6651904. Throughput: 0: 714.1. Samples: 662116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:54:19,010][20018] Avg episode reward: [(0, '25.950')] -[2024-06-06 13:54:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6664192. Throughput: 0: 702.9. Samples: 664074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:54:24,012][20018] Avg episode reward: [(0, '26.865')] -[2024-06-06 13:54:25,249][20350] Updated weights for policy 0, policy_version 1628 (0.0020) -[2024-06-06 13:54:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6676480. Throughput: 0: 657.3. Samples: 667094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:54:29,005][20018] Avg episode reward: [(0, '26.736')] -[2024-06-06 13:54:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6692864. Throughput: 0: 693.7. Samples: 672034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:54:34,010][20018] Avg episode reward: [(0, '27.051')] -[2024-06-06 13:54:39,003][20350] Updated weights for policy 0, policy_version 1638 (0.0015) -[2024-06-06 13:54:39,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2798.8, 300 sec: 2735.3). Total num frames: 6709248. Throughput: 0: 714.3. Samples: 674554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:54:39,012][20018] Avg episode reward: [(0, '25.769')] -[2024-06-06 13:54:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.6, 300 sec: 2721.4). Total num frames: 6717440. Throughput: 0: 682.4. Samples: 678060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:54:44,008][20018] Avg episode reward: [(0, '24.369')] -[2024-06-06 13:54:49,003][20018] Fps is (10 sec: 2458.1, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6733824. Throughput: 0: 670.1. Samples: 681982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:54:49,005][20018] Avg episode reward: [(0, '24.376')] -[2024-06-06 13:54:53,706][20350] Updated weights for policy 0, policy_version 1648 (0.0020) -[2024-06-06 13:54:54,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6750208. Throughput: 0: 691.7. Samples: 684370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:54:54,006][20018] Avg episode reward: [(0, '24.360')] -[2024-06-06 13:54:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6762496. Throughput: 0: 702.0. Samples: 688752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:54:59,011][20018] Avg episode reward: [(0, '25.821')] -[2024-06-06 13:55:04,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6770688. Throughput: 0: 658.8. Samples: 691760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-06-06 13:55:04,005][20018] Avg episode reward: [(0, '25.932')] -[2024-06-06 13:55:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2735.3). Total num frames: 6787072. Throughput: 0: 670.7. Samples: 694254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:55:09,008][20018] Avg episode reward: [(0, '27.138')] -[2024-06-06 13:55:09,221][20350] Updated weights for policy 0, policy_version 1658 (0.0029) -[2024-06-06 13:55:14,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6803456. Throughput: 0: 714.8. Samples: 699260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:55:14,010][20018] Avg episode reward: [(0, '27.601')] -[2024-06-06 13:55:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6811648. Throughput: 0: 680.2. Samples: 702644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:55:19,005][20018] Avg episode reward: [(0, '27.858')] -[2024-06-06 13:55:19,110][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth... -[2024-06-06 13:55:19,341][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001505_6164480.pth -[2024-06-06 13:55:24,003][20018] Fps is (10 sec: 2457.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6828032. Throughput: 0: 657.2. Samples: 704126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:55:24,011][20018] Avg episode reward: [(0, '29.921')] -[2024-06-06 13:55:24,907][20350] Updated weights for policy 0, policy_version 1668 (0.0025) -[2024-06-06 13:55:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6844416. Throughput: 0: 685.9. Samples: 708926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:55:29,008][20018] Avg episode reward: [(0, '30.611')] -[2024-06-06 13:55:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6856704. Throughput: 0: 700.0. Samples: 713482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:55:34,011][20018] Avg episode reward: [(0, '29.801')] -[2024-06-06 13:55:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2721.4). Total num frames: 6868992. Throughput: 0: 679.5. Samples: 714946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:55:39,010][20018] Avg episode reward: [(0, '29.201')] -[2024-06-06 13:55:40,356][20350] Updated weights for policy 0, policy_version 1678 (0.0047) -[2024-06-06 13:55:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 6885376. Throughput: 0: 664.5. Samples: 718654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:55:44,005][20018] Avg episode reward: [(0, '29.323')] -[2024-06-06 13:55:49,005][20018] Fps is (10 sec: 3276.0, 60 sec: 2798.8, 300 sec: 2749.2). Total num frames: 6901760. Throughput: 0: 708.9. Samples: 723662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:55:49,012][20018] Avg episode reward: [(0, '29.066')] -[2024-06-06 13:55:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6909952. Throughput: 0: 699.1. Samples: 725714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:55:54,005][20018] Avg episode reward: [(0, '29.126')] -[2024-06-06 13:55:54,497][20350] Updated weights for policy 0, policy_version 1688 (0.0065) -[2024-06-06 13:55:59,003][20018] Fps is (10 sec: 2048.5, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6922240. Throughput: 0: 655.7. Samples: 728768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:55:59,005][20018] Avg episode reward: [(0, '28.612')] -[2024-06-06 13:56:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6938624. Throughput: 0: 685.4. Samples: 733488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:04,005][20018] Avg episode reward: [(0, '28.922')] -[2024-06-06 13:56:08,421][20350] Updated weights for policy 0, policy_version 1698 (0.0027) -[2024-06-06 13:56:09,008][20018] Fps is (10 sec: 3275.0, 60 sec: 2798.7, 300 sec: 2735.3). Total num frames: 6955008. Throughput: 0: 707.8. Samples: 735982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-06-06 13:56:09,011][20018] Avg episode reward: [(0, '28.923')] -[2024-06-06 13:56:14,007][20018] Fps is (10 sec: 2456.5, 60 sec: 2662.2, 300 sec: 2721.4). Total num frames: 6963200. Throughput: 0: 683.2. Samples: 739672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:14,014][20018] Avg episode reward: [(0, '28.466')] -[2024-06-06 13:56:19,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6979584. Throughput: 0: 664.4. Samples: 743380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:19,006][20018] Avg episode reward: [(0, '28.805')] -[2024-06-06 13:56:23,762][20350] Updated weights for policy 0, policy_version 1708 (0.0021) -[2024-06-06 13:56:24,003][20018] Fps is (10 sec: 3278.3, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6995968. Throughput: 0: 686.4. Samples: 745832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:56:24,009][20018] Avg episode reward: [(0, '27.745')] -[2024-06-06 13:56:29,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 7008256. Throughput: 0: 709.1. Samples: 750562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:56:29,008][20018] Avg episode reward: [(0, '27.989')] -[2024-06-06 13:56:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 7016448. Throughput: 0: 664.0. Samples: 753542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:34,008][20018] Avg episode reward: [(0, '27.450')] -[2024-06-06 13:56:39,005][20018] Fps is (10 sec: 2457.1, 60 sec: 2730.6, 300 sec: 2721.4). Total num frames: 7032832. Throughput: 0: 667.3. Samples: 755744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:39,007][20018] Avg episode reward: [(0, '27.028')] -[2024-06-06 13:56:39,108][20350] Updated weights for policy 0, policy_version 1718 (0.0045) -[2024-06-06 13:56:44,003][20018] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 7053312. Throughput: 0: 711.8. Samples: 760800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:56:44,011][20018] Avg episode reward: [(0, '26.760')] -[2024-06-06 13:56:49,007][20018] Fps is (10 sec: 2866.8, 60 sec: 2662.3, 300 sec: 2707.5). Total num frames: 7061504. Throughput: 0: 689.9. Samples: 764534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:56:49,013][20018] Avg episode reward: [(0, '26.369')] -[2024-06-06 13:56:54,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 7073792. Throughput: 0: 667.1. Samples: 765996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:56:54,005][20018] Avg episode reward: [(0, '26.248')] -[2024-06-06 13:56:54,619][20350] Updated weights for policy 0, policy_version 1728 (0.0023) -[2024-06-06 13:56:59,003][20018] Fps is (10 sec: 2868.3, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 7090176. Throughput: 0: 687.0. Samples: 770582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:56:59,005][20018] Avg episode reward: [(0, '26.419')] -[2024-06-06 13:57:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 7106560. Throughput: 0: 711.3. Samples: 775390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:57:04,010][20018] Avg episode reward: [(0, '27.420')] -[2024-06-06 13:57:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.6, 300 sec: 2707.5). Total num frames: 7114752. Throughput: 0: 688.9. Samples: 776834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:57:09,012][20018] Avg episode reward: [(0, '28.205')] -[2024-06-06 13:57:09,775][20350] Updated weights for policy 0, policy_version 1738 (0.0039) -[2024-06-06 13:57:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2721.4). Total num frames: 7131136. Throughput: 0: 664.2. Samples: 780450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:57:14,005][20018] Avg episode reward: [(0, '28.978')] -[2024-06-06 13:57:19,008][20018] Fps is (10 sec: 3274.9, 60 sec: 2798.7, 300 sec: 2721.4). Total num frames: 7147520. Throughput: 0: 709.9. Samples: 785492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:57:19,011][20018] Avg episode reward: [(0, '28.194')] -[2024-06-06 13:57:19,026][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth... -[2024-06-06 13:57:19,174][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001583_6483968.pth -[2024-06-06 13:57:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7155712. Throughput: 0: 707.9. Samples: 787598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:57:24,008][20018] Avg episode reward: [(0, '27.636')] -[2024-06-06 13:57:24,236][20350] Updated weights for policy 0, policy_version 1748 (0.0039) -[2024-06-06 13:57:29,003][20018] Fps is (10 sec: 2049.1, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7168000. Throughput: 0: 662.8. Samples: 790626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:57:29,005][20018] Avg episode reward: [(0, '28.152')] -[2024-06-06 13:57:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2707.5). Total num frames: 7180288. Throughput: 0: 658.8. Samples: 794178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:57:34,006][20018] Avg episode reward: [(0, '28.515')] -[2024-06-06 13:57:39,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.2, 300 sec: 2707.5). Total num frames: 7188480. Throughput: 0: 660.2. Samples: 795706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:57:39,006][20018] Avg episode reward: [(0, '27.867')] -[2024-06-06 13:57:43,182][20350] Updated weights for policy 0, policy_version 1758 (0.0035) -[2024-06-06 13:57:44,007][20018] Fps is (10 sec: 2047.2, 60 sec: 2457.4, 300 sec: 2693.6). Total num frames: 7200768. Throughput: 0: 627.3. Samples: 798812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:57:44,009][20018] Avg episode reward: [(0, '28.362')] -[2024-06-06 13:57:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.0, 300 sec: 2707.5). Total num frames: 7213056. Throughput: 0: 599.7. Samples: 802376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 13:57:49,008][20018] Avg episode reward: [(0, '28.330')] -[2024-06-06 13:57:54,003][20018] Fps is (10 sec: 2868.3, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 7229440. Throughput: 0: 619.9. Samples: 804730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:57:54,005][20018] Avg episode reward: [(0, '28.256')] -[2024-06-06 13:57:56,791][20350] Updated weights for policy 0, policy_version 1768 (0.0034) -[2024-06-06 13:57:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7245824. Throughput: 0: 648.1. Samples: 809616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:57:59,008][20018] Avg episode reward: [(0, '26.941')] -[2024-06-06 13:58:04,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2457.5, 300 sec: 2693.6). Total num frames: 7254016. Throughput: 0: 604.1. Samples: 812674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:58:04,011][20018] Avg episode reward: [(0, '26.977')] -[2024-06-06 13:58:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7270400. Throughput: 0: 602.9. Samples: 814728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 13:58:09,011][20018] Avg episode reward: [(0, '27.320')] -[2024-06-06 13:58:11,921][20350] Updated weights for policy 0, policy_version 1778 (0.0042) -[2024-06-06 13:58:14,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 7286784. Throughput: 0: 648.7. Samples: 819818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:58:14,010][20018] Avg episode reward: [(0, '26.777')] -[2024-06-06 13:58:19,003][20018] Fps is (10 sec: 2867.0, 60 sec: 2526.1, 300 sec: 2707.5). Total num frames: 7299072. Throughput: 0: 656.2. Samples: 823706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:58:19,006][20018] Avg episode reward: [(0, '27.121')] -[2024-06-06 13:58:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7311360. Throughput: 0: 654.2. Samples: 825144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:58:24,005][20018] Avg episode reward: [(0, '27.621')] -[2024-06-06 13:58:27,288][20350] Updated weights for policy 0, policy_version 1788 (0.0031) -[2024-06-06 13:58:29,003][20018] Fps is (10 sec: 2867.4, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 7327744. Throughput: 0: 688.5. Samples: 829790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:58:29,005][20018] Avg episode reward: [(0, '28.006')] -[2024-06-06 13:58:34,009][20018] Fps is (10 sec: 2865.3, 60 sec: 2662.1, 300 sec: 2707.5). Total num frames: 7340032. Throughput: 0: 716.7. Samples: 834632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:58:34,016][20018] Avg episode reward: [(0, '29.933')] -[2024-06-06 13:58:39,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2730.6, 300 sec: 2693.7). Total num frames: 7352320. Throughput: 0: 698.5. Samples: 836162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:58:39,011][20018] Avg episode reward: [(0, '29.013')] -[2024-06-06 13:58:42,890][20350] Updated weights for policy 0, policy_version 1798 (0.0041) -[2024-06-06 13:58:44,003][20018] Fps is (10 sec: 2459.2, 60 sec: 2730.8, 300 sec: 2707.5). Total num frames: 7364608. Throughput: 0: 667.6. Samples: 839660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:58:44,011][20018] Avg episode reward: [(0, '29.093')] -[2024-06-06 13:58:49,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 7385088. Throughput: 0: 711.5. Samples: 844692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:58:49,011][20018] Avg episode reward: [(0, '29.243')] -[2024-06-06 13:58:54,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2693.6). Total num frames: 7393280. Throughput: 0: 717.2. Samples: 847002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-06-06 13:58:54,008][20018] Avg episode reward: [(0, '28.239')] -[2024-06-06 13:58:58,093][20350] Updated weights for policy 0, policy_version 1808 (0.0041) -[2024-06-06 13:58:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7405568. Throughput: 0: 670.0. Samples: 849968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:58:59,011][20018] Avg episode reward: [(0, '29.264')] -[2024-06-06 13:59:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.0, 300 sec: 2721.4). Total num frames: 7421952. Throughput: 0: 680.3. Samples: 854318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:04,005][20018] Avg episode reward: [(0, '29.837')] -[2024-06-06 13:59:09,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2707.5). Total num frames: 7434240. Throughput: 0: 689.5. Samples: 856172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:09,006][20018] Avg episode reward: [(0, '30.005')] -[2024-06-06 13:59:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7442432. Throughput: 0: 658.9. Samples: 859440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-06-06 13:59:14,009][20018] Avg episode reward: [(0, '29.818')] -[2024-06-06 13:59:14,980][20350] Updated weights for policy 0, policy_version 1818 (0.0026) -[2024-06-06 13:59:19,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.2, 300 sec: 2679.8). Total num frames: 7454720. Throughput: 0: 613.3. Samples: 862226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:19,009][20018] Avg episode reward: [(0, '29.373')] -[2024-06-06 13:59:19,030][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth... -[2024-06-06 13:59:19,200][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth -[2024-06-06 13:59:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7467008. Throughput: 0: 626.1. Samples: 864338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 13:59:24,005][20018] Avg episode reward: [(0, '27.982')] -[2024-06-06 13:59:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7483392. Throughput: 0: 648.6. Samples: 868846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:29,005][20018] Avg episode reward: [(0, '28.805')] -[2024-06-06 13:59:30,588][20350] Updated weights for policy 0, policy_version 1828 (0.0028) -[2024-06-06 13:59:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.1, 300 sec: 2652.0). Total num frames: 7491584. Throughput: 0: 611.2. Samples: 872194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:34,005][20018] Avg episode reward: [(0, '28.086')] -[2024-06-06 13:59:39,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7503872. Throughput: 0: 594.5. Samples: 873756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 13:59:39,005][20018] Avg episode reward: [(0, '28.578')] -[2024-06-06 13:59:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2665.9). Total num frames: 7520256. Throughput: 0: 625.4. Samples: 878112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:44,005][20018] Avg episode reward: [(0, '29.435')] -[2024-06-06 13:59:46,179][20350] Updated weights for policy 0, policy_version 1838 (0.0038) -[2024-06-06 13:59:49,006][20018] Fps is (10 sec: 2866.2, 60 sec: 2457.5, 300 sec: 2652.0). Total num frames: 7532544. Throughput: 0: 621.8. Samples: 882302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:49,014][20018] Avg episode reward: [(0, '29.261')] -[2024-06-06 13:59:54,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2457.6, 300 sec: 2638.1). Total num frames: 7540736. Throughput: 0: 612.9. Samples: 883752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 13:59:54,010][20018] Avg episode reward: [(0, '28.890')] -[2024-06-06 13:59:59,003][20018] Fps is (10 sec: 2458.5, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7557120. Throughput: 0: 614.4. Samples: 887086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 13:59:59,011][20018] Avg episode reward: [(0, '28.550')] -[2024-06-06 14:00:02,285][20350] Updated weights for policy 0, policy_version 1848 (0.0042) -[2024-06-06 14:00:04,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7573504. Throughput: 0: 662.1. Samples: 892022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:04,005][20018] Avg episode reward: [(0, '29.092')] -[2024-06-06 14:00:09,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2652.0). Total num frames: 7585792. Throughput: 0: 661.4. Samples: 894100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:00:09,005][20018] Avg episode reward: [(0, '32.084')] -[2024-06-06 14:00:09,020][20333] Saving new best policy, reward=32.084! -[2024-06-06 14:00:14,007][20018] Fps is (10 sec: 2047.1, 60 sec: 2525.7, 300 sec: 2651.9). Total num frames: 7593984. Throughput: 0: 621.5. Samples: 896818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:14,014][20018] Avg episode reward: [(0, '32.263')] -[2024-06-06 14:00:14,017][20333] Saving new best policy, reward=32.263! -[2024-06-06 14:00:19,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2638.1). Total num frames: 7606272. Throughput: 0: 631.0. Samples: 900588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:19,010][20018] Avg episode reward: [(0, '32.543')] -[2024-06-06 14:00:19,025][20333] Saving new best policy, reward=32.543! -[2024-06-06 14:00:19,670][20350] Updated weights for policy 0, policy_version 1858 (0.0034) -[2024-06-06 14:00:24,003][20018] Fps is (10 sec: 2868.5, 60 sec: 2594.1, 300 sec: 2638.1). Total num frames: 7622656. Throughput: 0: 646.2. Samples: 902836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:00:24,005][20018] Avg episode reward: [(0, '32.598')] -[2024-06-06 14:00:24,008][20333] Saving new best policy, reward=32.598! -[2024-06-06 14:00:29,003][20018] Fps is (10 sec: 2867.0, 60 sec: 2525.8, 300 sec: 2638.1). Total num frames: 7634944. Throughput: 0: 642.3. Samples: 907016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:00:29,006][20018] Avg episode reward: [(0, '32.495')] -[2024-06-06 14:00:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2624.2). Total num frames: 7643136. Throughput: 0: 617.0. Samples: 910064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:34,005][20018] Avg episode reward: [(0, '33.102')] -[2024-06-06 14:00:34,008][20333] Saving new best policy, reward=33.102! -[2024-06-06 14:00:35,915][20350] Updated weights for policy 0, policy_version 1868 (0.0053) -[2024-06-06 14:00:39,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 2624.2). Total num frames: 7659520. Throughput: 0: 628.1. Samples: 912016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-06-06 14:00:39,008][20018] Avg episode reward: [(0, '33.295')] -[2024-06-06 14:00:39,020][20333] Saving new best policy, reward=33.295! -[2024-06-06 14:00:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2610.4). Total num frames: 7671808. Throughput: 0: 649.7. Samples: 916324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:00:44,011][20018] Avg episode reward: [(0, '32.684')] -[2024-06-06 14:00:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.0, 300 sec: 2624.2). Total num frames: 7684096. Throughput: 0: 612.2. Samples: 919572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:49,005][20018] Avg episode reward: [(0, '32.187')] -[2024-06-06 14:00:52,530][20350] Updated weights for policy 0, policy_version 1878 (0.0034) -[2024-06-06 14:00:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2624.2). Total num frames: 7696384. Throughput: 0: 598.6. Samples: 921038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:00:54,012][20018] Avg episode reward: [(0, '30.082')] -[2024-06-06 14:00:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2624.2). Total num frames: 7712768. Throughput: 0: 640.7. Samples: 925646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 14:00:59,008][20018] Avg episode reward: [(0, '28.313')] -[2024-06-06 14:01:04,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2525.9, 300 sec: 2610.4). Total num frames: 7725056. Throughput: 0: 656.2. Samples: 930116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:01:04,011][20018] Avg episode reward: [(0, '26.783')] -[2024-06-06 14:01:07,433][20350] Updated weights for policy 0, policy_version 1888 (0.0020) -[2024-06-06 14:01:09,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2457.6, 300 sec: 2610.4). Total num frames: 7733248. Throughput: 0: 638.0. Samples: 931548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 14:01:09,005][20018] Avg episode reward: [(0, '27.125')] -[2024-06-06 14:01:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2526.1, 300 sec: 2596.4). Total num frames: 7745536. Throughput: 0: 615.3. Samples: 934704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 14:01:14,005][20018] Avg episode reward: [(0, '26.927')] -[2024-06-06 14:01:19,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7766016. Throughput: 0: 662.2. Samples: 939864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:01:19,012][20018] Avg episode reward: [(0, '26.410')] -[2024-06-06 14:01:19,028][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001896_7766016.pth... -[2024-06-06 14:01:19,176][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth -[2024-06-06 14:01:22,154][20350] Updated weights for policy 0, policy_version 1898 (0.0029) -[2024-06-06 14:01:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2596.4). Total num frames: 7774208. Throughput: 0: 665.9. Samples: 941980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:01:24,007][20018] Avg episode reward: [(0, '27.335')] -[2024-06-06 14:01:29,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2525.9, 300 sec: 2610.3). Total num frames: 7786496. Throughput: 0: 637.6. Samples: 945016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:01:29,006][20018] Avg episode reward: [(0, '26.305')] -[2024-06-06 14:01:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2610.4). Total num frames: 7802880. Throughput: 0: 658.5. Samples: 949204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:01:34,011][20018] Avg episode reward: [(0, '26.415')] -[2024-06-06 14:01:37,617][20350] Updated weights for policy 0, policy_version 1908 (0.0027) -[2024-06-06 14:01:39,005][20018] Fps is (10 sec: 3276.2, 60 sec: 2662.3, 300 sec: 2596.4). Total num frames: 7819264. Throughput: 0: 682.5. Samples: 951752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:01:39,008][20018] Avg episode reward: [(0, '28.778')] -[2024-06-06 14:01:44,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2594.1, 300 sec: 2596.5). Total num frames: 7827456. Throughput: 0: 667.5. Samples: 955686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:01:44,006][20018] Avg episode reward: [(0, '29.039')] -[2024-06-06 14:01:49,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7843840. Throughput: 0: 647.1. Samples: 959234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-06-06 14:01:49,010][20018] Avg episode reward: [(0, '29.154')] -[2024-06-06 14:01:52,550][20350] Updated weights for policy 0, policy_version 1918 (0.0023) -[2024-06-06 14:01:54,003][20018] Fps is (10 sec: 3277.1, 60 sec: 2730.7, 300 sec: 2610.3). Total num frames: 7860224. Throughput: 0: 672.3. Samples: 961800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:01:54,006][20018] Avg episode reward: [(0, '29.461')] -[2024-06-06 14:01:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2596.4). Total num frames: 7872512. Throughput: 0: 711.3. Samples: 966714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:01:59,009][20018] Avg episode reward: [(0, '29.328')] -[2024-06-06 14:02:04,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7884800. Throughput: 0: 664.7. Samples: 969778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:02:04,006][20018] Avg episode reward: [(0, '29.197')] -[2024-06-06 14:02:08,162][20350] Updated weights for policy 0, policy_version 1928 (0.0033) -[2024-06-06 14:02:09,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2730.6, 300 sec: 2596.4). Total num frames: 7897088. Throughput: 0: 656.0. Samples: 971502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-06-06 14:02:09,014][20018] Avg episode reward: [(0, '28.760')] -[2024-06-06 14:02:14,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2610.4). Total num frames: 7917568. Throughput: 0: 706.2. Samples: 976794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 14:02:14,008][20018] Avg episode reward: [(0, '29.731')] -[2024-06-06 14:02:19,003][20018] Fps is (10 sec: 3277.1, 60 sec: 2730.6, 300 sec: 2624.2). Total num frames: 7929856. Throughput: 0: 708.1. Samples: 981068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 14:02:19,008][20018] Avg episode reward: [(0, '29.703')] -[2024-06-06 14:02:23,768][20350] Updated weights for policy 0, policy_version 1938 (0.0034) -[2024-06-06 14:02:24,004][20018] Fps is (10 sec: 2047.7, 60 sec: 2730.6, 300 sec: 2610.3). Total num frames: 7938048. Throughput: 0: 680.6. Samples: 982378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 14:02:24,011][20018] Avg episode reward: [(0, '28.214')] -[2024-06-06 14:02:29,003][20018] Fps is (10 sec: 1638.5, 60 sec: 2662.4, 300 sec: 2596.4). Total num frames: 7946240. Throughput: 0: 647.7. Samples: 984832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 14:02:29,007][20018] Avg episode reward: [(0, '28.220')] -[2024-06-06 14:02:34,003][20018] Fps is (10 sec: 2048.3, 60 sec: 2594.1, 300 sec: 2610.3). Total num frames: 7958528. Throughput: 0: 652.4. Samples: 988592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-06-06 14:02:34,005][20018] Avg episode reward: [(0, '28.241')] -[2024-06-06 14:02:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.2, 300 sec: 2624.3). Total num frames: 7974912. Throughput: 0: 654.2. Samples: 991240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 14:02:39,006][20018] Avg episode reward: [(0, '29.947')] -[2024-06-06 14:02:39,863][20350] Updated weights for policy 0, policy_version 1948 (0.0046) -[2024-06-06 14:02:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2610.3). Total num frames: 7983104. Throughput: 0: 623.1. Samples: 994754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-06-06 14:02:44,009][20018] Avg episode reward: [(0, '29.953')] -[2024-06-06 14:02:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2610.3). Total num frames: 7999488. Throughput: 0: 642.5. Samples: 998692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-06-06 14:02:49,009][20018] Avg episode reward: [(0, '30.059')] -[2024-06-06 14:02:50,689][20333] Stopping Batcher_0... -[2024-06-06 14:02:50,691][20333] Loop batcher_evt_loop terminating... -[2024-06-06 14:02:50,692][20018] Component Batcher_0 stopped! -[2024-06-06 14:02:50,709][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... -[2024-06-06 14:02:50,782][20350] Weights refcount: 2 0 -[2024-06-06 14:02:50,785][20350] Stopping InferenceWorker_p0-w0... -[2024-06-06 14:02:50,786][20350] Loop inference_proc0-0_evt_loop terminating... -[2024-06-06 14:02:50,786][20018] Component InferenceWorker_p0-w0 stopped! -[2024-06-06 14:02:50,799][20353] Stopping RolloutWorker_w2... -[2024-06-06 14:02:50,802][20355] Stopping RolloutWorker_w4... -[2024-06-06 14:02:50,799][20018] Component RolloutWorker_w2 stopped! -[2024-06-06 14:02:50,806][20018] Component RolloutWorker_w4 stopped! -[2024-06-06 14:02:50,803][20353] Loop rollout_proc2_evt_loop terminating... -[2024-06-06 14:02:50,819][20018] Component RolloutWorker_w6 stopped! -[2024-06-06 14:02:50,812][20355] Loop rollout_proc4_evt_loop terminating... -[2024-06-06 14:02:50,818][20357] Stopping RolloutWorker_w6... -[2024-06-06 14:02:50,822][20357] Loop rollout_proc6_evt_loop terminating... -[2024-06-06 14:02:50,835][20018] Component RolloutWorker_w5 stopped! -[2024-06-06 14:02:50,835][20356] Stopping RolloutWorker_w5... -[2024-06-06 14:02:50,845][20356] Loop rollout_proc5_evt_loop terminating... -[2024-06-06 14:02:50,851][20018] Component RolloutWorker_w0 stopped! -[2024-06-06 14:02:50,853][20351] Stopping RolloutWorker_w0... -[2024-06-06 14:02:50,854][20351] Loop rollout_proc0_evt_loop terminating... -[2024-06-06 14:02:50,895][20018] Component RolloutWorker_w7 stopped! -[2024-06-06 14:02:50,904][20352] Stopping RolloutWorker_w1... -[2024-06-06 14:02:50,904][20352] Loop rollout_proc1_evt_loop terminating... -[2024-06-06 14:02:50,904][20018] Component RolloutWorker_w1 stopped! -[2024-06-06 14:02:50,895][20358] Stopping RolloutWorker_w7... -[2024-06-06 14:02:50,917][20354] Stopping RolloutWorker_w3... -[2024-06-06 14:02:50,912][20358] Loop rollout_proc7_evt_loop terminating... -[2024-06-06 14:02:50,917][20018] Component RolloutWorker_w3 stopped! -[2024-06-06 14:02:50,919][20354] Loop rollout_proc3_evt_loop terminating... -[2024-06-06 14:02:50,933][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth -[2024-06-06 14:02:50,959][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... -[2024-06-06 14:02:51,157][20018] Component LearnerWorker_p0 stopped! -[2024-06-06 14:02:51,160][20018] Waiting for process learner_proc0 to stop... -[2024-06-06 14:02:51,162][20333] Stopping LearnerWorker_p0... -[2024-06-06 14:02:51,163][20333] Loop learner_proc0_evt_loop terminating... -[2024-06-06 14:02:53,237][20018] Waiting for process inference_proc0-0 to join... -[2024-06-06 14:02:53,243][20018] Waiting for process rollout_proc0 to join... -[2024-06-06 14:02:55,217][20018] Waiting for process rollout_proc1 to join... -[2024-06-06 14:02:55,443][20018] Waiting for process rollout_proc2 to join... -[2024-06-06 14:02:55,449][20018] Waiting for process rollout_proc3 to join... -[2024-06-06 14:02:55,452][20018] Waiting for process rollout_proc4 to join... -[2024-06-06 14:02:55,457][20018] Waiting for process rollout_proc5 to join... -[2024-06-06 14:02:55,462][20018] Waiting for process rollout_proc6 to join... -[2024-06-06 14:02:55,467][20018] Waiting for process rollout_proc7 to join... -[2024-06-06 14:02:55,470][20018] Batcher 0 profile tree view: -batching: 29.3518, releasing_batches: 0.0397 -[2024-06-06 14:02:55,472][20018] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0016 - wait_policy_total: 507.1168 -update_model: 14.6358 - weight_update: 0.0043 -one_step: 0.0068 - handle_policy_step: 910.5851 - deserialize: 24.4844, stack: 5.5174, obs_to_device_normalize: 183.6583, forward: 499.7374, send_messages: 43.8067 - prepare_outputs: 109.8175 - to_cpu: 56.8004 -[2024-06-06 14:02:55,474][20018] Learner 0 profile tree view: -misc: 0.0068, prepare_batch: 14.2515 -train: 79.3266 - epoch_init: 0.0070, minibatch_init: 0.0106, losses_postprocess: 0.7325, kl_divergence: 0.8510, after_optimizer: 4.9770 - calculate_losses: 28.0587 - losses_init: 0.0044, forward_head: 1.6277, bptt_initial: 17.9514, tail: 1.3531, advantages_returns: 0.4493, losses: 3.7936 - bptt: 2.3988 - bptt_forward_core: 2.2933 - update: 43.8737 - clip: 1.1407 -[2024-06-06 14:02:55,477][20018] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.5428, enqueue_policy_requests: 159.9715, env_step: 1153.6276, overhead: 24.4674, complete_rollouts: 9.4613 -save_policy_outputs: 31.6610 - split_output_tensors: 12.4379 -[2024-06-06 14:02:55,479][20018] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.3626, enqueue_policy_requests: 165.2894, env_step: 1155.8202, overhead: 23.7456, complete_rollouts: 9.1582 -save_policy_outputs: 30.2311 - split_output_tensors: 11.9144 -[2024-06-06 14:02:55,480][20018] Loop Runner_EvtLoop terminating... -[2024-06-06 14:02:55,482][20018] Runner profile tree view: -main_loop: 1528.1796 -[2024-06-06 14:02:55,483][20018] Collected {0: 8007680}, FPS: 2618.7 -[2024-06-06 14:02:55,533][20018] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-06-06 14:02:55,536][20018] Overriding arg 'num_workers' with value 1 passed from command line -[2024-06-06 14:02:55,538][20018] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-06-06 14:02:55,540][20018] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-06-06 14:02:55,542][20018] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 14:02:55,544][20018] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-06-06 14:02:55,545][20018] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 14:02:55,546][20018] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-06-06 14:02:55,548][20018] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-06-06 14:02:55,549][20018] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-06-06 14:02:55,550][20018] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-06-06 14:02:55,551][20018] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-06-06 14:02:55,552][20018] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-06-06 14:02:55,553][20018] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-06-06 14:02:55,555][20018] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-06-06 14:02:55,605][20018] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-06-06 14:02:55,611][20018] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 14:02:55,613][20018] RunningMeanStd input shape: (1,) -[2024-06-06 14:02:55,634][20018] ConvEncoder: input_channels=3 -[2024-06-06 14:02:55,784][20018] Conv encoder output size: 512 -[2024-06-06 14:02:55,786][20018] Policy head output size: 512 -[2024-06-06 14:02:56,177][20018] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... -[2024-06-06 14:02:57,558][20018] Num frames 100... -[2024-06-06 14:02:57,775][20018] Num frames 200... -[2024-06-06 14:02:58,021][20018] Num frames 300... -[2024-06-06 14:02:58,250][20018] Num frames 400... -[2024-06-06 14:02:58,467][20018] Num frames 500... -[2024-06-06 14:02:58,679][20018] Num frames 600... -[2024-06-06 14:02:58,757][20018] Avg episode rewards: #0: 15.080, true rewards: #0: 6.080 -[2024-06-06 14:02:58,759][20018] Avg episode reward: 15.080, avg true_objective: 6.080 -[2024-06-06 14:02:58,967][20018] Num frames 700... -[2024-06-06 14:02:59,157][20018] Num frames 800... -[2024-06-06 14:02:59,316][20018] Num frames 900... -[2024-06-06 14:02:59,456][20018] Num frames 1000... -[2024-06-06 14:02:59,596][20018] Num frames 1100... -[2024-06-06 14:02:59,741][20018] Num frames 1200... -[2024-06-06 14:02:59,889][20018] Num frames 1300... -[2024-06-06 14:03:00,039][20018] Num frames 1400... -[2024-06-06 14:03:00,152][20018] Avg episode rewards: #0: 15.700, true rewards: #0: 7.200 -[2024-06-06 14:03:00,155][20018] Avg episode reward: 15.700, avg true_objective: 7.200 -[2024-06-06 14:03:00,244][20018] Num frames 1500... -[2024-06-06 14:03:00,398][20018] Num frames 1600... -[2024-06-06 14:03:00,541][20018] Num frames 1700... -[2024-06-06 14:03:00,683][20018] Num frames 1800... -[2024-06-06 14:03:00,830][20018] Num frames 1900... -[2024-06-06 14:03:00,970][20018] Num frames 2000... -[2024-06-06 14:03:01,122][20018] Num frames 2100... -[2024-06-06 14:03:01,272][20018] Num frames 2200... -[2024-06-06 14:03:01,442][20018] Num frames 2300... -[2024-06-06 14:03:01,590][20018] Num frames 2400... -[2024-06-06 14:03:01,734][20018] Num frames 2500... -[2024-06-06 14:03:01,881][20018] Num frames 2600... -[2024-06-06 14:03:02,022][20018] Num frames 2700... -[2024-06-06 14:03:02,172][20018] Num frames 2800... -[2024-06-06 14:03:02,327][20018] Num frames 2900... -[2024-06-06 14:03:02,475][20018] Num frames 3000... -[2024-06-06 14:03:02,634][20018] Num frames 3100... -[2024-06-06 14:03:02,785][20018] Num frames 3200... -[2024-06-06 14:03:02,932][20018] Num frames 3300... -[2024-06-06 14:03:03,083][20018] Num frames 3400... -[2024-06-06 14:03:03,228][20018] Num frames 3500... -[2024-06-06 14:03:03,345][20018] Avg episode rewards: #0: 30.133, true rewards: #0: 11.800 -[2024-06-06 14:03:03,347][20018] Avg episode reward: 30.133, avg true_objective: 11.800 -[2024-06-06 14:03:03,452][20018] Num frames 3600... -[2024-06-06 14:03:03,596][20018] Num frames 3700... -[2024-06-06 14:03:03,745][20018] Num frames 3800... -[2024-06-06 14:03:03,893][20018] Num frames 3900... -[2024-06-06 14:03:04,033][20018] Num frames 4000... -[2024-06-06 14:03:04,182][20018] Num frames 4100... -[2024-06-06 14:03:04,321][20018] Num frames 4200... -[2024-06-06 14:03:04,472][20018] Num frames 4300... -[2024-06-06 14:03:04,623][20018] Num frames 4400... -[2024-06-06 14:03:04,771][20018] Num frames 4500... -[2024-06-06 14:03:04,913][20018] Num frames 4600... -[2024-06-06 14:03:05,056][20018] Num frames 4700... -[2024-06-06 14:03:05,209][20018] Num frames 4800... -[2024-06-06 14:03:05,355][20018] Num frames 4900... -[2024-06-06 14:03:05,424][20018] Avg episode rewards: #0: 32.020, true rewards: #0: 12.270 -[2024-06-06 14:03:05,425][20018] Avg episode reward: 32.020, avg true_objective: 12.270 -[2024-06-06 14:03:05,572][20018] Num frames 5000... -[2024-06-06 14:03:05,720][20018] Num frames 5100... -[2024-06-06 14:03:05,864][20018] Num frames 5200... -[2024-06-06 14:03:06,008][20018] Num frames 5300... -[2024-06-06 14:03:06,153][20018] Num frames 5400... -[2024-06-06 14:03:06,297][20018] Num frames 5500... -[2024-06-06 14:03:06,457][20018] Num frames 5600... -[2024-06-06 14:03:06,616][20018] Num frames 5700... -[2024-06-06 14:03:06,761][20018] Num frames 5800... -[2024-06-06 14:03:06,900][20018] Num frames 5900... -[2024-06-06 14:03:07,055][20018] Avg episode rewards: #0: 29.936, true rewards: #0: 11.936 -[2024-06-06 14:03:07,057][20018] Avg episode reward: 29.936, avg true_objective: 11.936 -[2024-06-06 14:03:07,108][20018] Num frames 6000... -[2024-06-06 14:03:07,244][20018] Num frames 6100... -[2024-06-06 14:03:07,399][20018] Num frames 6200... -[2024-06-06 14:03:07,559][20018] Num frames 6300... -[2024-06-06 14:03:07,698][20018] Num frames 6400... -[2024-06-06 14:03:07,841][20018] Num frames 6500... -[2024-06-06 14:03:07,984][20018] Num frames 6600... -[2024-06-06 14:03:08,130][20018] Num frames 6700... -[2024-06-06 14:03:08,280][20018] Num frames 6800... -[2024-06-06 14:03:08,424][20018] Num frames 6900... -[2024-06-06 14:03:08,580][20018] Num frames 7000... -[2024-06-06 14:03:08,731][20018] Num frames 7100... -[2024-06-06 14:03:08,886][20018] Num frames 7200... -[2024-06-06 14:03:09,039][20018] Num frames 7300... -[2024-06-06 14:03:09,228][20018] Num frames 7400... -[2024-06-06 14:03:09,454][20018] Num frames 7500... -[2024-06-06 14:03:09,697][20018] Num frames 7600... -[2024-06-06 14:03:09,914][20018] Num frames 7700... -[2024-06-06 14:03:10,129][20018] Num frames 7800... -[2024-06-06 14:03:10,346][20018] Num frames 7900... -[2024-06-06 14:03:10,563][20018] Num frames 8000... -[2024-06-06 14:03:10,780][20018] Avg episode rewards: #0: 34.280, true rewards: #0: 13.447 -[2024-06-06 14:03:10,783][20018] Avg episode reward: 34.280, avg true_objective: 13.447 -[2024-06-06 14:03:10,861][20018] Num frames 8100... -[2024-06-06 14:03:11,085][20018] Num frames 8200... -[2024-06-06 14:03:11,316][20018] Num frames 8300... -[2024-06-06 14:03:11,540][20018] Num frames 8400... -[2024-06-06 14:03:11,768][20018] Num frames 8500... -[2024-06-06 14:03:11,984][20018] Num frames 8600... -[2024-06-06 14:03:12,249][20018] Avg episode rewards: #0: 31.128, true rewards: #0: 12.414 -[2024-06-06 14:03:12,251][20018] Avg episode reward: 31.128, avg true_objective: 12.414 -[2024-06-06 14:03:12,279][20018] Num frames 8700... -[2024-06-06 14:03:12,449][20018] Num frames 8800... -[2024-06-06 14:03:12,608][20018] Num frames 8900... -[2024-06-06 14:03:12,766][20018] Num frames 9000... -[2024-06-06 14:03:12,914][20018] Num frames 9100... -[2024-06-06 14:03:13,058][20018] Num frames 9200... -[2024-06-06 14:03:13,208][20018] Num frames 9300... -[2024-06-06 14:03:13,350][20018] Num frames 9400... -[2024-06-06 14:03:13,493][20018] Num frames 9500... -[2024-06-06 14:03:13,648][20018] Num frames 9600... -[2024-06-06 14:03:13,809][20018] Num frames 9700... -[2024-06-06 14:03:13,873][20018] Avg episode rewards: #0: 30.129, true rewards: #0: 12.129 -[2024-06-06 14:03:13,876][20018] Avg episode reward: 30.129, avg true_objective: 12.129 -[2024-06-06 14:03:14,018][20018] Num frames 9800... -[2024-06-06 14:03:14,167][20018] Num frames 9900... -[2024-06-06 14:03:14,313][20018] Num frames 10000... -[2024-06-06 14:03:14,462][20018] Num frames 10100... -[2024-06-06 14:03:14,612][20018] Num frames 10200... -[2024-06-06 14:03:14,762][20018] Num frames 10300... -[2024-06-06 14:03:14,920][20018] Num frames 10400... -[2024-06-06 14:03:15,066][20018] Num frames 10500... -[2024-06-06 14:03:15,217][20018] Num frames 10600... -[2024-06-06 14:03:15,364][20018] Num frames 10700... -[2024-06-06 14:03:15,512][20018] Num frames 10800... -[2024-06-06 14:03:15,658][20018] Num frames 10900... -[2024-06-06 14:03:15,818][20018] Num frames 11000... -[2024-06-06 14:03:15,971][20018] Num frames 11100... -[2024-06-06 14:03:16,123][20018] Num frames 11200... -[2024-06-06 14:03:16,194][20018] Avg episode rewards: #0: 31.230, true rewards: #0: 12.452 -[2024-06-06 14:03:16,196][20018] Avg episode reward: 31.230, avg true_objective: 12.452 -[2024-06-06 14:03:16,341][20018] Num frames 11300... -[2024-06-06 14:03:16,498][20018] Num frames 11400... -[2024-06-06 14:03:16,644][20018] Num frames 11500... -[2024-06-06 14:03:16,787][20018] Num frames 11600... -[2024-06-06 14:03:16,952][20018] Num frames 11700... -[2024-06-06 14:03:17,096][20018] Num frames 11800... -[2024-06-06 14:03:17,298][20018] Num frames 11900... -[2024-06-06 14:03:17,497][20018] Num frames 12000... -[2024-06-06 14:03:17,667][20018] Num frames 12100... -[2024-06-06 14:03:17,730][20018] Avg episode rewards: #0: 30.003, true rewards: #0: 12.103 -[2024-06-06 14:03:17,732][20018] Avg episode reward: 30.003, avg true_objective: 12.103 -[2024-06-06 14:04:41,427][20018] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-06-06 14:04:42,218][20018] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-06-06 14:04:42,221][20018] Overriding arg 'num_workers' with value 1 passed from command line -[2024-06-06 14:04:42,225][20018] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-06-06 14:04:42,227][20018] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-06-06 14:04:42,229][20018] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-06-06 14:04:42,230][20018] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-06-06 14:04:42,232][20018] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-06-06 14:04:42,233][20018] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-06-06 14:04:42,234][20018] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-06-06 14:04:42,236][20018] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-06-06 14:04:42,237][20018] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-06-06 14:04:42,238][20018] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-06-06 14:04:42,239][20018] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-06-06 14:04:42,240][20018] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-06-06 14:04:42,241][20018] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-06-06 14:04:42,287][20018] RunningMeanStd input shape: (3, 72, 128) -[2024-06-06 14:04:42,290][20018] RunningMeanStd input shape: (1,) -[2024-06-06 14:04:42,314][20018] ConvEncoder: input_channels=3 -[2024-06-06 14:04:42,381][20018] Conv encoder output size: 512 -[2024-06-06 14:04:42,384][20018] Policy head output size: 512 -[2024-06-06 14:04:42,413][20018] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... -[2024-06-06 14:04:43,189][20018] Num frames 100... -[2024-06-06 14:04:43,400][20018] Num frames 200... -[2024-06-06 14:04:43,618][20018] Num frames 300... -[2024-06-06 14:04:43,841][20018] Num frames 400... -[2024-06-06 14:04:44,052][20018] Num frames 500... -[2024-06-06 14:04:44,271][20018] Num frames 600... -[2024-06-06 14:04:44,491][20018] Num frames 700... -[2024-06-06 14:04:44,725][20018] Num frames 800... -[2024-06-06 14:04:44,955][20018] Num frames 900... -[2024-06-06 14:04:45,171][20018] Num frames 1000... -[2024-06-06 14:04:45,378][20018] Num frames 1100... -[2024-06-06 14:04:45,582][20018] Num frames 1200... -[2024-06-06 14:04:45,811][20018] Num frames 1300... -[2024-06-06 14:04:46,056][20018] Num frames 1400... -[2024-06-06 14:04:46,283][20018] Num frames 1500... -[2024-06-06 14:04:46,528][20018] Num frames 1600... -[2024-06-06 14:04:46,808][20018] Num frames 1700... -[2024-06-06 14:04:47,110][20018] Num frames 1800... -[2024-06-06 14:04:47,344][20018] Num frames 1900... -[2024-06-06 14:04:47,588][20018] Num frames 2000... -[2024-06-06 14:04:47,858][20018] Num frames 2100... -[2024-06-06 14:04:47,913][20018] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 -[2024-06-06 14:04:47,915][20018] Avg episode reward: 58.999, avg true_objective: 21.000 -[2024-06-06 14:04:48,209][20018] Num frames 2200... -[2024-06-06 14:04:48,495][20018] Num frames 2300... -[2024-06-06 14:04:48,740][20018] Num frames 2400... -[2024-06-06 14:04:49,003][20018] Num frames 2500... -[2024-06-06 14:04:49,252][20018] Num frames 2600... -[2024-06-06 14:04:49,517][20018] Num frames 2700... -[2024-06-06 14:04:49,783][20018] Num frames 2800... -[2024-06-06 14:04:50,061][20018] Num frames 2900... -[2024-06-06 14:04:50,160][20018] Avg episode rewards: #0: 37.564, true rewards: #0: 14.565 -[2024-06-06 14:04:50,163][20018] Avg episode reward: 37.564, avg true_objective: 14.565 -[2024-06-06 14:04:50,382][20018] Num frames 3000... -[2024-06-06 14:04:50,646][20018] Num frames 3100... -[2024-06-06 14:04:50,911][20018] Num frames 3200... -[2024-06-06 14:04:51,245][20018] Avg episode rewards: #0: 26.323, true rewards: #0: 10.990 -[2024-06-06 14:04:51,248][20018] Avg episode reward: 26.323, avg true_objective: 10.990 -[2024-06-06 14:04:51,258][20018] Num frames 3300... -[2024-06-06 14:04:51,551][20018] Num frames 3400... -[2024-06-06 14:04:51,857][20018] Num frames 3500... -[2024-06-06 14:04:52,100][20018] Num frames 3600... -[2024-06-06 14:04:52,329][20018] Num frames 3700... -[2024-06-06 14:04:52,425][20018] Avg episode rewards: #0: 21.040, true rewards: #0: 9.290 -[2024-06-06 14:04:52,427][20018] Avg episode reward: 21.040, avg true_objective: 9.290 -[2024-06-06 14:04:52,575][20018] Num frames 3800... -[2024-06-06 14:04:52,723][20018] Num frames 3900... -[2024-06-06 14:04:52,879][20018] Num frames 4000... -[2024-06-06 14:04:53,027][20018] Num frames 4100... -[2024-06-06 14:04:53,176][20018] Num frames 4200... -[2024-06-06 14:04:53,336][20018] Num frames 4300... -[2024-06-06 14:04:53,487][20018] Num frames 4400... -[2024-06-06 14:04:53,644][20018] Num frames 4500... -[2024-06-06 14:04:53,797][20018] Num frames 4600... -[2024-06-06 14:04:53,956][20018] Num frames 4700... -[2024-06-06 14:04:54,119][20018] Num frames 4800... -[2024-06-06 14:04:54,320][20018] Num frames 4900... -[2024-06-06 14:04:54,394][20018] Avg episode rewards: #0: 22.218, true rewards: #0: 9.818 -[2024-06-06 14:04:54,396][20018] Avg episode reward: 22.218, avg true_objective: 9.818 -[2024-06-06 14:04:54,533][20018] Num frames 5000... -[2024-06-06 14:04:54,718][20018] Num frames 5100... -[2024-06-06 14:04:54,893][20018] Num frames 5200... -[2024-06-06 14:04:55,080][20018] Num frames 5300... -[2024-06-06 14:04:55,253][20018] Num frames 5400... -[2024-06-06 14:04:55,425][20018] Num frames 5500... -[2024-06-06 14:04:55,604][20018] Num frames 5600... -[2024-06-06 14:04:55,697][20018] Avg episode rewards: #0: 20.522, true rewards: #0: 9.355 -[2024-06-06 14:04:55,699][20018] Avg episode reward: 20.522, avg true_objective: 9.355 -[2024-06-06 14:04:55,830][20018] Num frames 5700... -[2024-06-06 14:04:55,976][20018] Num frames 5800... -[2024-06-06 14:04:56,127][20018] Num frames 5900... -[2024-06-06 14:04:56,288][20018] Num frames 6000... -[2024-06-06 14:04:56,476][20018] Avg episode rewards: #0: 19.116, true rewards: #0: 8.687 -[2024-06-06 14:04:56,478][20018] Avg episode reward: 19.116, avg true_objective: 8.687 -[2024-06-06 14:04:56,515][20018] Num frames 6100... -[2024-06-06 14:04:56,727][20018] Num frames 6200... -[2024-06-06 14:04:56,939][20018] Num frames 6300... -[2024-06-06 14:04:57,115][20018] Num frames 6400... -[2024-06-06 14:04:57,265][20018] Num frames 6500... -[2024-06-06 14:04:57,431][20018] Num frames 6600... -[2024-06-06 14:04:57,583][20018] Num frames 6700... -[2024-06-06 14:04:57,738][20018] Num frames 6800... -[2024-06-06 14:04:57,886][20018] Num frames 6900... -[2024-06-06 14:04:58,062][20018] Avg episode rewards: #0: 19.467, true rewards: #0: 8.717 -[2024-06-06 14:04:58,064][20018] Avg episode reward: 19.467, avg true_objective: 8.717 -[2024-06-06 14:04:58,110][20018] Num frames 7000... -[2024-06-06 14:04:58,258][20018] Num frames 7100... -[2024-06-06 14:04:58,415][20018] Num frames 7200... -[2024-06-06 14:04:58,577][20018] Num frames 7300... -[2024-06-06 14:04:58,731][20018] Num frames 7400... -[2024-06-06 14:04:58,890][20018] Num frames 7500... -[2024-06-06 14:04:58,976][20018] Avg episode rewards: #0: 18.353, true rewards: #0: 8.353 -[2024-06-06 14:04:58,978][20018] Avg episode reward: 18.353, avg true_objective: 8.353 -[2024-06-06 14:04:59,107][20018] Num frames 7600... -[2024-06-06 14:04:59,256][20018] Num frames 7700... -[2024-06-06 14:04:59,417][20018] Num frames 7800... -[2024-06-06 14:04:59,586][20018] Num frames 7900... -[2024-06-06 14:04:59,738][20018] Num frames 8000... -[2024-06-06 14:04:59,904][20018] Num frames 8100... -[2024-06-06 14:05:00,057][20018] Num frames 8200... -[2024-06-06 14:05:00,212][20018] Num frames 8300... -[2024-06-06 14:05:00,374][20018] Avg episode rewards: #0: 18.369, true rewards: #0: 8.369 -[2024-06-06 14:05:00,377][20018] Avg episode reward: 18.369, avg true_objective: 8.369 -[2024-06-06 14:06:03,065][20018] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-06-06 14:15:24,463][03191] Using optimizer +[2024-06-06 14:15:25,947][03191] No checkpoints found +[2024-06-06 14:15:25,947][03191] Did not load from checkpoint, starting from scratch! +[2024-06-06 14:15:25,947][03191] Initialized policy 0 weights for model version 0 +[2024-06-06 14:15:25,952][03191] LearnerWorker_p0 finished initialization! +[2024-06-06 14:15:25,956][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 14:15:26,161][03204] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:15:26,162][03204] RunningMeanStd input shape: (1,) +[2024-06-06 14:15:26,182][03204] ConvEncoder: input_channels=3 +[2024-06-06 14:15:26,342][03204] Conv encoder output size: 512 +[2024-06-06 14:15:26,343][03204] Policy head output size: 512 +[2024-06-06 14:15:26,420][01062] Inference worker 0-0 is ready! +[2024-06-06 14:15:26,423][01062] All inference workers are ready! Signal rollout workers to start! +[2024-06-06 14:15:26,582][01062] Heartbeat connected on Batcher_0 +[2024-06-06 14:15:26,586][01062] Heartbeat connected on LearnerWorker_p0 +[2024-06-06 14:15:26,625][01062] Heartbeat connected on InferenceWorker_p0-w0 +[2024-06-06 14:15:26,890][03208] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,905][03205] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,910][03206] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,966][03212] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,979][03207] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,982][03210] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:26,997][03211] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:27,022][03209] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:15:28,138][01062] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:29,163][03206] Decorrelating experience for 0 frames... +[2024-06-06 14:15:29,162][03209] Decorrelating experience for 0 frames... +[2024-06-06 14:15:29,161][03208] Decorrelating experience for 0 frames... +[2024-06-06 14:15:29,165][03207] Decorrelating experience for 0 frames... +[2024-06-06 14:15:29,164][03211] Decorrelating experience for 0 frames... +[2024-06-06 14:15:29,161][03205] Decorrelating experience for 0 frames... +[2024-06-06 14:15:30,277][03205] Decorrelating experience for 32 frames... +[2024-06-06 14:15:30,288][03207] Decorrelating experience for 32 frames... +[2024-06-06 14:15:30,291][03210] Decorrelating experience for 0 frames... +[2024-06-06 14:15:30,557][03208] Decorrelating experience for 32 frames... +[2024-06-06 14:15:30,560][03206] Decorrelating experience for 32 frames... +[2024-06-06 14:15:30,562][03209] Decorrelating experience for 32 frames... +[2024-06-06 14:15:30,633][03212] Decorrelating experience for 0 frames... +[2024-06-06 14:15:31,295][03211] Decorrelating experience for 32 frames... +[2024-06-06 14:15:31,342][03207] Decorrelating experience for 64 frames... +[2024-06-06 14:15:31,765][03212] Decorrelating experience for 32 frames... +[2024-06-06 14:15:31,786][03206] Decorrelating experience for 64 frames... +[2024-06-06 14:15:31,783][03208] Decorrelating experience for 64 frames... +[2024-06-06 14:15:32,656][03209] Decorrelating experience for 64 frames... +[2024-06-06 14:15:32,728][03208] Decorrelating experience for 96 frames... +[2024-06-06 14:15:32,909][03205] Decorrelating experience for 64 frames... +[2024-06-06 14:15:32,940][03211] Decorrelating experience for 64 frames... +[2024-06-06 14:15:32,973][03210] Decorrelating experience for 32 frames... +[2024-06-06 14:15:33,136][03207] Decorrelating experience for 96 frames... +[2024-06-06 14:15:33,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:33,935][03206] Decorrelating experience for 96 frames... +[2024-06-06 14:15:34,182][03212] Decorrelating experience for 64 frames... +[2024-06-06 14:15:34,315][03210] Decorrelating experience for 64 frames... +[2024-06-06 14:15:34,361][03211] Decorrelating experience for 96 frames... +[2024-06-06 14:15:34,413][03209] Decorrelating experience for 96 frames... +[2024-06-06 14:15:35,312][03207] Decorrelating experience for 128 frames... +[2024-06-06 14:15:35,595][03205] Decorrelating experience for 96 frames... +[2024-06-06 14:15:35,974][03208] Decorrelating experience for 128 frames... +[2024-06-06 14:15:36,038][03212] Decorrelating experience for 96 frames... +[2024-06-06 14:15:36,369][03206] Decorrelating experience for 128 frames... +[2024-06-06 14:15:36,415][03211] Decorrelating experience for 128 frames... +[2024-06-06 14:15:36,954][03209] Decorrelating experience for 128 frames... +[2024-06-06 14:15:37,012][03210] Decorrelating experience for 96 frames... +[2024-06-06 14:15:37,638][03205] Decorrelating experience for 128 frames... +[2024-06-06 14:15:37,765][03207] Decorrelating experience for 160 frames... +[2024-06-06 14:15:37,797][03208] Decorrelating experience for 160 frames... +[2024-06-06 14:15:38,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:38,505][03206] Decorrelating experience for 160 frames... +[2024-06-06 14:15:38,578][03212] Decorrelating experience for 128 frames... +[2024-06-06 14:15:39,389][03210] Decorrelating experience for 128 frames... +[2024-06-06 14:15:39,416][03211] Decorrelating experience for 160 frames... +[2024-06-06 14:15:39,429][03209] Decorrelating experience for 160 frames... +[2024-06-06 14:15:39,947][03207] Decorrelating experience for 192 frames... +[2024-06-06 14:15:40,936][03212] Decorrelating experience for 160 frames... +[2024-06-06 14:15:41,695][03206] Decorrelating experience for 192 frames... +[2024-06-06 14:15:41,747][03205] Decorrelating experience for 160 frames... +[2024-06-06 14:15:42,196][03208] Decorrelating experience for 192 frames... +[2024-06-06 14:15:42,256][03209] Decorrelating experience for 192 frames... +[2024-06-06 14:15:42,380][03211] Decorrelating experience for 192 frames... +[2024-06-06 14:15:42,931][03207] Decorrelating experience for 224 frames... +[2024-06-06 14:15:43,139][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:43,776][03212] Decorrelating experience for 192 frames... +[2024-06-06 14:15:43,929][01062] Heartbeat connected on RolloutWorker_w2 +[2024-06-06 14:15:44,486][03210] Decorrelating experience for 160 frames... +[2024-06-06 14:15:45,068][03206] Decorrelating experience for 224 frames... +[2024-06-06 14:15:45,276][03205] Decorrelating experience for 192 frames... +[2024-06-06 14:15:45,675][03209] Decorrelating experience for 224 frames... +[2024-06-06 14:15:45,997][03211] Decorrelating experience for 224 frames... +[2024-06-06 14:15:46,070][01062] Heartbeat connected on RolloutWorker_w1 +[2024-06-06 14:15:46,599][01062] Heartbeat connected on RolloutWorker_w5 +[2024-06-06 14:15:46,785][03208] Decorrelating experience for 224 frames... +[2024-06-06 14:15:47,086][01062] Heartbeat connected on RolloutWorker_w6 +[2024-06-06 14:15:48,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 29.6. Samples: 592. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:48,143][01062] Avg episode reward: [(0, '0.823')] +[2024-06-06 14:15:48,208][01062] Heartbeat connected on RolloutWorker_w3 +[2024-06-06 14:15:49,190][03210] Decorrelating experience for 192 frames... +[2024-06-06 14:15:49,353][03205] Decorrelating experience for 224 frames... +[2024-06-06 14:15:51,458][01062] Heartbeat connected on RolloutWorker_w0 +[2024-06-06 14:15:52,633][03212] Decorrelating experience for 224 frames... +[2024-06-06 14:15:53,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 77.8. Samples: 1944. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 14:15:53,141][01062] Avg episode reward: [(0, '1.875')] +[2024-06-06 14:15:53,420][03191] Signal inference workers to stop experience collection... +[2024-06-06 14:15:53,437][03204] InferenceWorker_p0-w0: stopping experience collection +[2024-06-06 14:15:53,480][01062] Heartbeat connected on RolloutWorker_w7 +[2024-06-06 14:15:53,693][03210] Decorrelating experience for 224 frames... +[2024-06-06 14:15:53,938][01062] Heartbeat connected on RolloutWorker_w4 +[2024-06-06 14:15:55,002][03191] Signal inference workers to resume experience collection... +[2024-06-06 14:15:55,003][03204] InferenceWorker_p0-w0: resuming experience collection +[2024-06-06 14:15:58,138][01062] Fps is (10 sec: 1638.4, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 16384. Throughput: 0: 134.4. Samples: 4032. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2024-06-06 14:15:58,140][01062] Avg episode reward: [(0, '2.384')] +[2024-06-06 14:16:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 936.2, 300 sec: 936.2). Total num frames: 32768. Throughput: 0: 256.6. Samples: 8980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:16:03,145][01062] Avg episode reward: [(0, '3.114')] +[2024-06-06 14:16:05,548][03204] Updated weights for policy 0, policy_version 10 (0.0025) +[2024-06-06 14:16:08,138][01062] Fps is (10 sec: 2867.2, 60 sec: 1126.4, 300 sec: 1126.4). Total num frames: 45056. Throughput: 0: 281.3. Samples: 11252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:16:08,140][01062] Avg episode reward: [(0, '3.869')] +[2024-06-06 14:16:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 1547.4, 300 sec: 1547.4). Total num frames: 69632. Throughput: 0: 371.1. Samples: 16700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:16:13,140][01062] Avg episode reward: [(0, '4.295')] +[2024-06-06 14:16:15,821][03204] Updated weights for policy 0, policy_version 20 (0.0041) +[2024-06-06 14:16:18,141][01062] Fps is (10 sec: 4504.5, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 90112. Throughput: 0: 525.9. Samples: 23668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:16:18,143][01062] Avg episode reward: [(0, '4.471')] +[2024-06-06 14:16:23,141][01062] Fps is (10 sec: 3685.5, 60 sec: 1936.2, 300 sec: 1936.2). Total num frames: 106496. Throughput: 0: 588.6. Samples: 26488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:16:23,149][01062] Avg episode reward: [(0, '4.388')] +[2024-06-06 14:16:23,180][03191] Saving new best policy, reward=4.388! +[2024-06-06 14:16:27,677][03204] Updated weights for policy 0, policy_version 30 (0.0015) +[2024-06-06 14:16:28,138][01062] Fps is (10 sec: 3277.6, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 122880. Throughput: 0: 686.8. Samples: 30908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:16:28,144][01062] Avg episode reward: [(0, '4.407')] +[2024-06-06 14:16:28,162][03191] Saving new best policy, reward=4.407! +[2024-06-06 14:16:33,138][01062] Fps is (10 sec: 3687.3, 60 sec: 2389.3, 300 sec: 2205.5). Total num frames: 143360. Throughput: 0: 801.4. Samples: 36656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:16:33,141][01062] Avg episode reward: [(0, '4.372')] +[2024-06-06 14:16:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2223.5). Total num frames: 155648. Throughput: 0: 826.2. Samples: 39124. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:16:38,140][01062] Avg episode reward: [(0, '4.499')] +[2024-06-06 14:16:38,156][03191] Saving new best policy, reward=4.499! +[2024-06-06 14:16:41,557][03204] Updated weights for policy 0, policy_version 40 (0.0022) +[2024-06-06 14:16:43,138][01062] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2239.1). Total num frames: 167936. Throughput: 0: 846.8. Samples: 42136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:16:43,141][01062] Avg episode reward: [(0, '4.490')] +[2024-06-06 14:16:48,141][01062] Fps is (10 sec: 2866.5, 60 sec: 3071.9, 300 sec: 2303.9). Total num frames: 184320. Throughput: 0: 840.3. Samples: 46796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:16:48,143][01062] Avg episode reward: [(0, '4.416')] +[2024-06-06 14:16:53,003][03204] Updated weights for policy 0, policy_version 50 (0.0019) +[2024-06-06 14:16:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2409.4). Total num frames: 204800. Throughput: 0: 850.0. Samples: 49504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:16:53,141][01062] Avg episode reward: [(0, '4.348')] +[2024-06-06 14:16:58,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 2503.1). Total num frames: 225280. Throughput: 0: 880.7. Samples: 56332. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:16:58,140][01062] Avg episode reward: [(0, '4.553')] +[2024-06-06 14:16:58,148][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000055_225280.pth... +[2024-06-06 14:16:58,339][03191] Saving new best policy, reward=4.553! +[2024-06-06 14:17:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2500.7). Total num frames: 237568. Throughput: 0: 826.9. Samples: 60876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:17:03,144][01062] Avg episode reward: [(0, '4.536')] +[2024-06-06 14:17:05,293][03204] Updated weights for policy 0, policy_version 60 (0.0014) +[2024-06-06 14:17:08,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3413.2, 300 sec: 2498.5). Total num frames: 249856. Throughput: 0: 806.0. Samples: 62760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:17:08,143][01062] Avg episode reward: [(0, '4.382')] +[2024-06-06 14:17:13,139][01062] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 2496.6). Total num frames: 262144. Throughput: 0: 788.7. Samples: 66400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:17:13,145][01062] Avg episode reward: [(0, '4.361')] +[2024-06-06 14:17:17,829][03204] Updated weights for policy 0, policy_version 70 (0.0021) +[2024-06-06 14:17:18,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3276.9, 300 sec: 2606.5). Total num frames: 286720. Throughput: 0: 788.4. Samples: 72136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:17:18,146][01062] Avg episode reward: [(0, '4.555')] +[2024-06-06 14:17:18,155][03191] Saving new best policy, reward=4.555! +[2024-06-06 14:17:23,138][01062] Fps is (10 sec: 4505.8, 60 sec: 3345.2, 300 sec: 2671.3). Total num frames: 307200. Throughput: 0: 810.0. Samples: 75572. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:17:23,144][01062] Avg episode reward: [(0, '4.358')] +[2024-06-06 14:17:28,139][01062] Fps is (10 sec: 3686.2, 60 sec: 3345.0, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 875.8. Samples: 81548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:17:28,144][01062] Avg episode reward: [(0, '4.320')] +[2024-06-06 14:17:28,464][03204] Updated weights for policy 0, policy_version 80 (0.0015) +[2024-06-06 14:17:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2719.7). Total num frames: 339968. Throughput: 0: 874.8. Samples: 86160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:17:33,142][01062] Avg episode reward: [(0, '4.445')] +[2024-06-06 14:17:38,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 2772.7). Total num frames: 360448. Throughput: 0: 875.6. Samples: 88908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:17:38,140][01062] Avg episode reward: [(0, '4.510')] +[2024-06-06 14:17:39,070][03204] Updated weights for policy 0, policy_version 90 (0.0019) +[2024-06-06 14:17:43,139][01062] Fps is (10 sec: 4505.4, 60 sec: 3618.1, 300 sec: 2852.0). Total num frames: 385024. Throughput: 0: 879.2. Samples: 95896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:17:43,143][01062] Avg episode reward: [(0, '4.438')] +[2024-06-06 14:17:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 2867.2). Total num frames: 401408. Throughput: 0: 910.0. Samples: 101828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:17:48,142][01062] Avg episode reward: [(0, '4.380')] +[2024-06-06 14:17:49,421][03204] Updated weights for policy 0, policy_version 100 (0.0025) +[2024-06-06 14:17:53,141][01062] Fps is (10 sec: 3276.1, 60 sec: 3549.7, 300 sec: 2881.3). Total num frames: 417792. Throughput: 0: 919.5. Samples: 104136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:17:53,143][01062] Avg episode reward: [(0, '4.462')] +[2024-06-06 14:17:58,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2921.8). Total num frames: 438272. Throughput: 0: 947.9. Samples: 109056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:17:58,141][01062] Avg episode reward: [(0, '4.451')] +[2024-06-06 14:18:00,560][03204] Updated weights for policy 0, policy_version 110 (0.0014) +[2024-06-06 14:18:03,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 2959.7). Total num frames: 458752. Throughput: 0: 970.1. Samples: 115792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:18:03,142][01062] Avg episode reward: [(0, '4.736')] +[2024-06-06 14:18:03,149][03191] Saving new best policy, reward=4.736! +[2024-06-06 14:18:08,140][01062] Fps is (10 sec: 4095.3, 60 sec: 3823.0, 300 sec: 2995.2). Total num frames: 479232. Throughput: 0: 967.2. Samples: 119096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:18:08,143][01062] Avg episode reward: [(0, '4.829')] +[2024-06-06 14:18:08,150][03191] Saving new best policy, reward=4.829! +[2024-06-06 14:18:12,072][03204] Updated weights for policy 0, policy_version 120 (0.0036) +[2024-06-06 14:18:13,139][01062] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 2978.9). Total num frames: 491520. Throughput: 0: 930.8. Samples: 123432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:18:13,143][01062] Avg episode reward: [(0, '4.837')] +[2024-06-06 14:18:13,150][03191] Saving new best policy, reward=4.837! +[2024-06-06 14:18:18,138][01062] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3011.8). Total num frames: 512000. Throughput: 0: 948.1. Samples: 128824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:18:18,141][01062] Avg episode reward: [(0, '4.952')] +[2024-06-06 14:18:18,156][03191] Saving new best policy, reward=4.952! +[2024-06-06 14:18:22,748][03204] Updated weights for policy 0, policy_version 130 (0.0017) +[2024-06-06 14:18:23,138][01062] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3042.7). Total num frames: 532480. Throughput: 0: 950.8. Samples: 131696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:18:23,141][01062] Avg episode reward: [(0, '4.646')] +[2024-06-06 14:18:28,139][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3072.0). Total num frames: 552960. Throughput: 0: 937.5. Samples: 138084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:18:28,147][01062] Avg episode reward: [(0, '4.636')] +[2024-06-06 14:18:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3055.4). Total num frames: 565248. Throughput: 0: 901.2. Samples: 142380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:18:33,141][01062] Avg episode reward: [(0, '4.697')] +[2024-06-06 14:18:35,059][03204] Updated weights for policy 0, policy_version 140 (0.0014) +[2024-06-06 14:18:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3082.8). Total num frames: 585728. Throughput: 0: 902.5. Samples: 144748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:18:38,141][01062] Avg episode reward: [(0, '4.681')] +[2024-06-06 14:18:43,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3108.8). Total num frames: 606208. Throughput: 0: 942.4. Samples: 151464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:18:43,144][01062] Avg episode reward: [(0, '4.927')] +[2024-06-06 14:18:44,389][03204] Updated weights for policy 0, policy_version 150 (0.0014) +[2024-06-06 14:18:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3133.4). Total num frames: 626688. Throughput: 0: 932.7. Samples: 157764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:18:48,144][01062] Avg episode reward: [(0, '4.908')] +[2024-06-06 14:18:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3136.9). Total num frames: 643072. Throughput: 0: 907.9. Samples: 159948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:18:53,141][01062] Avg episode reward: [(0, '5.032')] +[2024-06-06 14:18:53,146][03191] Saving new best policy, reward=5.032! +[2024-06-06 14:18:56,770][03204] Updated weights for policy 0, policy_version 160 (0.0028) +[2024-06-06 14:18:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 659456. Throughput: 0: 913.5. Samples: 164540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:18:58,144][01062] Avg episode reward: [(0, '4.678')] +[2024-06-06 14:18:58,160][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth... +[2024-06-06 14:19:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3162.5). Total num frames: 679936. Throughput: 0: 940.4. Samples: 171144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:19:03,140][01062] Avg episode reward: [(0, '4.798')] +[2024-06-06 14:19:05,836][03204] Updated weights for policy 0, policy_version 170 (0.0017) +[2024-06-06 14:19:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3183.7). Total num frames: 700416. Throughput: 0: 955.0. Samples: 174672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:19:08,141][01062] Avg episode reward: [(0, '5.174')] +[2024-06-06 14:19:08,156][03191] Saving new best policy, reward=5.174! +[2024-06-06 14:19:13,140][01062] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3185.8). Total num frames: 716800. Throughput: 0: 913.8. Samples: 179208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:19:13,142][01062] Avg episode reward: [(0, '5.269')] +[2024-06-06 14:19:13,147][03191] Saving new best policy, reward=5.269! +[2024-06-06 14:19:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3187.8). Total num frames: 733184. Throughput: 0: 937.5. Samples: 184568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:19:18,141][01062] Avg episode reward: [(0, '5.044')] +[2024-06-06 14:19:18,173][03204] Updated weights for policy 0, policy_version 180 (0.0017) +[2024-06-06 14:19:23,139][01062] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3224.5). Total num frames: 757760. Throughput: 0: 963.0. Samples: 188084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:19:23,141][01062] Avg episode reward: [(0, '5.440')] +[2024-06-06 14:19:23,144][03191] Saving new best policy, reward=5.440! +[2024-06-06 14:19:27,737][03204] Updated weights for policy 0, policy_version 190 (0.0033) +[2024-06-06 14:19:28,139][01062] Fps is (10 sec: 4505.3, 60 sec: 3754.6, 300 sec: 3242.7). Total num frames: 778240. Throughput: 0: 959.5. Samples: 194644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:19:28,141][01062] Avg episode reward: [(0, '6.046')] +[2024-06-06 14:19:28,158][03191] Saving new best policy, reward=6.046! +[2024-06-06 14:19:33,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3243.4). Total num frames: 794624. Throughput: 0: 918.3. Samples: 199088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:19:33,142][01062] Avg episode reward: [(0, '5.971')] +[2024-06-06 14:19:38,138][01062] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3244.0). Total num frames: 811008. Throughput: 0: 923.8. Samples: 201520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:19:38,141][01062] Avg episode reward: [(0, '5.575')] +[2024-06-06 14:19:39,554][03204] Updated weights for policy 0, policy_version 200 (0.0029) +[2024-06-06 14:19:43,140][01062] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3276.8). Total num frames: 835584. Throughput: 0: 965.9. Samples: 208008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:19:43,143][01062] Avg episode reward: [(0, '5.472')] +[2024-06-06 14:19:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3292.6). Total num frames: 856064. Throughput: 0: 971.4. Samples: 214856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:19:48,142][01062] Avg episode reward: [(0, '5.546')] +[2024-06-06 14:19:48,882][03204] Updated weights for policy 0, policy_version 210 (0.0025) +[2024-06-06 14:19:53,139][01062] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3292.3). Total num frames: 872448. Throughput: 0: 942.9. Samples: 217104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:19:53,142][01062] Avg episode reward: [(0, '5.513')] +[2024-06-06 14:19:58,140][01062] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3292.0). Total num frames: 888832. Throughput: 0: 943.4. Samples: 221660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:19:58,143][01062] Avg episode reward: [(0, '5.777')] +[2024-06-06 14:20:00,674][03204] Updated weights for policy 0, policy_version 220 (0.0015) +[2024-06-06 14:20:03,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3306.6). Total num frames: 909312. Throughput: 0: 974.9. Samples: 228440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:20:03,141][01062] Avg episode reward: [(0, '5.791')] +[2024-06-06 14:20:08,138][01062] Fps is (10 sec: 4506.3, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 933888. Throughput: 0: 974.7. Samples: 231944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:20:08,145][01062] Avg episode reward: [(0, '5.619')] +[2024-06-06 14:20:10,329][03204] Updated weights for policy 0, policy_version 230 (0.0040) +[2024-06-06 14:20:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3334.3). Total num frames: 950272. Throughput: 0: 947.7. Samples: 237292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:13,142][01062] Avg episode reward: [(0, '5.875')] +[2024-06-06 14:20:18,140][01062] Fps is (10 sec: 3276.2, 60 sec: 3891.1, 300 sec: 3333.3). Total num frames: 966656. Throughput: 0: 956.0. Samples: 242108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:18,145][01062] Avg episode reward: [(0, '5.897')] +[2024-06-06 14:20:21,337][03204] Updated weights for policy 0, policy_version 240 (0.0020) +[2024-06-06 14:20:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 982.4. Samples: 245728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:23,141][01062] Avg episode reward: [(0, '6.586')] +[2024-06-06 14:20:23,147][03191] Saving new best policy, reward=6.586! +[2024-06-06 14:20:28,138][01062] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 992.4. Samples: 252664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:28,143][01062] Avg episode reward: [(0, '7.013')] +[2024-06-06 14:20:28,153][03191] Saving new best policy, reward=7.013! +[2024-06-06 14:20:32,014][03204] Updated weights for policy 0, policy_version 250 (0.0017) +[2024-06-06 14:20:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 948.7. Samples: 257548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:33,145][01062] Avg episode reward: [(0, '7.335')] +[2024-06-06 14:20:33,146][03191] Saving new best policy, reward=7.335! +[2024-06-06 14:20:38,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 949.7. Samples: 259840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:20:38,146][01062] Avg episode reward: [(0, '7.087')] +[2024-06-06 14:20:42,626][03204] Updated weights for policy 0, policy_version 260 (0.0027) +[2024-06-06 14:20:43,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 985.4. Samples: 266000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:20:43,144][01062] Avg episode reward: [(0, '6.606')] +[2024-06-06 14:20:48,139][01062] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 988.5. Samples: 272924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:20:48,145][01062] Avg episode reward: [(0, '6.679')] +[2024-06-06 14:20:53,142][01062] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3679.4). Total num frames: 1101824. Throughput: 0: 962.9. Samples: 275276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:20:53,148][01062] Avg episode reward: [(0, '6.956')] +[2024-06-06 14:20:53,477][03204] Updated weights for policy 0, policy_version 270 (0.0023) +[2024-06-06 14:20:58,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 3679.5). Total num frames: 1118208. Throughput: 0: 943.4. Samples: 279744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:20:58,141][01062] Avg episode reward: [(0, '7.375')] +[2024-06-06 14:20:58,150][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth... +[2024-06-06 14:20:58,267][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000055_225280.pth +[2024-06-06 14:20:58,287][03191] Saving new best policy, reward=7.375! +[2024-06-06 14:21:03,138][01062] Fps is (10 sec: 4097.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1142784. Throughput: 0: 974.8. Samples: 285972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:21:03,144][01062] Avg episode reward: [(0, '7.984')] +[2024-06-06 14:21:03,145][03191] Saving new best policy, reward=7.984! +[2024-06-06 14:21:04,187][03204] Updated weights for policy 0, policy_version 280 (0.0014) +[2024-06-06 14:21:08,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1163264. Throughput: 0: 972.2. Samples: 289476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:21:08,141][01062] Avg episode reward: [(0, '8.775')] +[2024-06-06 14:21:08,148][03191] Saving new best policy, reward=8.775! +[2024-06-06 14:21:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3693.4). Total num frames: 1179648. Throughput: 0: 938.8. Samples: 294912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:21:13,145][01062] Avg episode reward: [(0, '8.699')] +[2024-06-06 14:21:15,737][03204] Updated weights for policy 0, policy_version 290 (0.0020) +[2024-06-06 14:21:18,139][01062] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 1191936. Throughput: 0: 933.2. Samples: 299540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:21:18,149][01062] Avg episode reward: [(0, '8.852')] +[2024-06-06 14:21:18,267][03191] Saving new best policy, reward=8.852! +[2024-06-06 14:21:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1220608. Throughput: 0: 955.7. Samples: 302848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:21:23,144][01062] Avg episode reward: [(0, '8.610')] +[2024-06-06 14:21:25,159][03204] Updated weights for policy 0, policy_version 300 (0.0019) +[2024-06-06 14:21:28,143][01062] Fps is (10 sec: 4913.1, 60 sec: 3822.6, 300 sec: 3721.1). Total num frames: 1241088. Throughput: 0: 970.5. Samples: 309676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:21:28,145][01062] Avg episode reward: [(0, '9.365')] +[2024-06-06 14:21:28,157][03191] Saving new best policy, reward=9.365! +[2024-06-06 14:21:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1257472. Throughput: 0: 928.5. Samples: 314708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:21:33,141][01062] Avg episode reward: [(0, '8.847')] +[2024-06-06 14:21:37,191][03204] Updated weights for policy 0, policy_version 310 (0.0020) +[2024-06-06 14:21:38,138][01062] Fps is (10 sec: 2868.5, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1269760. Throughput: 0: 927.5. Samples: 317012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:21:38,142][01062] Avg episode reward: [(0, '9.023')] +[2024-06-06 14:21:43,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3762.8). Total num frames: 1294336. Throughput: 0: 962.7. Samples: 323068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:21:43,142][01062] Avg episode reward: [(0, '8.614')] +[2024-06-06 14:21:46,296][03204] Updated weights for policy 0, policy_version 320 (0.0018) +[2024-06-06 14:21:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1314816. Throughput: 0: 978.8. Samples: 330020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:21:48,142][01062] Avg episode reward: [(0, '9.085')] +[2024-06-06 14:21:53,140][01062] Fps is (10 sec: 3277.0, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1327104. Throughput: 0: 943.4. Samples: 331932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:21:53,145][01062] Avg episode reward: [(0, '8.994')] +[2024-06-06 14:21:58,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3686.2, 300 sec: 3735.0). Total num frames: 1339392. Throughput: 0: 904.9. Samples: 335636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:21:58,143][01062] Avg episode reward: [(0, '9.775')] +[2024-06-06 14:21:58,154][03191] Saving new best policy, reward=9.775! +[2024-06-06 14:22:02,099][03204] Updated weights for policy 0, policy_version 330 (0.0021) +[2024-06-06 14:22:03,138][01062] Fps is (10 sec: 2458.1, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 1351680. Throughput: 0: 878.6. Samples: 339076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:22:03,144][01062] Avg episode reward: [(0, '9.998')] +[2024-06-06 14:22:03,148][03191] Saving new best policy, reward=9.998! +[2024-06-06 14:22:08,139][01062] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 1376256. Throughput: 0: 873.7. Samples: 342164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:22:08,146][01062] Avg episode reward: [(0, '10.185')] +[2024-06-06 14:22:08,158][03191] Saving new best policy, reward=10.185! +[2024-06-06 14:22:11,558][03204] Updated weights for policy 0, policy_version 340 (0.0027) +[2024-06-06 14:22:13,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1396736. Throughput: 0: 871.8. Samples: 348904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:22:13,149][01062] Avg episode reward: [(0, '10.488')] +[2024-06-06 14:22:13,153][03191] Saving new best policy, reward=10.488! +[2024-06-06 14:22:18,140][01062] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 1413120. Throughput: 0: 874.9. Samples: 354080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:22:18,144][01062] Avg episode reward: [(0, '10.401')] +[2024-06-06 14:22:23,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3748.9). Total num frames: 1429504. Throughput: 0: 875.1. Samples: 356392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:22:23,143][01062] Avg episode reward: [(0, '10.562')] +[2024-06-06 14:22:23,146][03191] Saving new best policy, reward=10.562! +[2024-06-06 14:22:24,044][03204] Updated weights for policy 0, policy_version 350 (0.0025) +[2024-06-06 14:22:28,138][01062] Fps is (10 sec: 3686.8, 60 sec: 3481.9, 300 sec: 3762.8). Total num frames: 1449984. Throughput: 0: 871.0. Samples: 362260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:22:28,141][01062] Avg episode reward: [(0, '11.011')] +[2024-06-06 14:22:28,152][03191] Saving new best policy, reward=11.011! +[2024-06-06 14:22:33,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 1470464. Throughput: 0: 865.6. Samples: 368972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:22:33,147][01062] Avg episode reward: [(0, '10.808')] +[2024-06-06 14:22:33,162][03204] Updated weights for policy 0, policy_version 360 (0.0016) +[2024-06-06 14:22:38,140][01062] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3735.0). Total num frames: 1486848. Throughput: 0: 884.2. Samples: 371720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:22:38,143][01062] Avg episode reward: [(0, '10.526')] +[2024-06-06 14:22:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3735.0). Total num frames: 1503232. Throughput: 0: 904.8. Samples: 376348. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-06-06 14:22:43,146][01062] Avg episode reward: [(0, '11.324')] +[2024-06-06 14:22:43,153][03191] Saving new best policy, reward=11.324! +[2024-06-06 14:22:45,313][03204] Updated weights for policy 0, policy_version 370 (0.0024) +[2024-06-06 14:22:48,138][01062] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 1527808. Throughput: 0: 963.6. Samples: 382436. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-06-06 14:22:48,141][01062] Avg episode reward: [(0, '11.287')] +[2024-06-06 14:22:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 1548288. Throughput: 0: 974.0. Samples: 385992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:22:53,140][01062] Avg episode reward: [(0, '10.889')] +[2024-06-06 14:22:54,099][03204] Updated weights for policy 0, policy_version 380 (0.0029) +[2024-06-06 14:22:58,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 1568768. Throughput: 0: 959.7. Samples: 392092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:22:58,141][01062] Avg episode reward: [(0, '11.199')] +[2024-06-06 14:22:58,151][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000383_1568768.pth... +[2024-06-06 14:22:58,319][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth +[2024-06-06 14:23:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1581056. Throughput: 0: 943.9. Samples: 396556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:23:03,148][01062] Avg episode reward: [(0, '11.385')] +[2024-06-06 14:23:03,155][03191] Saving new best policy, reward=11.385! +[2024-06-06 14:23:06,336][03204] Updated weights for policy 0, policy_version 390 (0.0024) +[2024-06-06 14:23:08,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1601536. Throughput: 0: 953.2. Samples: 399284. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-06-06 14:23:08,152][01062] Avg episode reward: [(0, '11.939')] +[2024-06-06 14:23:08,227][03191] Saving new best policy, reward=11.939! +[2024-06-06 14:23:13,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1626112. Throughput: 0: 973.7. Samples: 406076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:23:13,144][01062] Avg episode reward: [(0, '13.326')] +[2024-06-06 14:23:13,150][03191] Saving new best policy, reward=13.326! +[2024-06-06 14:23:16,002][03204] Updated weights for policy 0, policy_version 400 (0.0013) +[2024-06-06 14:23:18,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1642496. Throughput: 0: 951.5. Samples: 411788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:23:18,146][01062] Avg episode reward: [(0, '13.823')] +[2024-06-06 14:23:18,161][03191] Saving new best policy, reward=13.823! +[2024-06-06 14:23:23,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 1658880. Throughput: 0: 939.9. Samples: 414016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:23:23,148][01062] Avg episode reward: [(0, '12.718')] +[2024-06-06 14:23:27,784][03204] Updated weights for policy 0, policy_version 410 (0.0025) +[2024-06-06 14:23:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 1679360. Throughput: 0: 954.5. Samples: 419300. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:23:28,141][01062] Avg episode reward: [(0, '12.704')] +[2024-06-06 14:23:33,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1699840. Throughput: 0: 964.2. Samples: 425824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-06-06 14:23:33,145][01062] Avg episode reward: [(0, '12.302')] +[2024-06-06 14:23:38,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1716224. Throughput: 0: 954.3. Samples: 428936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:23:38,145][01062] Avg episode reward: [(0, '12.086')] +[2024-06-06 14:23:38,700][03204] Updated weights for policy 0, policy_version 420 (0.0015) +[2024-06-06 14:23:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1732608. Throughput: 0: 917.9. Samples: 433396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:23:43,141][01062] Avg episode reward: [(0, '12.367')] +[2024-06-06 14:23:48,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1753088. Throughput: 0: 945.1. Samples: 439084. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:23:48,141][01062] Avg episode reward: [(0, '13.654')] +[2024-06-06 14:23:49,668][03204] Updated weights for policy 0, policy_version 430 (0.0032) +[2024-06-06 14:23:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1777664. Throughput: 0: 960.9. Samples: 442524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:23:53,141][01062] Avg episode reward: [(0, '15.404')] +[2024-06-06 14:23:53,143][03191] Saving new best policy, reward=15.404! +[2024-06-06 14:23:58,145][01062] Fps is (10 sec: 4093.3, 60 sec: 3754.3, 300 sec: 3776.6). Total num frames: 1794048. Throughput: 0: 945.2. Samples: 448616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:23:58,147][01062] Avg episode reward: [(0, '15.927')] +[2024-06-06 14:23:58,159][03191] Saving new best policy, reward=15.927! +[2024-06-06 14:24:00,775][03204] Updated weights for policy 0, policy_version 440 (0.0033) +[2024-06-06 14:24:03,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1806336. Throughput: 0: 912.1. Samples: 452832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:24:03,146][01062] Avg episode reward: [(0, '16.395')] +[2024-06-06 14:24:03,150][03191] Saving new best policy, reward=16.395! +[2024-06-06 14:24:08,138][01062] Fps is (10 sec: 3278.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1826816. Throughput: 0: 913.5. Samples: 455124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:24:08,141][01062] Avg episode reward: [(0, '15.661')] +[2024-06-06 14:24:11,280][03204] Updated weights for policy 0, policy_version 450 (0.0028) +[2024-06-06 14:24:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1847296. Throughput: 0: 945.5. Samples: 461848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:24:13,140][01062] Avg episode reward: [(0, '15.786')] +[2024-06-06 14:24:18,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1867776. Throughput: 0: 936.2. Samples: 467952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:24:18,141][01062] Avg episode reward: [(0, '16.091')] +[2024-06-06 14:24:23,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 1880064. Throughput: 0: 917.2. Samples: 470212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:24:23,143][01062] Avg episode reward: [(0, '16.049')] +[2024-06-06 14:24:23,202][03204] Updated weights for policy 0, policy_version 460 (0.0033) +[2024-06-06 14:24:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1904640. Throughput: 0: 931.7. Samples: 475324. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-06-06 14:24:28,141][01062] Avg episode reward: [(0, '16.104')] +[2024-06-06 14:24:32,776][03204] Updated weights for policy 0, policy_version 470 (0.0016) +[2024-06-06 14:24:33,138][01062] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1925120. Throughput: 0: 954.9. Samples: 482056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:24:33,141][01062] Avg episode reward: [(0, '18.296')] +[2024-06-06 14:24:33,145][03191] Saving new best policy, reward=18.296! +[2024-06-06 14:24:38,142][01062] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 1941504. Throughput: 0: 951.4. Samples: 485340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:24:38,149][01062] Avg episode reward: [(0, '17.789')] +[2024-06-06 14:24:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1957888. Throughput: 0: 915.2. Samples: 489796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:24:43,142][01062] Avg episode reward: [(0, '17.496')] +[2024-06-06 14:24:45,061][03204] Updated weights for policy 0, policy_version 480 (0.0023) +[2024-06-06 14:24:48,138][01062] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1978368. Throughput: 0: 939.3. Samples: 495100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:24:48,143][01062] Avg episode reward: [(0, '16.777')] +[2024-06-06 14:24:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2002944. Throughput: 0: 964.8. Samples: 498540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:24:53,141][01062] Avg episode reward: [(0, '17.355')] +[2024-06-06 14:24:54,168][03204] Updated weights for policy 0, policy_version 490 (0.0021) +[2024-06-06 14:24:58,142][01062] Fps is (10 sec: 4094.7, 60 sec: 3754.9, 300 sec: 3762.7). Total num frames: 2019328. Throughput: 0: 962.3. Samples: 505156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:24:58,144][01062] Avg episode reward: [(0, '17.685')] +[2024-06-06 14:24:58,157][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth... +[2024-06-06 14:24:58,348][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth +[2024-06-06 14:25:03,141][01062] Fps is (10 sec: 3275.9, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 2035712. Throughput: 0: 924.1. Samples: 509540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:25:03,144][01062] Avg episode reward: [(0, '18.050')] +[2024-06-06 14:25:06,887][03204] Updated weights for policy 0, policy_version 500 (0.0021) +[2024-06-06 14:25:08,138][01062] Fps is (10 sec: 3277.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2052096. Throughput: 0: 923.8. Samples: 511780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) +[2024-06-06 14:25:08,141][01062] Avg episode reward: [(0, '19.307')] +[2024-06-06 14:25:08,153][03191] Saving new best policy, reward=19.307! +[2024-06-06 14:25:13,139][01062] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2076672. Throughput: 0: 953.6. Samples: 518236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) +[2024-06-06 14:25:13,141][01062] Avg episode reward: [(0, '18.955')] +[2024-06-06 14:25:15,971][03204] Updated weights for policy 0, policy_version 510 (0.0016) +[2024-06-06 14:25:18,139][01062] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2093056. Throughput: 0: 942.3. Samples: 524460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:25:18,143][01062] Avg episode reward: [(0, '19.265')] +[2024-06-06 14:25:23,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3721.1). Total num frames: 2109440. Throughput: 0: 918.4. Samples: 526664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:25:23,151][01062] Avg episode reward: [(0, '18.438')] +[2024-06-06 14:25:28,138][01062] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2125824. Throughput: 0: 922.1. Samples: 531292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:25:28,144][01062] Avg episode reward: [(0, '18.850')] +[2024-06-06 14:25:28,273][03204] Updated weights for policy 0, policy_version 520 (0.0017) +[2024-06-06 14:25:33,138][01062] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2150400. Throughput: 0: 950.3. Samples: 537864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:25:33,151][01062] Avg episode reward: [(0, '18.175')] +[2024-06-06 14:25:38,143][01062] Fps is (10 sec: 4094.2, 60 sec: 3754.6, 300 sec: 3734.9). Total num frames: 2166784. Throughput: 0: 948.3. Samples: 541216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:25:38,148][01062] Avg episode reward: [(0, '17.803')] +[2024-06-06 14:25:38,614][03204] Updated weights for policy 0, policy_version 530 (0.0018) +[2024-06-06 14:25:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2183168. Throughput: 0: 907.1. Samples: 545972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) +[2024-06-06 14:25:43,148][01062] Avg episode reward: [(0, '17.692')] +[2024-06-06 14:25:48,138][01062] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 2199552. Throughput: 0: 919.3. Samples: 550904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:25:48,144][01062] Avg episode reward: [(0, '16.980')] +[2024-06-06 14:25:50,168][03204] Updated weights for policy 0, policy_version 540 (0.0030) +[2024-06-06 14:25:53,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2224128. Throughput: 0: 946.9. Samples: 554392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:25:53,146][01062] Avg episode reward: [(0, '17.171')] +[2024-06-06 14:25:58,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.9, 300 sec: 3735.0). Total num frames: 2244608. Throughput: 0: 954.4. Samples: 561184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:25:58,143][01062] Avg episode reward: [(0, '17.859')] +[2024-06-06 14:26:00,186][03204] Updated weights for policy 0, policy_version 550 (0.0013) +[2024-06-06 14:26:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2260992. Throughput: 0: 915.8. Samples: 565672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:26:03,142][01062] Avg episode reward: [(0, '18.680')] +[2024-06-06 14:26:08,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2273280. Throughput: 0: 916.9. Samples: 567924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:26:08,140][01062] Avg episode reward: [(0, '18.439')] +[2024-06-06 14:26:11,533][03204] Updated weights for policy 0, policy_version 560 (0.0023) +[2024-06-06 14:26:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2297856. Throughput: 0: 953.2. Samples: 574184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:26:13,141][01062] Avg episode reward: [(0, '18.135')] +[2024-06-06 14:26:18,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2318336. Throughput: 0: 959.3. Samples: 581032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:26:18,143][01062] Avg episode reward: [(0, '19.209')] +[2024-06-06 14:26:22,657][03204] Updated weights for policy 0, policy_version 570 (0.0040) +[2024-06-06 14:26:23,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 2334720. Throughput: 0: 935.4. Samples: 583304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:26:23,142][01062] Avg episode reward: [(0, '19.396')] +[2024-06-06 14:26:23,147][03191] Saving new best policy, reward=19.396! +[2024-06-06 14:26:28,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2351104. Throughput: 0: 927.7. Samples: 587720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:26:28,143][01062] Avg episode reward: [(0, '19.639')] +[2024-06-06 14:26:28,156][03191] Saving new best policy, reward=19.639! +[2024-06-06 14:26:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2371584. Throughput: 0: 956.8. Samples: 593960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:26:33,144][01062] Avg episode reward: [(0, '19.612')] +[2024-06-06 14:26:33,542][03204] Updated weights for policy 0, policy_version 580 (0.0016) +[2024-06-06 14:26:38,144][01062] Fps is (10 sec: 4093.7, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2392064. Throughput: 0: 951.9. Samples: 597232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 14:26:38,147][01062] Avg episode reward: [(0, '18.191')] +[2024-06-06 14:26:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2404352. Throughput: 0: 908.4. Samples: 602064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:26:43,146][01062] Avg episode reward: [(0, '18.162')] +[2024-06-06 14:26:46,382][03204] Updated weights for policy 0, policy_version 590 (0.0014) +[2024-06-06 14:26:48,139][01062] Fps is (10 sec: 2458.8, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 2416640. Throughput: 0: 889.7. Samples: 605708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:26:48,149][01062] Avg episode reward: [(0, '16.772')] +[2024-06-06 14:26:53,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3707.3). Total num frames: 2433024. Throughput: 0: 881.4. Samples: 607588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:26:53,145][01062] Avg episode reward: [(0, '16.765')] +[2024-06-06 14:26:58,139][01062] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 2453504. Throughput: 0: 866.3. Samples: 613168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 14:26:58,141][01062] Avg episode reward: [(0, '16.780')] +[2024-06-06 14:26:58,153][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000599_2453504.pth... +[2024-06-06 14:26:58,277][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000383_1568768.pth +[2024-06-06 14:26:58,494][03204] Updated weights for policy 0, policy_version 600 (0.0045) +[2024-06-06 14:27:03,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2473984. Throughput: 0: 855.3. Samples: 619520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:27:03,145][01062] Avg episode reward: [(0, '17.099')] +[2024-06-06 14:27:08,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 2490368. Throughput: 0: 853.8. Samples: 621728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:27:08,145][01062] Avg episode reward: [(0, '18.185')] +[2024-06-06 14:27:10,629][03204] Updated weights for policy 0, policy_version 610 (0.0036) +[2024-06-06 14:27:13,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 2506752. Throughput: 0: 855.1. Samples: 626200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 14:27:13,141][01062] Avg episode reward: [(0, '18.622')] +[2024-06-06 14:27:18,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 2527232. Throughput: 0: 860.9. Samples: 632700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:27:18,140][01062] Avg episode reward: [(0, '18.443')] +[2024-06-06 14:27:20,149][03204] Updated weights for policy 0, policy_version 620 (0.0019) +[2024-06-06 14:27:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2547712. Throughput: 0: 863.0. Samples: 636060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:27:23,155][01062] Avg episode reward: [(0, '18.576')] +[2024-06-06 14:27:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 2564096. Throughput: 0: 873.5. Samples: 641372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:27:28,143][01062] Avg episode reward: [(0, '19.483')] +[2024-06-06 14:27:32,700][03204] Updated weights for policy 0, policy_version 630 (0.0038) +[2024-06-06 14:27:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 2580480. Throughput: 0: 894.7. Samples: 645968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:27:33,142][01062] Avg episode reward: [(0, '18.771')] +[2024-06-06 14:27:38,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3550.2, 300 sec: 3735.0). Total num frames: 2605056. Throughput: 0: 926.8. Samples: 649292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:27:38,140][01062] Avg episode reward: [(0, '20.714')] +[2024-06-06 14:27:38,153][03191] Saving new best policy, reward=20.714! +[2024-06-06 14:27:41,542][03204] Updated weights for policy 0, policy_version 640 (0.0013) +[2024-06-06 14:27:43,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2625536. Throughput: 0: 954.0. Samples: 656100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:27:43,145][01062] Avg episode reward: [(0, '20.295')] +[2024-06-06 14:27:48,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2641920. Throughput: 0: 921.5. Samples: 660988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:27:48,144][01062] Avg episode reward: [(0, '20.871')] +[2024-06-06 14:27:48,165][03191] Saving new best policy, reward=20.871! +[2024-06-06 14:27:53,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2654208. Throughput: 0: 919.6. Samples: 663108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:27:53,145][01062] Avg episode reward: [(0, '19.491')] +[2024-06-06 14:27:54,251][03204] Updated weights for policy 0, policy_version 650 (0.0016) +[2024-06-06 14:27:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2674688. Throughput: 0: 950.1. Samples: 668956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:27:58,141][01062] Avg episode reward: [(0, '20.176')] +[2024-06-06 14:28:03,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2699264. Throughput: 0: 955.3. Samples: 675688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:28:03,141][01062] Avg episode reward: [(0, '19.620')] +[2024-06-06 14:28:03,980][03204] Updated weights for policy 0, policy_version 660 (0.0027) +[2024-06-06 14:28:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 2715648. Throughput: 0: 936.0. Samples: 678180. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:08,140][01062] Avg episode reward: [(0, '19.397')] +[2024-06-06 14:28:13,139][01062] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2732032. Throughput: 0: 918.1. Samples: 682688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:13,146][01062] Avg episode reward: [(0, '19.502')] +[2024-06-06 14:28:16,114][03204] Updated weights for policy 0, policy_version 670 (0.0026) +[2024-06-06 14:28:18,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 2752512. Throughput: 0: 954.4. Samples: 688920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:28:18,152][01062] Avg episode reward: [(0, '19.841')] +[2024-06-06 14:28:23,138][01062] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2772992. Throughput: 0: 957.6. Samples: 692384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:28:23,141][01062] Avg episode reward: [(0, '19.592')] +[2024-06-06 14:28:25,902][03204] Updated weights for policy 0, policy_version 680 (0.0019) +[2024-06-06 14:28:28,143][01062] Fps is (10 sec: 3685.7, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 2789376. Throughput: 0: 931.4. Samples: 698016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:28:28,145][01062] Avg episode reward: [(0, '19.944')] +[2024-06-06 14:28:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2805760. Throughput: 0: 922.6. Samples: 702504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:33,146][01062] Avg episode reward: [(0, '20.631')] +[2024-06-06 14:28:37,639][03204] Updated weights for policy 0, policy_version 690 (0.0019) +[2024-06-06 14:28:38,138][01062] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2830336. Throughput: 0: 941.0. Samples: 705452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:38,146][01062] Avg episode reward: [(0, '21.620')] +[2024-06-06 14:28:38,157][03191] Saving new best policy, reward=21.620! +[2024-06-06 14:28:43,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2850816. Throughput: 0: 961.4. Samples: 712220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:43,140][01062] Avg episode reward: [(0, '20.966')] +[2024-06-06 14:28:47,177][03204] Updated weights for policy 0, policy_version 700 (0.0018) +[2024-06-06 14:28:48,142][01062] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 2867200. Throughput: 0: 932.5. Samples: 717656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:28:48,145][01062] Avg episode reward: [(0, '20.508')] +[2024-06-06 14:28:53,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3693.4). Total num frames: 2883584. Throughput: 0: 927.2. Samples: 719904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:28:53,143][01062] Avg episode reward: [(0, '20.375')] +[2024-06-06 14:28:58,139][01062] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2904064. Throughput: 0: 951.8. Samples: 725520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:28:58,144][01062] Avg episode reward: [(0, '18.262')] +[2024-06-06 14:28:58,155][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000709_2904064.pth... +[2024-06-06 14:28:58,278][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth +[2024-06-06 14:28:59,015][03204] Updated weights for policy 0, policy_version 710 (0.0022) +[2024-06-06 14:29:03,143][01062] Fps is (10 sec: 4095.0, 60 sec: 3754.4, 300 sec: 3721.1). Total num frames: 2924544. Throughput: 0: 965.5. Samples: 732368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:29:03,148][01062] Avg episode reward: [(0, '18.459')] +[2024-06-06 14:29:08,139][01062] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2940928. Throughput: 0: 948.0. Samples: 735044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:29:08,143][01062] Avg episode reward: [(0, '18.768')] +[2024-06-06 14:29:09,333][03204] Updated weights for policy 0, policy_version 720 (0.0020) +[2024-06-06 14:29:13,138][01062] Fps is (10 sec: 3278.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2957312. Throughput: 0: 923.9. Samples: 739588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:29:13,144][01062] Avg episode reward: [(0, '19.583')] +[2024-06-06 14:29:18,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2977792. Throughput: 0: 953.9. Samples: 745428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:29:18,144][01062] Avg episode reward: [(0, '19.984')] +[2024-06-06 14:29:20,271][03204] Updated weights for policy 0, policy_version 730 (0.0018) +[2024-06-06 14:29:23,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3002368. Throughput: 0: 965.7. Samples: 748908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:29:23,141][01062] Avg episode reward: [(0, '21.977')] +[2024-06-06 14:29:23,144][03191] Saving new best policy, reward=21.977! +[2024-06-06 14:29:28,139][01062] Fps is (10 sec: 4095.8, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 3018752. Throughput: 0: 948.3. Samples: 754892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:29:28,151][01062] Avg episode reward: [(0, '23.512')] +[2024-06-06 14:29:28,161][03191] Saving new best policy, reward=23.512! +[2024-06-06 14:29:31,729][03204] Updated weights for policy 0, policy_version 740 (0.0014) +[2024-06-06 14:29:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 3035136. Throughput: 0: 927.5. Samples: 759392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:29:33,141][01062] Avg episode reward: [(0, '23.463')] +[2024-06-06 14:29:38,138][01062] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3051520. Throughput: 0: 933.3. Samples: 761900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:29:38,141][01062] Avg episode reward: [(0, '22.775')] +[2024-06-06 14:29:41,556][03204] Updated weights for policy 0, policy_version 750 (0.0018) +[2024-06-06 14:29:43,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3076096. Throughput: 0: 966.5. Samples: 769012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:29:43,144][01062] Avg episode reward: [(0, '23.050')] +[2024-06-06 14:29:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 3096576. Throughput: 0: 945.4. Samples: 774908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:29:48,144][01062] Avg episode reward: [(0, '21.764')] +[2024-06-06 14:29:53,139][01062] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3693.4). Total num frames: 3108864. Throughput: 0: 936.9. Samples: 777204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:29:53,141][01062] Avg episode reward: [(0, '20.159')] +[2024-06-06 14:29:53,336][03204] Updated weights for policy 0, policy_version 760 (0.0020) +[2024-06-06 14:29:58,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 3133440. Throughput: 0: 955.6. Samples: 782588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:29:58,141][01062] Avg episode reward: [(0, '20.834')] +[2024-06-06 14:30:02,512][03204] Updated weights for policy 0, policy_version 770 (0.0014) +[2024-06-06 14:30:03,138][01062] Fps is (10 sec: 4505.8, 60 sec: 3823.2, 300 sec: 3735.0). Total num frames: 3153920. Throughput: 0: 983.9. Samples: 789704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:30:03,142][01062] Avg episode reward: [(0, '20.693')] +[2024-06-06 14:30:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3174400. Throughput: 0: 973.9. Samples: 792732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:30:08,142][01062] Avg episode reward: [(0, '21.257')] +[2024-06-06 14:30:13,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3186688. Throughput: 0: 940.1. Samples: 797196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:30:13,142][01062] Avg episode reward: [(0, '22.433')] +[2024-06-06 14:30:14,652][03204] Updated weights for policy 0, policy_version 780 (0.0048) +[2024-06-06 14:30:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3207168. Throughput: 0: 959.8. Samples: 802584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:30:18,149][01062] Avg episode reward: [(0, '23.303')] +[2024-06-06 14:30:23,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3231744. Throughput: 0: 980.0. Samples: 806000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:30:23,143][01062] Avg episode reward: [(0, '23.025')] +[2024-06-06 14:30:23,829][03204] Updated weights for policy 0, policy_version 790 (0.0013) +[2024-06-06 14:30:28,140][01062] Fps is (10 sec: 4095.3, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3248128. Throughput: 0: 965.5. Samples: 812460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:30:28,142][01062] Avg episode reward: [(0, '22.641')] +[2024-06-06 14:30:33,140][01062] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 3264512. Throughput: 0: 932.7. Samples: 816880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:30:33,144][01062] Avg episode reward: [(0, '22.827')] +[2024-06-06 14:30:36,615][03204] Updated weights for policy 0, policy_version 800 (0.0030) +[2024-06-06 14:30:38,138][01062] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3280896. Throughput: 0: 931.5. Samples: 819120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:30:38,141][01062] Avg episode reward: [(0, '21.787')] +[2024-06-06 14:30:43,139][01062] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3305472. Throughput: 0: 965.2. Samples: 826024. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:30:43,145][01062] Avg episode reward: [(0, '20.264')] +[2024-06-06 14:30:45,466][03204] Updated weights for policy 0, policy_version 810 (0.0021) +[2024-06-06 14:30:48,141][01062] Fps is (10 sec: 4095.0, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 3321856. Throughput: 0: 942.8. Samples: 832132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:30:48,143][01062] Avg episode reward: [(0, '21.097')] +[2024-06-06 14:30:53,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3338240. Throughput: 0: 925.0. Samples: 834356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:30:53,143][01062] Avg episode reward: [(0, '21.727')] +[2024-06-06 14:30:58,088][03204] Updated weights for policy 0, policy_version 820 (0.0028) +[2024-06-06 14:30:58,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3358720. Throughput: 0: 931.6. Samples: 839116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:30:58,140][01062] Avg episode reward: [(0, '21.903')] +[2024-06-06 14:30:58,154][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... +[2024-06-06 14:30:58,283][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000599_2453504.pth +[2024-06-06 14:31:03,138][01062] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3379200. Throughput: 0: 960.3. Samples: 845796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:31:03,143][01062] Avg episode reward: [(0, '23.119')] +[2024-06-06 14:31:07,863][03204] Updated weights for policy 0, policy_version 830 (0.0014) +[2024-06-06 14:31:08,140][01062] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3399680. Throughput: 0: 959.4. Samples: 849176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:31:08,144][01062] Avg episode reward: [(0, '23.000')] +[2024-06-06 14:31:13,144][01062] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3707.2). Total num frames: 3411968. Throughput: 0: 917.9. Samples: 853768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:31:13,149][01062] Avg episode reward: [(0, '22.060')] +[2024-06-06 14:31:18,138][01062] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3432448. Throughput: 0: 931.7. Samples: 858804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:31:18,141][01062] Avg episode reward: [(0, '21.845')] +[2024-06-06 14:31:19,482][03204] Updated weights for policy 0, policy_version 840 (0.0028) +[2024-06-06 14:31:23,138][01062] Fps is (10 sec: 4508.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3457024. Throughput: 0: 957.5. Samples: 862208. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:31:23,140][01062] Avg episode reward: [(0, '20.799')] +[2024-06-06 14:31:28,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 3473408. Throughput: 0: 953.6. Samples: 868936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:31:28,142][01062] Avg episode reward: [(0, '19.929')] +[2024-06-06 14:31:29,780][03204] Updated weights for policy 0, policy_version 850 (0.0013) +[2024-06-06 14:31:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3721.2). Total num frames: 3489792. Throughput: 0: 916.8. Samples: 873388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:31:33,141][01062] Avg episode reward: [(0, '20.564')] +[2024-06-06 14:31:38,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3502080. Throughput: 0: 909.5. Samples: 875284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:31:38,148][01062] Avg episode reward: [(0, '21.687')] +[2024-06-06 14:31:43,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3481.5, 300 sec: 3721.1). Total num frames: 3514368. Throughput: 0: 893.1. Samples: 879308. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:31:43,144][01062] Avg episode reward: [(0, '21.494')] +[2024-06-06 14:31:44,518][03204] Updated weights for policy 0, policy_version 860 (0.0014) +[2024-06-06 14:31:48,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3735.0). Total num frames: 3534848. Throughput: 0: 864.7. Samples: 884708. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-06-06 14:31:48,141][01062] Avg episode reward: [(0, '21.781')] +[2024-06-06 14:31:53,139][01062] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3551232. Throughput: 0: 845.3. Samples: 887212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-06-06 14:31:53,143][01062] Avg episode reward: [(0, '21.727')] +[2024-06-06 14:31:57,241][03204] Updated weights for policy 0, policy_version 870 (0.0030) +[2024-06-06 14:31:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 3567616. Throughput: 0: 842.9. Samples: 891692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:31:58,146][01062] Avg episode reward: [(0, '21.754')] +[2024-06-06 14:32:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 3588096. Throughput: 0: 865.5. Samples: 897752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:32:03,143][01062] Avg episode reward: [(0, '21.759')] +[2024-06-06 14:32:06,106][03204] Updated weights for policy 0, policy_version 880 (0.0014) +[2024-06-06 14:32:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3735.0). Total num frames: 3608576. Throughput: 0: 866.7. Samples: 901208. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2024-06-06 14:32:08,142][01062] Avg episode reward: [(0, '21.220')] +[2024-06-06 14:32:13,139][01062] Fps is (10 sec: 3686.3, 60 sec: 3550.2, 300 sec: 3721.1). Total num frames: 3624960. Throughput: 0: 837.1. Samples: 906604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:32:13,142][01062] Avg episode reward: [(0, '21.725')] +[2024-06-06 14:32:18,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3693.3). Total num frames: 3637248. Throughput: 0: 836.2. Samples: 911016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-06-06 14:32:18,147][01062] Avg episode reward: [(0, '23.268')] +[2024-06-06 14:32:18,990][03204] Updated weights for policy 0, policy_version 890 (0.0021) +[2024-06-06 14:32:23,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3721.1). Total num frames: 3661824. Throughput: 0: 856.5. Samples: 913828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:32:23,141][01062] Avg episode reward: [(0, '24.169')] +[2024-06-06 14:32:23,144][03191] Saving new best policy, reward=24.169! +[2024-06-06 14:32:28,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 3682304. Throughput: 0: 917.6. Samples: 920600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:32:28,141][01062] Avg episode reward: [(0, '22.924')] +[2024-06-06 14:32:28,199][03204] Updated weights for policy 0, policy_version 900 (0.0018) +[2024-06-06 14:32:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 3698688. Throughput: 0: 918.4. Samples: 926036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:32:33,143][01062] Avg episode reward: [(0, '23.302')] +[2024-06-06 14:32:38,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 3715072. Throughput: 0: 912.8. Samples: 928288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:32:38,141][01062] Avg episode reward: [(0, '22.850')] +[2024-06-06 14:32:40,672][03204] Updated weights for policy 0, policy_version 910 (0.0028) +[2024-06-06 14:32:43,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 3735552. Throughput: 0: 933.1. Samples: 933680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:32:43,140][01062] Avg episode reward: [(0, '22.013')] +[2024-06-06 14:32:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3760128. Throughput: 0: 949.7. Samples: 940488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-06-06 14:32:48,144][01062] Avg episode reward: [(0, '22.537')] +[2024-06-06 14:32:50,405][03204] Updated weights for policy 0, policy_version 920 (0.0017) +[2024-06-06 14:32:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3772416. Throughput: 0: 935.6. Samples: 943308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:32:53,143][01062] Avg episode reward: [(0, '24.019')] +[2024-06-06 14:32:58,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3788800. Throughput: 0: 912.8. Samples: 947680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:32:58,144][01062] Avg episode reward: [(0, '22.859')] +[2024-06-06 14:32:58,164][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000925_3788800.pth... +[2024-06-06 14:32:58,351][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000709_2904064.pth +[2024-06-06 14:33:02,167][03204] Updated weights for policy 0, policy_version 930 (0.0017) +[2024-06-06 14:33:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3809280. Throughput: 0: 943.4. Samples: 953468. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-06-06 14:33:03,145][01062] Avg episode reward: [(0, '22.769')] +[2024-06-06 14:33:08,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3833856. Throughput: 0: 957.7. Samples: 956924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:33:08,147][01062] Avg episode reward: [(0, '21.846')] +[2024-06-06 14:33:12,658][03204] Updated weights for policy 0, policy_version 940 (0.0013) +[2024-06-06 14:33:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3850240. Throughput: 0: 937.7. Samples: 962796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:33:13,141][01062] Avg episode reward: [(0, '20.240')] +[2024-06-06 14:33:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3866624. Throughput: 0: 915.9. Samples: 967252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-06-06 14:33:18,142][01062] Avg episode reward: [(0, '19.769')] +[2024-06-06 14:33:23,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 3887104. Throughput: 0: 923.4. Samples: 969840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:33:23,148][01062] Avg episode reward: [(0, '19.411')] +[2024-06-06 14:33:23,855][03204] Updated weights for policy 0, policy_version 950 (0.0018) +[2024-06-06 14:33:28,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3907584. Throughput: 0: 956.5. Samples: 976724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:33:28,141][01062] Avg episode reward: [(0, '19.823')] +[2024-06-06 14:33:33,140][01062] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 3923968. Throughput: 0: 929.4. Samples: 982312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 14:33:33,142][01062] Avg episode reward: [(0, '19.990')] +[2024-06-06 14:33:34,596][03204] Updated weights for policy 0, policy_version 960 (0.0030) +[2024-06-06 14:33:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3940352. Throughput: 0: 917.0. Samples: 984572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-06-06 14:33:38,146][01062] Avg episode reward: [(0, '21.245')] +[2024-06-06 14:33:43,138][01062] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3960832. Throughput: 0: 933.2. Samples: 989676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:33:43,141][01062] Avg episode reward: [(0, '21.985')] +[2024-06-06 14:33:45,490][03204] Updated weights for policy 0, policy_version 970 (0.0029) +[2024-06-06 14:33:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3981312. Throughput: 0: 955.4. Samples: 996460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-06-06 14:33:48,141][01062] Avg episode reward: [(0, '23.469')] +[2024-06-06 14:33:53,142][01062] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3721.1). Total num frames: 4001792. Throughput: 0: 947.3. Samples: 999556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-06-06 14:33:53,148][01062] Avg episode reward: [(0, '23.932')] +[2024-06-06 14:33:54,303][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 14:33:54,315][01062] Component Batcher_0 stopped! +[2024-06-06 14:33:54,310][03191] Stopping Batcher_0... +[2024-06-06 14:33:54,316][03191] Loop batcher_evt_loop terminating... +[2024-06-06 14:33:54,394][03204] Weights refcount: 2 0 +[2024-06-06 14:33:54,410][03204] Stopping InferenceWorker_p0-w0... +[2024-06-06 14:33:54,412][03204] Loop inference_proc0-0_evt_loop terminating... +[2024-06-06 14:33:54,416][01062] Component InferenceWorker_p0-w0 stopped! +[2024-06-06 14:33:54,517][01062] Component RolloutWorker_w3 stopped! +[2024-06-06 14:33:54,522][03208] Stopping RolloutWorker_w3... +[2024-06-06 14:33:54,524][03208] Loop rollout_proc3_evt_loop terminating... +[2024-06-06 14:33:54,558][03211] Stopping RolloutWorker_w6... +[2024-06-06 14:33:54,562][03211] Loop rollout_proc6_evt_loop terminating... +[2024-06-06 14:33:54,558][01062] Component RolloutWorker_w6 stopped! +[2024-06-06 14:33:54,563][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth +[2024-06-06 14:33:54,573][01062] Component RolloutWorker_w4 stopped! +[2024-06-06 14:33:54,569][03210] Stopping RolloutWorker_w4... +[2024-06-06 14:33:54,582][03210] Loop rollout_proc4_evt_loop terminating... +[2024-06-06 14:33:54,593][03191] Saving new best policy, reward=24.311! +[2024-06-06 14:33:54,598][03205] Stopping RolloutWorker_w0... +[2024-06-06 14:33:54,602][01062] Component RolloutWorker_w0 stopped! +[2024-06-06 14:33:54,602][03205] Loop rollout_proc0_evt_loop terminating... +[2024-06-06 14:33:54,641][01062] Component RolloutWorker_w7 stopped! +[2024-06-06 14:33:54,644][03207] Stopping RolloutWorker_w2... +[2024-06-06 14:33:54,644][03207] Loop rollout_proc2_evt_loop terminating... +[2024-06-06 14:33:54,648][01062] Component RolloutWorker_w2 stopped! +[2024-06-06 14:33:54,641][03212] Stopping RolloutWorker_w7... +[2024-06-06 14:33:54,654][03212] Loop rollout_proc7_evt_loop terminating... +[2024-06-06 14:33:54,680][01062] Component RolloutWorker_w1 stopped! +[2024-06-06 14:33:54,686][03206] Stopping RolloutWorker_w1... +[2024-06-06 14:33:54,686][03206] Loop rollout_proc1_evt_loop terminating... +[2024-06-06 14:33:54,696][01062] Component RolloutWorker_w5 stopped! +[2024-06-06 14:33:54,698][03209] Stopping RolloutWorker_w5... +[2024-06-06 14:33:54,700][03209] Loop rollout_proc5_evt_loop terminating... +[2024-06-06 14:33:54,839][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 14:33:55,142][03191] Stopping LearnerWorker_p0... +[2024-06-06 14:33:55,142][03191] Loop learner_proc0_evt_loop terminating... +[2024-06-06 14:33:55,143][01062] Component LearnerWorker_p0 stopped! +[2024-06-06 14:33:55,147][01062] Waiting for process learner_proc0 to stop... +[2024-06-06 14:33:57,211][01062] Waiting for process inference_proc0-0 to join... +[2024-06-06 14:33:57,218][01062] Waiting for process rollout_proc0 to join... +[2024-06-06 14:33:59,419][01062] Waiting for process rollout_proc1 to join... +[2024-06-06 14:33:59,442][01062] Waiting for process rollout_proc2 to join... +[2024-06-06 14:33:59,445][01062] Waiting for process rollout_proc3 to join... +[2024-06-06 14:33:59,449][01062] Waiting for process rollout_proc4 to join... +[2024-06-06 14:33:59,454][01062] Waiting for process rollout_proc5 to join... +[2024-06-06 14:33:59,457][01062] Waiting for process rollout_proc6 to join... +[2024-06-06 14:33:59,461][01062] Waiting for process rollout_proc7 to join... +[2024-06-06 14:33:59,465][01062] Batcher 0 profile tree view: +batching: 27.6540, releasing_batches: 0.0784 +[2024-06-06 14:33:59,467][01062] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0058 + wait_policy_total: 666.2380 +update_model: 6.5978 + weight_update: 0.0033 +one_step: 0.0196 + handle_policy_step: 403.4676 + deserialize: 12.4335, stack: 2.3212, obs_to_device_normalize: 85.2330, forward: 215.5863, send_messages: 15.0034 + prepare_outputs: 53.5224 + to_cpu: 31.6488 +[2024-06-06 14:33:59,469][01062] Learner 0 profile tree view: +misc: 0.0060, prepare_batch: 14.2031 +train: 75.8572 + epoch_init: 0.0261, minibatch_init: 0.0295, losses_postprocess: 0.5978, kl_divergence: 0.6925, after_optimizer: 33.7659 + calculate_losses: 28.2320 + losses_init: 0.0048, forward_head: 1.4720, bptt_initial: 18.3517, tail: 1.4701, advantages_returns: 0.3125, losses: 4.2265 + bptt: 2.0431 + bptt_forward_core: 1.9579 + update: 11.8191 + clip: 0.9475 +[2024-06-06 14:33:59,470][01062] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3227, enqueue_policy_requests: 142.7240, env_step: 863.4403, overhead: 18.4747, complete_rollouts: 4.6592 +save_policy_outputs: 22.1226 + split_output_tensors: 8.9616 +[2024-06-06 14:33:59,472][01062] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2890, enqueue_policy_requests: 140.4736, env_step: 865.7577, overhead: 18.1760, complete_rollouts: 4.6386 +save_policy_outputs: 21.6645 + split_output_tensors: 8.4890 +[2024-06-06 14:33:59,474][01062] Loop Runner_EvtLoop terminating... +[2024-06-06 14:33:59,479][01062] Runner profile tree view: +main_loop: 1132.8555 +[2024-06-06 14:33:59,483][01062] Collected {0: 4005888}, FPS: 3536.1 +[2024-06-06 14:33:59,530][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-06-06 14:33:59,532][01062] Overriding arg 'num_workers' with value 1 passed from command line +[2024-06-06 14:33:59,534][01062] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-06-06 14:33:59,535][01062] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-06-06 14:33:59,538][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:33:59,539][01062] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-06-06 14:33:59,541][01062] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:33:59,542][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-06-06 14:33:59,543][01062] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-06-06 14:33:59,544][01062] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-06-06 14:33:59,545][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-06-06 14:33:59,546][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-06-06 14:33:59,547][01062] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-06-06 14:33:59,549][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-06-06 14:33:59,550][01062] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-06-06 14:33:59,597][01062] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:33:59,604][01062] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:33:59,607][01062] RunningMeanStd input shape: (1,) +[2024-06-06 14:33:59,625][01062] ConvEncoder: input_channels=3 +[2024-06-06 14:33:59,734][01062] Conv encoder output size: 512 +[2024-06-06 14:33:59,735][01062] Policy head output size: 512 +[2024-06-06 14:33:59,909][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 14:34:00,683][01062] Num frames 100... +[2024-06-06 14:34:00,812][01062] Num frames 200... +[2024-06-06 14:34:00,941][01062] Num frames 300... +[2024-06-06 14:34:01,074][01062] Num frames 400... +[2024-06-06 14:34:01,207][01062] Num frames 500... +[2024-06-06 14:34:01,337][01062] Num frames 600... +[2024-06-06 14:34:01,468][01062] Num frames 700... +[2024-06-06 14:34:01,611][01062] Num frames 800... +[2024-06-06 14:34:01,743][01062] Num frames 900... +[2024-06-06 14:34:01,880][01062] Num frames 1000... +[2024-06-06 14:34:02,054][01062] Avg episode rewards: #0: 27.880, true rewards: #0: 10.880 +[2024-06-06 14:34:02,056][01062] Avg episode reward: 27.880, avg true_objective: 10.880 +[2024-06-06 14:34:02,076][01062] Num frames 1100... +[2024-06-06 14:34:02,204][01062] Num frames 1200... +[2024-06-06 14:34:02,333][01062] Num frames 1300... +[2024-06-06 14:34:02,468][01062] Num frames 1400... +[2024-06-06 14:34:02,609][01062] Num frames 1500... +[2024-06-06 14:34:02,740][01062] Num frames 1600... +[2024-06-06 14:34:02,872][01062] Num frames 1700... +[2024-06-06 14:34:03,001][01062] Num frames 1800... +[2024-06-06 14:34:03,135][01062] Num frames 1900... +[2024-06-06 14:34:03,264][01062] Num frames 2000... +[2024-06-06 14:34:03,394][01062] Num frames 2100... +[2024-06-06 14:34:03,526][01062] Num frames 2200... +[2024-06-06 14:34:03,669][01062] Num frames 2300... +[2024-06-06 14:34:03,798][01062] Avg episode rewards: #0: 26.765, true rewards: #0: 11.765 +[2024-06-06 14:34:03,800][01062] Avg episode reward: 26.765, avg true_objective: 11.765 +[2024-06-06 14:34:03,861][01062] Num frames 2400... +[2024-06-06 14:34:03,992][01062] Num frames 2500... +[2024-06-06 14:34:04,124][01062] Num frames 2600... +[2024-06-06 14:34:04,255][01062] Num frames 2700... +[2024-06-06 14:34:04,383][01062] Num frames 2800... +[2024-06-06 14:34:04,524][01062] Avg episode rewards: #0: 22.550, true rewards: #0: 9.550 +[2024-06-06 14:34:04,526][01062] Avg episode reward: 22.550, avg true_objective: 9.550 +[2024-06-06 14:34:04,577][01062] Num frames 2900... +[2024-06-06 14:34:04,710][01062] Num frames 3000... +[2024-06-06 14:34:04,841][01062] Num frames 3100... +[2024-06-06 14:34:04,971][01062] Avg episode rewards: #0: 18.133, true rewards: #0: 7.882 +[2024-06-06 14:34:04,972][01062] Avg episode reward: 18.133, avg true_objective: 7.882 +[2024-06-06 14:34:05,037][01062] Num frames 3200... +[2024-06-06 14:34:05,169][01062] Num frames 3300... +[2024-06-06 14:34:05,296][01062] Num frames 3400... +[2024-06-06 14:34:05,427][01062] Num frames 3500... +[2024-06-06 14:34:05,558][01062] Num frames 3600... +[2024-06-06 14:34:05,696][01062] Num frames 3700... +[2024-06-06 14:34:05,842][01062] Avg episode rewards: #0: 17.138, true rewards: #0: 7.538 +[2024-06-06 14:34:05,843][01062] Avg episode reward: 17.138, avg true_objective: 7.538 +[2024-06-06 14:34:05,887][01062] Num frames 3800... +[2024-06-06 14:34:06,018][01062] Num frames 3900... +[2024-06-06 14:34:06,147][01062] Num frames 4000... +[2024-06-06 14:34:06,278][01062] Num frames 4100... +[2024-06-06 14:34:06,411][01062] Num frames 4200... +[2024-06-06 14:34:06,540][01062] Num frames 4300... +[2024-06-06 14:34:06,680][01062] Num frames 4400... +[2024-06-06 14:34:06,826][01062] Num frames 4500... +[2024-06-06 14:34:06,936][01062] Avg episode rewards: #0: 17.228, true rewards: #0: 7.562 +[2024-06-06 14:34:06,937][01062] Avg episode reward: 17.228, avg true_objective: 7.562 +[2024-06-06 14:34:07,022][01062] Num frames 4600... +[2024-06-06 14:34:07,152][01062] Num frames 4700... +[2024-06-06 14:34:07,284][01062] Num frames 4800... +[2024-06-06 14:34:07,428][01062] Num frames 4900... +[2024-06-06 14:34:07,616][01062] Num frames 5000... +[2024-06-06 14:34:07,811][01062] Num frames 5100... +[2024-06-06 14:34:08,002][01062] Num frames 5200... +[2024-06-06 14:34:08,188][01062] Num frames 5300... +[2024-06-06 14:34:08,376][01062] Num frames 5400... +[2024-06-06 14:34:08,561][01062] Num frames 5500... +[2024-06-06 14:34:08,747][01062] Num frames 5600... +[2024-06-06 14:34:08,940][01062] Num frames 5700... +[2024-06-06 14:34:09,130][01062] Num frames 5800... +[2024-06-06 14:34:09,316][01062] Num frames 5900... +[2024-06-06 14:34:09,511][01062] Num frames 6000... +[2024-06-06 14:34:09,705][01062] Num frames 6100... +[2024-06-06 14:34:09,908][01062] Num frames 6200... +[2024-06-06 14:34:10,100][01062] Num frames 6300... +[2024-06-06 14:34:10,293][01062] Num frames 6400... +[2024-06-06 14:34:10,466][01062] Num frames 6500... +[2024-06-06 14:34:10,610][01062] Num frames 6600... +[2024-06-06 14:34:10,720][01062] Avg episode rewards: #0: 23.338, true rewards: #0: 9.481 +[2024-06-06 14:34:10,722][01062] Avg episode reward: 23.338, avg true_objective: 9.481 +[2024-06-06 14:34:10,806][01062] Num frames 6700... +[2024-06-06 14:34:10,941][01062] Num frames 6800... +[2024-06-06 14:34:11,075][01062] Num frames 6900... +[2024-06-06 14:34:11,209][01062] Num frames 7000... +[2024-06-06 14:34:11,336][01062] Num frames 7100... +[2024-06-06 14:34:11,468][01062] Num frames 7200... +[2024-06-06 14:34:11,603][01062] Num frames 7300... +[2024-06-06 14:34:11,746][01062] Num frames 7400... +[2024-06-06 14:34:11,907][01062] Avg episode rewards: #0: 22.472, true rewards: #0: 9.347 +[2024-06-06 14:34:11,910][01062] Avg episode reward: 22.472, avg true_objective: 9.347 +[2024-06-06 14:34:11,941][01062] Num frames 7500... +[2024-06-06 14:34:12,086][01062] Num frames 7600... +[2024-06-06 14:34:12,226][01062] Num frames 7700... +[2024-06-06 14:34:12,364][01062] Num frames 7800... +[2024-06-06 14:34:12,497][01062] Num frames 7900... +[2024-06-06 14:34:12,631][01062] Num frames 8000... +[2024-06-06 14:34:12,758][01062] Num frames 8100... +[2024-06-06 14:34:12,896][01062] Num frames 8200... +[2024-06-06 14:34:13,031][01062] Num frames 8300... +[2024-06-06 14:34:13,160][01062] Num frames 8400... +[2024-06-06 14:34:13,291][01062] Num frames 8500... +[2024-06-06 14:34:13,421][01062] Num frames 8600... +[2024-06-06 14:34:13,551][01062] Num frames 8700... +[2024-06-06 14:34:13,681][01062] Num frames 8800... +[2024-06-06 14:34:13,808][01062] Num frames 8900... +[2024-06-06 14:34:13,949][01062] Num frames 9000... +[2024-06-06 14:34:14,080][01062] Num frames 9100... +[2024-06-06 14:34:14,250][01062] Avg episode rewards: #0: 24.874, true rewards: #0: 10.208 +[2024-06-06 14:34:14,252][01062] Avg episode reward: 24.874, avg true_objective: 10.208 +[2024-06-06 14:34:14,272][01062] Num frames 9200... +[2024-06-06 14:34:14,398][01062] Num frames 9300... +[2024-06-06 14:34:14,526][01062] Num frames 9400... +[2024-06-06 14:34:14,659][01062] Num frames 9500... +[2024-06-06 14:34:14,790][01062] Num frames 9600... +[2024-06-06 14:34:14,919][01062] Num frames 9700... +[2024-06-06 14:34:15,056][01062] Num frames 9800... +[2024-06-06 14:34:15,189][01062] Num frames 9900... +[2024-06-06 14:34:15,320][01062] Num frames 10000... +[2024-06-06 14:34:15,454][01062] Num frames 10100... +[2024-06-06 14:34:15,584][01062] Num frames 10200... +[2024-06-06 14:34:15,696][01062] Avg episode rewards: #0: 25.040, true rewards: #0: 10.240 +[2024-06-06 14:34:15,697][01062] Avg episode reward: 25.040, avg true_objective: 10.240 +[2024-06-06 14:35:21,028][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-06-06 14:35:21,661][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-06-06 14:35:21,663][01062] Overriding arg 'num_workers' with value 1 passed from command line +[2024-06-06 14:35:21,664][01062] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-06-06 14:35:21,666][01062] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-06-06 14:35:21,668][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:35:21,669][01062] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-06-06 14:35:21,670][01062] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-06-06 14:35:21,671][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-06-06 14:35:21,673][01062] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-06-06 14:35:21,674][01062] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-06-06 14:35:21,675][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-06-06 14:35:21,676][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-06-06 14:35:21,677][01062] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-06-06 14:35:21,678][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-06-06 14:35:21,679][01062] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-06-06 14:35:21,721][01062] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:35:21,723][01062] RunningMeanStd input shape: (1,) +[2024-06-06 14:35:21,741][01062] ConvEncoder: input_channels=3 +[2024-06-06 14:35:21,796][01062] Conv encoder output size: 512 +[2024-06-06 14:35:21,798][01062] Policy head output size: 512 +[2024-06-06 14:35:21,824][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 14:35:22,498][01062] Num frames 100... +[2024-06-06 14:35:22,673][01062] Num frames 200... +[2024-06-06 14:35:22,840][01062] Num frames 300... +[2024-06-06 14:35:23,022][01062] Num frames 400... +[2024-06-06 14:35:23,202][01062] Num frames 500... +[2024-06-06 14:35:23,386][01062] Num frames 600... +[2024-06-06 14:35:23,570][01062] Num frames 700... +[2024-06-06 14:35:23,756][01062] Num frames 800... +[2024-06-06 14:35:23,945][01062] Num frames 900... +[2024-06-06 14:35:24,119][01062] Num frames 1000... +[2024-06-06 14:35:24,308][01062] Num frames 1100... +[2024-06-06 14:35:24,501][01062] Num frames 1200... +[2024-06-06 14:35:24,644][01062] Avg episode rewards: #0: 26.480, true rewards: #0: 12.480 +[2024-06-06 14:35:24,647][01062] Avg episode reward: 26.480, avg true_objective: 12.480 +[2024-06-06 14:35:24,748][01062] Num frames 1300... +[2024-06-06 14:35:24,956][01062] Num frames 1400... +[2024-06-06 14:35:25,151][01062] Num frames 1500... +[2024-06-06 14:35:25,326][01062] Num frames 1600... +[2024-06-06 14:35:25,521][01062] Num frames 1700... +[2024-06-06 14:35:25,738][01062] Num frames 1800... +[2024-06-06 14:35:25,941][01062] Num frames 1900... +[2024-06-06 14:35:26,136][01062] Num frames 2000... +[2024-06-06 14:35:26,191][01062] Avg episode rewards: #0: 21.500, true rewards: #0: 10.000 +[2024-06-06 14:35:26,194][01062] Avg episode reward: 21.500, avg true_objective: 10.000 +[2024-06-06 14:35:26,376][01062] Num frames 2100... +[2024-06-06 14:35:26,571][01062] Num frames 2200... +[2024-06-06 14:35:26,753][01062] Num frames 2300... +[2024-06-06 14:35:26,953][01062] Num frames 2400... +[2024-06-06 14:35:27,159][01062] Num frames 2500... +[2024-06-06 14:35:27,374][01062] Num frames 2600... +[2024-06-06 14:35:27,565][01062] Num frames 2700... +[2024-06-06 14:35:27,776][01062] Num frames 2800... +[2024-06-06 14:35:27,991][01062] Num frames 2900... +[2024-06-06 14:35:28,043][01062] Avg episode rewards: #0: 21.334, true rewards: #0: 9.667 +[2024-06-06 14:35:28,045][01062] Avg episode reward: 21.334, avg true_objective: 9.667 +[2024-06-06 14:35:28,269][01062] Num frames 3000... +[2024-06-06 14:35:28,460][01062] Num frames 3100... +[2024-06-06 14:35:28,660][01062] Num frames 3200... +[2024-06-06 14:35:28,847][01062] Num frames 3300... +[2024-06-06 14:35:29,035][01062] Num frames 3400... +[2024-06-06 14:35:29,220][01062] Num frames 3500... +[2024-06-06 14:35:29,414][01062] Num frames 3600... +[2024-06-06 14:35:29,599][01062] Num frames 3700... +[2024-06-06 14:35:29,797][01062] Num frames 3800... +[2024-06-06 14:35:30,016][01062] Num frames 3900... +[2024-06-06 14:35:30,210][01062] Num frames 4000... +[2024-06-06 14:35:30,339][01062] Num frames 4100... +[2024-06-06 14:35:30,468][01062] Num frames 4200... +[2024-06-06 14:35:30,598][01062] Num frames 4300... +[2024-06-06 14:35:30,736][01062] Num frames 4400... +[2024-06-06 14:35:30,866][01062] Num frames 4500... +[2024-06-06 14:35:30,998][01062] Num frames 4600... +[2024-06-06 14:35:31,128][01062] Num frames 4700... +[2024-06-06 14:35:31,257][01062] Num frames 4800... +[2024-06-06 14:35:31,437][01062] Avg episode rewards: #0: 29.240, true rewards: #0: 12.240 +[2024-06-06 14:35:31,439][01062] Avg episode reward: 29.240, avg true_objective: 12.240 +[2024-06-06 14:35:31,449][01062] Num frames 4900... +[2024-06-06 14:35:31,576][01062] Num frames 5000... +[2024-06-06 14:35:31,712][01062] Num frames 5100... +[2024-06-06 14:35:31,855][01062] Num frames 5200... +[2024-06-06 14:35:31,988][01062] Num frames 5300... +[2024-06-06 14:35:32,120][01062] Num frames 5400... +[2024-06-06 14:35:32,247][01062] Num frames 5500... +[2024-06-06 14:35:32,377][01062] Num frames 5600... +[2024-06-06 14:35:32,503][01062] Num frames 5700... +[2024-06-06 14:35:32,633][01062] Num frames 5800... +[2024-06-06 14:35:32,765][01062] Num frames 5900... +[2024-06-06 14:35:32,893][01062] Avg episode rewards: #0: 27.504, true rewards: #0: 11.904 +[2024-06-06 14:35:32,894][01062] Avg episode reward: 27.504, avg true_objective: 11.904 +[2024-06-06 14:35:32,960][01062] Num frames 6000... +[2024-06-06 14:35:33,086][01062] Num frames 6100... +[2024-06-06 14:35:33,214][01062] Num frames 6200... +[2024-06-06 14:35:33,342][01062] Num frames 6300... +[2024-06-06 14:35:33,404][01062] Avg episode rewards: #0: 23.673, true rewards: #0: 10.507 +[2024-06-06 14:35:33,405][01062] Avg episode reward: 23.673, avg true_objective: 10.507 +[2024-06-06 14:35:33,528][01062] Num frames 6400... +[2024-06-06 14:35:33,659][01062] Num frames 6500... +[2024-06-06 14:35:33,788][01062] Num frames 6600... +[2024-06-06 14:35:33,922][01062] Num frames 6700... +[2024-06-06 14:35:34,047][01062] Num frames 6800... +[2024-06-06 14:35:34,124][01062] Avg episode rewards: #0: 21.309, true rewards: #0: 9.737 +[2024-06-06 14:35:34,126][01062] Avg episode reward: 21.309, avg true_objective: 9.737 +[2024-06-06 14:35:34,235][01062] Num frames 6900... +[2024-06-06 14:35:34,360][01062] Num frames 7000... +[2024-06-06 14:35:34,487][01062] Num frames 7100... +[2024-06-06 14:35:34,619][01062] Num frames 7200... +[2024-06-06 14:35:34,748][01062] Num frames 7300... +[2024-06-06 14:35:34,883][01062] Num frames 7400... +[2024-06-06 14:35:35,062][01062] Num frames 7500... +[2024-06-06 14:35:35,239][01062] Num frames 7600... +[2024-06-06 14:35:35,387][01062] Num frames 7700... +[2024-06-06 14:35:35,514][01062] Num frames 7800... +[2024-06-06 14:35:35,645][01062] Num frames 7900... +[2024-06-06 14:35:35,789][01062] Avg episode rewards: #0: 21.835, true rewards: #0: 9.960 +[2024-06-06 14:35:35,791][01062] Avg episode reward: 21.835, avg true_objective: 9.960 +[2024-06-06 14:35:35,841][01062] Num frames 8000... +[2024-06-06 14:35:35,976][01062] Num frames 8100... +[2024-06-06 14:35:36,102][01062] Num frames 8200... +[2024-06-06 14:35:36,229][01062] Num frames 8300... +[2024-06-06 14:35:36,359][01062] Num frames 8400... +[2024-06-06 14:35:36,528][01062] Avg episode rewards: #0: 20.311, true rewards: #0: 9.422 +[2024-06-06 14:35:36,529][01062] Avg episode reward: 20.311, avg true_objective: 9.422 +[2024-06-06 14:35:36,560][01062] Num frames 8500... +[2024-06-06 14:35:36,689][01062] Num frames 8600... +[2024-06-06 14:35:36,818][01062] Num frames 8700... +[2024-06-06 14:35:36,956][01062] Num frames 8800... +[2024-06-06 14:35:37,142][01062] Avg episode rewards: #0: 18.995, true rewards: #0: 8.895 +[2024-06-06 14:35:37,144][01062] Avg episode reward: 18.995, avg true_objective: 8.895 +[2024-06-06 14:36:34,978][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-06-06 14:45:01,257][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-06-06 14:45:01,260][01062] Overriding arg 'num_workers' with value 1 passed from command line +[2024-06-06 14:45:01,262][01062] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-06-06 14:45:01,264][01062] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-06-06 14:45:01,266][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:45:01,268][01062] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-06-06 14:45:01,270][01062] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-06-06 14:45:01,271][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-06-06 14:45:01,272][01062] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-06-06 14:45:01,274][01062] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-06-06 14:45:01,275][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-06-06 14:45:01,276][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-06-06 14:45:01,277][01062] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-06-06 14:45:01,278][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-06-06 14:45:01,279][01062] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-06-06 14:45:01,322][01062] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:45:01,323][01062] RunningMeanStd input shape: (1,) +[2024-06-06 14:45:01,338][01062] ConvEncoder: input_channels=3 +[2024-06-06 14:45:01,377][01062] Conv encoder output size: 512 +[2024-06-06 14:45:01,378][01062] Policy head output size: 512 +[2024-06-06 14:45:01,401][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 14:45:01,859][01062] Num frames 100... +[2024-06-06 14:45:01,988][01062] Num frames 200... +[2024-06-06 14:45:02,120][01062] Num frames 300... +[2024-06-06 14:45:02,254][01062] Num frames 400... +[2024-06-06 14:45:02,389][01062] Num frames 500... +[2024-06-06 14:45:02,526][01062] Num frames 600... +[2024-06-06 14:45:02,663][01062] Num frames 700... +[2024-06-06 14:45:02,736][01062] Avg episode rewards: #0: 14.120, true rewards: #0: 7.120 +[2024-06-06 14:45:02,738][01062] Avg episode reward: 14.120, avg true_objective: 7.120 +[2024-06-06 14:45:02,861][01062] Num frames 800... +[2024-06-06 14:45:03,028][01062] Num frames 900... +[2024-06-06 14:45:03,244][01062] Num frames 1000... +[2024-06-06 14:45:03,429][01062] Num frames 1100... +[2024-06-06 14:45:03,617][01062] Num frames 1200... +[2024-06-06 14:45:03,809][01062] Num frames 1300... +[2024-06-06 14:45:03,993][01062] Num frames 1400... +[2024-06-06 14:45:04,175][01062] Num frames 1500... +[2024-06-06 14:45:04,355][01062] Num frames 1600... +[2024-06-06 14:45:04,545][01062] Num frames 1700... +[2024-06-06 14:45:04,741][01062] Avg episode rewards: #0: 20.340, true rewards: #0: 8.840 +[2024-06-06 14:45:04,743][01062] Avg episode reward: 20.340, avg true_objective: 8.840 +[2024-06-06 14:45:04,812][01062] Num frames 1800... +[2024-06-06 14:45:05,008][01062] Num frames 1900... +[2024-06-06 14:45:05,200][01062] Num frames 2000... +[2024-06-06 14:45:05,406][01062] Num frames 2100... +[2024-06-06 14:45:05,603][01062] Num frames 2200... +[2024-06-06 14:45:05,765][01062] Num frames 2300... +[2024-06-06 14:45:05,893][01062] Num frames 2400... +[2024-06-06 14:45:06,031][01062] Num frames 2500... +[2024-06-06 14:45:06,162][01062] Num frames 2600... +[2024-06-06 14:45:06,297][01062] Num frames 2700... +[2024-06-06 14:45:06,433][01062] Avg episode rewards: #0: 21.200, true rewards: #0: 9.200 +[2024-06-06 14:45:06,435][01062] Avg episode reward: 21.200, avg true_objective: 9.200 +[2024-06-06 14:45:06,492][01062] Num frames 2800... +[2024-06-06 14:45:06,624][01062] Num frames 2900... +[2024-06-06 14:45:06,769][01062] Num frames 3000... +[2024-06-06 14:45:06,904][01062] Num frames 3100... +[2024-06-06 14:45:07,064][01062] Avg episode rewards: #0: 17.690, true rewards: #0: 7.940 +[2024-06-06 14:45:07,067][01062] Avg episode reward: 17.690, avg true_objective: 7.940 +[2024-06-06 14:45:07,103][01062] Num frames 3200... +[2024-06-06 14:45:07,237][01062] Num frames 3300... +[2024-06-06 14:45:07,368][01062] Num frames 3400... +[2024-06-06 14:45:07,497][01062] Num frames 3500... +[2024-06-06 14:45:07,626][01062] Num frames 3600... +[2024-06-06 14:45:07,763][01062] Num frames 3700... +[2024-06-06 14:45:07,891][01062] Num frames 3800... +[2024-06-06 14:45:08,024][01062] Num frames 3900... +[2024-06-06 14:45:08,152][01062] Num frames 4000... +[2024-06-06 14:45:08,283][01062] Num frames 4100... +[2024-06-06 14:45:08,415][01062] Num frames 4200... +[2024-06-06 14:45:08,544][01062] Num frames 4300... +[2024-06-06 14:45:08,678][01062] Num frames 4400... +[2024-06-06 14:45:08,859][01062] Avg episode rewards: #0: 20.576, true rewards: #0: 8.976 +[2024-06-06 14:45:08,861][01062] Avg episode reward: 20.576, avg true_objective: 8.976 +[2024-06-06 14:45:08,881][01062] Num frames 4500... +[2024-06-06 14:45:09,014][01062] Num frames 4600... +[2024-06-06 14:45:09,144][01062] Num frames 4700... +[2024-06-06 14:45:09,277][01062] Num frames 4800... +[2024-06-06 14:45:09,407][01062] Num frames 4900... +[2024-06-06 14:45:09,535][01062] Num frames 5000... +[2024-06-06 14:45:09,674][01062] Num frames 5100... +[2024-06-06 14:45:09,735][01062] Avg episode rewards: #0: 19.338, true rewards: #0: 8.505 +[2024-06-06 14:45:09,737][01062] Avg episode reward: 19.338, avg true_objective: 8.505 +[2024-06-06 14:45:09,866][01062] Num frames 5200... +[2024-06-06 14:45:09,995][01062] Num frames 5300... +[2024-06-06 14:45:10,124][01062] Num frames 5400... +[2024-06-06 14:45:10,264][01062] Num frames 5500... +[2024-06-06 14:45:10,408][01062] Num frames 5600... +[2024-06-06 14:45:10,552][01062] Num frames 5700... +[2024-06-06 14:45:10,705][01062] Num frames 5800... +[2024-06-06 14:45:10,827][01062] Avg episode rewards: #0: 18.627, true rewards: #0: 8.341 +[2024-06-06 14:45:10,829][01062] Avg episode reward: 18.627, avg true_objective: 8.341 +[2024-06-06 14:45:10,908][01062] Num frames 5900... +[2024-06-06 14:45:11,067][01062] Num frames 6000... +[2024-06-06 14:45:11,203][01062] Num frames 6100... +[2024-06-06 14:45:11,335][01062] Num frames 6200... +[2024-06-06 14:45:11,469][01062] Num frames 6300... +[2024-06-06 14:45:11,545][01062] Avg episode rewards: #0: 17.393, true rewards: #0: 7.892 +[2024-06-06 14:45:11,546][01062] Avg episode reward: 17.393, avg true_objective: 7.892 +[2024-06-06 14:45:11,670][01062] Num frames 6400... +[2024-06-06 14:45:11,809][01062] Num frames 6500... +[2024-06-06 14:45:11,940][01062] Num frames 6600... +[2024-06-06 14:45:12,072][01062] Num frames 6700... +[2024-06-06 14:45:12,211][01062] Num frames 6800... +[2024-06-06 14:45:12,342][01062] Num frames 6900... +[2024-06-06 14:45:12,471][01062] Num frames 7000... +[2024-06-06 14:45:12,593][01062] Avg episode rewards: #0: 17.278, true rewards: #0: 7.833 +[2024-06-06 14:45:12,597][01062] Avg episode reward: 17.278, avg true_objective: 7.833 +[2024-06-06 14:45:12,663][01062] Num frames 7100... +[2024-06-06 14:45:12,799][01062] Num frames 7200... +[2024-06-06 14:45:12,932][01062] Num frames 7300... +[2024-06-06 14:45:13,065][01062] Num frames 7400... +[2024-06-06 14:45:13,195][01062] Num frames 7500... +[2024-06-06 14:45:13,326][01062] Num frames 7600... +[2024-06-06 14:45:13,458][01062] Num frames 7700... +[2024-06-06 14:45:13,544][01062] Avg episode rewards: #0: 17.022, true rewards: #0: 7.722 +[2024-06-06 14:45:13,545][01062] Avg episode reward: 17.022, avg true_objective: 7.722 +[2024-06-06 14:46:03,108][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4!