diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,50 @@ -[2024-08-11 11:07:14,139][00221] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-08-11 11:07:14,145][00221] Rollout worker 0 uses device cpu -[2024-08-11 11:07:14,146][00221] Rollout worker 1 uses device cpu -[2024-08-11 11:07:14,148][00221] Rollout worker 2 uses device cpu -[2024-08-11 11:07:14,149][00221] Rollout worker 3 uses device cpu -[2024-08-11 11:07:14,150][00221] Rollout worker 4 uses device cpu -[2024-08-11 11:07:14,151][00221] Rollout worker 5 uses device cpu -[2024-08-11 11:07:14,152][00221] Rollout worker 6 uses device cpu -[2024-08-11 11:07:14,153][00221] Rollout worker 7 uses device cpu -[2024-08-11 11:07:14,316][00221] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-08-11 11:07:14,318][00221] InferenceWorker_p0-w0: min num requests: 2 -[2024-08-11 11:07:14,350][00221] Starting all processes... -[2024-08-11 11:07:14,351][00221] Starting process learner_proc0 -[2024-08-11 11:07:15,657][00221] Starting all processes... -[2024-08-11 11:07:15,669][00221] Starting process inference_proc0-0 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc0 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc1 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc2 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc3 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc4 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc5 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc6 -[2024-08-11 11:07:15,670][00221] Starting process rollout_proc7 -[2024-08-11 11:07:30,549][02645] Worker 7 uses CPU cores [1] -[2024-08-11 11:07:30,556][02639] Worker 1 uses CPU cores [1] -[2024-08-11 11:07:30,596][02641] Worker 3 uses CPU cores [1] -[2024-08-11 11:07:30,685][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-08-11 11:07:30,685][02624] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-08-11 11:07:30,767][02624] Num visible devices: 1 -[2024-08-11 11:07:30,794][02624] Starting seed is not provided -[2024-08-11 11:07:30,794][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-08-11 11:07:30,795][02624] Initializing actor-critic model on device cuda:0 -[2024-08-11 11:07:30,796][02624] RunningMeanStd input shape: (3, 72, 128) -[2024-08-11 11:07:30,798][02624] RunningMeanStd input shape: (1,) -[2024-08-11 11:07:30,883][02624] ConvEncoder: input_channels=3 -[2024-08-11 11:07:30,886][02638] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-08-11 11:07:30,887][02638] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-08-11 11:07:31,007][02637] Worker 0 uses CPU cores [0] -[2024-08-11 11:07:31,011][02638] Num visible devices: 1 -[2024-08-11 11:07:31,084][02643] Worker 5 uses CPU cores [1] -[2024-08-11 11:07:31,098][02640] Worker 2 uses CPU cores [0] -[2024-08-11 11:07:31,158][02642] Worker 4 uses CPU cores [0] -[2024-08-11 11:07:31,169][02644] Worker 6 uses CPU cores [0] -[2024-08-11 11:07:31,319][02624] Conv encoder output size: 512 -[2024-08-11 11:07:31,320][02624] Policy head output size: 512 -[2024-08-11 11:07:31,388][02624] Created Actor Critic model with architecture: -[2024-08-11 11:07:31,388][02624] ActorCriticSharedWeights( +[2024-10-03 11:11:26,918][00426] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-10-03 11:11:26,920][00426] Rollout worker 0 uses device cpu +[2024-10-03 11:11:26,923][00426] Rollout worker 1 uses device cpu +[2024-10-03 11:11:26,925][00426] Rollout worker 2 uses device cpu +[2024-10-03 11:11:26,926][00426] Rollout worker 3 uses device cpu +[2024-10-03 11:11:26,930][00426] Rollout worker 4 uses device cpu +[2024-10-03 11:11:26,932][00426] Rollout worker 5 uses device cpu +[2024-10-03 11:11:26,933][00426] Rollout worker 6 uses device cpu +[2024-10-03 11:11:26,937][00426] Rollout worker 7 uses device cpu +[2024-10-03 11:11:27,152][00426] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-10-03 11:11:27,155][00426] InferenceWorker_p0-w0: min num requests: 2 +[2024-10-03 11:11:27,199][00426] Starting all processes... +[2024-10-03 11:11:27,203][00426] Starting process learner_proc0 +[2024-10-03 11:11:28,046][00426] Starting all processes... +[2024-10-03 11:11:28,067][00426] Starting process inference_proc0-0 +[2024-10-03 11:11:28,067][00426] Starting process rollout_proc0 +[2024-10-03 11:11:28,069][00426] Starting process rollout_proc1 +[2024-10-03 11:11:28,069][00426] Starting process rollout_proc2 +[2024-10-03 11:11:28,069][00426] Starting process rollout_proc3 +[2024-10-03 11:11:28,069][00426] Starting process rollout_proc4 +[2024-10-03 11:11:28,070][00426] Starting process rollout_proc5 +[2024-10-03 11:11:28,070][00426] Starting process rollout_proc6 +[2024-10-03 11:11:28,070][00426] Starting process rollout_proc7 +[2024-10-03 11:11:45,908][02746] Worker 4 uses CPU cores [0] +[2024-10-03 11:11:45,915][02728] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-10-03 11:11:45,915][02728] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-10-03 11:11:45,973][02747] Worker 5 uses CPU cores [1] +[2024-10-03 11:11:45,984][02728] Num visible devices: 1 +[2024-10-03 11:11:46,019][02728] Starting seed is not provided +[2024-10-03 11:11:46,020][02728] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-10-03 11:11:46,020][02728] Initializing actor-critic model on device cuda:0 +[2024-10-03 11:11:46,021][02728] RunningMeanStd input shape: (3, 72, 128) +[2024-10-03 11:11:46,024][02728] RunningMeanStd input shape: (1,) +[2024-10-03 11:11:46,069][02744] Worker 3 uses CPU cores [1] +[2024-10-03 11:11:46,110][02728] ConvEncoder: input_channels=3 +[2024-10-03 11:11:46,269][02748] Worker 7 uses CPU cores [1] +[2024-10-03 11:11:46,292][02743] Worker 1 uses CPU cores [1] +[2024-10-03 11:11:46,348][02741] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-10-03 11:11:46,349][02741] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-10-03 11:11:46,363][02742] Worker 0 uses CPU cores [0] +[2024-10-03 11:11:46,377][02749] Worker 6 uses CPU cores [0] +[2024-10-03 11:11:46,391][02741] Num visible devices: 1 +[2024-10-03 11:11:46,390][02745] Worker 2 uses CPU cores [0] +[2024-10-03 11:11:46,482][02728] Conv encoder output size: 512 +[2024-10-03 11:11:46,482][02728] Policy head output size: 512 +[2024-10-03 11:11:46,539][02728] Created Actor Critic model with architecture: +[2024-10-03 11:11:46,539][02728] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,1223 +85,1202 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-08-11 11:07:31,723][02624] Using optimizer -[2024-08-11 11:07:32,799][02624] No checkpoints found -[2024-08-11 11:07:32,799][02624] Did not load from checkpoint, starting from scratch! -[2024-08-11 11:07:32,800][02624] Initialized policy 0 weights for model version 0 -[2024-08-11 11:07:32,803][02624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-08-11 11:07:32,811][02624] LearnerWorker_p0 finished initialization! -[2024-08-11 11:07:33,020][02638] RunningMeanStd input shape: (3, 72, 128) -[2024-08-11 11:07:33,022][02638] RunningMeanStd input shape: (1,) -[2024-08-11 11:07:33,041][02638] ConvEncoder: input_channels=3 -[2024-08-11 11:07:33,206][02638] Conv encoder output size: 512 -[2024-08-11 11:07:33,206][02638] Policy head output size: 512 -[2024-08-11 11:07:33,296][00221] Inference worker 0-0 is ready! -[2024-08-11 11:07:33,299][00221] All inference workers are ready! Signal rollout workers to start! -[2024-08-11 11:07:33,645][02644] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,668][02642] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,674][02645] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,649][02640] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,696][02639] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,699][02641] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,706][02643] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:33,776][02637] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:07:34,310][00221] Heartbeat connected on Batcher_0 -[2024-08-11 11:07:34,312][00221] Heartbeat connected on LearnerWorker_p0 -[2024-08-11 11:07:34,363][00221] Heartbeat connected on InferenceWorker_p0-w0 -[2024-08-11 11:07:35,026][00221] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-08-11 11:07:35,542][02645] Decorrelating experience for 0 frames... -[2024-08-11 11:07:35,541][02639] Decorrelating experience for 0 frames... -[2024-08-11 11:07:35,622][02642] Decorrelating experience for 0 frames... -[2024-08-11 11:07:35,624][02640] Decorrelating experience for 0 frames... -[2024-08-11 11:07:35,620][02644] Decorrelating experience for 0 frames... -[2024-08-11 11:07:35,668][02637] Decorrelating experience for 0 frames... -[2024-08-11 11:07:36,901][02643] Decorrelating experience for 0 frames... -[2024-08-11 11:07:36,936][02645] Decorrelating experience for 32 frames... -[2024-08-11 11:07:37,120][02644] Decorrelating experience for 32 frames... -[2024-08-11 11:07:37,129][02642] Decorrelating experience for 32 frames... -[2024-08-11 11:07:37,126][02640] Decorrelating experience for 32 frames... -[2024-08-11 11:07:37,178][02641] Decorrelating experience for 0 frames... -[2024-08-11 11:07:37,201][02637] Decorrelating experience for 32 frames... -[2024-08-11 11:07:37,237][02639] Decorrelating experience for 32 frames... -[2024-08-11 11:07:38,183][02643] Decorrelating experience for 32 frames... -[2024-08-11 11:07:38,467][02641] Decorrelating experience for 32 frames... -[2024-08-11 11:07:38,739][02644] Decorrelating experience for 64 frames... -[2024-08-11 11:07:38,741][02640] Decorrelating experience for 64 frames... -[2024-08-11 11:07:38,777][02642] Decorrelating experience for 64 frames... -[2024-08-11 11:07:38,969][02639] Decorrelating experience for 64 frames... -[2024-08-11 11:07:39,848][02645] Decorrelating experience for 64 frames... -[2024-08-11 11:07:39,885][02643] Decorrelating experience for 64 frames... -[2024-08-11 11:07:40,016][02640] Decorrelating experience for 96 frames... -[2024-08-11 11:07:40,026][00221] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-08-11 11:07:40,057][02642] Decorrelating experience for 96 frames... -[2024-08-11 11:07:40,157][02641] Decorrelating experience for 64 frames... -[2024-08-11 11:07:40,219][02637] Decorrelating experience for 64 frames... -[2024-08-11 11:07:40,291][00221] Heartbeat connected on RolloutWorker_w2 -[2024-08-11 11:07:40,342][00221] Heartbeat connected on RolloutWorker_w4 -[2024-08-11 11:07:41,520][02644] Decorrelating experience for 96 frames... -[2024-08-11 11:07:41,609][02645] Decorrelating experience for 96 frames... -[2024-08-11 11:07:41,800][00221] Heartbeat connected on RolloutWorker_w6 -[2024-08-11 11:07:42,012][00221] Heartbeat connected on RolloutWorker_w7 -[2024-08-11 11:07:42,138][02643] Decorrelating experience for 96 frames... -[2024-08-11 11:07:42,197][02637] Decorrelating experience for 96 frames... -[2024-08-11 11:07:42,328][00221] Heartbeat connected on RolloutWorker_w0 -[2024-08-11 11:07:42,336][02641] Decorrelating experience for 96 frames... -[2024-08-11 11:07:42,472][00221] Heartbeat connected on RolloutWorker_w5 -[2024-08-11 11:07:42,682][00221] Heartbeat connected on RolloutWorker_w3 -[2024-08-11 11:07:43,487][02639] Decorrelating experience for 96 frames... -[2024-08-11 11:07:44,022][00221] Heartbeat connected on RolloutWorker_w1 -[2024-08-11 11:07:45,026][00221] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 14.6. Samples: 146. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-08-11 11:07:45,029][00221] Avg episode reward: [(0, '1.887')] -[2024-08-11 11:07:45,961][02624] Signal inference workers to stop experience collection... -[2024-08-11 11:07:46,003][02638] InferenceWorker_p0-w0: stopping experience collection -[2024-08-11 11:07:48,838][02624] Signal inference workers to resume experience collection... -[2024-08-11 11:07:48,839][02638] InferenceWorker_p0-w0: resuming experience collection -[2024-08-11 11:07:50,026][00221] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 176.5. Samples: 2648. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2024-08-11 11:07:50,029][00221] Avg episode reward: [(0, '2.365')] -[2024-08-11 11:07:55,025][00221] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 379.5. Samples: 7590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:07:55,028][00221] Avg episode reward: [(0, '3.581')] -[2024-08-11 11:07:57,903][02638] Updated weights for policy 0, policy_version 10 (0.0028) -[2024-08-11 11:08:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 411.3. Samples: 10282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:08:00,030][00221] Avg episode reward: [(0, '4.249')] -[2024-08-11 11:08:05,028][00221] Fps is (10 sec: 3276.0, 60 sec: 2047.8, 300 sec: 2047.8). Total num frames: 61440. Throughput: 0: 519.8. Samples: 15594. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-08-11 11:08:05,030][00221] Avg episode reward: [(0, '4.344')] -[2024-08-11 11:08:09,961][02638] Updated weights for policy 0, policy_version 20 (0.0030) -[2024-08-11 11:08:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 577.4. Samples: 20210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:08:10,030][00221] Avg episode reward: [(0, '4.291')] -[2024-08-11 11:08:15,025][00221] Fps is (10 sec: 4097.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 593.7. Samples: 23748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:08:15,030][00221] Avg episode reward: [(0, '4.206')] -[2024-08-11 11:08:15,033][02624] Saving new best policy, reward=4.206! -[2024-08-11 11:08:19,696][02638] Updated weights for policy 0, policy_version 30 (0.0038) -[2024-08-11 11:08:20,026][00221] Fps is (10 sec: 4095.7, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 122880. Throughput: 0: 676.1. Samples: 30424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:08:20,031][00221] Avg episode reward: [(0, '4.498')] -[2024-08-11 11:08:20,042][02624] Saving new best policy, reward=4.498! -[2024-08-11 11:08:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 767.1. Samples: 34518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:08:25,029][00221] Avg episode reward: [(0, '4.519')] -[2024-08-11 11:08:25,035][02624] Saving new best policy, reward=4.519! -[2024-08-11 11:08:30,026][00221] Fps is (10 sec: 3686.7, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 829.3. Samples: 37464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:08:30,028][00221] Avg episode reward: [(0, '4.483')] -[2024-08-11 11:08:30,817][02638] Updated weights for policy 0, policy_version 40 (0.0031) -[2024-08-11 11:08:35,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 928.4. Samples: 44428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:08:35,029][00221] Avg episode reward: [(0, '4.519')] -[2024-08-11 11:08:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 927.6. Samples: 49330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:08:40,029][00221] Avg episode reward: [(0, '4.503')] -[2024-08-11 11:08:42,608][02638] Updated weights for policy 0, policy_version 50 (0.0027) -[2024-08-11 11:08:45,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 917.8. Samples: 51584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:08:45,032][00221] Avg episode reward: [(0, '4.420')] -[2024-08-11 11:08:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3167.6). Total num frames: 237568. Throughput: 0: 955.4. Samples: 58586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:08:50,029][00221] Avg episode reward: [(0, '4.429')] -[2024-08-11 11:08:51,502][02638] Updated weights for policy 0, policy_version 60 (0.0043) -[2024-08-11 11:08:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 986.8. Samples: 64618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:08:55,033][00221] Avg episode reward: [(0, '4.443')] -[2024-08-11 11:09:00,026][00221] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 955.5. Samples: 66746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:09:00,032][00221] Avg episode reward: [(0, '4.466')] -[2024-08-11 11:09:02,932][02638] Updated weights for policy 0, policy_version 70 (0.0026) -[2024-08-11 11:09:05,030][00221] Fps is (10 sec: 4094.3, 60 sec: 3891.1, 300 sec: 3276.7). Total num frames: 294912. Throughput: 0: 935.6. Samples: 72528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:05,031][00221] Avg episode reward: [(0, '4.518')] -[2024-08-11 11:09:10,026][00221] Fps is (10 sec: 4915.4, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 1002.1. Samples: 79614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:09:10,028][00221] Avg episode reward: [(0, '4.654')] -[2024-08-11 11:09:10,039][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... -[2024-08-11 11:09:10,256][02624] Saving new best policy, reward=4.654! -[2024-08-11 11:09:13,094][02638] Updated weights for policy 0, policy_version 80 (0.0027) -[2024-08-11 11:09:15,026][00221] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 987.3. Samples: 81892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:09:15,036][00221] Avg episode reward: [(0, '4.569')] -[2024-08-11 11:09:20,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 934.1. Samples: 86462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:20,030][00221] Avg episode reward: [(0, '4.548')] -[2024-08-11 11:09:23,620][02638] Updated weights for policy 0, policy_version 90 (0.0042) -[2024-08-11 11:09:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 982.0. Samples: 93522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:25,031][00221] Avg episode reward: [(0, '4.648')] -[2024-08-11 11:09:30,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 1008.4. Samples: 96964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:30,028][00221] Avg episode reward: [(0, '4.812')] -[2024-08-11 11:09:30,034][02624] Saving new best policy, reward=4.812! -[2024-08-11 11:09:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 945.3. Samples: 101124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:35,030][00221] Avg episode reward: [(0, '5.051')] -[2024-08-11 11:09:35,033][02624] Saving new best policy, reward=5.051! -[2024-08-11 11:09:35,576][02638] Updated weights for policy 0, policy_version 100 (0.0043) -[2024-08-11 11:09:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 946.1. Samples: 107192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:40,028][00221] Avg episode reward: [(0, '4.925')] -[2024-08-11 11:09:44,444][02638] Updated weights for policy 0, policy_version 110 (0.0016) -[2024-08-11 11:09:45,026][00221] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 977.8. Samples: 110746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:45,032][00221] Avg episode reward: [(0, '4.736')] -[2024-08-11 11:09:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 970.8. Samples: 116208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:09:50,032][00221] Avg episode reward: [(0, '4.835')] -[2024-08-11 11:09:55,026][00221] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 927.9. Samples: 121370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:09:55,032][00221] Avg episode reward: [(0, '4.950')] -[2024-08-11 11:09:56,096][02638] Updated weights for policy 0, policy_version 120 (0.0035) -[2024-08-11 11:10:00,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 956.9. Samples: 124954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:00,028][00221] Avg episode reward: [(0, '5.510')] -[2024-08-11 11:10:00,039][02624] Saving new best policy, reward=5.510! -[2024-08-11 11:10:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 992.4. Samples: 131122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:05,031][00221] Avg episode reward: [(0, '5.280')] -[2024-08-11 11:10:07,161][02638] Updated weights for policy 0, policy_version 130 (0.0022) -[2024-08-11 11:10:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 929.4. Samples: 135346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:10,028][00221] Avg episode reward: [(0, '5.245')] -[2024-08-11 11:10:15,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 925.2. Samples: 138600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:15,028][00221] Avg episode reward: [(0, '5.166')] -[2024-08-11 11:10:16,873][02638] Updated weights for policy 0, policy_version 140 (0.0027) -[2024-08-11 11:10:20,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 988.3. Samples: 145596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:10:20,028][00221] Avg episode reward: [(0, '5.137')] -[2024-08-11 11:10:25,027][00221] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 960.0. Samples: 150394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:10:25,036][00221] Avg episode reward: [(0, '5.073')] -[2024-08-11 11:10:28,523][02638] Updated weights for policy 0, policy_version 150 (0.0034) -[2024-08-11 11:10:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 930.5. Samples: 152616. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) -[2024-08-11 11:10:30,028][00221] Avg episode reward: [(0, '5.416')] -[2024-08-11 11:10:35,026][00221] Fps is (10 sec: 4506.2, 60 sec: 3959.5, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 958.7. Samples: 159348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:10:35,032][00221] Avg episode reward: [(0, '5.517')] -[2024-08-11 11:10:35,036][02624] Saving new best policy, reward=5.517! -[2024-08-11 11:10:38,186][02638] Updated weights for policy 0, policy_version 160 (0.0044) -[2024-08-11 11:10:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 975.2. Samples: 165252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:40,029][00221] Avg episode reward: [(0, '5.320')] -[2024-08-11 11:10:45,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3535.5). Total num frames: 671744. Throughput: 0: 941.3. Samples: 167312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:10:45,032][00221] Avg episode reward: [(0, '5.314')] -[2024-08-11 11:10:49,536][02638] Updated weights for policy 0, policy_version 170 (0.0024) -[2024-08-11 11:10:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 934.9. Samples: 173192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:10:50,028][00221] Avg episode reward: [(0, '5.312')] -[2024-08-11 11:10:55,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 992.7. Samples: 180018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:10:55,028][00221] Avg episode reward: [(0, '5.576')] -[2024-08-11 11:10:55,031][02624] Saving new best policy, reward=5.576! -[2024-08-11 11:11:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 965.8. Samples: 182060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:11:00,028][00221] Avg episode reward: [(0, '5.473')] -[2024-08-11 11:11:01,383][02638] Updated weights for policy 0, policy_version 180 (0.0013) -[2024-08-11 11:11:05,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3569.4). Total num frames: 749568. Throughput: 0: 917.9. Samples: 186900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:11:05,029][00221] Avg episode reward: [(0, '5.703')] -[2024-08-11 11:11:05,033][02624] Saving new best policy, reward=5.703! -[2024-08-11 11:11:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 958.1. Samples: 193508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:10,029][00221] Avg episode reward: [(0, '5.576')] -[2024-08-11 11:11:10,054][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth... -[2024-08-11 11:11:10,812][02638] Updated weights for policy 0, policy_version 190 (0.0033) -[2024-08-11 11:11:15,030][00221] Fps is (10 sec: 4094.2, 60 sec: 3822.7, 300 sec: 3593.2). Total num frames: 790528. Throughput: 0: 979.1. Samples: 196682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:11:15,032][00221] Avg episode reward: [(0, '5.626')] -[2024-08-11 11:11:20,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 917.8. Samples: 200648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:20,028][00221] Avg episode reward: [(0, '5.952')] -[2024-08-11 11:11:20,036][02624] Saving new best policy, reward=5.952! -[2024-08-11 11:11:23,302][02638] Updated weights for policy 0, policy_version 200 (0.0044) -[2024-08-11 11:11:25,026][00221] Fps is (10 sec: 3278.2, 60 sec: 3754.7, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 915.1. Samples: 206432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:25,033][00221] Avg episode reward: [(0, '6.141')] -[2024-08-11 11:11:25,036][02624] Saving new best policy, reward=6.141! -[2024-08-11 11:11:30,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 939.6. Samples: 209592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:11:30,028][00221] Avg episode reward: [(0, '6.266')] -[2024-08-11 11:11:30,037][02624] Saving new best policy, reward=6.266! -[2024-08-11 11:11:35,014][02638] Updated weights for policy 0, policy_version 210 (0.0040) -[2024-08-11 11:11:35,034][00221] Fps is (10 sec: 3683.3, 60 sec: 3617.6, 300 sec: 3583.9). Total num frames: 860160. Throughput: 0: 916.9. Samples: 214460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:11:35,036][00221] Avg episode reward: [(0, '6.211')] -[2024-08-11 11:11:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 874.5. Samples: 219372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:40,032][00221] Avg episode reward: [(0, '5.928')] -[2024-08-11 11:11:45,026][00221] Fps is (10 sec: 3689.5, 60 sec: 3754.7, 300 sec: 3588.1). Total num frames: 897024. Throughput: 0: 903.5. Samples: 222718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:45,028][00221] Avg episode reward: [(0, '6.145')] -[2024-08-11 11:11:45,065][02638] Updated weights for policy 0, policy_version 220 (0.0027) -[2024-08-11 11:11:50,030][00221] Fps is (10 sec: 4094.3, 60 sec: 3686.1, 300 sec: 3598.0). Total num frames: 917504. Throughput: 0: 935.6. Samples: 229004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:11:50,040][00221] Avg episode reward: [(0, '6.251')] -[2024-08-11 11:11:55,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 876.3. Samples: 232940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:11:55,029][00221] Avg episode reward: [(0, '6.556')] -[2024-08-11 11:11:55,032][02624] Saving new best policy, reward=6.556! -[2024-08-11 11:11:57,385][02638] Updated weights for policy 0, policy_version 230 (0.0027) -[2024-08-11 11:12:00,026][00221] Fps is (10 sec: 3278.1, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 875.2. Samples: 236062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:00,028][00221] Avg episode reward: [(0, '6.573')] -[2024-08-11 11:12:00,041][02624] Saving new best policy, reward=6.573! -[2024-08-11 11:12:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3595.4). Total num frames: 970752. Throughput: 0: 925.5. Samples: 242294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:05,031][00221] Avg episode reward: [(0, '6.943')] -[2024-08-11 11:12:05,033][02624] Saving new best policy, reward=6.943! -[2024-08-11 11:12:08,652][02638] Updated weights for policy 0, policy_version 240 (0.0014) -[2024-08-11 11:12:10,027][00221] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 897.0. Samples: 246798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:10,030][00221] Avg episode reward: [(0, '7.336')] -[2024-08-11 11:12:10,045][02624] Saving new best policy, reward=7.336! -[2024-08-11 11:12:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3584.0). Total num frames: 1003520. Throughput: 0: 876.9. Samples: 249054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:15,031][00221] Avg episode reward: [(0, '7.270')] -[2024-08-11 11:12:18,836][02638] Updated weights for policy 0, policy_version 250 (0.0026) -[2024-08-11 11:12:20,026][00221] Fps is (10 sec: 4506.3, 60 sec: 3754.7, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 924.2. Samples: 256042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:20,032][00221] Avg episode reward: [(0, '7.300')] -[2024-08-11 11:12:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3601.7). Total num frames: 1044480. Throughput: 0: 943.6. Samples: 261834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:12:25,029][00221] Avg episode reward: [(0, '8.103')] -[2024-08-11 11:12:25,034][02624] Saving new best policy, reward=8.103! -[2024-08-11 11:12:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 1060864. Throughput: 0: 914.5. Samples: 263870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:12:30,029][00221] Avg episode reward: [(0, '8.483')] -[2024-08-11 11:12:30,039][02624] Saving new best policy, reward=8.483! -[2024-08-11 11:12:30,692][02638] Updated weights for policy 0, policy_version 260 (0.0020) -[2024-08-11 11:12:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.9, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 908.6. Samples: 269888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:35,031][00221] Avg episode reward: [(0, '8.058')] -[2024-08-11 11:12:39,572][02638] Updated weights for policy 0, policy_version 270 (0.0021) -[2024-08-11 11:12:40,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 975.4. Samples: 276834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:12:40,032][00221] Avg episode reward: [(0, '7.896')] -[2024-08-11 11:12:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1118208. Throughput: 0: 953.0. Samples: 278948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:12:45,028][00221] Avg episode reward: [(0, '7.974')] -[2024-08-11 11:12:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3762.8). Total num frames: 1138688. Throughput: 0: 930.0. Samples: 284144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:12:50,032][00221] Avg episode reward: [(0, '8.498')] -[2024-08-11 11:12:50,040][02624] Saving new best policy, reward=8.498! -[2024-08-11 11:12:51,185][02638] Updated weights for policy 0, policy_version 280 (0.0017) -[2024-08-11 11:12:55,025][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1163264. Throughput: 0: 979.9. Samples: 290892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:12:55,028][00221] Avg episode reward: [(0, '8.551')] -[2024-08-11 11:12:55,030][02624] Saving new best policy, reward=8.551! -[2024-08-11 11:13:00,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1179648. Throughput: 0: 998.2. Samples: 293972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:13:00,038][00221] Avg episode reward: [(0, '8.753')] -[2024-08-11 11:13:00,051][02624] Saving new best policy, reward=8.753! -[2024-08-11 11:13:02,861][02638] Updated weights for policy 0, policy_version 290 (0.0041) -[2024-08-11 11:13:05,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1191936. Throughput: 0: 932.2. Samples: 297990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:13:05,027][00221] Avg episode reward: [(0, '9.329')] -[2024-08-11 11:13:05,092][02624] Saving new best policy, reward=9.329! -[2024-08-11 11:13:10,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3776.6). Total num frames: 1216512. Throughput: 0: 953.7. Samples: 304750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2024-08-11 11:13:10,028][00221] Avg episode reward: [(0, '9.344')] -[2024-08-11 11:13:10,043][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth... -[2024-08-11 11:13:10,195][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth -[2024-08-11 11:13:10,218][02624] Saving new best policy, reward=9.344! -[2024-08-11 11:13:12,035][02638] Updated weights for policy 0, policy_version 300 (0.0026) -[2024-08-11 11:13:15,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1236992. Throughput: 0: 980.2. Samples: 307980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:13:15,035][00221] Avg episode reward: [(0, '9.644')] -[2024-08-11 11:13:15,037][02624] Saving new best policy, reward=9.644! -[2024-08-11 11:13:20,027][00221] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 1253376. Throughput: 0: 954.6. Samples: 312846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-08-11 11:13:20,033][00221] Avg episode reward: [(0, '9.587')] -[2024-08-11 11:13:24,011][02638] Updated weights for policy 0, policy_version 310 (0.0026) -[2024-08-11 11:13:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1269760. Throughput: 0: 923.5. Samples: 318392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:13:25,028][00221] Avg episode reward: [(0, '10.096')] -[2024-08-11 11:13:25,033][02624] Saving new best policy, reward=10.096! -[2024-08-11 11:13:30,026][00221] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1294336. Throughput: 0: 952.0. Samples: 321788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:13:30,028][00221] Avg episode reward: [(0, '10.197')] -[2024-08-11 11:13:30,035][02624] Saving new best policy, reward=10.197! -[2024-08-11 11:13:33,912][02638] Updated weights for policy 0, policy_version 320 (0.0021) -[2024-08-11 11:13:35,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1310720. Throughput: 0: 968.3. Samples: 327718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:13:35,030][00221] Avg episode reward: [(0, '9.887')] -[2024-08-11 11:13:40,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1327104. Throughput: 0: 918.9. Samples: 332242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:13:40,030][00221] Avg episode reward: [(0, '10.191')] -[2024-08-11 11:13:44,529][02638] Updated weights for policy 0, policy_version 330 (0.0022) -[2024-08-11 11:13:45,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1351680. Throughput: 0: 929.7. Samples: 335806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:13:45,033][00221] Avg episode reward: [(0, '10.427')] -[2024-08-11 11:13:45,036][02624] Saving new best policy, reward=10.427! -[2024-08-11 11:13:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1372160. Throughput: 0: 994.2. Samples: 342730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:13:50,032][00221] Avg episode reward: [(0, '11.068')] -[2024-08-11 11:13:50,047][02624] Saving new best policy, reward=11.068! -[2024-08-11 11:13:55,029][00221] Fps is (10 sec: 3275.7, 60 sec: 3686.2, 300 sec: 3776.6). Total num frames: 1384448. Throughput: 0: 941.0. Samples: 347100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:13:55,031][00221] Avg episode reward: [(0, '11.449')] -[2024-08-11 11:13:55,041][02624] Saving new best policy, reward=11.449! -[2024-08-11 11:13:56,588][02638] Updated weights for policy 0, policy_version 340 (0.0033) -[2024-08-11 11:14:00,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1409024. Throughput: 0: 928.5. Samples: 349764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:14:00,032][00221] Avg episode reward: [(0, '11.639')] -[2024-08-11 11:14:00,043][02624] Saving new best policy, reward=11.639! -[2024-08-11 11:14:05,026][00221] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 1429504. Throughput: 0: 972.4. Samples: 356602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:14:05,028][00221] Avg episode reward: [(0, '11.222')] -[2024-08-11 11:14:05,499][02638] Updated weights for policy 0, policy_version 350 (0.0028) -[2024-08-11 11:14:10,026][00221] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1445888. Throughput: 0: 969.3. Samples: 362012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:14:10,030][00221] Avg episode reward: [(0, '12.143')] -[2024-08-11 11:14:10,037][02624] Saving new best policy, reward=12.143! -[2024-08-11 11:14:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1462272. Throughput: 0: 941.0. Samples: 364134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:14:15,028][00221] Avg episode reward: [(0, '12.381')] -[2024-08-11 11:14:15,030][02624] Saving new best policy, reward=12.381! -[2024-08-11 11:14:17,058][02638] Updated weights for policy 0, policy_version 360 (0.0027) -[2024-08-11 11:14:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 1486848. Throughput: 0: 953.5. Samples: 370624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:14:20,029][00221] Avg episode reward: [(0, '12.402')] -[2024-08-11 11:14:20,037][02624] Saving new best policy, reward=12.402! -[2024-08-11 11:14:25,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1507328. Throughput: 0: 992.3. Samples: 376896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:14:25,029][00221] Avg episode reward: [(0, '13.962')] -[2024-08-11 11:14:25,036][02624] Saving new best policy, reward=13.962! -[2024-08-11 11:14:28,242][02638] Updated weights for policy 0, policy_version 370 (0.0034) -[2024-08-11 11:14:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1519616. Throughput: 0: 955.7. Samples: 378812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:14:30,028][00221] Avg episode reward: [(0, '14.123')] -[2024-08-11 11:14:30,043][02624] Saving new best policy, reward=14.123! -[2024-08-11 11:14:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1540096. Throughput: 0: 919.7. Samples: 384116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:14:35,031][00221] Avg episode reward: [(0, '13.593')] -[2024-08-11 11:14:38,188][02638] Updated weights for policy 0, policy_version 380 (0.0042) -[2024-08-11 11:14:40,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1564672. Throughput: 0: 976.1. Samples: 391020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:14:40,028][00221] Avg episode reward: [(0, '12.575')] -[2024-08-11 11:14:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1576960. Throughput: 0: 974.2. Samples: 393602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:14:45,032][00221] Avg episode reward: [(0, '11.420')] -[2024-08-11 11:14:49,858][02638] Updated weights for policy 0, policy_version 390 (0.0032) -[2024-08-11 11:14:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1597440. Throughput: 0: 921.5. Samples: 398070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:14:50,028][00221] Avg episode reward: [(0, '12.136')] -[2024-08-11 11:14:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3762.8). Total num frames: 1617920. Throughput: 0: 954.1. Samples: 404948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:14:55,028][00221] Avg episode reward: [(0, '13.016')] -[2024-08-11 11:14:59,575][02638] Updated weights for policy 0, policy_version 400 (0.0027) -[2024-08-11 11:15:00,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 1638400. Throughput: 0: 981.6. Samples: 408308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:15:00,028][00221] Avg episode reward: [(0, '13.780')] -[2024-08-11 11:15:05,028][00221] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3762.7). Total num frames: 1650688. Throughput: 0: 932.8. Samples: 412604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:15:05,030][00221] Avg episode reward: [(0, '14.839')] -[2024-08-11 11:15:05,034][02624] Saving new best policy, reward=14.839! -[2024-08-11 11:15:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1671168. Throughput: 0: 924.1. Samples: 418480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:15:10,027][00221] Avg episode reward: [(0, '14.709')] -[2024-08-11 11:15:10,122][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth... -[2024-08-11 11:15:10,252][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000189_774144.pth -[2024-08-11 11:15:11,089][02638] Updated weights for policy 0, policy_version 410 (0.0040) -[2024-08-11 11:15:15,028][00221] Fps is (10 sec: 4505.5, 60 sec: 3891.0, 300 sec: 3762.7). Total num frames: 1695744. Throughput: 0: 957.6. Samples: 421908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:15:15,030][00221] Avg episode reward: [(0, '14.394')] -[2024-08-11 11:15:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1712128. Throughput: 0: 962.0. Samples: 427406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:15:20,032][00221] Avg episode reward: [(0, '14.244')] -[2024-08-11 11:15:22,893][02638] Updated weights for policy 0, policy_version 420 (0.0049) -[2024-08-11 11:15:25,026][00221] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1728512. Throughput: 0: 916.6. Samples: 432266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:15:25,030][00221] Avg episode reward: [(0, '13.350')] -[2024-08-11 11:15:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1748992. Throughput: 0: 935.6. Samples: 435702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:15:30,033][00221] Avg episode reward: [(0, '13.590')] -[2024-08-11 11:15:31,812][02638] Updated weights for policy 0, policy_version 430 (0.0030) -[2024-08-11 11:15:35,026][00221] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1769472. Throughput: 0: 980.5. Samples: 442194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:15:35,029][00221] Avg episode reward: [(0, '14.427')] -[2024-08-11 11:15:40,026][00221] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1781760. Throughput: 0: 919.3. Samples: 446316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:15:40,033][00221] Avg episode reward: [(0, '15.065')] -[2024-08-11 11:15:40,042][02624] Saving new best policy, reward=15.065! -[2024-08-11 11:15:43,822][02638] Updated weights for policy 0, policy_version 440 (0.0026) -[2024-08-11 11:15:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1806336. Throughput: 0: 912.2. Samples: 449358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:15:45,028][00221] Avg episode reward: [(0, '16.714')] -[2024-08-11 11:15:45,032][02624] Saving new best policy, reward=16.714! -[2024-08-11 11:15:50,026][00221] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1826816. Throughput: 0: 969.1. Samples: 456210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:15:50,028][00221] Avg episode reward: [(0, '17.284')] -[2024-08-11 11:15:50,034][02624] Saving new best policy, reward=17.284! -[2024-08-11 11:15:54,743][02638] Updated weights for policy 0, policy_version 450 (0.0024) -[2024-08-11 11:15:55,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1843200. Throughput: 0: 945.8. Samples: 461042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:15:55,028][00221] Avg episode reward: [(0, '17.711')] -[2024-08-11 11:15:55,036][02624] Saving new best policy, reward=17.711! -[2024-08-11 11:16:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1859584. Throughput: 0: 916.6. Samples: 463154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:16:00,031][00221] Avg episode reward: [(0, '17.663')] -[2024-08-11 11:16:04,964][02638] Updated weights for policy 0, policy_version 460 (0.0025) -[2024-08-11 11:16:05,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 1884160. Throughput: 0: 941.4. Samples: 469768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:16:05,028][00221] Avg episode reward: [(0, '16.361')] -[2024-08-11 11:16:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1900544. Throughput: 0: 965.4. Samples: 475708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:10,029][00221] Avg episode reward: [(0, '16.539')] -[2024-08-11 11:16:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 1916928. Throughput: 0: 933.9. Samples: 477728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:15,032][00221] Avg episode reward: [(0, '16.211')] -[2024-08-11 11:16:16,754][02638] Updated weights for policy 0, policy_version 470 (0.0044) -[2024-08-11 11:16:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1937408. Throughput: 0: 922.3. Samples: 483696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:16:20,030][00221] Avg episode reward: [(0, '16.116')] -[2024-08-11 11:16:25,029][00221] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 1961984. Throughput: 0: 983.7. Samples: 490586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:16:25,035][00221] Avg episode reward: [(0, '15.525')] -[2024-08-11 11:16:26,052][02638] Updated weights for policy 0, policy_version 480 (0.0033) -[2024-08-11 11:16:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.8). Total num frames: 1974272. Throughput: 0: 964.8. Samples: 492772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:16:30,029][00221] Avg episode reward: [(0, '16.422')] -[2024-08-11 11:16:35,026][00221] Fps is (10 sec: 2868.2, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1990656. Throughput: 0: 911.9. Samples: 497244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:35,028][00221] Avg episode reward: [(0, '18.527')] -[2024-08-11 11:16:35,034][02624] Saving new best policy, reward=18.527! -[2024-08-11 11:16:37,971][02638] Updated weights for policy 0, policy_version 490 (0.0037) -[2024-08-11 11:16:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2015232. Throughput: 0: 946.3. Samples: 503624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:40,033][00221] Avg episode reward: [(0, '18.219')] -[2024-08-11 11:16:45,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2027520. Throughput: 0: 966.2. Samples: 506634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:45,032][00221] Avg episode reward: [(0, '18.134')] -[2024-08-11 11:16:50,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2043904. Throughput: 0: 905.0. Samples: 510492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:50,031][00221] Avg episode reward: [(0, '18.496')] -[2024-08-11 11:16:50,669][02638] Updated weights for policy 0, policy_version 500 (0.0048) -[2024-08-11 11:16:55,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2064384. Throughput: 0: 910.8. Samples: 516694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:16:55,032][00221] Avg episode reward: [(0, '17.020')] -[2024-08-11 11:16:59,729][02638] Updated weights for policy 0, policy_version 510 (0.0025) -[2024-08-11 11:17:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2088960. Throughput: 0: 943.2. Samples: 520170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:17:00,030][00221] Avg episode reward: [(0, '16.019')] -[2024-08-11 11:17:05,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 2101248. Throughput: 0: 922.4. Samples: 525206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:05,042][00221] Avg episode reward: [(0, '16.993')] -[2024-08-11 11:17:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2121728. Throughput: 0: 883.1. Samples: 530322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:17:10,028][00221] Avg episode reward: [(0, '17.511')] -[2024-08-11 11:17:10,043][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth... -[2024-08-11 11:17:10,207][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth -[2024-08-11 11:17:11,817][02638] Updated weights for policy 0, policy_version 520 (0.0028) -[2024-08-11 11:17:15,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2142208. Throughput: 0: 908.1. Samples: 533638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:15,031][00221] Avg episode reward: [(0, '20.203')] -[2024-08-11 11:17:15,034][02624] Saving new best policy, reward=20.203! -[2024-08-11 11:17:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2158592. Throughput: 0: 948.6. Samples: 539930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:20,031][00221] Avg episode reward: [(0, '20.424')] -[2024-08-11 11:17:20,044][02624] Saving new best policy, reward=20.424! -[2024-08-11 11:17:23,197][02638] Updated weights for policy 0, policy_version 530 (0.0018) -[2024-08-11 11:17:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3776.7). Total num frames: 2174976. Throughput: 0: 897.3. Samples: 544002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:17:25,027][00221] Avg episode reward: [(0, '21.389')] -[2024-08-11 11:17:25,033][02624] Saving new best policy, reward=21.389! -[2024-08-11 11:17:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2195456. Throughput: 0: 900.3. Samples: 547146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:30,028][00221] Avg episode reward: [(0, '20.842')] -[2024-08-11 11:17:32,892][02638] Updated weights for policy 0, policy_version 540 (0.0032) -[2024-08-11 11:17:35,029][00221] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2220032. Throughput: 0: 968.0. Samples: 554050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:35,031][00221] Avg episode reward: [(0, '20.329')] -[2024-08-11 11:17:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2232320. Throughput: 0: 931.2. Samples: 558598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:17:40,031][00221] Avg episode reward: [(0, '18.347')] -[2024-08-11 11:17:44,730][02638] Updated weights for policy 0, policy_version 550 (0.0019) -[2024-08-11 11:17:45,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2252800. Throughput: 0: 905.3. Samples: 560908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:17:45,028][00221] Avg episode reward: [(0, '18.289')] -[2024-08-11 11:17:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2277376. Throughput: 0: 950.2. Samples: 567966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:17:50,028][00221] Avg episode reward: [(0, '17.787')] -[2024-08-11 11:17:54,442][02638] Updated weights for policy 0, policy_version 560 (0.0037) -[2024-08-11 11:17:55,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2293760. Throughput: 0: 966.4. Samples: 573810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:17:55,031][00221] Avg episode reward: [(0, '17.534')] -[2024-08-11 11:18:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2310144. Throughput: 0: 938.9. Samples: 575890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:18:00,028][00221] Avg episode reward: [(0, '18.843')] -[2024-08-11 11:18:05,029][00221] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3776.6). Total num frames: 2330624. Throughput: 0: 938.6. Samples: 582172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:18:05,031][00221] Avg episode reward: [(0, '18.744')] -[2024-08-11 11:18:05,111][02638] Updated weights for policy 0, policy_version 570 (0.0045) -[2024-08-11 11:18:10,025][00221] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2355200. Throughput: 0: 997.6. Samples: 588892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:18:10,031][00221] Avg episode reward: [(0, '18.280')] -[2024-08-11 11:18:15,026][00221] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2367488. Throughput: 0: 974.4. Samples: 590992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:18:15,032][00221] Avg episode reward: [(0, '18.767')] -[2024-08-11 11:18:16,946][02638] Updated weights for policy 0, policy_version 580 (0.0024) -[2024-08-11 11:18:20,026][00221] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2387968. Throughput: 0: 934.9. Samples: 596120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:18:20,029][00221] Avg episode reward: [(0, '18.884')] -[2024-08-11 11:18:25,026][00221] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3790.5). Total num frames: 2412544. Throughput: 0: 989.5. Samples: 603128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:18:25,033][00221] Avg episode reward: [(0, '18.700')] -[2024-08-11 11:18:25,704][02638] Updated weights for policy 0, policy_version 590 (0.0041) -[2024-08-11 11:18:30,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2428928. Throughput: 0: 1004.6. Samples: 606114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:18:30,029][00221] Avg episode reward: [(0, '19.671')] -[2024-08-11 11:18:35,026][00221] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2445312. Throughput: 0: 942.6. Samples: 610382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:18:35,032][00221] Avg episode reward: [(0, '19.638')] -[2024-08-11 11:18:37,507][02638] Updated weights for policy 0, policy_version 600 (0.0024) -[2024-08-11 11:18:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2465792. Throughput: 0: 961.2. Samples: 617062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:18:40,027][00221] Avg episode reward: [(0, '21.888')] -[2024-08-11 11:18:40,036][02624] Saving new best policy, reward=21.888! -[2024-08-11 11:18:45,028][00221] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3776.6). Total num frames: 2486272. Throughput: 0: 989.1. Samples: 620400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:18:45,034][00221] Avg episode reward: [(0, '21.658')] -[2024-08-11 11:18:48,378][02638] Updated weights for policy 0, policy_version 610 (0.0031) -[2024-08-11 11:18:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2502656. Throughput: 0: 956.7. Samples: 625222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:18:50,027][00221] Avg episode reward: [(0, '22.797')] -[2024-08-11 11:18:50,044][02624] Saving new best policy, reward=22.797! -[2024-08-11 11:18:55,026][00221] Fps is (10 sec: 3687.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2523136. Throughput: 0: 933.5. Samples: 630898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:18:55,033][00221] Avg episode reward: [(0, '22.441')] -[2024-08-11 11:18:58,269][02638] Updated weights for policy 0, policy_version 620 (0.0031) -[2024-08-11 11:19:00,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2547712. Throughput: 0: 964.5. Samples: 634396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:00,032][00221] Avg episode reward: [(0, '21.793')] -[2024-08-11 11:19:05,027][00221] Fps is (10 sec: 3686.0, 60 sec: 3823.1, 300 sec: 3776.6). Total num frames: 2560000. Throughput: 0: 983.5. Samples: 640376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:05,031][00221] Avg episode reward: [(0, '21.292')] -[2024-08-11 11:19:09,890][02638] Updated weights for policy 0, policy_version 630 (0.0044) -[2024-08-11 11:19:10,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2580480. Throughput: 0: 927.8. Samples: 644880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:19:10,032][00221] Avg episode reward: [(0, '21.962')] -[2024-08-11 11:19:10,044][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth... -[2024-08-11 11:19:10,171][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000409_1675264.pth -[2024-08-11 11:19:15,026][00221] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2600960. Throughput: 0: 938.1. Samples: 648328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:19:15,031][00221] Avg episode reward: [(0, '21.234')] -[2024-08-11 11:19:18,803][02638] Updated weights for policy 0, policy_version 640 (0.0038) -[2024-08-11 11:19:20,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2621440. Throughput: 0: 997.6. Samples: 655276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:19:20,028][00221] Avg episode reward: [(0, '20.938')] -[2024-08-11 11:19:25,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2637824. Throughput: 0: 945.3. Samples: 659602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:19:25,028][00221] Avg episode reward: [(0, '22.264')] -[2024-08-11 11:19:30,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2658304. Throughput: 0: 931.9. Samples: 662332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:19:30,032][00221] Avg episode reward: [(0, '21.439')] -[2024-08-11 11:19:30,586][02638] Updated weights for policy 0, policy_version 650 (0.0031) -[2024-08-11 11:19:35,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2682880. Throughput: 0: 984.2. Samples: 669510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:19:35,033][00221] Avg episode reward: [(0, '19.851')] -[2024-08-11 11:19:40,028][00221] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 2695168. Throughput: 0: 972.0. Samples: 674642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:40,030][00221] Avg episode reward: [(0, '19.685')] -[2024-08-11 11:19:41,972][02638] Updated weights for policy 0, policy_version 660 (0.0046) -[2024-08-11 11:19:45,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2711552. Throughput: 0: 940.8. Samples: 676732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:45,027][00221] Avg episode reward: [(0, '18.727')] -[2024-08-11 11:19:50,026][00221] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2736128. Throughput: 0: 953.8. Samples: 683294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:50,031][00221] Avg episode reward: [(0, '18.373')] -[2024-08-11 11:19:51,339][02638] Updated weights for policy 0, policy_version 670 (0.0034) -[2024-08-11 11:19:55,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2756608. Throughput: 0: 994.2. Samples: 689618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:19:55,031][00221] Avg episode reward: [(0, '18.635')] -[2024-08-11 11:20:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 2768896. Throughput: 0: 963.8. Samples: 691698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:20:00,032][00221] Avg episode reward: [(0, '19.018')] -[2024-08-11 11:20:02,983][02638] Updated weights for policy 0, policy_version 680 (0.0031) -[2024-08-11 11:20:05,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 2793472. Throughput: 0: 931.8. Samples: 697206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:20:05,033][00221] Avg episode reward: [(0, '19.700')] -[2024-08-11 11:20:10,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 2813952. Throughput: 0: 985.4. Samples: 703946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-08-11 11:20:10,032][00221] Avg episode reward: [(0, '20.203')] -[2024-08-11 11:20:13,090][02638] Updated weights for policy 0, policy_version 690 (0.0025) -[2024-08-11 11:20:15,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2830336. Throughput: 0: 982.6. Samples: 706548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:20:15,030][00221] Avg episode reward: [(0, '21.855')] -[2024-08-11 11:20:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2846720. Throughput: 0: 921.0. Samples: 710956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:20:20,033][00221] Avg episode reward: [(0, '22.166')] -[2024-08-11 11:20:23,950][02638] Updated weights for policy 0, policy_version 700 (0.0041) -[2024-08-11 11:20:25,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2871296. Throughput: 0: 958.9. Samples: 717788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:20:25,028][00221] Avg episode reward: [(0, '22.410')] -[2024-08-11 11:20:30,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2887680. Throughput: 0: 988.7. Samples: 721224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:20:30,028][00221] Avg episode reward: [(0, '23.317')] -[2024-08-11 11:20:30,038][02624] Saving new best policy, reward=23.317! -[2024-08-11 11:20:35,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2904064. Throughput: 0: 939.0. Samples: 725548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:20:35,031][00221] Avg episode reward: [(0, '22.615')] -[2024-08-11 11:20:35,858][02638] Updated weights for policy 0, policy_version 710 (0.0031) -[2024-08-11 11:20:40,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 2924544. Throughput: 0: 929.0. Samples: 731422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:20:40,028][00221] Avg episode reward: [(0, '22.969')] -[2024-08-11 11:20:44,954][02638] Updated weights for policy 0, policy_version 720 (0.0017) -[2024-08-11 11:20:45,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2949120. Throughput: 0: 957.4. Samples: 734780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:20:45,029][00221] Avg episode reward: [(0, '22.177')] -[2024-08-11 11:20:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2961408. Throughput: 0: 958.8. Samples: 740354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:20:50,032][00221] Avg episode reward: [(0, '22.404')] -[2024-08-11 11:20:55,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2977792. Throughput: 0: 914.9. Samples: 745116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:20:55,028][00221] Avg episode reward: [(0, '21.720')] -[2024-08-11 11:20:57,091][02638] Updated weights for policy 0, policy_version 730 (0.0032) -[2024-08-11 11:21:00,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3002368. Throughput: 0: 927.5. Samples: 748288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:00,033][00221] Avg episode reward: [(0, '21.466')] -[2024-08-11 11:21:05,029][00221] Fps is (10 sec: 4094.7, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 3018752. Throughput: 0: 970.5. Samples: 754632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:05,032][00221] Avg episode reward: [(0, '19.561')] -[2024-08-11 11:21:08,814][02638] Updated weights for policy 0, policy_version 740 (0.0020) -[2024-08-11 11:21:10,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3776.6). Total num frames: 3031040. Throughput: 0: 905.4. Samples: 758530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:21:10,028][00221] Avg episode reward: [(0, '22.074')] -[2024-08-11 11:21:10,041][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth... -[2024-08-11 11:21:10,218][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000518_2121728.pth -[2024-08-11 11:21:15,026][00221] Fps is (10 sec: 3277.9, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3051520. Throughput: 0: 884.6. Samples: 761030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:15,028][00221] Avg episode reward: [(0, '21.431')] -[2024-08-11 11:21:20,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3067904. Throughput: 0: 918.6. Samples: 766884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:21:20,028][00221] Avg episode reward: [(0, '22.100')] -[2024-08-11 11:21:20,056][02638] Updated weights for policy 0, policy_version 750 (0.0022) -[2024-08-11 11:21:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3084288. Throughput: 0: 887.8. Samples: 771372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:25,029][00221] Avg episode reward: [(0, '22.699')] -[2024-08-11 11:21:30,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3748.9). Total num frames: 3096576. Throughput: 0: 853.2. Samples: 773172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:21:30,029][00221] Avg episode reward: [(0, '21.897')] -[2024-08-11 11:21:33,078][02638] Updated weights for policy 0, policy_version 760 (0.0020) -[2024-08-11 11:21:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3121152. Throughput: 0: 861.0. Samples: 779100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:35,028][00221] Avg episode reward: [(0, '23.582')] -[2024-08-11 11:21:35,032][02624] Saving new best policy, reward=23.582! -[2024-08-11 11:21:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3137536. Throughput: 0: 889.6. Samples: 785146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:21:40,028][00221] Avg episode reward: [(0, '22.574')] -[2024-08-11 11:21:44,933][02638] Updated weights for policy 0, policy_version 770 (0.0028) -[2024-08-11 11:21:45,031][00221] Fps is (10 sec: 3274.9, 60 sec: 3413.0, 300 sec: 3762.7). Total num frames: 3153920. Throughput: 0: 862.0. Samples: 787082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:21:45,034][00221] Avg episode reward: [(0, '23.081')] -[2024-08-11 11:21:50,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 3174400. Throughput: 0: 838.4. Samples: 792356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:21:50,032][00221] Avg episode reward: [(0, '22.650')] -[2024-08-11 11:21:54,600][02638] Updated weights for policy 0, policy_version 780 (0.0030) -[2024-08-11 11:21:55,026][00221] Fps is (10 sec: 4098.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3194880. Throughput: 0: 899.2. Samples: 798994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:21:55,031][00221] Avg episode reward: [(0, '23.446')] -[2024-08-11 11:22:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3762.8). Total num frames: 3211264. Throughput: 0: 902.6. Samples: 801648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:22:00,031][00221] Avg episode reward: [(0, '23.298')] -[2024-08-11 11:22:05,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3748.9). Total num frames: 3227648. Throughput: 0: 868.4. Samples: 805960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:05,027][00221] Avg episode reward: [(0, '23.506')] -[2024-08-11 11:22:06,404][02638] Updated weights for policy 0, policy_version 790 (0.0032) -[2024-08-11 11:22:10,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3248128. Throughput: 0: 915.9. Samples: 812588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:10,028][00221] Avg episode reward: [(0, '23.944')] -[2024-08-11 11:22:10,039][02624] Saving new best policy, reward=23.944! -[2024-08-11 11:22:15,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3268608. Throughput: 0: 944.7. Samples: 815682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:22:15,028][00221] Avg episode reward: [(0, '25.160')] -[2024-08-11 11:22:15,033][02624] Saving new best policy, reward=25.160! -[2024-08-11 11:22:17,841][02638] Updated weights for policy 0, policy_version 800 (0.0033) -[2024-08-11 11:22:20,029][00221] Fps is (10 sec: 3275.7, 60 sec: 3549.7, 300 sec: 3748.8). Total num frames: 3280896. Throughput: 0: 910.6. Samples: 820080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:22:20,031][00221] Avg episode reward: [(0, '25.180')] -[2024-08-11 11:22:20,045][02624] Saving new best policy, reward=25.180! -[2024-08-11 11:22:25,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3301376. Throughput: 0: 898.4. Samples: 825572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:22:25,032][00221] Avg episode reward: [(0, '22.755')] -[2024-08-11 11:22:28,292][02638] Updated weights for policy 0, policy_version 810 (0.0035) -[2024-08-11 11:22:30,026][00221] Fps is (10 sec: 4507.1, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3325952. Throughput: 0: 930.3. Samples: 828938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:22:30,028][00221] Avg episode reward: [(0, '22.099')] -[2024-08-11 11:22:35,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3338240. Throughput: 0: 934.9. Samples: 834426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:35,030][00221] Avg episode reward: [(0, '23.050')] -[2024-08-11 11:22:40,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3354624. Throughput: 0: 889.2. Samples: 839008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:22:40,028][00221] Avg episode reward: [(0, '21.480')] -[2024-08-11 11:22:40,507][02638] Updated weights for policy 0, policy_version 820 (0.0047) -[2024-08-11 11:22:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.8, 300 sec: 3721.1). Total num frames: 3375104. Throughput: 0: 901.8. Samples: 842228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:45,032][00221] Avg episode reward: [(0, '19.371')] -[2024-08-11 11:22:50,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3395584. Throughput: 0: 955.3. Samples: 848948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:50,028][00221] Avg episode reward: [(0, '20.863')] -[2024-08-11 11:22:50,688][02638] Updated weights for policy 0, policy_version 830 (0.0026) -[2024-08-11 11:22:55,027][00221] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 3407872. Throughput: 0: 895.8. Samples: 852902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:22:55,031][00221] Avg episode reward: [(0, '21.220')] -[2024-08-11 11:23:00,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3432448. Throughput: 0: 891.3. Samples: 855792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:23:00,032][00221] Avg episode reward: [(0, '20.061')] -[2024-08-11 11:23:01,824][02638] Updated weights for policy 0, policy_version 840 (0.0037) -[2024-08-11 11:23:05,026][00221] Fps is (10 sec: 4506.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3452928. Throughput: 0: 943.1. Samples: 862516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:05,031][00221] Avg episode reward: [(0, '21.955')] -[2024-08-11 11:23:10,026][00221] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3469312. Throughput: 0: 933.4. Samples: 867574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:10,032][00221] Avg episode reward: [(0, '22.055')] -[2024-08-11 11:23:10,045][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth... -[2024-08-11 11:23:10,206][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth -[2024-08-11 11:23:13,874][02638] Updated weights for policy 0, policy_version 850 (0.0016) -[2024-08-11 11:23:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 3485696. Throughput: 0: 902.0. Samples: 869530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:15,032][00221] Avg episode reward: [(0, '22.322')] -[2024-08-11 11:23:20,026][00221] Fps is (10 sec: 3686.5, 60 sec: 3754.9, 300 sec: 3707.2). Total num frames: 3506176. Throughput: 0: 925.4. Samples: 876068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:20,028][00221] Avg episode reward: [(0, '25.090')] -[2024-08-11 11:23:22,905][02638] Updated weights for policy 0, policy_version 860 (0.0042) -[2024-08-11 11:23:25,027][00221] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 3526656. Throughput: 0: 958.1. Samples: 882124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:23:25,029][00221] Avg episode reward: [(0, '25.432')] -[2024-08-11 11:23:25,033][02624] Saving new best policy, reward=25.432! -[2024-08-11 11:23:30,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3538944. Throughput: 0: 931.6. Samples: 884152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:23:30,029][00221] Avg episode reward: [(0, '26.744')] -[2024-08-11 11:23:30,045][02624] Saving new best policy, reward=26.744! -[2024-08-11 11:23:35,026][00221] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3559424. Throughput: 0: 894.0. Samples: 889176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:23:35,028][00221] Avg episode reward: [(0, '26.132')] -[2024-08-11 11:23:35,571][02638] Updated weights for policy 0, policy_version 870 (0.0027) -[2024-08-11 11:23:40,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3579904. Throughput: 0: 952.1. Samples: 895744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:40,029][00221] Avg episode reward: [(0, '27.634')] -[2024-08-11 11:23:40,044][02624] Saving new best policy, reward=27.634! -[2024-08-11 11:23:45,029][00221] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3707.2). Total num frames: 3596288. Throughput: 0: 944.9. Samples: 898314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:23:45,031][00221] Avg episode reward: [(0, '26.240')] -[2024-08-11 11:23:47,467][02638] Updated weights for policy 0, policy_version 880 (0.0026) -[2024-08-11 11:23:50,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3612672. Throughput: 0: 888.8. Samples: 902512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:23:50,030][00221] Avg episode reward: [(0, '25.763')] -[2024-08-11 11:23:55,026][00221] Fps is (10 sec: 4097.3, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 3637248. Throughput: 0: 923.2. Samples: 909118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:23:55,033][00221] Avg episode reward: [(0, '25.150')] -[2024-08-11 11:23:56,670][02638] Updated weights for policy 0, policy_version 890 (0.0024) -[2024-08-11 11:24:00,026][00221] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3653632. Throughput: 0: 956.8. Samples: 912586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-08-11 11:24:00,032][00221] Avg episode reward: [(0, '26.175')] -[2024-08-11 11:24:05,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 3665920. Throughput: 0: 905.7. Samples: 916824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:05,032][00221] Avg episode reward: [(0, '26.069')] -[2024-08-11 11:24:08,956][02638] Updated weights for policy 0, policy_version 900 (0.0036) -[2024-08-11 11:24:10,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3690496. Throughput: 0: 899.4. Samples: 922596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:10,028][00221] Avg episode reward: [(0, '25.586')] -[2024-08-11 11:24:15,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3710976. Throughput: 0: 929.9. Samples: 925998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-08-11 11:24:15,034][00221] Avg episode reward: [(0, '26.683')] -[2024-08-11 11:24:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3723264. Throughput: 0: 934.2. Samples: 931216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-08-11 11:24:20,033][00221] Avg episode reward: [(0, '26.923')] -[2024-08-11 11:24:20,185][02638] Updated weights for policy 0, policy_version 910 (0.0029) -[2024-08-11 11:24:25,026][00221] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3739648. Throughput: 0: 888.5. Samples: 935726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:24:25,033][00221] Avg episode reward: [(0, '27.951')] -[2024-08-11 11:24:25,125][02624] Saving new best policy, reward=27.951! -[2024-08-11 11:24:30,026][00221] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3764224. Throughput: 0: 901.6. Samples: 938884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:30,033][00221] Avg episode reward: [(0, '28.608')] -[2024-08-11 11:24:30,044][02624] Saving new best policy, reward=28.608! -[2024-08-11 11:24:30,747][02638] Updated weights for policy 0, policy_version 920 (0.0023) -[2024-08-11 11:24:35,026][00221] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3780608. Throughput: 0: 950.2. Samples: 945272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:35,028][00221] Avg episode reward: [(0, '28.215')] -[2024-08-11 11:24:40,026][00221] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3792896. Throughput: 0: 892.0. Samples: 949260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:40,034][00221] Avg episode reward: [(0, '29.099')] -[2024-08-11 11:24:40,092][02624] Saving new best policy, reward=29.099! -[2024-08-11 11:24:43,063][02638] Updated weights for policy 0, policy_version 930 (0.0028) -[2024-08-11 11:24:45,026][00221] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 3817472. Throughput: 0: 876.8. Samples: 952044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:24:45,027][00221] Avg episode reward: [(0, '27.512')] -[2024-08-11 11:24:50,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3837952. Throughput: 0: 933.5. Samples: 958832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:24:50,032][00221] Avg episode reward: [(0, '26.756')] -[2024-08-11 11:24:53,269][02638] Updated weights for policy 0, policy_version 940 (0.0018) -[2024-08-11 11:24:55,030][00221] Fps is (10 sec: 3684.8, 60 sec: 3617.9, 300 sec: 3679.4). Total num frames: 3854336. Throughput: 0: 916.2. Samples: 963830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:24:55,032][00221] Avg episode reward: [(0, '26.608')] -[2024-08-11 11:25:00,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3870720. Throughput: 0: 887.2. Samples: 965920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:25:00,033][00221] Avg episode reward: [(0, '24.298')] -[2024-08-11 11:25:04,216][02638] Updated weights for policy 0, policy_version 950 (0.0019) -[2024-08-11 11:25:05,026][00221] Fps is (10 sec: 3688.1, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 3891200. Throughput: 0: 914.5. Samples: 972370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:25:05,034][00221] Avg episode reward: [(0, '22.812')] -[2024-08-11 11:25:10,026][00221] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3911680. Throughput: 0: 952.8. Samples: 978600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:25:10,033][00221] Avg episode reward: [(0, '22.324')] -[2024-08-11 11:25:10,044][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000955_3911680.pth... -[2024-08-11 11:25:10,219][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000740_3031040.pth -[2024-08-11 11:25:15,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3923968. Throughput: 0: 925.5. Samples: 980532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-08-11 11:25:15,027][00221] Avg episode reward: [(0, '23.255')] -[2024-08-11 11:25:16,408][02638] Updated weights for policy 0, policy_version 960 (0.0037) -[2024-08-11 11:25:20,026][00221] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3944448. Throughput: 0: 897.4. Samples: 985654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:25:20,032][00221] Avg episode reward: [(0, '24.524')] -[2024-08-11 11:25:25,026][00221] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 3969024. Throughput: 0: 958.1. Samples: 992374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:25:25,028][00221] Avg episode reward: [(0, '24.152')] -[2024-08-11 11:25:25,679][02638] Updated weights for policy 0, policy_version 970 (0.0017) -[2024-08-11 11:25:30,028][00221] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3651.7). Total num frames: 3981312. Throughput: 0: 951.8. Samples: 994876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-08-11 11:25:30,036][00221] Avg episode reward: [(0, '25.939')] -[2024-08-11 11:25:35,026][00221] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3997696. Throughput: 0: 887.3. Samples: 998762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-08-11 11:25:35,032][00221] Avg episode reward: [(0, '24.894')] -[2024-08-11 11:25:36,308][02624] Stopping Batcher_0... -[2024-08-11 11:25:36,309][02624] Loop batcher_evt_loop terminating... -[2024-08-11 11:25:36,309][00221] Component Batcher_0 stopped! -[2024-08-11 11:25:36,311][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-08-11 11:25:36,395][00221] Component RolloutWorker_w1 stopped! -[2024-08-11 11:25:36,400][02643] Stopping RolloutWorker_w5... -[2024-08-11 11:25:36,402][00221] Component RolloutWorker_w5 stopped! -[2024-08-11 11:25:36,410][02638] Weights refcount: 2 0 -[2024-08-11 11:25:36,395][02639] Stopping RolloutWorker_w1... -[2024-08-11 11:25:36,403][02643] Loop rollout_proc5_evt_loop terminating... -[2024-08-11 11:25:36,415][00221] Component RolloutWorker_w4 stopped! -[2024-08-11 11:25:36,419][02642] Stopping RolloutWorker_w4... -[2024-08-11 11:25:36,421][00221] Component InferenceWorker_p0-w0 stopped! -[2024-08-11 11:25:36,413][02639] Loop rollout_proc1_evt_loop terminating... -[2024-08-11 11:25:36,425][02638] Stopping InferenceWorker_p0-w0... -[2024-08-11 11:25:36,426][02638] Loop inference_proc0-0_evt_loop terminating... -[2024-08-11 11:25:36,433][02644] Stopping RolloutWorker_w6... -[2024-08-11 11:25:36,433][00221] Component RolloutWorker_w6 stopped! -[2024-08-11 11:25:36,434][00221] Component RolloutWorker_w7 stopped! -[2024-08-11 11:25:36,434][02642] Loop rollout_proc4_evt_loop terminating... -[2024-08-11 11:25:36,443][02644] Loop rollout_proc6_evt_loop terminating... -[2024-08-11 11:25:36,433][02645] Stopping RolloutWorker_w7... -[2024-08-11 11:25:36,449][00221] Component RolloutWorker_w2 stopped! -[2024-08-11 11:25:36,455][00221] Component RolloutWorker_w0 stopped! -[2024-08-11 11:25:36,449][02645] Loop rollout_proc7_evt_loop terminating... -[2024-08-11 11:25:36,451][02637] Stopping RolloutWorker_w0... -[2024-08-11 11:25:36,463][02637] Loop rollout_proc0_evt_loop terminating... -[2024-08-11 11:25:36,449][02640] Stopping RolloutWorker_w2... -[2024-08-11 11:25:36,469][02641] Stopping RolloutWorker_w3... -[2024-08-11 11:25:36,469][00221] Component RolloutWorker_w3 stopped! -[2024-08-11 11:25:36,472][02641] Loop rollout_proc3_evt_loop terminating... -[2024-08-11 11:25:36,468][02640] Loop rollout_proc2_evt_loop terminating... -[2024-08-11 11:25:36,497][02624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000847_3469312.pth -[2024-08-11 11:25:36,513][02624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-08-11 11:25:36,690][00221] Component LearnerWorker_p0 stopped! -[2024-08-11 11:25:36,690][02624] Stopping LearnerWorker_p0... -[2024-08-11 11:25:36,693][02624] Loop learner_proc0_evt_loop terminating... -[2024-08-11 11:25:36,693][00221] Waiting for process learner_proc0 to stop... -[2024-08-11 11:25:38,211][00221] Waiting for process inference_proc0-0 to join... -[2024-08-11 11:25:38,216][00221] Waiting for process rollout_proc0 to join... -[2024-08-11 11:25:40,086][00221] Waiting for process rollout_proc1 to join... -[2024-08-11 11:25:40,090][00221] Waiting for process rollout_proc2 to join... -[2024-08-11 11:25:40,095][00221] Waiting for process rollout_proc3 to join... -[2024-08-11 11:25:40,099][00221] Waiting for process rollout_proc4 to join... -[2024-08-11 11:25:40,104][00221] Waiting for process rollout_proc5 to join... -[2024-08-11 11:25:40,108][00221] Waiting for process rollout_proc6 to join... -[2024-08-11 11:25:40,111][00221] Waiting for process rollout_proc7 to join... -[2024-08-11 11:25:40,114][00221] Batcher 0 profile tree view: -batching: 27.8949, releasing_batches: 0.0314 -[2024-08-11 11:25:40,118][00221] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0049 - wait_policy_total: 411.5365 -update_model: 9.2979 - weight_update: 0.0019 -one_step: 0.0032 - handle_policy_step: 615.1092 - deserialize: 16.0363, stack: 3.0893, obs_to_device_normalize: 124.3913, forward: 328.9067, send_messages: 30.8385 - prepare_outputs: 82.2664 - to_cpu: 47.2236 -[2024-08-11 11:25:40,120][00221] Learner 0 profile tree view: -misc: 0.0073, prepare_batch: 14.9899 -train: 73.7830 - epoch_init: 0.0057, minibatch_init: 0.0180, losses_postprocess: 0.6744, kl_divergence: 0.6177, after_optimizer: 34.0486 - calculate_losses: 25.9707 - losses_init: 0.0037, forward_head: 1.3265, bptt_initial: 16.9622, tail: 1.1766, advantages_returns: 0.2774, losses: 3.7369 - bptt: 2.0988 - bptt_forward_core: 2.0037 - update: 11.7642 - clip: 0.9665 -[2024-08-11 11:25:40,123][00221] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.3571, enqueue_policy_requests: 101.0353, env_step: 841.4191, overhead: 14.3779, complete_rollouts: 7.6473 -save_policy_outputs: 21.0938 - split_output_tensors: 8.3709 -[2024-08-11 11:25:40,124][00221] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.3584, enqueue_policy_requests: 103.4353, env_step: 837.6421, overhead: 14.2160, complete_rollouts: 6.8723 -save_policy_outputs: 21.4510 - split_output_tensors: 8.8589 -[2024-08-11 11:25:40,125][00221] Loop Runner_EvtLoop terminating... -[2024-08-11 11:25:40,127][00221] Runner profile tree view: -main_loop: 1105.7774 -[2024-08-11 11:25:40,129][00221] Collected {0: 4005888}, FPS: 3622.7 -[2024-08-11 11:25:40,585][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-08-11 11:25:40,587][00221] Overriding arg 'num_workers' with value 1 passed from command line -[2024-08-11 11:25:40,590][00221] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-08-11 11:25:40,593][00221] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-08-11 11:25:40,594][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-08-11 11:25:40,597][00221] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-08-11 11:25:40,598][00221] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-08-11 11:25:40,599][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-08-11 11:25:40,601][00221] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-08-11 11:25:40,602][00221] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-08-11 11:25:40,603][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-08-11 11:25:40,604][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-08-11 11:25:40,605][00221] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-08-11 11:25:40,606][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-08-11 11:25:40,607][00221] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-08-11 11:25:40,640][00221] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-08-11 11:25:40,644][00221] RunningMeanStd input shape: (3, 72, 128) -[2024-08-11 11:25:40,646][00221] RunningMeanStd input shape: (1,) -[2024-08-11 11:25:40,666][00221] ConvEncoder: input_channels=3 -[2024-08-11 11:25:40,772][00221] Conv encoder output size: 512 -[2024-08-11 11:25:40,774][00221] Policy head output size: 512 -[2024-08-11 11:25:40,965][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-08-11 11:25:41,779][00221] Num frames 100... -[2024-08-11 11:25:41,899][00221] Num frames 200... -[2024-08-11 11:25:42,030][00221] Num frames 300... -[2024-08-11 11:25:42,151][00221] Num frames 400... -[2024-08-11 11:25:42,270][00221] Num frames 500... -[2024-08-11 11:25:42,394][00221] Num frames 600... -[2024-08-11 11:25:42,516][00221] Num frames 700... -[2024-08-11 11:25:42,641][00221] Num frames 800... -[2024-08-11 11:25:42,764][00221] Num frames 900... -[2024-08-11 11:25:42,888][00221] Num frames 1000... -[2024-08-11 11:25:43,019][00221] Num frames 1100... -[2024-08-11 11:25:43,172][00221] Num frames 1200... -[2024-08-11 11:25:43,347][00221] Num frames 1300... -[2024-08-11 11:25:43,513][00221] Num frames 1400... -[2024-08-11 11:25:43,680][00221] Num frames 1500... -[2024-08-11 11:25:43,848][00221] Num frames 1600... -[2024-08-11 11:25:44,016][00221] Num frames 1700... -[2024-08-11 11:25:44,197][00221] Num frames 1800... -[2024-08-11 11:25:44,368][00221] Num frames 1900... -[2024-08-11 11:25:44,542][00221] Num frames 2000... -[2024-08-11 11:25:44,714][00221] Num frames 2100... -[2024-08-11 11:25:44,769][00221] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 -[2024-08-11 11:25:44,771][00221] Avg episode reward: 58.999, avg true_objective: 21.000 -[2024-08-11 11:25:44,944][00221] Num frames 2200... -[2024-08-11 11:25:45,122][00221] Num frames 2300... -[2024-08-11 11:25:45,294][00221] Num frames 2400... -[2024-08-11 11:25:45,470][00221] Num frames 2500... -[2024-08-11 11:25:45,638][00221] Num frames 2600... -[2024-08-11 11:25:45,757][00221] Num frames 2700... -[2024-08-11 11:25:45,876][00221] Num frames 2800... -[2024-08-11 11:25:45,994][00221] Num frames 2900... -[2024-08-11 11:25:46,121][00221] Num frames 3000... -[2024-08-11 11:25:46,250][00221] Num frames 3100... -[2024-08-11 11:25:46,375][00221] Num frames 3200... -[2024-08-11 11:25:46,500][00221] Num frames 3300... -[2024-08-11 11:25:46,604][00221] Avg episode rewards: #0: 47.184, true rewards: #0: 16.685 -[2024-08-11 11:25:46,605][00221] Avg episode reward: 47.184, avg true_objective: 16.685 -[2024-08-11 11:25:46,681][00221] Num frames 3400... -[2024-08-11 11:25:46,803][00221] Num frames 3500... -[2024-08-11 11:25:46,921][00221] Num frames 3600... -[2024-08-11 11:25:47,038][00221] Num frames 3700... -[2024-08-11 11:25:47,173][00221] Num frames 3800... -[2024-08-11 11:25:47,291][00221] Num frames 3900... -[2024-08-11 11:25:47,412][00221] Num frames 4000... -[2024-08-11 11:25:47,496][00221] Avg episode rewards: #0: 37.746, true rewards: #0: 13.413 -[2024-08-11 11:25:47,498][00221] Avg episode reward: 37.746, avg true_objective: 13.413 -[2024-08-11 11:25:47,601][00221] Num frames 4100... -[2024-08-11 11:25:47,722][00221] Num frames 4200... -[2024-08-11 11:25:47,844][00221] Num frames 4300... -[2024-08-11 11:25:47,967][00221] Num frames 4400... -[2024-08-11 11:25:48,085][00221] Num frames 4500... -[2024-08-11 11:25:48,223][00221] Num frames 4600... -[2024-08-11 11:25:48,342][00221] Num frames 4700... -[2024-08-11 11:25:48,462][00221] Num frames 4800... -[2024-08-11 11:25:48,587][00221] Num frames 4900... -[2024-08-11 11:25:48,720][00221] Num frames 5000... -[2024-08-11 11:25:48,846][00221] Num frames 5100... -[2024-08-11 11:25:48,969][00221] Num frames 5200... -[2024-08-11 11:25:49,090][00221] Num frames 5300... -[2024-08-11 11:25:49,194][00221] Avg episode rewards: #0: 35.590, true rewards: #0: 13.340 -[2024-08-11 11:25:49,196][00221] Avg episode reward: 35.590, avg true_objective: 13.340 -[2024-08-11 11:25:49,276][00221] Num frames 5400... -[2024-08-11 11:25:49,394][00221] Num frames 5500... -[2024-08-11 11:25:49,520][00221] Num frames 5600... -[2024-08-11 11:25:49,640][00221] Num frames 5700... -[2024-08-11 11:25:49,768][00221] Num frames 5800... -[2024-08-11 11:25:49,899][00221] Num frames 5900... -[2024-08-11 11:25:50,029][00221] Num frames 6000... -[2024-08-11 11:25:50,156][00221] Num frames 6100... -[2024-08-11 11:25:50,286][00221] Num frames 6200... -[2024-08-11 11:25:50,404][00221] Num frames 6300... -[2024-08-11 11:25:50,529][00221] Num frames 6400... -[2024-08-11 11:25:50,650][00221] Num frames 6500... -[2024-08-11 11:25:50,776][00221] Num frames 6600... -[2024-08-11 11:25:50,896][00221] Num frames 6700... -[2024-08-11 11:25:51,020][00221] Num frames 6800... -[2024-08-11 11:25:51,145][00221] Num frames 6900... -[2024-08-11 11:25:51,278][00221] Num frames 7000... -[2024-08-11 11:25:51,403][00221] Num frames 7100... -[2024-08-11 11:25:51,505][00221] Avg episode rewards: #0: 36.672, true rewards: #0: 14.272 -[2024-08-11 11:25:51,507][00221] Avg episode reward: 36.672, avg true_objective: 14.272 -[2024-08-11 11:25:51,588][00221] Num frames 7200... -[2024-08-11 11:25:51,707][00221] Num frames 7300... -[2024-08-11 11:25:51,827][00221] Num frames 7400... -[2024-08-11 11:25:51,948][00221] Num frames 7500... -[2024-08-11 11:25:52,069][00221] Num frames 7600... -[2024-08-11 11:25:52,198][00221] Num frames 7700... -[2024-08-11 11:25:52,325][00221] Num frames 7800... -[2024-08-11 11:25:52,444][00221] Num frames 7900... -[2024-08-11 11:25:52,567][00221] Num frames 8000... -[2024-08-11 11:25:52,687][00221] Num frames 8100... -[2024-08-11 11:25:52,853][00221] Avg episode rewards: #0: 34.320, true rewards: #0: 13.653 -[2024-08-11 11:25:52,856][00221] Avg episode reward: 34.320, avg true_objective: 13.653 -[2024-08-11 11:25:52,869][00221] Num frames 8200... -[2024-08-11 11:25:52,984][00221] Num frames 8300... -[2024-08-11 11:25:53,110][00221] Num frames 8400... -[2024-08-11 11:25:53,229][00221] Num frames 8500... -[2024-08-11 11:25:53,359][00221] Num frames 8600... -[2024-08-11 11:25:53,477][00221] Num frames 8700... -[2024-08-11 11:25:53,598][00221] Num frames 8800... -[2024-08-11 11:25:53,719][00221] Num frames 8900... -[2024-08-11 11:25:53,842][00221] Num frames 9000... -[2024-08-11 11:25:53,964][00221] Num frames 9100... -[2024-08-11 11:25:54,049][00221] Avg episode rewards: #0: 32.317, true rewards: #0: 13.031 -[2024-08-11 11:25:54,050][00221] Avg episode reward: 32.317, avg true_objective: 13.031 -[2024-08-11 11:25:54,156][00221] Num frames 9200... -[2024-08-11 11:25:54,278][00221] Num frames 9300... -[2024-08-11 11:25:54,409][00221] Num frames 9400... -[2024-08-11 11:25:54,533][00221] Num frames 9500... -[2024-08-11 11:25:54,650][00221] Num frames 9600... -[2024-08-11 11:25:54,770][00221] Num frames 9700... -[2024-08-11 11:25:54,887][00221] Num frames 9800... -[2024-08-11 11:25:55,009][00221] Num frames 9900... -[2024-08-11 11:25:55,136][00221] Num frames 10000... -[2024-08-11 11:25:55,258][00221] Num frames 10100... -[2024-08-11 11:25:55,385][00221] Num frames 10200... -[2024-08-11 11:25:55,507][00221] Num frames 10300... -[2024-08-11 11:25:55,645][00221] Num frames 10400... -[2024-08-11 11:25:55,821][00221] Num frames 10500... -[2024-08-11 11:25:55,992][00221] Avg episode rewards: #0: 32.702, true rewards: #0: 13.202 -[2024-08-11 11:25:55,994][00221] Avg episode reward: 32.702, avg true_objective: 13.202 -[2024-08-11 11:25:56,066][00221] Num frames 10600... -[2024-08-11 11:25:56,247][00221] Num frames 10700... -[2024-08-11 11:25:56,420][00221] Num frames 10800... -[2024-08-11 11:25:56,580][00221] Num frames 10900... -[2024-08-11 11:25:56,742][00221] Num frames 11000... -[2024-08-11 11:25:56,912][00221] Num frames 11100... -[2024-08-11 11:25:57,084][00221] Num frames 11200... -[2024-08-11 11:25:57,253][00221] Num frames 11300... -[2024-08-11 11:25:57,435][00221] Num frames 11400... -[2024-08-11 11:25:57,608][00221] Num frames 11500... -[2024-08-11 11:25:57,807][00221] Avg episode rewards: #0: 31.762, true rewards: #0: 12.873 -[2024-08-11 11:25:57,809][00221] Avg episode reward: 31.762, avg true_objective: 12.873 -[2024-08-11 11:25:57,837][00221] Num frames 11600... -[2024-08-11 11:25:58,020][00221] Num frames 11700... -[2024-08-11 11:25:58,177][00221] Num frames 11800... -[2024-08-11 11:25:58,303][00221] Num frames 11900... -[2024-08-11 11:25:58,424][00221] Num frames 12000... -[2024-08-11 11:25:58,551][00221] Num frames 12100... -[2024-08-11 11:25:58,672][00221] Num frames 12200... -[2024-08-11 11:25:58,796][00221] Avg episode rewards: #0: 30.058, true rewards: #0: 12.258 -[2024-08-11 11:25:58,798][00221] Avg episode reward: 30.058, avg true_objective: 12.258 -[2024-08-11 11:27:14,671][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-08-11 11:29:31,052][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-08-11 11:29:31,054][00221] Overriding arg 'num_workers' with value 1 passed from command line -[2024-08-11 11:29:31,056][00221] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-08-11 11:29:31,058][00221] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-08-11 11:29:31,059][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-08-11 11:29:31,061][00221] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-08-11 11:29:31,063][00221] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-08-11 11:29:31,067][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-08-11 11:29:31,069][00221] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-08-11 11:29:31,070][00221] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-08-11 11:29:31,071][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-08-11 11:29:31,072][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-08-11 11:29:31,073][00221] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-08-11 11:29:31,074][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-08-11 11:29:31,075][00221] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-08-11 11:29:31,108][00221] RunningMeanStd input shape: (3, 72, 128) -[2024-08-11 11:29:31,110][00221] RunningMeanStd input shape: (1,) -[2024-08-11 11:29:31,126][00221] ConvEncoder: input_channels=3 -[2024-08-11 11:29:31,172][00221] Conv encoder output size: 512 -[2024-08-11 11:29:31,174][00221] Policy head output size: 512 -[2024-08-11 11:29:31,194][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-08-11 11:29:31,617][00221] Num frames 100... -[2024-08-11 11:29:31,747][00221] Num frames 200... -[2024-08-11 11:29:31,876][00221] Num frames 300... -[2024-08-11 11:29:31,998][00221] Num frames 400... -[2024-08-11 11:29:32,122][00221] Num frames 500... -[2024-08-11 11:29:32,246][00221] Num frames 600... -[2024-08-11 11:29:32,369][00221] Num frames 700... -[2024-08-11 11:29:32,493][00221] Num frames 800... -[2024-08-11 11:29:32,618][00221] Num frames 900... -[2024-08-11 11:29:32,747][00221] Num frames 1000... -[2024-08-11 11:29:32,873][00221] Num frames 1100... -[2024-08-11 11:29:32,992][00221] Num frames 1200... -[2024-08-11 11:29:33,119][00221] Num frames 1300... -[2024-08-11 11:29:33,278][00221] Num frames 1400... -[2024-08-11 11:29:33,350][00221] Avg episode rewards: #0: 33.080, true rewards: #0: 14.080 -[2024-08-11 11:29:33,352][00221] Avg episode reward: 33.080, avg true_objective: 14.080 -[2024-08-11 11:29:33,509][00221] Num frames 1500... -[2024-08-11 11:29:33,671][00221] Num frames 1600... -[2024-08-11 11:29:33,849][00221] Num frames 1700... -[2024-08-11 11:29:34,019][00221] Num frames 1800... -[2024-08-11 11:29:34,196][00221] Num frames 1900... -[2024-08-11 11:29:34,355][00221] Num frames 2000... -[2024-08-11 11:29:34,519][00221] Num frames 2100... -[2024-08-11 11:29:34,685][00221] Num frames 2200... -[2024-08-11 11:29:34,864][00221] Num frames 2300... -[2024-08-11 11:29:35,037][00221] Num frames 2400... -[2024-08-11 11:29:35,219][00221] Num frames 2500... -[2024-08-11 11:29:35,433][00221] Avg episode rewards: #0: 30.960, true rewards: #0: 12.960 -[2024-08-11 11:29:35,436][00221] Avg episode reward: 30.960, avg true_objective: 12.960 -[2024-08-11 11:29:35,454][00221] Num frames 2600... -[2024-08-11 11:29:35,623][00221] Num frames 2700... -[2024-08-11 11:29:35,774][00221] Num frames 2800... -[2024-08-11 11:29:35,898][00221] Num frames 2900... -[2024-08-11 11:29:36,018][00221] Num frames 3000... -[2024-08-11 11:29:36,141][00221] Num frames 3100... -[2024-08-11 11:29:36,261][00221] Num frames 3200... -[2024-08-11 11:29:36,377][00221] Num frames 3300... -[2024-08-11 11:29:36,505][00221] Avg episode rewards: #0: 25.533, true rewards: #0: 11.200 -[2024-08-11 11:29:36,507][00221] Avg episode reward: 25.533, avg true_objective: 11.200 -[2024-08-11 11:29:36,557][00221] Num frames 3400... -[2024-08-11 11:29:36,682][00221] Num frames 3500... -[2024-08-11 11:29:36,805][00221] Num frames 3600... -[2024-08-11 11:29:36,935][00221] Num frames 3700... -[2024-08-11 11:29:37,061][00221] Num frames 3800... -[2024-08-11 11:29:37,193][00221] Num frames 3900... -[2024-08-11 11:29:37,321][00221] Num frames 4000... -[2024-08-11 11:29:37,441][00221] Num frames 4100... -[2024-08-11 11:29:37,563][00221] Num frames 4200... -[2024-08-11 11:29:37,681][00221] Num frames 4300... -[2024-08-11 11:29:37,800][00221] Num frames 4400... -[2024-08-11 11:29:37,933][00221] Num frames 4500... -[2024-08-11 11:29:38,004][00221] Avg episode rewards: #0: 26.530, true rewards: #0: 11.280 -[2024-08-11 11:29:38,005][00221] Avg episode reward: 26.530, avg true_objective: 11.280 -[2024-08-11 11:29:38,117][00221] Num frames 4600... -[2024-08-11 11:29:38,242][00221] Num frames 4700... -[2024-08-11 11:29:38,372][00221] Num frames 4800... -[2024-08-11 11:29:38,492][00221] Num frames 4900... -[2024-08-11 11:29:38,615][00221] Num frames 5000... -[2024-08-11 11:29:38,735][00221] Num frames 5100... -[2024-08-11 11:29:38,864][00221] Num frames 5200... -[2024-08-11 11:29:38,993][00221] Num frames 5300... -[2024-08-11 11:29:39,123][00221] Num frames 5400... -[2024-08-11 11:29:39,249][00221] Num frames 5500... -[2024-08-11 11:29:39,374][00221] Num frames 5600... -[2024-08-11 11:29:39,495][00221] Num frames 5700... -[2024-08-11 11:29:39,622][00221] Avg episode rewards: #0: 27.520, true rewards: #0: 11.520 -[2024-08-11 11:29:39,626][00221] Avg episode reward: 27.520, avg true_objective: 11.520 -[2024-08-11 11:29:39,678][00221] Num frames 5800... -[2024-08-11 11:29:39,798][00221] Num frames 5900... -[2024-08-11 11:29:39,926][00221] Num frames 6000... -[2024-08-11 11:29:40,059][00221] Num frames 6100... -[2024-08-11 11:29:40,189][00221] Num frames 6200... -[2024-08-11 11:29:40,317][00221] Num frames 6300... -[2024-08-11 11:29:40,470][00221] Num frames 6400... -[2024-08-11 11:29:40,611][00221] Avg episode rewards: #0: 26.115, true rewards: #0: 10.782 -[2024-08-11 11:29:40,614][00221] Avg episode reward: 26.115, avg true_objective: 10.782 -[2024-08-11 11:29:40,652][00221] Num frames 6500... -[2024-08-11 11:29:40,773][00221] Num frames 6600... -[2024-08-11 11:29:40,896][00221] Num frames 6700... -[2024-08-11 11:29:41,026][00221] Num frames 6800... -[2024-08-11 11:29:41,156][00221] Num frames 6900... -[2024-08-11 11:29:41,277][00221] Num frames 7000... -[2024-08-11 11:29:41,405][00221] Num frames 7100... -[2024-08-11 11:29:41,527][00221] Num frames 7200... -[2024-08-11 11:29:41,634][00221] Avg episode rewards: #0: 25.202, true rewards: #0: 10.344 -[2024-08-11 11:29:41,637][00221] Avg episode reward: 25.202, avg true_objective: 10.344 -[2024-08-11 11:29:41,709][00221] Num frames 7300... -[2024-08-11 11:29:41,832][00221] Num frames 7400... -[2024-08-11 11:29:41,950][00221] Num frames 7500... -[2024-08-11 11:29:42,075][00221] Num frames 7600... -[2024-08-11 11:29:42,206][00221] Num frames 7700... -[2024-08-11 11:29:42,323][00221] Num frames 7800... -[2024-08-11 11:29:42,443][00221] Num frames 7900... -[2024-08-11 11:29:42,561][00221] Num frames 8000... -[2024-08-11 11:29:42,684][00221] Num frames 8100... -[2024-08-11 11:29:42,799][00221] Num frames 8200... -[2024-08-11 11:29:42,923][00221] Num frames 8300... -[2024-08-11 11:29:43,052][00221] Num frames 8400... -[2024-08-11 11:29:43,180][00221] Num frames 8500... -[2024-08-11 11:29:43,304][00221] Num frames 8600... -[2024-08-11 11:29:43,425][00221] Num frames 8700... -[2024-08-11 11:29:43,550][00221] Num frames 8800... -[2024-08-11 11:29:43,673][00221] Num frames 8900... -[2024-08-11 11:29:43,791][00221] Avg episode rewards: #0: 27.185, true rewards: #0: 11.185 -[2024-08-11 11:29:43,793][00221] Avg episode reward: 27.185, avg true_objective: 11.185 -[2024-08-11 11:29:43,861][00221] Num frames 9000... -[2024-08-11 11:29:43,985][00221] Num frames 9100... -[2024-08-11 11:29:44,122][00221] Num frames 9200... -[2024-08-11 11:29:44,256][00221] Num frames 9300... -[2024-08-11 11:29:44,383][00221] Num frames 9400... -[2024-08-11 11:29:44,503][00221] Num frames 9500... -[2024-08-11 11:29:44,628][00221] Num frames 9600... -[2024-08-11 11:29:44,752][00221] Num frames 9700... -[2024-08-11 11:29:44,877][00221] Num frames 9800... -[2024-08-11 11:29:45,024][00221] Avg episode rewards: #0: 26.862, true rewards: #0: 10.973 -[2024-08-11 11:29:45,025][00221] Avg episode reward: 26.862, avg true_objective: 10.973 -[2024-08-11 11:29:45,068][00221] Num frames 9900... -[2024-08-11 11:29:45,201][00221] Num frames 10000... -[2024-08-11 11:29:45,322][00221] Num frames 10100... -[2024-08-11 11:29:45,445][00221] Num frames 10200... -[2024-08-11 11:29:45,565][00221] Num frames 10300... -[2024-08-11 11:29:45,690][00221] Num frames 10400... -[2024-08-11 11:29:45,860][00221] Num frames 10500... -[2024-08-11 11:29:46,029][00221] Num frames 10600... -[2024-08-11 11:29:46,105][00221] Avg episode rewards: #0: 25.712, true rewards: #0: 10.612 -[2024-08-11 11:29:46,107][00221] Avg episode reward: 25.712, avg true_objective: 10.612 -[2024-08-11 11:30:57,504][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-08-11 11:31:06,931][00221] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-08-11 11:31:06,933][00221] Overriding arg 'num_workers' with value 1 passed from command line -[2024-08-11 11:31:06,934][00221] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-08-11 11:31:06,936][00221] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-08-11 11:31:06,938][00221] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-08-11 11:31:06,940][00221] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-08-11 11:31:06,941][00221] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-08-11 11:31:06,943][00221] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-08-11 11:31:06,944][00221] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-08-11 11:31:06,946][00221] Adding new argument 'hf_repository'='maavaneck/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-08-11 11:31:06,947][00221] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-08-11 11:31:06,949][00221] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-08-11 11:31:06,950][00221] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-08-11 11:31:06,951][00221] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-08-11 11:31:06,952][00221] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-08-11 11:31:06,994][00221] RunningMeanStd input shape: (3, 72, 128) -[2024-08-11 11:31:06,997][00221] RunningMeanStd input shape: (1,) -[2024-08-11 11:31:07,016][00221] ConvEncoder: input_channels=3 -[2024-08-11 11:31:07,059][00221] Conv encoder output size: 512 -[2024-08-11 11:31:07,061][00221] Policy head output size: 512 -[2024-08-11 11:31:07,079][00221] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-08-11 11:31:07,519][00221] Num frames 100... -[2024-08-11 11:31:07,641][00221] Num frames 200... -[2024-08-11 11:31:07,760][00221] Num frames 300... -[2024-08-11 11:31:07,886][00221] Num frames 400... -[2024-08-11 11:31:07,962][00221] Avg episode rewards: #0: 6.160, true rewards: #0: 4.160 -[2024-08-11 11:31:07,965][00221] Avg episode reward: 6.160, avg true_objective: 4.160 -[2024-08-11 11:31:08,074][00221] Num frames 500... -[2024-08-11 11:31:08,205][00221] Num frames 600... -[2024-08-11 11:31:08,325][00221] Num frames 700... -[2024-08-11 11:31:08,447][00221] Num frames 800... -[2024-08-11 11:31:08,570][00221] Num frames 900... -[2024-08-11 11:31:08,735][00221] Avg episode rewards: #0: 8.460, true rewards: #0: 4.960 -[2024-08-11 11:31:08,736][00221] Avg episode reward: 8.460, avg true_objective: 4.960 -[2024-08-11 11:31:08,751][00221] Num frames 1000... -[2024-08-11 11:31:08,873][00221] Num frames 1100... -[2024-08-11 11:31:08,995][00221] Num frames 1200... -[2024-08-11 11:31:09,132][00221] Num frames 1300... -[2024-08-11 11:31:09,249][00221] Num frames 1400... -[2024-08-11 11:31:09,375][00221] Num frames 1500... -[2024-08-11 11:31:09,496][00221] Num frames 1600... -[2024-08-11 11:31:09,614][00221] Num frames 1700... -[2024-08-11 11:31:09,694][00221] Avg episode rewards: #0: 10.390, true rewards: #0: 5.723 -[2024-08-11 11:31:09,696][00221] Avg episode reward: 10.390, avg true_objective: 5.723 -[2024-08-11 11:31:09,796][00221] Num frames 1800... -[2024-08-11 11:31:09,919][00221] Num frames 1900... -[2024-08-11 11:31:10,040][00221] Num frames 2000... -[2024-08-11 11:31:10,177][00221] Num frames 2100... -[2024-08-11 11:31:10,306][00221] Num frames 2200... -[2024-08-11 11:31:10,432][00221] Num frames 2300... -[2024-08-11 11:31:10,557][00221] Num frames 2400... -[2024-08-11 11:31:10,678][00221] Num frames 2500... -[2024-08-11 11:31:10,809][00221] Num frames 2600... -[2024-08-11 11:31:10,961][00221] Num frames 2700... -[2024-08-11 11:31:11,084][00221] Num frames 2800... -[2024-08-11 11:31:11,230][00221] Num frames 2900... -[2024-08-11 11:31:11,356][00221] Num frames 3000... -[2024-08-11 11:31:11,477][00221] Num frames 3100... -[2024-08-11 11:31:11,599][00221] Num frames 3200... -[2024-08-11 11:31:11,720][00221] Num frames 3300... -[2024-08-11 11:31:11,851][00221] Num frames 3400... -[2024-08-11 11:31:11,970][00221] Num frames 3500... -[2024-08-11 11:31:12,093][00221] Num frames 3600... -[2024-08-11 11:31:12,248][00221] Avg episode rewards: #0: 20.920, true rewards: #0: 9.170 -[2024-08-11 11:31:12,250][00221] Avg episode reward: 20.920, avg true_objective: 9.170 -[2024-08-11 11:31:12,292][00221] Num frames 3700... -[2024-08-11 11:31:12,415][00221] Num frames 3800... -[2024-08-11 11:31:12,531][00221] Num frames 3900... -[2024-08-11 11:31:12,648][00221] Num frames 4000... -[2024-08-11 11:31:12,770][00221] Num frames 4100... -[2024-08-11 11:31:12,887][00221] Num frames 4200... -[2024-08-11 11:31:12,998][00221] Avg episode rewards: #0: 18.488, true rewards: #0: 8.488 -[2024-08-11 11:31:12,999][00221] Avg episode reward: 18.488, avg true_objective: 8.488 -[2024-08-11 11:31:13,068][00221] Num frames 4300... -[2024-08-11 11:31:13,204][00221] Num frames 4400... -[2024-08-11 11:31:13,328][00221] Num frames 4500... -[2024-08-11 11:31:13,448][00221] Num frames 4600... -[2024-08-11 11:31:13,565][00221] Num frames 4700... -[2024-08-11 11:31:13,686][00221] Num frames 4800... -[2024-08-11 11:31:13,804][00221] Num frames 4900... -[2024-08-11 11:31:13,925][00221] Num frames 5000... -[2024-08-11 11:31:14,043][00221] Num frames 5100... -[2024-08-11 11:31:14,195][00221] Avg episode rewards: #0: 18.787, true rewards: #0: 8.620 -[2024-08-11 11:31:14,199][00221] Avg episode reward: 18.787, avg true_objective: 8.620 -[2024-08-11 11:31:14,243][00221] Num frames 5200... -[2024-08-11 11:31:14,364][00221] Num frames 5300... -[2024-08-11 11:31:14,488][00221] Num frames 5400... -[2024-08-11 11:31:14,609][00221] Num frames 5500... -[2024-08-11 11:31:14,731][00221] Num frames 5600... -[2024-08-11 11:31:14,855][00221] Num frames 5700... -[2024-08-11 11:31:14,932][00221] Avg episode rewards: #0: 17.166, true rewards: #0: 8.166 -[2024-08-11 11:31:14,935][00221] Avg episode reward: 17.166, avg true_objective: 8.166 -[2024-08-11 11:31:15,042][00221] Num frames 5800... -[2024-08-11 11:31:15,172][00221] Num frames 5900... -[2024-08-11 11:31:15,303][00221] Num frames 6000... -[2024-08-11 11:31:15,424][00221] Num frames 6100... -[2024-08-11 11:31:15,545][00221] Num frames 6200... -[2024-08-11 11:31:15,667][00221] Num frames 6300... -[2024-08-11 11:31:15,788][00221] Num frames 6400... -[2024-08-11 11:31:15,907][00221] Num frames 6500... -[2024-08-11 11:31:16,036][00221] Num frames 6600... -[2024-08-11 11:31:16,166][00221] Num frames 6700... -[2024-08-11 11:31:16,304][00221] Num frames 6800... -[2024-08-11 11:31:16,429][00221] Num frames 6900... -[2024-08-11 11:31:16,550][00221] Num frames 7000... -[2024-08-11 11:31:16,676][00221] Num frames 7100... -[2024-08-11 11:31:16,849][00221] Num frames 7200... -[2024-08-11 11:31:17,018][00221] Num frames 7300... -[2024-08-11 11:31:17,201][00221] Num frames 7400... -[2024-08-11 11:31:17,392][00221] Num frames 7500... -[2024-08-11 11:31:17,559][00221] Num frames 7600... -[2024-08-11 11:31:17,721][00221] Num frames 7700... -[2024-08-11 11:31:17,881][00221] Num frames 7800... -[2024-08-11 11:31:17,968][00221] Avg episode rewards: #0: 21.020, true rewards: #0: 9.770 -[2024-08-11 11:31:17,970][00221] Avg episode reward: 21.020, avg true_objective: 9.770 -[2024-08-11 11:31:18,119][00221] Num frames 7900... -[2024-08-11 11:31:18,290][00221] Num frames 8000... -[2024-08-11 11:31:18,474][00221] Num frames 8100... -[2024-08-11 11:31:18,655][00221] Num frames 8200... -[2024-08-11 11:31:18,829][00221] Num frames 8300... -[2024-08-11 11:31:19,013][00221] Num frames 8400... -[2024-08-11 11:31:19,212][00221] Num frames 8500... -[2024-08-11 11:31:19,342][00221] Num frames 8600... -[2024-08-11 11:31:19,471][00221] Num frames 8700... -[2024-08-11 11:31:19,594][00221] Num frames 8800... -[2024-08-11 11:31:19,715][00221] Num frames 8900... -[2024-08-11 11:31:19,839][00221] Num frames 9000... -[2024-08-11 11:31:19,966][00221] Num frames 9100... -[2024-08-11 11:31:20,087][00221] Num frames 9200... -[2024-08-11 11:31:20,215][00221] Avg episode rewards: #0: 23.062, true rewards: #0: 10.284 -[2024-08-11 11:31:20,218][00221] Avg episode reward: 23.062, avg true_objective: 10.284 -[2024-08-11 11:31:20,274][00221] Num frames 9300... -[2024-08-11 11:31:20,403][00221] Num frames 9400... -[2024-08-11 11:31:20,528][00221] Num frames 9500... -[2024-08-11 11:31:20,652][00221] Num frames 9600... -[2024-08-11 11:31:20,782][00221] Num frames 9700... -[2024-08-11 11:31:20,906][00221] Num frames 9800... -[2024-08-11 11:31:21,027][00221] Num frames 9900... -[2024-08-11 11:31:21,157][00221] Num frames 10000... -[2024-08-11 11:31:21,281][00221] Num frames 10100... -[2024-08-11 11:31:21,407][00221] Num frames 10200... -[2024-08-11 11:31:21,538][00221] Num frames 10300... -[2024-08-11 11:31:21,659][00221] Num frames 10400... -[2024-08-11 11:31:21,790][00221] Num frames 10500... -[2024-08-11 11:31:21,921][00221] Num frames 10600... -[2024-08-11 11:31:22,048][00221] Num frames 10700... -[2024-08-11 11:31:22,174][00221] Num frames 10800... -[2024-08-11 11:31:22,298][00221] Num frames 10900... -[2024-08-11 11:31:22,419][00221] Num frames 11000... -[2024-08-11 11:31:22,552][00221] Num frames 11100... -[2024-08-11 11:31:22,675][00221] Num frames 11200... -[2024-08-11 11:31:22,794][00221] Num frames 11300... -[2024-08-11 11:31:22,921][00221] Avg episode rewards: #0: 26.856, true rewards: #0: 11.356 -[2024-08-11 11:31:22,923][00221] Avg episode reward: 26.856, avg true_objective: 11.356 -[2024-08-11 11:32:32,804][00221] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-10-03 11:11:46,960][02728] Using optimizer +[2024-10-03 11:11:47,144][00426] Heartbeat connected on Batcher_0 +[2024-10-03 11:11:47,152][00426] Heartbeat connected on InferenceWorker_p0-w0 +[2024-10-03 11:11:47,164][00426] Heartbeat connected on RolloutWorker_w0 +[2024-10-03 11:11:47,169][00426] Heartbeat connected on RolloutWorker_w1 +[2024-10-03 11:11:47,174][00426] Heartbeat connected on RolloutWorker_w2 +[2024-10-03 11:11:47,179][00426] Heartbeat connected on RolloutWorker_w3 +[2024-10-03 11:11:47,184][00426] Heartbeat connected on RolloutWorker_w4 +[2024-10-03 11:11:47,192][00426] Heartbeat connected on RolloutWorker_w5 +[2024-10-03 11:11:47,194][00426] Heartbeat connected on RolloutWorker_w6 +[2024-10-03 11:11:47,199][00426] Heartbeat connected on RolloutWorker_w7 +[2024-10-03 11:11:47,697][02728] No checkpoints found +[2024-10-03 11:11:47,697][02728] Did not load from checkpoint, starting from scratch! +[2024-10-03 11:11:47,698][02728] Initialized policy 0 weights for model version 0 +[2024-10-03 11:11:47,703][02728] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-10-03 11:11:47,711][02728] LearnerWorker_p0 finished initialization! +[2024-10-03 11:11:47,712][00426] Heartbeat connected on LearnerWorker_p0 +[2024-10-03 11:11:47,904][02741] RunningMeanStd input shape: (3, 72, 128) +[2024-10-03 11:11:47,905][02741] RunningMeanStd input shape: (1,) +[2024-10-03 11:11:47,919][02741] ConvEncoder: input_channels=3 +[2024-10-03 11:11:48,033][02741] Conv encoder output size: 512 +[2024-10-03 11:11:48,034][02741] Policy head output size: 512 +[2024-10-03 11:11:48,100][00426] Inference worker 0-0 is ready! +[2024-10-03 11:11:48,102][00426] All inference workers are ready! Signal rollout workers to start! +[2024-10-03 11:11:48,301][02742] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,303][02745] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,303][02746] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,309][02749] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,315][02744] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,306][02747] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,317][02743] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:48,318][02748] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:11:49,640][02742] Decorrelating experience for 0 frames... +[2024-10-03 11:11:49,639][02749] Decorrelating experience for 0 frames... +[2024-10-03 11:11:49,971][02743] Decorrelating experience for 0 frames... +[2024-10-03 11:11:49,973][02744] Decorrelating experience for 0 frames... +[2024-10-03 11:11:49,978][02748] Decorrelating experience for 0 frames... +[2024-10-03 11:11:50,394][02744] Decorrelating experience for 32 frames... +[2024-10-03 11:11:51,036][02749] Decorrelating experience for 32 frames... +[2024-10-03 11:11:51,040][02745] Decorrelating experience for 0 frames... +[2024-10-03 11:11:51,050][02746] Decorrelating experience for 0 frames... +[2024-10-03 11:11:51,062][02742] Decorrelating experience for 32 frames... +[2024-10-03 11:11:51,554][00426] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-10-03 11:11:52,210][02743] Decorrelating experience for 32 frames... +[2024-10-03 11:11:52,373][02744] Decorrelating experience for 64 frames... +[2024-10-03 11:11:52,395][02746] Decorrelating experience for 32 frames... +[2024-10-03 11:11:52,400][02745] Decorrelating experience for 32 frames... +[2024-10-03 11:11:52,422][02748] Decorrelating experience for 32 frames... +[2024-10-03 11:11:52,737][02749] Decorrelating experience for 64 frames... +[2024-10-03 11:11:54,300][02745] Decorrelating experience for 64 frames... +[2024-10-03 11:11:54,302][02746] Decorrelating experience for 64 frames... +[2024-10-03 11:11:54,532][02747] Decorrelating experience for 0 frames... +[2024-10-03 11:11:54,641][02744] Decorrelating experience for 96 frames... +[2024-10-03 11:11:54,715][02749] Decorrelating experience for 96 frames... +[2024-10-03 11:11:55,014][02743] Decorrelating experience for 64 frames... +[2024-10-03 11:11:55,397][02748] Decorrelating experience for 64 frames... +[2024-10-03 11:11:56,554][00426] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-10-03 11:11:56,867][02742] Decorrelating experience for 64 frames... +[2024-10-03 11:11:56,931][02746] Decorrelating experience for 96 frames... +[2024-10-03 11:11:56,934][02745] Decorrelating experience for 96 frames... +[2024-10-03 11:11:57,288][02747] Decorrelating experience for 32 frames... +[2024-10-03 11:11:58,128][02743] Decorrelating experience for 96 frames... +[2024-10-03 11:11:58,816][02742] Decorrelating experience for 96 frames... +[2024-10-03 11:12:00,610][02748] Decorrelating experience for 96 frames... +[2024-10-03 11:12:01,558][00426] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 99.8. Samples: 998. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-10-03 11:12:01,565][00426] Avg episode reward: [(0, '2.116')] +[2024-10-03 11:12:02,430][02747] Decorrelating experience for 64 frames... +[2024-10-03 11:12:03,622][02728] Signal inference workers to stop experience collection... +[2024-10-03 11:12:03,633][02741] InferenceWorker_p0-w0: stopping experience collection +[2024-10-03 11:12:03,896][02747] Decorrelating experience for 96 frames... +[2024-10-03 11:12:06,554][00426] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 172.1. Samples: 2582. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-10-03 11:12:06,556][00426] Avg episode reward: [(0, '2.678')] +[2024-10-03 11:12:06,853][02728] Signal inference workers to resume experience collection... +[2024-10-03 11:12:06,855][02741] InferenceWorker_p0-w0: resuming experience collection +[2024-10-03 11:12:11,554][00426] Fps is (10 sec: 2458.7, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 230.6. Samples: 4612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:12:11,559][00426] Avg episode reward: [(0, '3.617')] +[2024-10-03 11:12:16,558][00426] Fps is (10 sec: 3685.0, 60 sec: 1474.3, 300 sec: 1474.3). Total num frames: 36864. Throughput: 0: 378.0. Samples: 9452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:12:16,563][00426] Avg episode reward: [(0, '3.792')] +[2024-10-03 11:12:17,383][02741] Updated weights for policy 0, policy_version 10 (0.0179) +[2024-10-03 11:12:21,554][00426] Fps is (10 sec: 2867.2, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 53248. Throughput: 0: 477.1. Samples: 14314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:12:21,561][00426] Avg episode reward: [(0, '4.352')] +[2024-10-03 11:12:26,554][00426] Fps is (10 sec: 4097.6, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 500.6. Samples: 17522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:12:26,557][00426] Avg episode reward: [(0, '4.463')] +[2024-10-03 11:12:27,338][02741] Updated weights for policy 0, policy_version 20 (0.0037) +[2024-10-03 11:12:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 589.6. Samples: 23584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:12:31,557][00426] Avg episode reward: [(0, '4.346')] +[2024-10-03 11:12:36,556][00426] Fps is (10 sec: 2866.7, 60 sec: 2366.5, 300 sec: 2366.5). Total num frames: 106496. Throughput: 0: 609.5. Samples: 27430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:12:36,558][00426] Avg episode reward: [(0, '4.251')] +[2024-10-03 11:12:36,562][02728] Saving new best policy, reward=4.251! +[2024-10-03 11:12:40,263][02741] Updated weights for policy 0, policy_version 30 (0.0049) +[2024-10-03 11:12:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 672.7. Samples: 30272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:12:41,557][00426] Avg episode reward: [(0, '4.344')] +[2024-10-03 11:12:41,566][02728] Saving new best policy, reward=4.344! +[2024-10-03 11:12:46,554][00426] Fps is (10 sec: 4096.8, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 147456. Throughput: 0: 792.6. Samples: 36662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:12:46,557][00426] Avg episode reward: [(0, '4.424')] +[2024-10-03 11:12:46,559][02728] Saving new best policy, reward=4.424! +[2024-10-03 11:12:51,382][02741] Updated weights for policy 0, policy_version 40 (0.0033) +[2024-10-03 11:12:51,554][00426] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 857.6. Samples: 41176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-10-03 11:12:51,563][00426] Avg episode reward: [(0, '4.443')] +[2024-10-03 11:12:51,577][02728] Saving new best policy, reward=4.443! +[2024-10-03 11:12:56,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 856.2. Samples: 43140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:12:56,562][00426] Avg episode reward: [(0, '4.426')] +[2024-10-03 11:13:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 888.6. Samples: 49436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:13:01,560][00426] Avg episode reward: [(0, '4.489')] +[2024-10-03 11:13:01,570][02728] Saving new best policy, reward=4.489! +[2024-10-03 11:13:02,314][02741] Updated weights for policy 0, policy_version 50 (0.0017) +[2024-10-03 11:13:06,555][00426] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 901.8. Samples: 54894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:13:06,562][00426] Avg episode reward: [(0, '4.525')] +[2024-10-03 11:13:06,564][02728] Saving new best policy, reward=4.525! +[2024-10-03 11:13:11,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 870.7. Samples: 56704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:13:11,562][00426] Avg episode reward: [(0, '4.499')] +[2024-10-03 11:13:15,199][02741] Updated weights for policy 0, policy_version 60 (0.0039) +[2024-10-03 11:13:16,554][00426] Fps is (10 sec: 3277.0, 60 sec: 3550.1, 300 sec: 2939.5). Total num frames: 249856. Throughput: 0: 851.3. Samples: 61894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:13:16,562][00426] Avg episode reward: [(0, '4.393')] +[2024-10-03 11:13:21,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3003.7). Total num frames: 270336. Throughput: 0: 906.7. Samples: 68232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:13:21,557][00426] Avg episode reward: [(0, '4.367')] +[2024-10-03 11:13:21,568][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth... +[2024-10-03 11:13:26,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2975.0). Total num frames: 282624. Throughput: 0: 893.9. Samples: 70496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:13:26,560][00426] Avg episode reward: [(0, '4.429')] +[2024-10-03 11:13:26,668][02741] Updated weights for policy 0, policy_version 70 (0.0025) +[2024-10-03 11:13:31,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 841.8. Samples: 74544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:13:31,561][00426] Avg episode reward: [(0, '4.327')] +[2024-10-03 11:13:36,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3081.8). Total num frames: 323584. Throughput: 0: 883.0. Samples: 80912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:13:36,563][00426] Avg episode reward: [(0, '4.467')] +[2024-10-03 11:13:37,480][02741] Updated weights for policy 0, policy_version 80 (0.0038) +[2024-10-03 11:13:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3090.6). Total num frames: 339968. Throughput: 0: 910.9. Samples: 84132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:13:41,561][00426] Avg episode reward: [(0, '4.451')] +[2024-10-03 11:13:46,561][00426] Fps is (10 sec: 2865.3, 60 sec: 3412.9, 300 sec: 3062.9). Total num frames: 352256. Throughput: 0: 863.0. Samples: 88276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:13:46,566][00426] Avg episode reward: [(0, '4.275')] +[2024-10-03 11:13:50,158][02741] Updated weights for policy 0, policy_version 90 (0.0048) +[2024-10-03 11:13:51,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3106.1). Total num frames: 372736. Throughput: 0: 865.2. Samples: 93826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:13:51,557][00426] Avg episode reward: [(0, '4.376')] +[2024-10-03 11:13:56,554][00426] Fps is (10 sec: 4098.8, 60 sec: 3549.9, 300 sec: 3145.7). Total num frames: 393216. Throughput: 0: 896.8. Samples: 97060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:13:56,560][00426] Avg episode reward: [(0, '4.622')] +[2024-10-03 11:13:56,563][02728] Saving new best policy, reward=4.622! +[2024-10-03 11:14:01,324][02741] Updated weights for policy 0, policy_version 100 (0.0020) +[2024-10-03 11:14:01,560][00426] Fps is (10 sec: 3684.3, 60 sec: 3481.3, 300 sec: 3150.6). Total num frames: 409600. Throughput: 0: 895.8. Samples: 102208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:14:01,567][00426] Avg episode reward: [(0, '4.697')] +[2024-10-03 11:14:01,578][02728] Saving new best policy, reward=4.697! +[2024-10-03 11:14:06,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3125.1). Total num frames: 421888. Throughput: 0: 850.9. Samples: 106524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:14:06,561][00426] Avg episode reward: [(0, '4.498')] +[2024-10-03 11:14:11,554][00426] Fps is (10 sec: 3688.5, 60 sec: 3618.1, 300 sec: 3189.0). Total num frames: 446464. Throughput: 0: 870.8. Samples: 109682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:14:11,557][00426] Avg episode reward: [(0, '4.278')] +[2024-10-03 11:14:12,546][02741] Updated weights for policy 0, policy_version 110 (0.0034) +[2024-10-03 11:14:16,555][00426] Fps is (10 sec: 4095.7, 60 sec: 3549.8, 300 sec: 3192.0). Total num frames: 462848. Throughput: 0: 924.1. Samples: 116128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:14:16,557][00426] Avg episode reward: [(0, '4.356')] +[2024-10-03 11:14:21,557][00426] Fps is (10 sec: 2866.4, 60 sec: 3413.2, 300 sec: 3167.5). Total num frames: 475136. Throughput: 0: 870.2. Samples: 120074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:14:21,560][00426] Avg episode reward: [(0, '4.456')] +[2024-10-03 11:14:25,025][02741] Updated weights for policy 0, policy_version 120 (0.0024) +[2024-10-03 11:14:26,554][00426] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3197.5). Total num frames: 495616. Throughput: 0: 855.2. Samples: 122616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:14:26,558][00426] Avg episode reward: [(0, '4.430')] +[2024-10-03 11:14:31,554][00426] Fps is (10 sec: 4506.9, 60 sec: 3686.4, 300 sec: 3251.2). Total num frames: 520192. Throughput: 0: 908.9. Samples: 129170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:14:31,561][00426] Avg episode reward: [(0, '4.555')] +[2024-10-03 11:14:35,321][02741] Updated weights for policy 0, policy_version 130 (0.0024) +[2024-10-03 11:14:36,554][00426] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3227.1). Total num frames: 532480. Throughput: 0: 895.1. Samples: 134104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:14:36,556][00426] Avg episode reward: [(0, '4.545')] +[2024-10-03 11:14:41,555][00426] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3228.6). Total num frames: 548864. Throughput: 0: 866.5. Samples: 136052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:14:41,557][00426] Avg episode reward: [(0, '4.506')] +[2024-10-03 11:14:46,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3253.4). Total num frames: 569344. Throughput: 0: 884.6. Samples: 142010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:14:46,562][00426] Avg episode reward: [(0, '4.402')] +[2024-10-03 11:14:46,944][02741] Updated weights for policy 0, policy_version 140 (0.0030) +[2024-10-03 11:14:51,554][00426] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 589824. Throughput: 0: 924.3. Samples: 148118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:14:51,559][00426] Avg episode reward: [(0, '4.329')] +[2024-10-03 11:14:56,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3254.7). Total num frames: 602112. Throughput: 0: 897.0. Samples: 150048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:14:56,558][00426] Avg episode reward: [(0, '4.352')] +[2024-10-03 11:14:59,413][02741] Updated weights for policy 0, policy_version 150 (0.0030) +[2024-10-03 11:15:01,554][00426] Fps is (10 sec: 3276.7, 60 sec: 3550.2, 300 sec: 3276.8). Total num frames: 622592. Throughput: 0: 861.0. Samples: 154872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:15:01,561][00426] Avg episode reward: [(0, '4.289')] +[2024-10-03 11:15:06,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3297.8). Total num frames: 643072. Throughput: 0: 912.9. Samples: 161150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:06,561][00426] Avg episode reward: [(0, '4.444')] +[2024-10-03 11:15:09,844][02741] Updated weights for policy 0, policy_version 160 (0.0022) +[2024-10-03 11:15:11,554][00426] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3297.3). Total num frames: 659456. Throughput: 0: 917.2. Samples: 163890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:11,563][00426] Avg episode reward: [(0, '4.555')] +[2024-10-03 11:15:16,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 671744. Throughput: 0: 856.0. Samples: 167690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:16,561][00426] Avg episode reward: [(0, '4.519')] +[2024-10-03 11:15:21,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3296.3). Total num frames: 692224. Throughput: 0: 885.0. Samples: 173930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:21,557][00426] Avg episode reward: [(0, '4.502')] +[2024-10-03 11:15:21,571][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000169_692224.pth... +[2024-10-03 11:15:21,848][02741] Updated weights for policy 0, policy_version 170 (0.0040) +[2024-10-03 11:15:26,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3314.9). Total num frames: 712704. Throughput: 0: 912.1. Samples: 177096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:15:26,561][00426] Avg episode reward: [(0, '4.594')] +[2024-10-03 11:15:31,556][00426] Fps is (10 sec: 3276.4, 60 sec: 3413.3, 300 sec: 3295.4). Total num frames: 724992. Throughput: 0: 880.6. Samples: 181638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:15:31,558][00426] Avg episode reward: [(0, '4.682')] +[2024-10-03 11:15:34,384][02741] Updated weights for policy 0, policy_version 180 (0.0033) +[2024-10-03 11:15:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3313.2). Total num frames: 745472. Throughput: 0: 857.9. Samples: 186722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:15:36,559][00426] Avg episode reward: [(0, '4.748')] +[2024-10-03 11:15:36,564][02728] Saving new best policy, reward=4.748! +[2024-10-03 11:15:41,554][00426] Fps is (10 sec: 4096.5, 60 sec: 3618.2, 300 sec: 3330.2). Total num frames: 765952. Throughput: 0: 882.5. Samples: 189760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:15:41,559][00426] Avg episode reward: [(0, '4.721')] +[2024-10-03 11:15:44,500][02741] Updated weights for policy 0, policy_version 190 (0.0027) +[2024-10-03 11:15:46,554][00426] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3329.1). Total num frames: 782336. Throughput: 0: 900.8. Samples: 195408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:46,559][00426] Avg episode reward: [(0, '4.663')] +[2024-10-03 11:15:51,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3310.9). Total num frames: 794624. Throughput: 0: 848.9. Samples: 199352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:15:51,560][00426] Avg episode reward: [(0, '4.479')] +[2024-10-03 11:15:56,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3327.0). Total num frames: 815104. Throughput: 0: 858.4. Samples: 202516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:15:56,563][00426] Avg episode reward: [(0, '4.320')] +[2024-10-03 11:15:56,852][02741] Updated weights for policy 0, policy_version 200 (0.0041) +[2024-10-03 11:16:01,556][00426] Fps is (10 sec: 4095.2, 60 sec: 3549.8, 300 sec: 3342.3). Total num frames: 835584. Throughput: 0: 916.8. Samples: 208948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:16:01,559][00426] Avg episode reward: [(0, '4.301')] +[2024-10-03 11:16:06,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3325.0). Total num frames: 847872. Throughput: 0: 870.7. Samples: 213110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:16:06,562][00426] Avg episode reward: [(0, '4.452')] +[2024-10-03 11:16:09,521][02741] Updated weights for policy 0, policy_version 210 (0.0029) +[2024-10-03 11:16:11,554][00426] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3339.8). Total num frames: 868352. Throughput: 0: 845.5. Samples: 215144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:16:11,557][00426] Avg episode reward: [(0, '4.341')] +[2024-10-03 11:16:16,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3354.1). Total num frames: 888832. Throughput: 0: 887.0. Samples: 221554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:16:16,557][00426] Avg episode reward: [(0, '4.183')] +[2024-10-03 11:16:19,402][02741] Updated weights for policy 0, policy_version 220 (0.0040) +[2024-10-03 11:16:21,555][00426] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3352.6). Total num frames: 905216. Throughput: 0: 895.5. Samples: 227022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:16:21,561][00426] Avg episode reward: [(0, '4.357')] +[2024-10-03 11:16:26,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3336.4). Total num frames: 917504. Throughput: 0: 870.0. Samples: 228910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:16:26,557][00426] Avg episode reward: [(0, '4.445')] +[2024-10-03 11:16:31,554][00426] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3349.9). Total num frames: 937984. Throughput: 0: 864.7. Samples: 234318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:16:31,559][00426] Avg episode reward: [(0, '4.508')] +[2024-10-03 11:16:31,809][02741] Updated weights for policy 0, policy_version 230 (0.0047) +[2024-10-03 11:16:36,555][00426] Fps is (10 sec: 4505.5, 60 sec: 3618.1, 300 sec: 3377.4). Total num frames: 962560. Throughput: 0: 920.6. Samples: 240778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:16:36,562][00426] Avg episode reward: [(0, '4.696')] +[2024-10-03 11:16:41,555][00426] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3361.5). Total num frames: 974848. Throughput: 0: 895.8. Samples: 242828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:16:41,560][00426] Avg episode reward: [(0, '4.708')] +[2024-10-03 11:16:44,307][02741] Updated weights for policy 0, policy_version 240 (0.0021) +[2024-10-03 11:16:46,554][00426] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 848.2. Samples: 247114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:16:46,559][00426] Avg episode reward: [(0, '4.573')] +[2024-10-03 11:16:51,554][00426] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 897.6. Samples: 253504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:16:51,560][00426] Avg episode reward: [(0, '4.374')] +[2024-10-03 11:16:53,904][02741] Updated weights for policy 0, policy_version 250 (0.0033) +[2024-10-03 11:16:56,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 923.7. Samples: 256710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:16:56,556][00426] Avg episode reward: [(0, '4.439')] +[2024-10-03 11:17:01,555][00426] Fps is (10 sec: 2866.9, 60 sec: 3413.4, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 867.2. Samples: 260580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:17:01,560][00426] Avg episode reward: [(0, '4.622')] +[2024-10-03 11:17:06,490][02741] Updated weights for policy 0, policy_version 260 (0.0026) +[2024-10-03 11:17:06,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1064960. Throughput: 0: 878.2. Samples: 266542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:17:06,559][00426] Avg episode reward: [(0, '4.468')] +[2024-10-03 11:17:11,554][00426] Fps is (10 sec: 4506.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1085440. Throughput: 0: 903.7. Samples: 269578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:17:11,557][00426] Avg episode reward: [(0, '4.286')] +[2024-10-03 11:17:16,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1097728. Throughput: 0: 894.2. Samples: 274556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:17:16,560][00426] Avg episode reward: [(0, '4.435')] +[2024-10-03 11:17:18,980][02741] Updated weights for policy 0, policy_version 270 (0.0022) +[2024-10-03 11:17:21,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1114112. Throughput: 0: 856.1. Samples: 279302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:17:21,560][00426] Avg episode reward: [(0, '4.516')] +[2024-10-03 11:17:21,572][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000272_1114112.pth... +[2024-10-03 11:17:21,710][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth +[2024-10-03 11:17:26,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1134592. Throughput: 0: 881.4. Samples: 282492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:17:26,559][00426] Avg episode reward: [(0, '4.777')] +[2024-10-03 11:17:26,567][02728] Saving new best policy, reward=4.777! +[2024-10-03 11:17:28,674][02741] Updated weights for policy 0, policy_version 280 (0.0035) +[2024-10-03 11:17:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1155072. Throughput: 0: 922.6. Samples: 288630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:17:31,559][00426] Avg episode reward: [(0, '4.650')] +[2024-10-03 11:17:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1167360. Throughput: 0: 868.0. Samples: 292564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-10-03 11:17:36,557][00426] Avg episode reward: [(0, '4.584')] +[2024-10-03 11:17:41,294][02741] Updated weights for policy 0, policy_version 290 (0.0019) +[2024-10-03 11:17:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1187840. Throughput: 0: 857.6. Samples: 295304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:17:41,562][00426] Avg episode reward: [(0, '4.547')] +[2024-10-03 11:17:46,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1208320. Throughput: 0: 918.5. Samples: 301912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:17:46,562][00426] Avg episode reward: [(0, '4.577')] +[2024-10-03 11:17:51,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1224704. Throughput: 0: 891.0. Samples: 306638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:17:51,561][00426] Avg episode reward: [(0, '4.653')] +[2024-10-03 11:17:52,927][02741] Updated weights for policy 0, policy_version 300 (0.0023) +[2024-10-03 11:17:56,555][00426] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 1241088. Throughput: 0: 870.0. Samples: 308730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:17:56,557][00426] Avg episode reward: [(0, '4.815')] +[2024-10-03 11:17:56,560][02728] Saving new best policy, reward=4.815! +[2024-10-03 11:18:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3540.6). Total num frames: 1261568. Throughput: 0: 900.9. Samples: 315096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:18:01,557][00426] Avg episode reward: [(0, '4.953')] +[2024-10-03 11:18:01,569][02728] Saving new best policy, reward=4.953! +[2024-10-03 11:18:02,947][02741] Updated weights for policy 0, policy_version 310 (0.0036) +[2024-10-03 11:18:06,554][00426] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1277952. Throughput: 0: 925.0. Samples: 320928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:18:06,558][00426] Avg episode reward: [(0, '5.166')] +[2024-10-03 11:18:06,605][02728] Saving new best policy, reward=5.166! +[2024-10-03 11:18:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1294336. Throughput: 0: 895.3. Samples: 322782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:18:11,562][00426] Avg episode reward: [(0, '5.208')] +[2024-10-03 11:18:11,583][02728] Saving new best policy, reward=5.208! +[2024-10-03 11:18:15,513][02741] Updated weights for policy 0, policy_version 320 (0.0036) +[2024-10-03 11:18:16,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1314816. Throughput: 0: 874.7. Samples: 327992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-10-03 11:18:16,556][00426] Avg episode reward: [(0, '4.953')] +[2024-10-03 11:18:21,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1335296. Throughput: 0: 931.8. Samples: 334496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:18:21,557][00426] Avg episode reward: [(0, '4.695')] +[2024-10-03 11:18:26,531][02741] Updated weights for policy 0, policy_version 330 (0.0028) +[2024-10-03 11:18:26,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1351680. Throughput: 0: 925.3. Samples: 336942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:18:26,558][00426] Avg episode reward: [(0, '4.846')] +[2024-10-03 11:18:31,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1363968. Throughput: 0: 866.4. Samples: 340900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:18:31,557][00426] Avg episode reward: [(0, '4.975')] +[2024-10-03 11:18:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1384448. Throughput: 0: 903.0. Samples: 347272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:18:36,557][00426] Avg episode reward: [(0, '4.855')] +[2024-10-03 11:18:37,674][02741] Updated weights for policy 0, policy_version 340 (0.0019) +[2024-10-03 11:18:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.5). Total num frames: 1404928. Throughput: 0: 927.4. Samples: 350464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:18:41,559][00426] Avg episode reward: [(0, '5.172')] +[2024-10-03 11:18:46,557][00426] Fps is (10 sec: 3275.9, 60 sec: 3481.4, 300 sec: 3540.6). Total num frames: 1417216. Throughput: 0: 877.6. Samples: 354592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:18:46,561][00426] Avg episode reward: [(0, '5.128')] +[2024-10-03 11:18:50,061][02741] Updated weights for policy 0, policy_version 350 (0.0036) +[2024-10-03 11:18:51,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1437696. Throughput: 0: 870.8. Samples: 360116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:18:51,559][00426] Avg episode reward: [(0, '4.917')] +[2024-10-03 11:18:56,554][00426] Fps is (10 sec: 4097.1, 60 sec: 3618.2, 300 sec: 3554.6). Total num frames: 1458176. Throughput: 0: 900.6. Samples: 363308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:18:56,561][00426] Avg episode reward: [(0, '4.550')] +[2024-10-03 11:19:01,133][02741] Updated weights for policy 0, policy_version 360 (0.0028) +[2024-10-03 11:19:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1474560. Throughput: 0: 900.6. Samples: 368520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:19:01,556][00426] Avg episode reward: [(0, '4.623')] +[2024-10-03 11:19:06,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1490944. Throughput: 0: 856.1. Samples: 373020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:19:06,558][00426] Avg episode reward: [(0, '4.748')] +[2024-10-03 11:19:11,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1511424. Throughput: 0: 873.1. Samples: 376232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:19:11,557][00426] Avg episode reward: [(0, '5.010')] +[2024-10-03 11:19:12,216][02741] Updated weights for policy 0, policy_version 370 (0.0022) +[2024-10-03 11:19:16,554][00426] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1527808. Throughput: 0: 921.2. Samples: 382352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:19:16,558][00426] Avg episode reward: [(0, '5.140')] +[2024-10-03 11:19:21,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1544192. Throughput: 0: 867.1. Samples: 386292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:19:21,566][00426] Avg episode reward: [(0, '4.983')] +[2024-10-03 11:19:21,578][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000377_1544192.pth... +[2024-10-03 11:19:21,747][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000169_692224.pth +[2024-10-03 11:19:24,870][02741] Updated weights for policy 0, policy_version 380 (0.0041) +[2024-10-03 11:19:26,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1560576. Throughput: 0: 852.5. Samples: 388828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:19:26,556][00426] Avg episode reward: [(0, '4.992')] +[2024-10-03 11:19:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1585152. Throughput: 0: 903.0. Samples: 395226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:19:31,557][00426] Avg episode reward: [(0, '5.005')] +[2024-10-03 11:19:35,705][02741] Updated weights for policy 0, policy_version 390 (0.0031) +[2024-10-03 11:19:36,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1597440. Throughput: 0: 888.5. Samples: 400100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:19:36,561][00426] Avg episode reward: [(0, '5.061')] +[2024-10-03 11:19:41,554][00426] Fps is (10 sec: 2048.0, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1605632. Throughput: 0: 837.7. Samples: 401006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:19:41,560][00426] Avg episode reward: [(0, '5.130')] +[2024-10-03 11:19:46,554][00426] Fps is (10 sec: 2457.5, 60 sec: 3413.5, 300 sec: 3499.0). Total num frames: 1622016. Throughput: 0: 811.1. Samples: 405020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:19:46,557][00426] Avg episode reward: [(0, '5.454')] +[2024-10-03 11:19:46,560][02728] Saving new best policy, reward=5.454! +[2024-10-03 11:19:49,945][02741] Updated weights for policy 0, policy_version 400 (0.0033) +[2024-10-03 11:19:51,559][00426] Fps is (10 sec: 3684.6, 60 sec: 3413.1, 300 sec: 3526.7). Total num frames: 1642496. Throughput: 0: 844.5. Samples: 411028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:19:51,562][00426] Avg episode reward: [(0, '5.563')] +[2024-10-03 11:19:51,583][02728] Saving new best policy, reward=5.563! +[2024-10-03 11:19:56,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 1654784. Throughput: 0: 815.0. Samples: 412908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:19:56,560][00426] Avg episode reward: [(0, '5.921')] +[2024-10-03 11:19:56,563][02728] Saving new best policy, reward=5.921! +[2024-10-03 11:20:01,554][00426] Fps is (10 sec: 2868.6, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 1671168. Throughput: 0: 785.5. Samples: 417700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:20:01,557][00426] Avg episode reward: [(0, '5.625')] +[2024-10-03 11:20:02,694][02741] Updated weights for policy 0, policy_version 410 (0.0026) +[2024-10-03 11:20:06,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1695744. Throughput: 0: 841.4. Samples: 424156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:20:06,560][00426] Avg episode reward: [(0, '5.133')] +[2024-10-03 11:20:11,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 1708032. Throughput: 0: 845.6. Samples: 426878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:20:11,561][00426] Avg episode reward: [(0, '5.333')] +[2024-10-03 11:20:14,933][02741] Updated weights for policy 0, policy_version 420 (0.0029) +[2024-10-03 11:20:16,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 1724416. Throughput: 0: 787.9. Samples: 430682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:20:16,561][00426] Avg episode reward: [(0, '5.444')] +[2024-10-03 11:20:21,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 1744896. Throughput: 0: 815.3. Samples: 436790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:20:21,557][00426] Avg episode reward: [(0, '5.583')] +[2024-10-03 11:20:25,018][02741] Updated weights for policy 0, policy_version 430 (0.0020) +[2024-10-03 11:20:26,559][00426] Fps is (10 sec: 4094.0, 60 sec: 3413.1, 300 sec: 3526.7). Total num frames: 1765376. Throughput: 0: 865.9. Samples: 439974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:20:26,562][00426] Avg episode reward: [(0, '5.671')] +[2024-10-03 11:20:31,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 1777664. Throughput: 0: 880.1. Samples: 444626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:20:31,560][00426] Avg episode reward: [(0, '6.308')] +[2024-10-03 11:20:31,570][02728] Saving new best policy, reward=6.308! +[2024-10-03 11:20:36,554][00426] Fps is (10 sec: 2868.6, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 1794048. Throughput: 0: 855.6. Samples: 449528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:20:36,559][00426] Avg episode reward: [(0, '6.485')] +[2024-10-03 11:20:36,566][02728] Saving new best policy, reward=6.485! +[2024-10-03 11:20:37,826][02741] Updated weights for policy 0, policy_version 440 (0.0024) +[2024-10-03 11:20:41,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1814528. Throughput: 0: 882.2. Samples: 452608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:20:41,565][00426] Avg episode reward: [(0, '6.482')] +[2024-10-03 11:20:46,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1830912. Throughput: 0: 900.6. Samples: 458226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:20:46,556][00426] Avg episode reward: [(0, '6.766')] +[2024-10-03 11:20:46,565][02728] Saving new best policy, reward=6.766! +[2024-10-03 11:20:50,801][02741] Updated weights for policy 0, policy_version 450 (0.0043) +[2024-10-03 11:20:51,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3345.3, 300 sec: 3485.1). Total num frames: 1843200. Throughput: 0: 830.7. Samples: 461538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:20:51,557][00426] Avg episode reward: [(0, '6.895')] +[2024-10-03 11:20:51,567][02728] Saving new best policy, reward=6.895! +[2024-10-03 11:20:56,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1863680. Throughput: 0: 840.2. Samples: 464686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:20:56,563][00426] Avg episode reward: [(0, '7.555')] +[2024-10-03 11:20:56,566][02728] Saving new best policy, reward=7.555! +[2024-10-03 11:21:00,701][02741] Updated weights for policy 0, policy_version 460 (0.0054) +[2024-10-03 11:21:01,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1884160. Throughput: 0: 897.9. Samples: 471086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:01,557][00426] Avg episode reward: [(0, '7.359')] +[2024-10-03 11:21:06,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 1896448. Throughput: 0: 857.1. Samples: 475360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:21:06,560][00426] Avg episode reward: [(0, '7.410')] +[2024-10-03 11:21:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1916928. Throughput: 0: 832.0. Samples: 477410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:21:11,560][00426] Avg episode reward: [(0, '7.511')] +[2024-10-03 11:21:13,500][02741] Updated weights for policy 0, policy_version 470 (0.0033) +[2024-10-03 11:21:16,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1937408. Throughput: 0: 867.7. Samples: 483672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:16,561][00426] Avg episode reward: [(0, '7.691')] +[2024-10-03 11:21:16,564][02728] Saving new best policy, reward=7.691! +[2024-10-03 11:21:21,555][00426] Fps is (10 sec: 3686.1, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 1953792. Throughput: 0: 878.5. Samples: 489062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:21,558][00426] Avg episode reward: [(0, '7.983')] +[2024-10-03 11:21:21,575][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000477_1953792.pth... +[2024-10-03 11:21:21,752][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000272_1114112.pth +[2024-10-03 11:21:21,776][02728] Saving new best policy, reward=7.983! +[2024-10-03 11:21:26,001][02741] Updated weights for policy 0, policy_version 480 (0.0022) +[2024-10-03 11:21:26,555][00426] Fps is (10 sec: 2867.1, 60 sec: 3345.3, 300 sec: 3485.1). Total num frames: 1966080. Throughput: 0: 850.5. Samples: 490882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:26,559][00426] Avg episode reward: [(0, '8.210')] +[2024-10-03 11:21:26,563][02728] Saving new best policy, reward=8.210! +[2024-10-03 11:21:31,554][00426] Fps is (10 sec: 3277.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1986560. Throughput: 0: 843.2. Samples: 496170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:21:31,559][00426] Avg episode reward: [(0, '8.211')] +[2024-10-03 11:21:35,964][02741] Updated weights for policy 0, policy_version 490 (0.0032) +[2024-10-03 11:21:36,554][00426] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2007040. Throughput: 0: 912.0. Samples: 502576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:21:36,560][00426] Avg episode reward: [(0, '7.942')] +[2024-10-03 11:21:41,555][00426] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 2019328. Throughput: 0: 890.0. Samples: 504738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:21:41,560][00426] Avg episode reward: [(0, '7.934')] +[2024-10-03 11:21:46,554][00426] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2035712. Throughput: 0: 840.8. Samples: 508922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:46,564][00426] Avg episode reward: [(0, '8.253')] +[2024-10-03 11:21:46,607][02728] Saving new best policy, reward=8.253! +[2024-10-03 11:21:48,674][02741] Updated weights for policy 0, policy_version 500 (0.0044) +[2024-10-03 11:21:51,554][00426] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2060288. Throughput: 0: 887.1. Samples: 515278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:21:51,557][00426] Avg episode reward: [(0, '9.687')] +[2024-10-03 11:21:51,568][02728] Saving new best policy, reward=9.687! +[2024-10-03 11:21:56,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 2076672. Throughput: 0: 910.3. Samples: 518374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:21:56,560][00426] Avg episode reward: [(0, '9.484')] +[2024-10-03 11:22:00,556][02741] Updated weights for policy 0, policy_version 510 (0.0021) +[2024-10-03 11:22:01,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2088960. Throughput: 0: 861.5. Samples: 522440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:22:01,561][00426] Avg episode reward: [(0, '9.700')] +[2024-10-03 11:22:01,575][02728] Saving new best policy, reward=9.700! +[2024-10-03 11:22:06,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2109440. Throughput: 0: 864.8. Samples: 527976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:22:06,558][00426] Avg episode reward: [(0, '10.754')] +[2024-10-03 11:22:06,564][02728] Saving new best policy, reward=10.754! +[2024-10-03 11:22:10,946][02741] Updated weights for policy 0, policy_version 520 (0.0020) +[2024-10-03 11:22:11,554][00426] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2129920. Throughput: 0: 894.9. Samples: 531154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:22:11,560][00426] Avg episode reward: [(0, '10.454')] +[2024-10-03 11:22:16,559][00426] Fps is (10 sec: 3684.7, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 2146304. Throughput: 0: 890.4. Samples: 536244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:22:16,563][00426] Avg episode reward: [(0, '11.380')] +[2024-10-03 11:22:16,569][02728] Saving new best policy, reward=11.380! +[2024-10-03 11:22:21,554][00426] Fps is (10 sec: 2867.3, 60 sec: 3413.4, 300 sec: 3471.2). Total num frames: 2158592. Throughput: 0: 842.7. Samples: 540498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:22:21,557][00426] Avg episode reward: [(0, '11.511')] +[2024-10-03 11:22:21,566][02728] Saving new best policy, reward=11.511! +[2024-10-03 11:22:23,912][02741] Updated weights for policy 0, policy_version 530 (0.0050) +[2024-10-03 11:22:26,554][00426] Fps is (10 sec: 3278.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2179072. Throughput: 0: 863.9. Samples: 543614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:22:26,557][00426] Avg episode reward: [(0, '10.935')] +[2024-10-03 11:22:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2199552. Throughput: 0: 912.7. Samples: 549992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:22:31,561][00426] Avg episode reward: [(0, '11.075')] +[2024-10-03 11:22:35,580][02741] Updated weights for policy 0, policy_version 540 (0.0040) +[2024-10-03 11:22:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2211840. Throughput: 0: 857.7. Samples: 553876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:22:36,566][00426] Avg episode reward: [(0, '10.855')] +[2024-10-03 11:22:41,554][00426] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2232320. Throughput: 0: 844.2. Samples: 556362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:22:41,562][00426] Avg episode reward: [(0, '10.406')] +[2024-10-03 11:22:46,226][02741] Updated weights for policy 0, policy_version 550 (0.0026) +[2024-10-03 11:22:46,554][00426] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2252800. Throughput: 0: 894.6. Samples: 562698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:22:46,562][00426] Avg episode reward: [(0, '10.324')] +[2024-10-03 11:22:51,557][00426] Fps is (10 sec: 3685.5, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 2269184. Throughput: 0: 882.4. Samples: 567688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:22:51,563][00426] Avg episode reward: [(0, '10.058')] +[2024-10-03 11:22:56,554][00426] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2281472. Throughput: 0: 855.1. Samples: 569634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:22:56,556][00426] Avg episode reward: [(0, '10.879')] +[2024-10-03 11:22:58,818][02741] Updated weights for policy 0, policy_version 560 (0.0048) +[2024-10-03 11:23:01,554][00426] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2301952. Throughput: 0: 870.8. Samples: 575424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:23:01,557][00426] Avg episode reward: [(0, '10.844')] +[2024-10-03 11:23:06,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2322432. Throughput: 0: 914.6. Samples: 581654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:06,557][00426] Avg episode reward: [(0, '11.004')] +[2024-10-03 11:23:10,203][02741] Updated weights for policy 0, policy_version 570 (0.0028) +[2024-10-03 11:23:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2334720. Throughput: 0: 886.2. Samples: 583494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:11,556][00426] Avg episode reward: [(0, '11.958')] +[2024-10-03 11:23:11,571][02728] Saving new best policy, reward=11.958! +[2024-10-03 11:23:16,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3457.3). Total num frames: 2355200. Throughput: 0: 844.8. Samples: 588006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:16,556][00426] Avg episode reward: [(0, '11.627')] +[2024-10-03 11:23:21,220][02741] Updated weights for policy 0, policy_version 580 (0.0030) +[2024-10-03 11:23:21,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 2375680. Throughput: 0: 901.8. Samples: 594458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:21,564][00426] Avg episode reward: [(0, '12.623')] +[2024-10-03 11:23:21,577][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000580_2375680.pth... +[2024-10-03 11:23:21,718][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000377_1544192.pth +[2024-10-03 11:23:21,737][02728] Saving new best policy, reward=12.623! +[2024-10-03 11:23:26,555][00426] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 2392064. Throughput: 0: 906.8. Samples: 597170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:26,561][00426] Avg episode reward: [(0, '12.556')] +[2024-10-03 11:23:31,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2404352. Throughput: 0: 851.1. Samples: 600998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:23:31,557][00426] Avg episode reward: [(0, '12.329')] +[2024-10-03 11:23:33,958][02741] Updated weights for policy 0, policy_version 590 (0.0045) +[2024-10-03 11:23:36,554][00426] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 2424832. Throughput: 0: 874.1. Samples: 607018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:23:36,562][00426] Avg episode reward: [(0, '12.423')] +[2024-10-03 11:23:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2445312. Throughput: 0: 900.9. Samples: 610174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:23:41,565][00426] Avg episode reward: [(0, '12.696')] +[2024-10-03 11:23:41,576][02728] Saving new best policy, reward=12.696! +[2024-10-03 11:23:45,578][02741] Updated weights for policy 0, policy_version 600 (0.0035) +[2024-10-03 11:23:46,555][00426] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2457600. Throughput: 0: 874.0. Samples: 614754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:46,559][00426] Avg episode reward: [(0, '13.346')] +[2024-10-03 11:23:46,571][02728] Saving new best policy, reward=13.346! +[2024-10-03 11:23:51,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 2478080. Throughput: 0: 844.9. Samples: 619676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:51,556][00426] Avg episode reward: [(0, '13.907')] +[2024-10-03 11:23:51,569][02728] Saving new best policy, reward=13.907! +[2024-10-03 11:23:56,303][02741] Updated weights for policy 0, policy_version 610 (0.0015) +[2024-10-03 11:23:56,556][00426] Fps is (10 sec: 4095.4, 60 sec: 3618.0, 300 sec: 3471.2). Total num frames: 2498560. Throughput: 0: 874.5. Samples: 622850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:23:56,561][00426] Avg episode reward: [(0, '13.860')] +[2024-10-03 11:24:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2514944. Throughput: 0: 905.7. Samples: 628762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:24:01,559][00426] Avg episode reward: [(0, '13.990')] +[2024-10-03 11:24:01,572][02728] Saving new best policy, reward=13.990! +[2024-10-03 11:24:06,555][00426] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2527232. Throughput: 0: 848.9. Samples: 632658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-10-03 11:24:06,559][00426] Avg episode reward: [(0, '13.752')] +[2024-10-03 11:24:08,837][02741] Updated weights for policy 0, policy_version 620 (0.0025) +[2024-10-03 11:24:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 2547712. Throughput: 0: 860.5. Samples: 635894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:24:11,562][00426] Avg episode reward: [(0, '14.253')] +[2024-10-03 11:24:11,609][02728] Saving new best policy, reward=14.253! +[2024-10-03 11:24:16,555][00426] Fps is (10 sec: 4095.9, 60 sec: 3549.8, 300 sec: 3471.2). Total num frames: 2568192. Throughput: 0: 918.9. Samples: 642350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:24:16,559][00426] Avg episode reward: [(0, '15.397')] +[2024-10-03 11:24:16,564][02728] Saving new best policy, reward=15.397! +[2024-10-03 11:24:19,836][02741] Updated weights for policy 0, policy_version 630 (0.0030) +[2024-10-03 11:24:21,558][00426] Fps is (10 sec: 3684.9, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 2584576. Throughput: 0: 880.6. Samples: 646648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:24:21,561][00426] Avg episode reward: [(0, '15.703')] +[2024-10-03 11:24:21,581][02728] Saving new best policy, reward=15.703! +[2024-10-03 11:24:26,554][00426] Fps is (10 sec: 3277.1, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2600960. Throughput: 0: 856.3. Samples: 648706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:24:26,561][00426] Avg episode reward: [(0, '15.453')] +[2024-10-03 11:24:30,808][02741] Updated weights for policy 0, policy_version 640 (0.0021) +[2024-10-03 11:24:31,554][00426] Fps is (10 sec: 3687.9, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 2621440. Throughput: 0: 900.1. Samples: 655258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:24:31,559][00426] Avg episode reward: [(0, '15.787')] +[2024-10-03 11:24:31,569][02728] Saving new best policy, reward=15.787! +[2024-10-03 11:24:36,558][00426] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3498.9). Total num frames: 2637824. Throughput: 0: 913.3. Samples: 660778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:24:36,561][00426] Avg episode reward: [(0, '16.689')] +[2024-10-03 11:24:36,568][02728] Saving new best policy, reward=16.689! +[2024-10-03 11:24:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2654208. Throughput: 0: 883.5. Samples: 662604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:24:41,559][00426] Avg episode reward: [(0, '16.304')] +[2024-10-03 11:24:43,267][02741] Updated weights for policy 0, policy_version 650 (0.0019) +[2024-10-03 11:24:46,554][00426] Fps is (10 sec: 3687.8, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2674688. Throughput: 0: 878.6. Samples: 668298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:24:46,561][00426] Avg episode reward: [(0, '16.762')] +[2024-10-03 11:24:46,567][02728] Saving new best policy, reward=16.762! +[2024-10-03 11:24:51,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2695168. Throughput: 0: 934.6. Samples: 674716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:24:51,557][00426] Avg episode reward: [(0, '18.225')] +[2024-10-03 11:24:51,565][02728] Saving new best policy, reward=18.225! +[2024-10-03 11:24:53,468][02741] Updated weights for policy 0, policy_version 660 (0.0041) +[2024-10-03 11:24:56,558][00426] Fps is (10 sec: 3275.5, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 2707456. Throughput: 0: 905.9. Samples: 676664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:24:56,560][00426] Avg episode reward: [(0, '18.599')] +[2024-10-03 11:24:56,571][02728] Saving new best policy, reward=18.599! +[2024-10-03 11:25:01,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2727936. Throughput: 0: 863.7. Samples: 681216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:25:01,557][00426] Avg episode reward: [(0, '17.567')] +[2024-10-03 11:25:05,179][02741] Updated weights for policy 0, policy_version 670 (0.0040) +[2024-10-03 11:25:06,555][00426] Fps is (10 sec: 4097.6, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 2748416. Throughput: 0: 914.4. Samples: 687794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:25:06,557][00426] Avg episode reward: [(0, '17.400')] +[2024-10-03 11:25:11,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2764800. Throughput: 0: 936.8. Samples: 690862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-10-03 11:25:11,558][00426] Avg episode reward: [(0, '16.783')] +[2024-10-03 11:25:16,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2781184. Throughput: 0: 878.0. Samples: 694768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:25:16,557][00426] Avg episode reward: [(0, '15.799')] +[2024-10-03 11:25:17,550][02741] Updated weights for policy 0, policy_version 680 (0.0034) +[2024-10-03 11:25:21,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3512.9). Total num frames: 2801664. Throughput: 0: 888.1. Samples: 700740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:25:21,562][00426] Avg episode reward: [(0, '15.249')] +[2024-10-03 11:25:21,579][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000684_2801664.pth... +[2024-10-03 11:25:21,737][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000477_1953792.pth +[2024-10-03 11:25:26,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 2822144. Throughput: 0: 916.6. Samples: 703850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:25:26,557][00426] Avg episode reward: [(0, '16.879')] +[2024-10-03 11:25:27,571][02741] Updated weights for policy 0, policy_version 690 (0.0027) +[2024-10-03 11:25:31,555][00426] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 2834432. Throughput: 0: 895.0. Samples: 708574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-10-03 11:25:31,560][00426] Avg episode reward: [(0, '17.303')] +[2024-10-03 11:25:36,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 2850816. Throughput: 0: 864.7. Samples: 713626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:25:36,559][00426] Avg episode reward: [(0, '17.267')] +[2024-10-03 11:25:39,563][02741] Updated weights for policy 0, policy_version 700 (0.0019) +[2024-10-03 11:25:41,554][00426] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 2875392. Throughput: 0: 893.8. Samples: 716880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:25:41,558][00426] Avg episode reward: [(0, '18.231')] +[2024-10-03 11:25:46,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2891776. Throughput: 0: 923.6. Samples: 722778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:25:46,557][00426] Avg episode reward: [(0, '17.848')] +[2024-10-03 11:25:51,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2904064. Throughput: 0: 862.7. Samples: 726614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:25:51,558][00426] Avg episode reward: [(0, '18.622')] +[2024-10-03 11:25:51,570][02728] Saving new best policy, reward=18.622! +[2024-10-03 11:25:52,330][02741] Updated weights for policy 0, policy_version 710 (0.0036) +[2024-10-03 11:25:56,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3526.7). Total num frames: 2924544. Throughput: 0: 859.4. Samples: 729536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:25:56,562][00426] Avg episode reward: [(0, '18.604')] +[2024-10-03 11:26:01,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2945024. Throughput: 0: 913.7. Samples: 735886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:26:01,557][00426] Avg episode reward: [(0, '20.526')] +[2024-10-03 11:26:01,569][02728] Saving new best policy, reward=20.526! +[2024-10-03 11:26:02,262][02741] Updated weights for policy 0, policy_version 720 (0.0038) +[2024-10-03 11:26:06,557][00426] Fps is (10 sec: 3275.8, 60 sec: 3481.4, 300 sec: 3526.7). Total num frames: 2957312. Throughput: 0: 881.1. Samples: 740394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:26:06,562][00426] Avg episode reward: [(0, '20.319')] +[2024-10-03 11:26:11,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2973696. Throughput: 0: 855.0. Samples: 742324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:26:11,557][00426] Avg episode reward: [(0, '20.740')] +[2024-10-03 11:26:11,570][02728] Saving new best policy, reward=20.740! +[2024-10-03 11:26:14,707][02741] Updated weights for policy 0, policy_version 730 (0.0018) +[2024-10-03 11:26:16,554][00426] Fps is (10 sec: 3687.5, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2994176. Throughput: 0: 888.6. Samples: 748560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:26:16,557][00426] Avg episode reward: [(0, '22.541')] +[2024-10-03 11:26:16,620][02728] Saving new best policy, reward=22.541! +[2024-10-03 11:26:21,555][00426] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3540.6). Total num frames: 3010560. Throughput: 0: 895.6. Samples: 753928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:26:21,564][00426] Avg episode reward: [(0, '21.312')] +[2024-10-03 11:26:26,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3026944. Throughput: 0: 865.1. Samples: 755808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:26:26,562][00426] Avg episode reward: [(0, '21.547')] +[2024-10-03 11:26:27,720][02741] Updated weights for policy 0, policy_version 740 (0.0044) +[2024-10-03 11:26:31,554][00426] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3047424. Throughput: 0: 849.8. Samples: 761018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:26:31,557][00426] Avg episode reward: [(0, '21.987')] +[2024-10-03 11:26:36,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3067904. Throughput: 0: 908.4. Samples: 767490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:26:36,562][00426] Avg episode reward: [(0, '22.828')] +[2024-10-03 11:26:36,567][02728] Saving new best policy, reward=22.828! +[2024-10-03 11:26:37,082][02741] Updated weights for policy 0, policy_version 750 (0.0024) +[2024-10-03 11:26:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3080192. Throughput: 0: 896.8. Samples: 769894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:26:41,557][00426] Avg episode reward: [(0, '24.204')] +[2024-10-03 11:26:41,637][02728] Saving new best policy, reward=24.204! +[2024-10-03 11:26:46,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 3096576. Throughput: 0: 843.2. Samples: 773832. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-10-03 11:26:46,562][00426] Avg episode reward: [(0, '22.365')] +[2024-10-03 11:26:49,784][02741] Updated weights for policy 0, policy_version 760 (0.0038) +[2024-10-03 11:26:51,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3117056. Throughput: 0: 881.4. Samples: 780056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:26:51,560][00426] Avg episode reward: [(0, '22.493')] +[2024-10-03 11:26:56,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3137536. Throughput: 0: 910.2. Samples: 783284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:26:56,559][00426] Avg episode reward: [(0, '21.575')] +[2024-10-03 11:27:01,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3149824. Throughput: 0: 863.0. Samples: 787394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:27:01,556][00426] Avg episode reward: [(0, '21.250')] +[2024-10-03 11:27:02,603][02741] Updated weights for policy 0, policy_version 770 (0.0030) +[2024-10-03 11:27:06,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 3170304. Throughput: 0: 866.3. Samples: 792910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:27:06,561][00426] Avg episode reward: [(0, '20.860')] +[2024-10-03 11:27:11,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.7). Total num frames: 3190784. Throughput: 0: 894.8. Samples: 796076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:27:11,560][00426] Avg episode reward: [(0, '19.887')] +[2024-10-03 11:27:11,953][02741] Updated weights for policy 0, policy_version 780 (0.0028) +[2024-10-03 11:27:16,554][00426] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3207168. Throughput: 0: 896.2. Samples: 801348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:27:16,561][00426] Avg episode reward: [(0, '20.686')] +[2024-10-03 11:27:21,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 3219456. Throughput: 0: 847.0. Samples: 805606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:27:21,557][00426] Avg episode reward: [(0, '21.324')] +[2024-10-03 11:27:21,571][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000786_3219456.pth... +[2024-10-03 11:27:21,702][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000580_2375680.pth +[2024-10-03 11:27:24,684][02741] Updated weights for policy 0, policy_version 790 (0.0027) +[2024-10-03 11:27:26,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3239936. Throughput: 0: 863.6. Samples: 808756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:27:26,561][00426] Avg episode reward: [(0, '21.694')] +[2024-10-03 11:27:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3260416. Throughput: 0: 919.1. Samples: 815190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:27:31,557][00426] Avg episode reward: [(0, '21.084')] +[2024-10-03 11:27:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3272704. Throughput: 0: 866.2. Samples: 819036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:27:36,558][00426] Avg episode reward: [(0, '21.211')] +[2024-10-03 11:27:37,170][02741] Updated weights for policy 0, policy_version 800 (0.0026) +[2024-10-03 11:27:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3293184. Throughput: 0: 851.5. Samples: 821600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:27:41,559][00426] Avg episode reward: [(0, '20.836')] +[2024-10-03 11:27:46,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3313664. Throughput: 0: 904.1. Samples: 828078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:27:46,561][00426] Avg episode reward: [(0, '18.894')] +[2024-10-03 11:27:46,877][02741] Updated weights for policy 0, policy_version 810 (0.0029) +[2024-10-03 11:27:51,555][00426] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3330048. Throughput: 0: 893.5. Samples: 833118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:27:51,557][00426] Avg episode reward: [(0, '18.402')] +[2024-10-03 11:27:56,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3342336. Throughput: 0: 866.2. Samples: 835054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:27:56,562][00426] Avg episode reward: [(0, '17.940')] +[2024-10-03 11:27:59,385][02741] Updated weights for policy 0, policy_version 820 (0.0034) +[2024-10-03 11:28:01,554][00426] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3366912. Throughput: 0: 879.6. Samples: 840928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:28:01,559][00426] Avg episode reward: [(0, '19.156')] +[2024-10-03 11:28:06,554][00426] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3387392. Throughput: 0: 924.8. Samples: 847220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:28:06,557][00426] Avg episode reward: [(0, '18.692')] +[2024-10-03 11:28:10,930][02741] Updated weights for policy 0, policy_version 830 (0.0017) +[2024-10-03 11:28:11,555][00426] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3399680. Throughput: 0: 898.6. Samples: 849194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:28:11,557][00426] Avg episode reward: [(0, '20.163')] +[2024-10-03 11:28:16,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 3416064. Throughput: 0: 862.5. Samples: 854002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:28:16,556][00426] Avg episode reward: [(0, '21.444')] +[2024-10-03 11:28:21,518][02741] Updated weights for policy 0, policy_version 840 (0.0026) +[2024-10-03 11:28:21,554][00426] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 3440640. Throughput: 0: 918.5. Samples: 860370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:28:21,558][00426] Avg episode reward: [(0, '21.903')] +[2024-10-03 11:28:26,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3457024. Throughput: 0: 925.5. Samples: 863246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:28:26,561][00426] Avg episode reward: [(0, '21.173')] +[2024-10-03 11:28:31,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3469312. Throughput: 0: 867.9. Samples: 867132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:28:31,557][00426] Avg episode reward: [(0, '21.121')] +[2024-10-03 11:28:33,949][02741] Updated weights for policy 0, policy_version 850 (0.0031) +[2024-10-03 11:28:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3489792. Throughput: 0: 892.9. Samples: 873298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:28:36,561][00426] Avg episode reward: [(0, '19.744')] +[2024-10-03 11:28:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3510272. Throughput: 0: 921.6. Samples: 876528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:28:41,561][00426] Avg episode reward: [(0, '20.272')] +[2024-10-03 11:28:44,972][02741] Updated weights for policy 0, policy_version 860 (0.0027) +[2024-10-03 11:28:46,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3526656. Throughput: 0: 894.4. Samples: 881174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:28:46,561][00426] Avg episode reward: [(0, '19.587')] +[2024-10-03 11:28:51,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3543040. Throughput: 0: 863.0. Samples: 886054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:28:51,561][00426] Avg episode reward: [(0, '20.282')] +[2024-10-03 11:28:56,223][02741] Updated weights for policy 0, policy_version 870 (0.0033) +[2024-10-03 11:28:56,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 3563520. Throughput: 0: 890.9. Samples: 889286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:28:56,556][00426] Avg episode reward: [(0, '20.205')] +[2024-10-03 11:29:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3579904. Throughput: 0: 914.5. Samples: 895154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:29:01,559][00426] Avg episode reward: [(0, '20.820')] +[2024-10-03 11:29:06,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3592192. Throughput: 0: 859.7. Samples: 899056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:29:06,561][00426] Avg episode reward: [(0, '20.030')] +[2024-10-03 11:29:08,612][02741] Updated weights for policy 0, policy_version 880 (0.0032) +[2024-10-03 11:29:11,555][00426] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3616768. Throughput: 0: 868.7. Samples: 902336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:29:11,558][00426] Avg episode reward: [(0, '20.477')] +[2024-10-03 11:29:16,554][00426] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3637248. Throughput: 0: 929.2. Samples: 908946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:16,557][00426] Avg episode reward: [(0, '18.994')] +[2024-10-03 11:29:19,074][02741] Updated weights for policy 0, policy_version 890 (0.0036) +[2024-10-03 11:29:21,557][00426] Fps is (10 sec: 3275.9, 60 sec: 3481.4, 300 sec: 3554.5). Total num frames: 3649536. Throughput: 0: 888.7. Samples: 913294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:21,560][00426] Avg episode reward: [(0, '19.360')] +[2024-10-03 11:29:21,570][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000891_3649536.pth... +[2024-10-03 11:29:21,744][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000684_2801664.pth +[2024-10-03 11:29:26,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3670016. Throughput: 0: 866.4. Samples: 915514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:26,561][00426] Avg episode reward: [(0, '18.296')] +[2024-10-03 11:29:30,247][02741] Updated weights for policy 0, policy_version 900 (0.0021) +[2024-10-03 11:29:31,554][00426] Fps is (10 sec: 4097.2, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3690496. Throughput: 0: 908.7. Samples: 922066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:31,556][00426] Avg episode reward: [(0, '18.642')] +[2024-10-03 11:29:36,559][00426] Fps is (10 sec: 3684.8, 60 sec: 3617.9, 300 sec: 3568.3). Total num frames: 3706880. Throughput: 0: 918.8. Samples: 927404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:29:36,561][00426] Avg episode reward: [(0, '19.664')] +[2024-10-03 11:29:41,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3719168. Throughput: 0: 889.9. Samples: 929332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:29:41,556][00426] Avg episode reward: [(0, '20.124')] +[2024-10-03 11:29:42,672][02741] Updated weights for policy 0, policy_version 910 (0.0051) +[2024-10-03 11:29:46,554][00426] Fps is (10 sec: 3688.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3743744. Throughput: 0: 887.6. Samples: 935098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:46,560][00426] Avg episode reward: [(0, '22.237')] +[2024-10-03 11:29:51,555][00426] Fps is (10 sec: 4505.3, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3764224. Throughput: 0: 947.6. Samples: 941698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:29:51,559][00426] Avg episode reward: [(0, '22.357')] +[2024-10-03 11:29:52,552][02741] Updated weights for policy 0, policy_version 920 (0.0027) +[2024-10-03 11:29:56,558][00426] Fps is (10 sec: 3275.5, 60 sec: 3549.6, 300 sec: 3554.5). Total num frames: 3776512. Throughput: 0: 919.2. Samples: 943702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:29:56,563][00426] Avg episode reward: [(0, '22.374')] +[2024-10-03 11:30:01,554][00426] Fps is (10 sec: 2867.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3792896. Throughput: 0: 873.3. Samples: 948244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:30:01,559][00426] Avg episode reward: [(0, '21.979')] +[2024-10-03 11:30:04,307][02741] Updated weights for policy 0, policy_version 930 (0.0021) +[2024-10-03 11:30:06,554][00426] Fps is (10 sec: 4097.5, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 3817472. Throughput: 0: 924.5. Samples: 954894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:30:06,561][00426] Avg episode reward: [(0, '23.607')] +[2024-10-03 11:30:11,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 3833856. Throughput: 0: 945.6. Samples: 958064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:30:11,557][00426] Avg episode reward: [(0, '22.510')] +[2024-10-03 11:30:16,533][02741] Updated weights for policy 0, policy_version 940 (0.0027) +[2024-10-03 11:30:16,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3850240. Throughput: 0: 888.1. Samples: 962030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:30:16,560][00426] Avg episode reward: [(0, '23.691')] +[2024-10-03 11:30:21,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3554.5). Total num frames: 3870720. Throughput: 0: 905.6. Samples: 968154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:30:21,557][00426] Avg episode reward: [(0, '23.783')] +[2024-10-03 11:30:26,063][02741] Updated weights for policy 0, policy_version 950 (0.0033) +[2024-10-03 11:30:26,555][00426] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3891200. Throughput: 0: 931.8. Samples: 971264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:30:26,561][00426] Avg episode reward: [(0, '24.240')] +[2024-10-03 11:30:26,565][02728] Saving new best policy, reward=24.240! +[2024-10-03 11:30:31,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3903488. Throughput: 0: 911.1. Samples: 976098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:30:31,561][00426] Avg episode reward: [(0, '23.641')] +[2024-10-03 11:30:36,554][00426] Fps is (10 sec: 3276.9, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 3923968. Throughput: 0: 876.0. Samples: 981116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:30:36,559][00426] Avg episode reward: [(0, '23.265')] +[2024-10-03 11:30:38,271][02741] Updated weights for policy 0, policy_version 960 (0.0018) +[2024-10-03 11:30:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 3944448. Throughput: 0: 904.5. Samples: 984402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:30:41,562][00426] Avg episode reward: [(0, '24.224')] +[2024-10-03 11:30:46,557][00426] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3960832. Throughput: 0: 936.3. Samples: 990378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:30:46,563][00426] Avg episode reward: [(0, '23.073')] +[2024-10-03 11:30:50,263][02741] Updated weights for policy 0, policy_version 970 (0.0030) +[2024-10-03 11:30:51,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 3973120. Throughput: 0: 879.7. Samples: 994482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:30:51,558][00426] Avg episode reward: [(0, '23.243')] +[2024-10-03 11:30:56,560][00426] Fps is (10 sec: 3686.6, 60 sec: 3686.5, 300 sec: 3568.4). Total num frames: 3997696. Throughput: 0: 875.5. Samples: 997462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:30:56,562][00426] Avg episode reward: [(0, '23.533')] +[2024-10-03 11:31:00,142][02741] Updated weights for policy 0, policy_version 980 (0.0038) +[2024-10-03 11:31:01,554][00426] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3596.2). Total num frames: 4018176. Throughput: 0: 935.7. Samples: 1004136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:31:01,557][00426] Avg episode reward: [(0, '23.369')] +[2024-10-03 11:31:06,554][00426] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4030464. Throughput: 0: 903.1. Samples: 1008792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:31:06,560][00426] Avg episode reward: [(0, '23.343')] +[2024-10-03 11:31:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 4050944. Throughput: 0: 881.8. Samples: 1010946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:31:11,556][00426] Avg episode reward: [(0, '22.209')] +[2024-10-03 11:31:12,156][02741] Updated weights for policy 0, policy_version 990 (0.0021) +[2024-10-03 11:31:16,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 4071424. Throughput: 0: 921.9. Samples: 1017584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:31:16,556][00426] Avg episode reward: [(0, '21.712')] +[2024-10-03 11:31:21,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 4087808. Throughput: 0: 938.3. Samples: 1023340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:31:21,557][00426] Avg episode reward: [(0, '21.927')] +[2024-10-03 11:31:21,612][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000999_4091904.pth... +[2024-10-03 11:31:21,769][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000786_3219456.pth +[2024-10-03 11:31:23,185][02741] Updated weights for policy 0, policy_version 1000 (0.0029) +[2024-10-03 11:31:26,554][00426] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4104192. Throughput: 0: 906.7. Samples: 1025202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:31:26,559][00426] Avg episode reward: [(0, '22.862')] +[2024-10-03 11:31:31,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 4124672. Throughput: 0: 898.5. Samples: 1030810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:31:31,557][00426] Avg episode reward: [(0, '21.659')] +[2024-10-03 11:31:33,784][02741] Updated weights for policy 0, policy_version 1010 (0.0023) +[2024-10-03 11:31:36,554][00426] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 4149248. Throughput: 0: 956.1. Samples: 1037508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:31:36,557][00426] Avg episode reward: [(0, '22.714')] +[2024-10-03 11:31:41,559][00426] Fps is (10 sec: 3684.6, 60 sec: 3617.8, 300 sec: 3610.0). Total num frames: 4161536. Throughput: 0: 937.8. Samples: 1039664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:31:41,564][00426] Avg episode reward: [(0, '23.251')] +[2024-10-03 11:31:45,809][02741] Updated weights for policy 0, policy_version 1020 (0.0020) +[2024-10-03 11:31:46,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3596.1). Total num frames: 4177920. Throughput: 0: 889.7. Samples: 1044174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:31:46,560][00426] Avg episode reward: [(0, '23.840')] +[2024-10-03 11:31:51,554][00426] Fps is (10 sec: 4098.0, 60 sec: 3822.9, 300 sec: 3610.0). Total num frames: 4202496. Throughput: 0: 935.2. Samples: 1050874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:31:51,564][00426] Avg episode reward: [(0, '23.965')] +[2024-10-03 11:31:56,002][02741] Updated weights for policy 0, policy_version 1030 (0.0031) +[2024-10-03 11:31:56,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 4218880. Throughput: 0: 958.5. Samples: 1054078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:31:56,557][00426] Avg episode reward: [(0, '23.854')] +[2024-10-03 11:32:01,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 4231168. Throughput: 0: 900.8. Samples: 1058118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:32:01,557][00426] Avg episode reward: [(0, '25.329')] +[2024-10-03 11:32:01,565][02728] Saving new best policy, reward=25.329! +[2024-10-03 11:32:06,554][00426] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3610.0). Total num frames: 4255744. Throughput: 0: 905.6. Samples: 1064090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-10-03 11:32:06,557][00426] Avg episode reward: [(0, '23.480')] +[2024-10-03 11:32:07,471][02741] Updated weights for policy 0, policy_version 1040 (0.0017) +[2024-10-03 11:32:11,554][00426] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 4276224. Throughput: 0: 937.7. Samples: 1067398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:32:11,556][00426] Avg episode reward: [(0, '23.066')] +[2024-10-03 11:32:16,556][00426] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 4288512. Throughput: 0: 925.2. Samples: 1072446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:32:16,558][00426] Avg episode reward: [(0, '23.940')] +[2024-10-03 11:32:19,643][02741] Updated weights for policy 0, policy_version 1050 (0.0035) +[2024-10-03 11:32:21,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 4308992. Throughput: 0: 889.6. Samples: 1077542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:32:21,556][00426] Avg episode reward: [(0, '22.909')] +[2024-10-03 11:32:26,554][00426] Fps is (10 sec: 4096.7, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 4329472. Throughput: 0: 915.6. Samples: 1080860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:32:26,557][00426] Avg episode reward: [(0, '21.949')] +[2024-10-03 11:32:28,714][02741] Updated weights for policy 0, policy_version 1060 (0.0048) +[2024-10-03 11:32:31,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 4345856. Throughput: 0: 949.5. Samples: 1086900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-10-03 11:32:31,557][00426] Avg episode reward: [(0, '22.560')] +[2024-10-03 11:32:36,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 4362240. Throughput: 0: 892.0. Samples: 1091016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:32:36,561][00426] Avg episode reward: [(0, '23.510')] +[2024-10-03 11:32:40,922][02741] Updated weights for policy 0, policy_version 1070 (0.0026) +[2024-10-03 11:32:41,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3623.9). Total num frames: 4382720. Throughput: 0: 892.5. Samples: 1094240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:32:41,561][00426] Avg episode reward: [(0, '22.491')] +[2024-10-03 11:32:46,554][00426] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 4407296. Throughput: 0: 953.2. Samples: 1101010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:32:46,558][00426] Avg episode reward: [(0, '22.750')] +[2024-10-03 11:32:51,556][00426] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3651.7). Total num frames: 4419584. Throughput: 0: 921.7. Samples: 1105566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:32:51,558][00426] Avg episode reward: [(0, '22.703')] +[2024-10-03 11:32:52,476][02741] Updated weights for policy 0, policy_version 1080 (0.0052) +[2024-10-03 11:32:56,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 4435968. Throughput: 0: 898.4. Samples: 1107826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:32:56,561][00426] Avg episode reward: [(0, '22.862')] +[2024-10-03 11:33:01,554][00426] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 4460544. Throughput: 0: 930.7. Samples: 1114326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:01,556][00426] Avg episode reward: [(0, '22.523')] +[2024-10-03 11:33:02,225][02741] Updated weights for policy 0, policy_version 1090 (0.0027) +[2024-10-03 11:33:06,556][00426] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 4476928. Throughput: 0: 944.7. Samples: 1120056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:33:06,561][00426] Avg episode reward: [(0, '23.853')] +[2024-10-03 11:33:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 4493312. Throughput: 0: 914.4. Samples: 1122006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:11,558][00426] Avg episode reward: [(0, '25.096')] +[2024-10-03 11:33:14,377][02741] Updated weights for policy 0, policy_version 1100 (0.0044) +[2024-10-03 11:33:16,554][00426] Fps is (10 sec: 3687.1, 60 sec: 3754.8, 300 sec: 3637.8). Total num frames: 4513792. Throughput: 0: 912.8. Samples: 1127974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:33:16,561][00426] Avg episode reward: [(0, '24.346')] +[2024-10-03 11:33:21,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 4534272. Throughput: 0: 970.8. Samples: 1134700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:33:21,560][00426] Avg episode reward: [(0, '24.859')] +[2024-10-03 11:33:21,571][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001107_4534272.pth... +[2024-10-03 11:33:21,776][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000891_3649536.pth +[2024-10-03 11:33:24,936][02741] Updated weights for policy 0, policy_version 1110 (0.0027) +[2024-10-03 11:33:26,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 4546560. Throughput: 0: 941.4. Samples: 1136602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:33:26,561][00426] Avg episode reward: [(0, '25.543')] +[2024-10-03 11:33:26,604][02728] Saving new best policy, reward=25.543! +[2024-10-03 11:33:31,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 4567040. Throughput: 0: 894.1. Samples: 1141246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:31,556][00426] Avg episode reward: [(0, '25.742')] +[2024-10-03 11:33:31,571][02728] Saving new best policy, reward=25.742! +[2024-10-03 11:33:35,694][02741] Updated weights for policy 0, policy_version 1120 (0.0036) +[2024-10-03 11:33:36,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 4587520. Throughput: 0: 940.4. Samples: 1147882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:33:36,561][00426] Avg episode reward: [(0, '23.445')] +[2024-10-03 11:33:41,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 4608000. Throughput: 0: 958.0. Samples: 1150936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:41,557][00426] Avg episode reward: [(0, '24.795')] +[2024-10-03 11:33:46,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 4620288. Throughput: 0: 904.4. Samples: 1155022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:46,560][00426] Avg episode reward: [(0, '24.380')] +[2024-10-03 11:33:47,838][02741] Updated weights for policy 0, policy_version 1130 (0.0014) +[2024-10-03 11:33:51,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3665.6). Total num frames: 4644864. Throughput: 0: 917.3. Samples: 1161334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:51,561][00426] Avg episode reward: [(0, '24.464')] +[2024-10-03 11:33:56,554][00426] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 4665344. Throughput: 0: 948.9. Samples: 1164708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:33:56,566][00426] Avg episode reward: [(0, '22.315')] +[2024-10-03 11:33:57,758][02741] Updated weights for policy 0, policy_version 1140 (0.0033) +[2024-10-03 11:34:01,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 4677632. Throughput: 0: 921.1. Samples: 1169422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:34:01,560][00426] Avg episode reward: [(0, '23.618')] +[2024-10-03 11:34:06,554][00426] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3651.7). Total num frames: 4694016. Throughput: 0: 887.8. Samples: 1174652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:34:06,557][00426] Avg episode reward: [(0, '23.495')] +[2024-10-03 11:34:09,208][02741] Updated weights for policy 0, policy_version 1150 (0.0026) +[2024-10-03 11:34:11,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 4718592. Throughput: 0: 921.0. Samples: 1178046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:11,557][00426] Avg episode reward: [(0, '23.352')] +[2024-10-03 11:34:16,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 4734976. Throughput: 0: 952.0. Samples: 1184084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:16,557][00426] Avg episode reward: [(0, '24.382')] +[2024-10-03 11:34:21,255][02741] Updated weights for policy 0, policy_version 1160 (0.0020) +[2024-10-03 11:34:21,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 4751360. Throughput: 0: 897.0. Samples: 1188246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:21,557][00426] Avg episode reward: [(0, '23.850')] +[2024-10-03 11:34:26,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 4771840. Throughput: 0: 901.8. Samples: 1191518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:26,557][00426] Avg episode reward: [(0, '24.794')] +[2024-10-03 11:34:30,623][02741] Updated weights for policy 0, policy_version 1170 (0.0023) +[2024-10-03 11:34:31,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 4792320. Throughput: 0: 957.8. Samples: 1198122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:34:31,562][00426] Avg episode reward: [(0, '24.829')] +[2024-10-03 11:34:36,555][00426] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 4804608. Throughput: 0: 914.5. Samples: 1202486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:34:36,556][00426] Avg episode reward: [(0, '24.389')] +[2024-10-03 11:34:41,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 4825088. Throughput: 0: 889.6. Samples: 1204740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:34:41,557][00426] Avg episode reward: [(0, '23.711')] +[2024-10-03 11:34:42,816][02741] Updated weights for policy 0, policy_version 1180 (0.0026) +[2024-10-03 11:34:46,554][00426] Fps is (10 sec: 4505.8, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 4849664. Throughput: 0: 934.9. Samples: 1211494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:46,560][00426] Avg episode reward: [(0, '24.340')] +[2024-10-03 11:34:51,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 4866048. Throughput: 0: 945.6. Samples: 1217204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:34:51,561][00426] Avg episode reward: [(0, '23.273')] +[2024-10-03 11:34:54,016][02741] Updated weights for policy 0, policy_version 1190 (0.0044) +[2024-10-03 11:34:56,554][00426] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 4878336. Throughput: 0: 914.8. Samples: 1219214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:34:56,557][00426] Avg episode reward: [(0, '23.695')] +[2024-10-03 11:35:01,554][00426] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 4902912. Throughput: 0: 909.7. Samples: 1225022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:35:01,560][00426] Avg episode reward: [(0, '22.857')] +[2024-10-03 11:35:04,048][02741] Updated weights for policy 0, policy_version 1200 (0.0024) +[2024-10-03 11:35:06,554][00426] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 4923392. Throughput: 0: 965.5. Samples: 1231692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-10-03 11:35:06,561][00426] Avg episode reward: [(0, '23.572')] +[2024-10-03 11:35:11,554][00426] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 4935680. Throughput: 0: 937.8. Samples: 1233718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:35:11,562][00426] Avg episode reward: [(0, '23.246')] +[2024-10-03 11:35:16,059][02741] Updated weights for policy 0, policy_version 1210 (0.0015) +[2024-10-03 11:35:16,555][00426] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 4956160. Throughput: 0: 898.1. Samples: 1238538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-10-03 11:35:16,559][00426] Avg episode reward: [(0, '23.124')] +[2024-10-03 11:35:21,554][00426] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 4976640. Throughput: 0: 950.1. Samples: 1245242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-10-03 11:35:21,561][00426] Avg episode reward: [(0, '24.660')] +[2024-10-03 11:35:21,594][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001216_4980736.pth... +[2024-10-03 11:35:21,725][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000999_4091904.pth +[2024-10-03 11:35:26,556][00426] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 4993024. Throughput: 0: 964.9. Samples: 1248162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-10-03 11:35:26,559][00426] Avg episode reward: [(0, '24.000')] +[2024-10-03 11:35:26,684][02741] Updated weights for policy 0, policy_version 1220 (0.0036) +[2024-10-03 11:35:30,084][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... +[2024-10-03 11:35:30,096][02728] Stopping Batcher_0... +[2024-10-03 11:35:30,097][02728] Loop batcher_evt_loop terminating... +[2024-10-03 11:35:30,099][00426] Component Batcher_0 stopped! +[2024-10-03 11:35:30,210][02741] Weights refcount: 2 0 +[2024-10-03 11:35:30,219][00426] Component InferenceWorker_p0-w0 stopped! +[2024-10-03 11:35:30,225][02741] Stopping InferenceWorker_p0-w0... +[2024-10-03 11:35:30,225][02741] Loop inference_proc0-0_evt_loop terminating... +[2024-10-03 11:35:30,283][02728] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001107_4534272.pth +[2024-10-03 11:35:30,311][02728] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... +[2024-10-03 11:35:30,587][00426] Component LearnerWorker_p0 stopped! +[2024-10-03 11:35:30,589][02728] Stopping LearnerWorker_p0... +[2024-10-03 11:35:30,590][02728] Loop learner_proc0_evt_loop terminating... +[2024-10-03 11:35:30,853][02743] Stopping RolloutWorker_w1... +[2024-10-03 11:35:30,854][02743] Loop rollout_proc1_evt_loop terminating... +[2024-10-03 11:35:30,848][00426] Component RolloutWorker_w1 stopped! +[2024-10-03 11:35:30,864][00426] Component RolloutWorker_w7 stopped! +[2024-10-03 11:35:30,867][02747] Stopping RolloutWorker_w5... +[2024-10-03 11:35:30,873][02747] Loop rollout_proc5_evt_loop terminating... +[2024-10-03 11:35:30,869][00426] Component RolloutWorker_w5 stopped! +[2024-10-03 11:35:30,864][02748] Stopping RolloutWorker_w7... +[2024-10-03 11:35:30,885][02748] Loop rollout_proc7_evt_loop terminating... +[2024-10-03 11:35:30,894][02744] Stopping RolloutWorker_w3... +[2024-10-03 11:35:30,896][02744] Loop rollout_proc3_evt_loop terminating... +[2024-10-03 11:35:30,896][00426] Component RolloutWorker_w3 stopped! +[2024-10-03 11:35:30,966][00426] Component RolloutWorker_w4 stopped! +[2024-10-03 11:35:30,968][02746] Stopping RolloutWorker_w4... +[2024-10-03 11:35:30,979][00426] Component RolloutWorker_w2 stopped! +[2024-10-03 11:35:30,984][02745] Stopping RolloutWorker_w2... +[2024-10-03 11:35:30,984][02745] Loop rollout_proc2_evt_loop terminating... +[2024-10-03 11:35:30,988][02746] Loop rollout_proc4_evt_loop terminating... +[2024-10-03 11:35:31,075][00426] Component RolloutWorker_w6 stopped! +[2024-10-03 11:35:31,077][02749] Stopping RolloutWorker_w6... +[2024-10-03 11:35:31,088][02749] Loop rollout_proc6_evt_loop terminating... +[2024-10-03 11:35:31,128][00426] Component RolloutWorker_w0 stopped! +[2024-10-03 11:35:31,133][00426] Waiting for process learner_proc0 to stop... +[2024-10-03 11:35:31,139][02742] Stopping RolloutWorker_w0... +[2024-10-03 11:35:31,140][02742] Loop rollout_proc0_evt_loop terminating... +[2024-10-03 11:35:32,418][00426] Waiting for process inference_proc0-0 to join... +[2024-10-03 11:35:32,425][00426] Waiting for process rollout_proc0 to join... +[2024-10-03 11:35:34,587][00426] Waiting for process rollout_proc1 to join... +[2024-10-03 11:35:34,593][00426] Waiting for process rollout_proc2 to join... +[2024-10-03 11:35:34,596][00426] Waiting for process rollout_proc3 to join... +[2024-10-03 11:35:34,599][00426] Waiting for process rollout_proc4 to join... +[2024-10-03 11:35:34,604][00426] Waiting for process rollout_proc5 to join... +[2024-10-03 11:35:34,608][00426] Waiting for process rollout_proc6 to join... +[2024-10-03 11:35:34,612][00426] Waiting for process rollout_proc7 to join... +[2024-10-03 11:35:34,615][00426] Batcher 0 profile tree view: +batching: 35.2146, releasing_batches: 0.0375 +[2024-10-03 11:35:34,619][00426] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 538.8484 +update_model: 12.1759 + weight_update: 0.0022 +one_step: 0.0156 + handle_policy_step: 808.9313 + deserialize: 19.9541, stack: 4.2855, obs_to_device_normalize: 160.2709, forward: 431.4136, send_messages: 41.1244 + prepare_outputs: 112.0741 + to_cpu: 63.3618 +[2024-10-03 11:35:34,622][00426] Learner 0 profile tree view: +misc: 0.0064, prepare_batch: 16.4991 +train: 93.6749 + epoch_init: 0.0244, minibatch_init: 0.0133, losses_postprocess: 0.7770, kl_divergence: 0.9469, after_optimizer: 42.9201 + calculate_losses: 33.2916 + losses_init: 0.0049, forward_head: 1.6026, bptt_initial: 22.0421, tail: 1.4390, advantages_returns: 0.3449, losses: 4.8251 + bptt: 2.6393 + bptt_forward_core: 2.4985 + update: 14.8179 + clip: 1.1531 +[2024-10-03 11:35:34,624][00426] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4399, enqueue_policy_requests: 139.8889, env_step: 1098.1560, overhead: 19.4920, complete_rollouts: 9.7938 +save_policy_outputs: 29.7660 + split_output_tensors: 11.8874 +[2024-10-03 11:35:34,627][00426] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4412, enqueue_policy_requests: 140.6351, env_step: 1098.1984, overhead: 19.1132, complete_rollouts: 8.4768 +save_policy_outputs: 29.0579 + split_output_tensors: 11.3268 +[2024-10-03 11:35:34,628][00426] Loop Runner_EvtLoop terminating... +[2024-10-03 11:35:34,629][00426] Runner profile tree view: +main_loop: 1447.4305 +[2024-10-03 11:35:34,630][00426] Collected {0: 5005312}, FPS: 3458.1 +[2024-10-03 11:35:55,648][00426] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-10-03 11:35:55,650][00426] Overriding arg 'num_workers' with value 1 passed from command line +[2024-10-03 11:35:55,653][00426] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-10-03 11:35:55,655][00426] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-10-03 11:35:55,657][00426] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-10-03 11:35:55,659][00426] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-10-03 11:35:55,662][00426] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-10-03 11:35:55,663][00426] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-10-03 11:35:55,664][00426] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-10-03 11:35:55,665][00426] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-10-03 11:35:55,666][00426] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-10-03 11:35:55,667][00426] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-10-03 11:35:55,668][00426] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-10-03 11:35:55,669][00426] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-10-03 11:35:55,670][00426] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-10-03 11:35:55,708][00426] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-10-03 11:35:55,712][00426] RunningMeanStd input shape: (3, 72, 128) +[2024-10-03 11:35:55,714][00426] RunningMeanStd input shape: (1,) +[2024-10-03 11:35:55,731][00426] ConvEncoder: input_channels=3 +[2024-10-03 11:35:55,855][00426] Conv encoder output size: 512 +[2024-10-03 11:35:55,858][00426] Policy head output size: 512 +[2024-10-03 11:35:56,159][00426] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... +[2024-10-03 11:35:57,029][00426] Num frames 100... +[2024-10-03 11:35:57,173][00426] Num frames 200... +[2024-10-03 11:35:57,306][00426] Num frames 300... +[2024-10-03 11:35:57,435][00426] Num frames 400... +[2024-10-03 11:35:57,569][00426] Num frames 500... +[2024-10-03 11:35:57,705][00426] Num frames 600... +[2024-10-03 11:35:57,834][00426] Num frames 700... +[2024-10-03 11:35:57,972][00426] Num frames 800... +[2024-10-03 11:35:58,145][00426] Num frames 900... +[2024-10-03 11:35:58,278][00426] Num frames 1000... +[2024-10-03 11:35:58,409][00426] Num frames 1100... +[2024-10-03 11:35:58,545][00426] Num frames 1200... +[2024-10-03 11:35:58,675][00426] Num frames 1300... +[2024-10-03 11:35:58,809][00426] Num frames 1400... +[2024-10-03 11:35:58,951][00426] Num frames 1500... +[2024-10-03 11:35:59,085][00426] Num frames 1600... +[2024-10-03 11:35:59,226][00426] Num frames 1700... +[2024-10-03 11:35:59,357][00426] Num frames 1800... +[2024-10-03 11:35:59,489][00426] Num frames 1900... +[2024-10-03 11:35:59,622][00426] Num frames 2000... +[2024-10-03 11:35:59,762][00426] Num frames 2100... +[2024-10-03 11:35:59,814][00426] Avg episode rewards: #0: 56.999, true rewards: #0: 21.000 +[2024-10-03 11:35:59,817][00426] Avg episode reward: 56.999, avg true_objective: 21.000 +[2024-10-03 11:35:59,952][00426] Num frames 2200... +[2024-10-03 11:36:00,083][00426] Num frames 2300... +[2024-10-03 11:36:00,221][00426] Num frames 2400... +[2024-10-03 11:36:00,357][00426] Num frames 2500... +[2024-10-03 11:36:00,489][00426] Num frames 2600... +[2024-10-03 11:36:00,621][00426] Num frames 2700... +[2024-10-03 11:36:00,751][00426] Num frames 2800... +[2024-10-03 11:36:00,885][00426] Num frames 2900... +[2024-10-03 11:36:01,022][00426] Num frames 3000... +[2024-10-03 11:36:01,159][00426] Avg episode rewards: #0: 39.799, true rewards: #0: 15.300 +[2024-10-03 11:36:01,161][00426] Avg episode reward: 39.799, avg true_objective: 15.300 +[2024-10-03 11:36:01,223][00426] Num frames 3100... +[2024-10-03 11:36:01,352][00426] Num frames 3200... +[2024-10-03 11:36:01,483][00426] Num frames 3300... +[2024-10-03 11:36:01,614][00426] Num frames 3400... +[2024-10-03 11:36:01,743][00426] Num frames 3500... +[2024-10-03 11:36:01,874][00426] Num frames 3600... +[2024-10-03 11:36:02,003][00426] Num frames 3700... +[2024-10-03 11:36:02,133][00426] Num frames 3800... +[2024-10-03 11:36:02,278][00426] Num frames 3900... +[2024-10-03 11:36:02,411][00426] Num frames 4000... +[2024-10-03 11:36:02,542][00426] Num frames 4100... +[2024-10-03 11:36:02,675][00426] Num frames 4200... +[2024-10-03 11:36:02,814][00426] Num frames 4300... +[2024-10-03 11:36:02,952][00426] Num frames 4400... +[2024-10-03 11:36:03,096][00426] Avg episode rewards: #0: 37.226, true rewards: #0: 14.893 +[2024-10-03 11:36:03,098][00426] Avg episode reward: 37.226, avg true_objective: 14.893 +[2024-10-03 11:36:03,143][00426] Num frames 4500... +[2024-10-03 11:36:03,280][00426] Num frames 4600... +[2024-10-03 11:36:03,410][00426] Num frames 4700... +[2024-10-03 11:36:03,544][00426] Num frames 4800... +[2024-10-03 11:36:03,678][00426] Num frames 4900... +[2024-10-03 11:36:03,814][00426] Num frames 5000... +[2024-10-03 11:36:03,952][00426] Num frames 5100... +[2024-10-03 11:36:04,082][00426] Num frames 5200... +[2024-10-03 11:36:04,211][00426] Num frames 5300... +[2024-10-03 11:36:04,357][00426] Num frames 5400... +[2024-10-03 11:36:04,542][00426] Num frames 5500... +[2024-10-03 11:36:04,718][00426] Num frames 5600... +[2024-10-03 11:36:04,906][00426] Num frames 5700... +[2024-10-03 11:36:05,083][00426] Num frames 5800... +[2024-10-03 11:36:05,263][00426] Num frames 5900... +[2024-10-03 11:36:05,454][00426] Num frames 6000... +[2024-10-03 11:36:05,627][00426] Num frames 6100... +[2024-10-03 11:36:05,821][00426] Num frames 6200... +[2024-10-03 11:36:05,913][00426] Avg episode rewards: #0: 39.780, true rewards: #0: 15.530 +[2024-10-03 11:36:05,916][00426] Avg episode reward: 39.780, avg true_objective: 15.530 +[2024-10-03 11:36:06,079][00426] Num frames 6300... +[2024-10-03 11:36:06,272][00426] Num frames 6400... +[2024-10-03 11:36:06,486][00426] Num frames 6500... +[2024-10-03 11:36:06,601][00426] Avg episode rewards: #0: 32.858, true rewards: #0: 13.058 +[2024-10-03 11:36:06,603][00426] Avg episode reward: 32.858, avg true_objective: 13.058 +[2024-10-03 11:36:06,749][00426] Num frames 6600... +[2024-10-03 11:36:06,958][00426] Num frames 6700... +[2024-10-03 11:36:07,100][00426] Num frames 6800... +[2024-10-03 11:36:07,229][00426] Num frames 6900... +[2024-10-03 11:36:07,358][00426] Num frames 7000... +[2024-10-03 11:36:07,504][00426] Num frames 7100... +[2024-10-03 11:36:07,639][00426] Num frames 7200... +[2024-10-03 11:36:07,782][00426] Avg episode rewards: #0: 29.941, true rewards: #0: 12.108 +[2024-10-03 11:36:07,784][00426] Avg episode reward: 29.941, avg true_objective: 12.108 +[2024-10-03 11:36:07,833][00426] Num frames 7300... +[2024-10-03 11:36:07,974][00426] Num frames 7400... +[2024-10-03 11:36:08,106][00426] Num frames 7500... +[2024-10-03 11:36:08,237][00426] Num frames 7600... +[2024-10-03 11:36:08,364][00426] Num frames 7700... +[2024-10-03 11:36:08,504][00426] Num frames 7800... +[2024-10-03 11:36:08,635][00426] Num frames 7900... +[2024-10-03 11:36:08,778][00426] Avg episode rewards: #0: 27.670, true rewards: #0: 11.384 +[2024-10-03 11:36:08,780][00426] Avg episode reward: 27.670, avg true_objective: 11.384 +[2024-10-03 11:36:08,822][00426] Num frames 8000... +[2024-10-03 11:36:08,958][00426] Num frames 8100... +[2024-10-03 11:36:09,095][00426] Num frames 8200... +[2024-10-03 11:36:09,224][00426] Num frames 8300... +[2024-10-03 11:36:09,320][00426] Avg episode rewards: #0: 24.912, true rewards: #0: 10.412 +[2024-10-03 11:36:09,321][00426] Avg episode reward: 24.912, avg true_objective: 10.412 +[2024-10-03 11:36:09,414][00426] Num frames 8400... +[2024-10-03 11:36:09,550][00426] Num frames 8500... +[2024-10-03 11:36:09,682][00426] Num frames 8600... +[2024-10-03 11:36:09,812][00426] Num frames 8700... +[2024-10-03 11:36:09,945][00426] Num frames 8800... +[2024-10-03 11:36:10,069][00426] Num frames 8900... +[2024-10-03 11:36:10,173][00426] Avg episode rewards: #0: 23.264, true rewards: #0: 9.931 +[2024-10-03 11:36:10,174][00426] Avg episode reward: 23.264, avg true_objective: 9.931 +[2024-10-03 11:36:10,255][00426] Num frames 9000... +[2024-10-03 11:36:10,386][00426] Num frames 9100... +[2024-10-03 11:36:10,529][00426] Num frames 9200... +[2024-10-03 11:36:10,661][00426] Num frames 9300... +[2024-10-03 11:36:10,791][00426] Num frames 9400... +[2024-10-03 11:36:10,925][00426] Num frames 9500... +[2024-10-03 11:36:11,066][00426] Num frames 9600... +[2024-10-03 11:36:11,200][00426] Num frames 9700... +[2024-10-03 11:36:11,329][00426] Num frames 9800... +[2024-10-03 11:36:11,458][00426] Num frames 9900... +[2024-10-03 11:36:11,598][00426] Num frames 10000... +[2024-10-03 11:36:11,734][00426] Num frames 10100... +[2024-10-03 11:36:11,880][00426] Num frames 10200... +[2024-10-03 11:36:12,043][00426] Avg episode rewards: #0: 23.982, true rewards: #0: 10.282 +[2024-10-03 11:36:12,046][00426] Avg episode reward: 23.982, avg true_objective: 10.282 +[2024-10-03 11:37:19,119][00426] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-10-03 11:42:31,642][00426] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-10-03 11:42:31,644][00426] Overriding arg 'num_workers' with value 1 passed from command line +[2024-10-03 11:42:31,646][00426] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-10-03 11:42:31,648][00426] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-10-03 11:42:31,650][00426] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-10-03 11:42:31,651][00426] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-10-03 11:42:31,653][00426] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-10-03 11:42:31,655][00426] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-10-03 11:42:31,656][00426] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-10-03 11:42:31,657][00426] Adding new argument 'hf_repository'='maavaneck/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-10-03 11:42:31,658][00426] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-10-03 11:42:31,659][00426] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-10-03 11:42:31,660][00426] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-10-03 11:42:31,661][00426] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-10-03 11:42:31,662][00426] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-10-03 11:42:31,693][00426] RunningMeanStd input shape: (3, 72, 128) +[2024-10-03 11:42:31,694][00426] RunningMeanStd input shape: (1,) +[2024-10-03 11:42:31,708][00426] ConvEncoder: input_channels=3 +[2024-10-03 11:42:31,752][00426] Conv encoder output size: 512 +[2024-10-03 11:42:31,754][00426] Policy head output size: 512 +[2024-10-03 11:42:31,773][00426] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... +[2024-10-03 11:42:32,259][00426] Num frames 100... +[2024-10-03 11:42:32,390][00426] Num frames 200... +[2024-10-03 11:42:32,530][00426] Num frames 300... +[2024-10-03 11:42:32,663][00426] Num frames 400... +[2024-10-03 11:42:32,796][00426] Num frames 500... +[2024-10-03 11:42:32,933][00426] Num frames 600... +[2024-10-03 11:42:33,066][00426] Num frames 700... +[2024-10-03 11:42:33,195][00426] Num frames 800... +[2024-10-03 11:42:33,321][00426] Num frames 900... +[2024-10-03 11:42:33,461][00426] Num frames 1000... +[2024-10-03 11:42:33,591][00426] Num frames 1100... +[2024-10-03 11:42:33,773][00426] Num frames 1200... +[2024-10-03 11:42:33,949][00426] Num frames 1300... +[2024-10-03 11:42:34,121][00426] Num frames 1400... +[2024-10-03 11:42:34,292][00426] Num frames 1500... +[2024-10-03 11:42:34,474][00426] Num frames 1600... +[2024-10-03 11:42:34,647][00426] Num frames 1700... +[2024-10-03 11:42:34,827][00426] Num frames 1800... +[2024-10-03 11:42:35,000][00426] Num frames 1900... +[2024-10-03 11:42:35,186][00426] Num frames 2000... +[2024-10-03 11:42:35,373][00426] Num frames 2100... +[2024-10-03 11:42:35,429][00426] Avg episode rewards: #0: 57.999, true rewards: #0: 21.000 +[2024-10-03 11:42:35,432][00426] Avg episode reward: 57.999, avg true_objective: 21.000 +[2024-10-03 11:42:35,638][00426] Num frames 2200... +[2024-10-03 11:42:35,827][00426] Num frames 2300... +[2024-10-03 11:42:36,015][00426] Num frames 2400... +[2024-10-03 11:42:36,199][00426] Num frames 2500... +[2024-10-03 11:42:36,367][00426] Num frames 2600... +[2024-10-03 11:42:36,498][00426] Num frames 2700... +[2024-10-03 11:42:36,640][00426] Num frames 2800... +[2024-10-03 11:42:36,773][00426] Num frames 2900... +[2024-10-03 11:42:36,918][00426] Num frames 3000... +[2024-10-03 11:42:37,061][00426] Num frames 3100... +[2024-10-03 11:42:37,191][00426] Num frames 3200... +[2024-10-03 11:42:37,321][00426] Num frames 3300... +[2024-10-03 11:42:37,450][00426] Num frames 3400... +[2024-10-03 11:42:37,593][00426] Num frames 3500... +[2024-10-03 11:42:37,731][00426] Num frames 3600... +[2024-10-03 11:42:37,871][00426] Num frames 3700... +[2024-10-03 11:42:37,969][00426] Avg episode rewards: #0: 47.159, true rewards: #0: 18.660 +[2024-10-03 11:42:37,972][00426] Avg episode reward: 47.159, avg true_objective: 18.660 +[2024-10-03 11:42:38,066][00426] Num frames 3800... +[2024-10-03 11:42:38,198][00426] Num frames 3900... +[2024-10-03 11:42:38,336][00426] Num frames 4000... +[2024-10-03 11:42:38,465][00426] Num frames 4100... +[2024-10-03 11:42:38,594][00426] Num frames 4200... +[2024-10-03 11:42:38,731][00426] Num frames 4300... +[2024-10-03 11:42:38,866][00426] Num frames 4400... +[2024-10-03 11:42:38,996][00426] Num frames 4500... +[2024-10-03 11:42:39,125][00426] Num frames 4600... +[2024-10-03 11:42:39,258][00426] Num frames 4700... +[2024-10-03 11:42:39,393][00426] Num frames 4800... +[2024-10-03 11:42:39,558][00426] Avg episode rewards: #0: 39.613, true rewards: #0: 16.280 +[2024-10-03 11:42:39,560][00426] Avg episode reward: 39.613, avg true_objective: 16.280 +[2024-10-03 11:42:39,585][00426] Num frames 4900... +[2024-10-03 11:42:39,719][00426] Num frames 5000... +[2024-10-03 11:42:39,853][00426] Num frames 5100... +[2024-10-03 11:42:39,985][00426] Num frames 5200... +[2024-10-03 11:42:40,117][00426] Num frames 5300... +[2024-10-03 11:42:40,253][00426] Num frames 5400... +[2024-10-03 11:42:40,386][00426] Avg episode rewards: #0: 32.150, true rewards: #0: 13.650 +[2024-10-03 11:42:40,388][00426] Avg episode reward: 32.150, avg true_objective: 13.650 +[2024-10-03 11:42:40,445][00426] Num frames 5500... +[2024-10-03 11:42:40,577][00426] Num frames 5600... +[2024-10-03 11:42:40,713][00426] Num frames 5700... +[2024-10-03 11:42:40,851][00426] Num frames 5800... +[2024-10-03 11:42:40,985][00426] Num frames 5900... +[2024-10-03 11:42:41,118][00426] Num frames 6000... +[2024-10-03 11:42:41,179][00426] Avg episode rewards: #0: 27.808, true rewards: #0: 12.008 +[2024-10-03 11:42:41,181][00426] Avg episode reward: 27.808, avg true_objective: 12.008 +[2024-10-03 11:42:41,309][00426] Num frames 6100... +[2024-10-03 11:42:41,438][00426] Num frames 6200... +[2024-10-03 11:42:41,563][00426] Num frames 6300... +[2024-10-03 11:42:41,697][00426] Num frames 6400... +[2024-10-03 11:42:41,832][00426] Num frames 6500... +[2024-10-03 11:42:41,966][00426] Num frames 6600... +[2024-10-03 11:42:42,093][00426] Num frames 6700... +[2024-10-03 11:42:42,219][00426] Num frames 6800... +[2024-10-03 11:42:42,349][00426] Num frames 6900... +[2024-10-03 11:42:42,530][00426] Avg episode rewards: #0: 26.327, true rewards: #0: 11.660 +[2024-10-03 11:42:42,532][00426] Avg episode reward: 26.327, avg true_objective: 11.660 +[2024-10-03 11:42:42,541][00426] Num frames 7000... +[2024-10-03 11:42:42,680][00426] Num frames 7100... +[2024-10-03 11:42:42,821][00426] Num frames 7200... +[2024-10-03 11:42:42,957][00426] Num frames 7300... +[2024-10-03 11:42:43,087][00426] Num frames 7400... +[2024-10-03 11:42:43,212][00426] Num frames 7500... +[2024-10-03 11:42:43,356][00426] Avg episode rewards: #0: 23.960, true rewards: #0: 10.817 +[2024-10-03 11:42:43,357][00426] Avg episode reward: 23.960, avg true_objective: 10.817 +[2024-10-03 11:42:43,398][00426] Num frames 7600... +[2024-10-03 11:42:43,529][00426] Num frames 7700... +[2024-10-03 11:42:43,655][00426] Num frames 7800... +[2024-10-03 11:42:43,799][00426] Num frames 7900... +[2024-10-03 11:42:43,931][00426] Num frames 8000... +[2024-10-03 11:42:44,056][00426] Num frames 8100... +[2024-10-03 11:42:44,133][00426] Avg episode rewards: #0: 22.270, true rewards: #0: 10.145 +[2024-10-03 11:42:44,134][00426] Avg episode reward: 22.270, avg true_objective: 10.145 +[2024-10-03 11:42:44,241][00426] Num frames 8200... +[2024-10-03 11:42:44,364][00426] Num frames 8300... +[2024-10-03 11:42:44,492][00426] Num frames 8400... +[2024-10-03 11:42:44,621][00426] Num frames 8500... +[2024-10-03 11:42:44,751][00426] Num frames 8600... +[2024-10-03 11:42:44,900][00426] Num frames 8700... +[2024-10-03 11:42:45,028][00426] Num frames 8800... +[2024-10-03 11:42:45,155][00426] Num frames 8900... +[2024-10-03 11:42:45,280][00426] Num frames 9000... +[2024-10-03 11:42:45,406][00426] Num frames 9100... +[2024-10-03 11:42:45,536][00426] Num frames 9200... +[2024-10-03 11:42:45,662][00426] Num frames 9300... +[2024-10-03 11:42:45,792][00426] Num frames 9400... +[2024-10-03 11:42:45,943][00426] Avg episode rewards: #0: 22.955, true rewards: #0: 10.511 +[2024-10-03 11:42:45,945][00426] Avg episode reward: 22.955, avg true_objective: 10.511 +[2024-10-03 11:42:45,998][00426] Num frames 9500... +[2024-10-03 11:42:46,127][00426] Num frames 9600... +[2024-10-03 11:42:46,254][00426] Num frames 9700... +[2024-10-03 11:42:46,408][00426] Num frames 9800... +[2024-10-03 11:42:46,607][00426] Avg episode rewards: #0: 21.276, true rewards: #0: 9.876 +[2024-10-03 11:42:46,609][00426] Avg episode reward: 21.276, avg true_objective: 9.876 +[2024-10-03 11:43:48,464][00426] Replay video saved to /content/train_dir/default_experiment/replay.mp4!