diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1105 @@ +[2024-08-25 05:50:38,007][00480] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-08-25 05:50:38,017][00480] Rollout worker 0 uses device cpu +[2024-08-25 05:50:38,023][00480] Rollout worker 1 uses device cpu +[2024-08-25 05:50:38,027][00480] Rollout worker 2 uses device cpu +[2024-08-25 05:50:38,030][00480] Rollout worker 3 uses device cpu +[2024-08-25 05:50:38,032][00480] Rollout worker 4 uses device cpu +[2024-08-25 05:50:38,035][00480] Rollout worker 5 uses device cpu +[2024-08-25 05:50:38,038][00480] Rollout worker 6 uses device cpu +[2024-08-25 05:50:38,041][00480] Rollout worker 7 uses device cpu +[2024-08-25 05:50:38,255][00480] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-25 05:50:38,261][00480] InferenceWorker_p0-w0: min num requests: 2 +[2024-08-25 05:50:38,309][00480] Starting all processes... +[2024-08-25 05:50:38,313][00480] Starting process learner_proc0 +[2024-08-25 05:50:38,375][00480] Starting all processes... +[2024-08-25 05:50:38,488][00480] Starting process inference_proc0-0 +[2024-08-25 05:50:38,491][00480] Starting process rollout_proc0 +[2024-08-25 05:50:38,492][00480] Starting process rollout_proc1 +[2024-08-25 05:50:38,492][00480] Starting process rollout_proc2 +[2024-08-25 05:50:38,492][00480] Starting process rollout_proc3 +[2024-08-25 05:50:38,492][00480] Starting process rollout_proc4 +[2024-08-25 05:50:38,493][00480] Starting process rollout_proc5 +[2024-08-25 05:50:38,493][00480] Starting process rollout_proc6 +[2024-08-25 05:50:38,494][00480] Starting process rollout_proc7 +[2024-08-25 05:50:50,518][03619] Worker 2 uses CPU cores [0] +[2024-08-25 05:50:50,569][03602] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-25 05:50:50,571][03602] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-08-25 05:50:50,629][03602] Num visible devices: 1 +[2024-08-25 05:50:50,656][03602] Starting seed is not provided +[2024-08-25 05:50:50,657][03602] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-25 05:50:50,658][03602] Initializing actor-critic model on device cuda:0 +[2024-08-25 05:50:50,659][03602] RunningMeanStd input shape: (3, 72, 128) +[2024-08-25 05:50:50,660][03602] RunningMeanStd input shape: (1,) +[2024-08-25 05:50:50,730][03602] ConvEncoder: input_channels=3 +[2024-08-25 05:50:50,805][03620] Worker 4 uses CPU cores [0] +[2024-08-25 05:50:50,880][03621] Worker 5 uses CPU cores [1] +[2024-08-25 05:50:50,959][03616] Worker 0 uses CPU cores [0] +[2024-08-25 05:50:50,970][03615] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-25 05:50:50,971][03615] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-08-25 05:50:50,989][03615] Num visible devices: 1 +[2024-08-25 05:50:51,096][03623] Worker 7 uses CPU cores [1] +[2024-08-25 05:50:51,118][03617] Worker 1 uses CPU cores [1] +[2024-08-25 05:50:51,165][03622] Worker 6 uses CPU cores [0] +[2024-08-25 05:50:51,201][03618] Worker 3 uses CPU cores [1] +[2024-08-25 05:50:51,233][03602] Conv encoder output size: 512 +[2024-08-25 05:50:51,234][03602] Policy head output size: 512 +[2024-08-25 05:50:51,249][03602] Created Actor Critic model with architecture: +[2024-08-25 05:50:51,249][03602] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-08-25 05:50:55,125][03602] Using optimizer +[2024-08-25 05:50:55,126][03602] No checkpoints found +[2024-08-25 05:50:55,126][03602] Did not load from checkpoint, starting from scratch! +[2024-08-25 05:50:55,126][03602] Initialized policy 0 weights for model version 0 +[2024-08-25 05:50:55,135][03602] LearnerWorker_p0 finished initialization! +[2024-08-25 05:50:55,135][03602] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-25 05:50:55,349][03615] RunningMeanStd input shape: (3, 72, 128) +[2024-08-25 05:50:55,350][03615] RunningMeanStd input shape: (1,) +[2024-08-25 05:50:55,369][03615] ConvEncoder: input_channels=3 +[2024-08-25 05:50:55,516][03615] Conv encoder output size: 512 +[2024-08-25 05:50:55,517][03615] Policy head output size: 512 +[2024-08-25 05:50:57,447][00480] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-25 05:50:57,707][00480] Inference worker 0-0 is ready! +[2024-08-25 05:50:57,709][00480] All inference workers are ready! Signal rollout workers to start! +[2024-08-25 05:50:57,809][03620] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,820][03616] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,830][03619] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,849][03622] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,880][03618] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,902][03617] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,916][03623] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:57,935][03621] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 05:50:58,245][00480] Heartbeat connected on Batcher_0 +[2024-08-25 05:50:58,250][00480] Heartbeat connected on LearnerWorker_p0 +[2024-08-25 05:50:58,301][00480] Heartbeat connected on InferenceWorker_p0-w0 +[2024-08-25 05:50:59,111][03617] Decorrelating experience for 0 frames... +[2024-08-25 05:50:59,111][03616] Decorrelating experience for 0 frames... +[2024-08-25 05:50:59,112][03619] Decorrelating experience for 0 frames... +[2024-08-25 05:50:59,506][03616] Decorrelating experience for 32 frames... +[2024-08-25 05:50:59,816][03617] Decorrelating experience for 32 frames... +[2024-08-25 05:50:59,873][03618] Decorrelating experience for 0 frames... +[2024-08-25 05:50:59,992][03619] Decorrelating experience for 32 frames... +[2024-08-25 05:51:00,270][03618] Decorrelating experience for 32 frames... +[2024-08-25 05:51:00,702][03618] Decorrelating experience for 64 frames... +[2024-08-25 05:51:00,906][03620] Decorrelating experience for 0 frames... +[2024-08-25 05:51:00,987][03616] Decorrelating experience for 64 frames... +[2024-08-25 05:51:01,084][03619] Decorrelating experience for 64 frames... +[2024-08-25 05:51:01,418][03620] Decorrelating experience for 32 frames... +[2024-08-25 05:51:01,707][03617] Decorrelating experience for 64 frames... +[2024-08-25 05:51:01,850][03616] Decorrelating experience for 96 frames... +[2024-08-25 05:51:01,889][03618] Decorrelating experience for 96 frames... +[2024-08-25 05:51:01,981][00480] Heartbeat connected on RolloutWorker_w0 +[2024-08-25 05:51:02,028][00480] Heartbeat connected on RolloutWorker_w3 +[2024-08-25 05:51:02,358][03617] Decorrelating experience for 96 frames... +[2024-08-25 05:51:02,446][00480] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-25 05:51:02,454][00480] Heartbeat connected on RolloutWorker_w1 +[2024-08-25 05:51:02,673][03620] Decorrelating experience for 64 frames... +[2024-08-25 05:51:02,720][03619] Decorrelating experience for 96 frames... +[2024-08-25 05:51:02,811][00480] Heartbeat connected on RolloutWorker_w2 +[2024-08-25 05:51:03,072][03620] Decorrelating experience for 96 frames... +[2024-08-25 05:51:03,139][00480] Heartbeat connected on RolloutWorker_w4 +[2024-08-25 05:51:07,446][00480] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 186.0. Samples: 1860. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-08-25 05:51:07,451][00480] Avg episode reward: [(0, '3.190')] +[2024-08-25 05:51:07,634][03602] Signal inference workers to stop experience collection... +[2024-08-25 05:51:07,643][03615] InferenceWorker_p0-w0: stopping experience collection +[2024-08-25 05:51:09,220][03602] Signal inference workers to resume experience collection... +[2024-08-25 05:51:09,221][03615] InferenceWorker_p0-w0: resuming experience collection +[2024-08-25 05:51:12,447][00480] Fps is (10 sec: 1228.7, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 179.9. Samples: 2698. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-25 05:51:12,452][00480] Avg episode reward: [(0, '3.517')] +[2024-08-25 05:51:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 357.4. Samples: 7148. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:51:17,453][00480] Avg episode reward: [(0, '4.125')] +[2024-08-25 05:51:19,147][03615] Updated weights for policy 0, policy_version 10 (0.0014) +[2024-08-25 05:51:22,447][00480] Fps is (10 sec: 4096.2, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 537.3. Samples: 13432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-08-25 05:51:22,450][00480] Avg episode reward: [(0, '4.377')] +[2024-08-25 05:51:27,447][00480] Fps is (10 sec: 3276.7, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 527.1. Samples: 15814. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-08-25 05:51:27,454][00480] Avg episode reward: [(0, '4.451')] +[2024-08-25 05:51:31,351][03615] Updated weights for policy 0, policy_version 20 (0.0016) +[2024-08-25 05:51:32,446][00480] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 575.8. Samples: 20154. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:51:32,449][00480] Avg episode reward: [(0, '4.551')] +[2024-08-25 05:51:37,446][00480] Fps is (10 sec: 4096.1, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 663.8. Samples: 26550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:51:37,449][00480] Avg episode reward: [(0, '4.401')] +[2024-08-25 05:51:37,455][03602] Saving new best policy, reward=4.401! +[2024-08-25 05:51:41,690][03615] Updated weights for policy 0, policy_version 30 (0.0025) +[2024-08-25 05:51:42,447][00480] Fps is (10 sec: 3686.2, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 122880. Throughput: 0: 657.7. Samples: 29598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:51:42,449][00480] Avg episode reward: [(0, '4.290')] +[2024-08-25 05:51:47,446][00480] Fps is (10 sec: 3276.8, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 750.2. Samples: 33758. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:51:47,450][00480] Avg episode reward: [(0, '4.437')] +[2024-08-25 05:51:47,454][03602] Saving new best policy, reward=4.437! +[2024-08-25 05:51:52,446][00480] Fps is (10 sec: 3686.6, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 848.1. Samples: 40026. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:51:52,449][00480] Avg episode reward: [(0, '4.604')] +[2024-08-25 05:51:52,459][03602] Saving new best policy, reward=4.604! +[2024-08-25 05:51:52,951][03615] Updated weights for policy 0, policy_version 40 (0.0019) +[2024-08-25 05:51:57,447][00480] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 897.5. Samples: 43084. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:51:57,448][00480] Avg episode reward: [(0, '4.589')] +[2024-08-25 05:52:02,447][00480] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 894.4. Samples: 47398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:02,449][00480] Avg episode reward: [(0, '4.405')] +[2024-08-25 05:52:05,059][03615] Updated weights for policy 0, policy_version 50 (0.0016) +[2024-08-25 05:52:07,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 888.6. Samples: 53418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:52:07,449][00480] Avg episode reward: [(0, '4.304')] +[2024-08-25 05:52:12,446][00480] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 905.4. Samples: 56558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:52:12,452][00480] Avg episode reward: [(0, '4.420')] +[2024-08-25 05:52:16,245][03615] Updated weights for policy 0, policy_version 60 (0.0015) +[2024-08-25 05:52:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 914.8. Samples: 61320. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:17,449][00480] Avg episode reward: [(0, '4.418')] +[2024-08-25 05:52:22,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 894.9. Samples: 66820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:22,452][00480] Avg episode reward: [(0, '4.585')] +[2024-08-25 05:52:27,225][03615] Updated weights for policy 0, policy_version 70 (0.0020) +[2024-08-25 05:52:27,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3185.8). Total num frames: 286720. Throughput: 0: 896.5. Samples: 69938. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:27,450][00480] Avg episode reward: [(0, '4.510')] +[2024-08-25 05:52:32,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 905.2. Samples: 74490. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:52:32,450][00480] Avg episode reward: [(0, '4.509')] +[2024-08-25 05:52:32,459][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... +[2024-08-25 05:52:37,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 883.9. Samples: 79800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:52:37,449][00480] Avg episode reward: [(0, '4.516')] +[2024-08-25 05:52:39,079][03615] Updated weights for policy 0, policy_version 80 (0.0014) +[2024-08-25 05:52:42,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 885.2. Samples: 82920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:52:42,449][00480] Avg episode reward: [(0, '4.397')] +[2024-08-25 05:52:47,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3239.6). Total num frames: 356352. Throughput: 0: 911.3. Samples: 88406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:47,454][00480] Avg episode reward: [(0, '4.406')] +[2024-08-25 05:52:50,922][03615] Updated weights for policy 0, policy_version 90 (0.0013) +[2024-08-25 05:52:52,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3241.2). Total num frames: 372736. Throughput: 0: 889.2. Samples: 93432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:52,452][00480] Avg episode reward: [(0, '4.486')] +[2024-08-25 05:52:57,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 888.7. Samples: 96548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:52:57,451][00480] Avg episode reward: [(0, '4.588')] +[2024-08-25 05:53:01,193][03615] Updated weights for policy 0, policy_version 100 (0.0015) +[2024-08-25 05:53:02,446][00480] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 409600. Throughput: 0: 911.0. Samples: 102314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:53:02,454][00480] Avg episode reward: [(0, '4.485')] +[2024-08-25 05:53:07,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 425984. Throughput: 0: 890.8. Samples: 106906. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:53:07,449][00480] Avg episode reward: [(0, '4.451')] +[2024-08-25 05:53:12,416][03615] Updated weights for policy 0, policy_version 110 (0.0014) +[2024-08-25 05:53:12,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3337.5). Total num frames: 450560. Throughput: 0: 890.5. Samples: 110012. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:53:12,448][00480] Avg episode reward: [(0, '4.635')] +[2024-08-25 05:53:12,459][03602] Saving new best policy, reward=4.635! +[2024-08-25 05:53:17,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3335.3). Total num frames: 466944. Throughput: 0: 923.6. Samples: 116054. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:53:17,452][00480] Avg episode reward: [(0, '4.601')] +[2024-08-25 05:53:22,449][00480] Fps is (10 sec: 2866.5, 60 sec: 3549.7, 300 sec: 3305.0). Total num frames: 479232. Throughput: 0: 902.9. Samples: 120432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:53:22,452][00480] Avg episode reward: [(0, '4.488')] +[2024-08-25 05:53:24,355][03615] Updated weights for policy 0, policy_version 120 (0.0020) +[2024-08-25 05:53:27,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3331.4). Total num frames: 499712. Throughput: 0: 903.5. Samples: 123576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:53:27,459][00480] Avg episode reward: [(0, '4.414')] +[2024-08-25 05:53:32,446][00480] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3329.7). Total num frames: 516096. Throughput: 0: 899.4. Samples: 128878. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:53:32,450][00480] Avg episode reward: [(0, '4.550')] +[2024-08-25 05:53:37,446][00480] Fps is (10 sec: 2457.7, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 524288. Throughput: 0: 845.2. Samples: 131468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:53:37,449][00480] Avg episode reward: [(0, '4.520')] +[2024-08-25 05:53:38,855][03615] Updated weights for policy 0, policy_version 130 (0.0013) +[2024-08-25 05:53:42,447][00480] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3301.6). Total num frames: 544768. Throughput: 0: 840.2. Samples: 134356. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:53:42,453][00480] Avg episode reward: [(0, '4.641')] +[2024-08-25 05:53:42,463][03602] Saving new best policy, reward=4.641! +[2024-08-25 05:53:47,447][00480] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3325.0). Total num frames: 565248. Throughput: 0: 850.1. Samples: 140568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:53:47,454][00480] Avg episode reward: [(0, '4.726')] +[2024-08-25 05:53:47,456][03602] Saving new best policy, reward=4.726! +[2024-08-25 05:53:49,884][03615] Updated weights for policy 0, policy_version 140 (0.0012) +[2024-08-25 05:53:52,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3300.2). Total num frames: 577536. Throughput: 0: 845.0. Samples: 144932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:53:52,449][00480] Avg episode reward: [(0, '4.725')] +[2024-08-25 05:53:57,446][00480] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3322.3). Total num frames: 598016. Throughput: 0: 837.2. Samples: 147688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:53:57,448][00480] Avg episode reward: [(0, '4.495')] +[2024-08-25 05:54:00,725][03615] Updated weights for policy 0, policy_version 150 (0.0013) +[2024-08-25 05:54:02,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3343.2). Total num frames: 618496. Throughput: 0: 840.9. Samples: 153896. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:02,451][00480] Avg episode reward: [(0, '4.488')] +[2024-08-25 05:54:07,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3341.5). Total num frames: 634880. Throughput: 0: 849.8. Samples: 158672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:54:07,449][00480] Avg episode reward: [(0, '4.521')] +[2024-08-25 05:54:12,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3339.8). Total num frames: 651264. Throughput: 0: 833.0. Samples: 161062. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:54:12,448][00480] Avg episode reward: [(0, '4.679')] +[2024-08-25 05:54:12,735][03615] Updated weights for policy 0, policy_version 160 (0.0012) +[2024-08-25 05:54:17,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3358.7). Total num frames: 671744. Throughput: 0: 856.8. Samples: 167436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:17,454][00480] Avg episode reward: [(0, '4.761')] +[2024-08-25 05:54:17,533][03602] Saving new best policy, reward=4.761! +[2024-08-25 05:54:22,447][00480] Fps is (10 sec: 3686.2, 60 sec: 3481.7, 300 sec: 3356.7). Total num frames: 688128. Throughput: 0: 908.5. Samples: 172352. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:22,450][00480] Avg episode reward: [(0, '4.626')] +[2024-08-25 05:54:24,682][03615] Updated weights for policy 0, policy_version 170 (0.0015) +[2024-08-25 05:54:27,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3354.8). Total num frames: 704512. Throughput: 0: 889.9. Samples: 174402. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:27,448][00480] Avg episode reward: [(0, '4.603')] +[2024-08-25 05:54:32,446][00480] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3372.1). Total num frames: 724992. Throughput: 0: 888.3. Samples: 180540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:32,449][00480] Avg episode reward: [(0, '4.642')] +[2024-08-25 05:54:32,459][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth... +[2024-08-25 05:54:34,926][03615] Updated weights for policy 0, policy_version 180 (0.0019) +[2024-08-25 05:54:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3369.9). Total num frames: 741376. Throughput: 0: 908.7. Samples: 185824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:54:37,448][00480] Avg episode reward: [(0, '4.556')] +[2024-08-25 05:54:42,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3367.8). Total num frames: 757760. Throughput: 0: 891.1. Samples: 187786. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:54:42,452][00480] Avg episode reward: [(0, '4.673')] +[2024-08-25 05:54:47,115][03615] Updated weights for policy 0, policy_version 190 (0.0014) +[2024-08-25 05:54:47,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3383.7). Total num frames: 778240. Throughput: 0: 878.4. Samples: 193426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:54:47,448][00480] Avg episode reward: [(0, '4.650')] +[2024-08-25 05:54:52,453][00480] Fps is (10 sec: 3684.1, 60 sec: 3617.7, 300 sec: 3381.3). Total num frames: 794624. Throughput: 0: 900.9. Samples: 199220. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-08-25 05:54:52,455][00480] Avg episode reward: [(0, '4.640')] +[2024-08-25 05:54:57,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3379.2). Total num frames: 811008. Throughput: 0: 893.4. Samples: 201266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:54:57,449][00480] Avg episode reward: [(0, '4.767')] +[2024-08-25 05:54:57,453][03602] Saving new best policy, reward=4.767! +[2024-08-25 05:54:58,951][03615] Updated weights for policy 0, policy_version 200 (0.0020) +[2024-08-25 05:55:02,447][00480] Fps is (10 sec: 3688.5, 60 sec: 3549.8, 300 sec: 3393.8). Total num frames: 831488. Throughput: 0: 877.1. Samples: 206904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:55:02,453][00480] Avg episode reward: [(0, '4.780')] +[2024-08-25 05:55:02,466][03602] Saving new best policy, reward=4.780! +[2024-08-25 05:55:07,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3407.9). Total num frames: 851968. Throughput: 0: 905.0. Samples: 213078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:55:07,452][00480] Avg episode reward: [(0, '4.886')] +[2024-08-25 05:55:07,458][03602] Saving new best policy, reward=4.886! +[2024-08-25 05:55:10,441][03615] Updated weights for policy 0, policy_version 210 (0.0016) +[2024-08-25 05:55:12,447][00480] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3389.2). Total num frames: 864256. Throughput: 0: 902.3. Samples: 215004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:55:12,453][00480] Avg episode reward: [(0, '4.778')] +[2024-08-25 05:55:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3402.8). Total num frames: 884736. Throughput: 0: 885.7. Samples: 220398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:17,452][00480] Avg episode reward: [(0, '4.816')] +[2024-08-25 05:55:20,619][03615] Updated weights for policy 0, policy_version 220 (0.0013) +[2024-08-25 05:55:22,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3415.9). Total num frames: 905216. Throughput: 0: 910.5. Samples: 226798. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:22,453][00480] Avg episode reward: [(0, '4.992')] +[2024-08-25 05:55:22,462][03602] Saving new best policy, reward=4.992! +[2024-08-25 05:55:27,447][00480] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3413.3). Total num frames: 921600. Throughput: 0: 916.6. Samples: 229032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:55:27,453][00480] Avg episode reward: [(0, '5.157')] +[2024-08-25 05:55:27,456][03602] Saving new best policy, reward=5.157! +[2024-08-25 05:55:32,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3410.9). Total num frames: 937984. Throughput: 0: 899.7. Samples: 233914. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-08-25 05:55:32,451][00480] Avg episode reward: [(0, '4.989')] +[2024-08-25 05:55:32,483][03615] Updated weights for policy 0, policy_version 230 (0.0018) +[2024-08-25 05:55:37,446][00480] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3437.7). Total num frames: 962560. Throughput: 0: 914.4. Samples: 240362. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:37,450][00480] Avg episode reward: [(0, '5.265')] +[2024-08-25 05:55:37,454][03602] Saving new best policy, reward=5.265! +[2024-08-25 05:55:42,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3420.5). Total num frames: 974848. Throughput: 0: 925.2. Samples: 242900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:42,454][00480] Avg episode reward: [(0, '5.231')] +[2024-08-25 05:55:44,417][03615] Updated weights for policy 0, policy_version 240 (0.0017) +[2024-08-25 05:55:47,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3432.2). Total num frames: 995328. Throughput: 0: 901.9. Samples: 247490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:55:47,452][00480] Avg episode reward: [(0, '4.929')] +[2024-08-25 05:55:52,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.8, 300 sec: 3443.4). Total num frames: 1015808. Throughput: 0: 908.7. Samples: 253970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:52,454][00480] Avg episode reward: [(0, '4.880')] +[2024-08-25 05:55:53,991][03615] Updated weights for policy 0, policy_version 250 (0.0022) +[2024-08-25 05:55:57,448][00480] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3498.9). Total num frames: 1032192. Throughput: 0: 929.9. Samples: 256852. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:55:57,453][00480] Avg episode reward: [(0, '5.168')] +[2024-08-25 05:56:02,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 905.6. Samples: 261150. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:02,451][00480] Avg episode reward: [(0, '5.147')] +[2024-08-25 05:56:05,787][03615] Updated weights for policy 0, policy_version 260 (0.0018) +[2024-08-25 05:56:07,446][00480] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1069056. Throughput: 0: 905.8. Samples: 267560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:07,451][00480] Avg episode reward: [(0, '5.093')] +[2024-08-25 05:56:12,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1089536. Throughput: 0: 925.8. Samples: 270694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:56:12,450][00480] Avg episode reward: [(0, '5.071')] +[2024-08-25 05:56:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1101824. Throughput: 0: 912.9. Samples: 274994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:17,456][00480] Avg episode reward: [(0, '5.058')] +[2024-08-25 05:56:17,555][03615] Updated weights for policy 0, policy_version 270 (0.0016) +[2024-08-25 05:56:22,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 1126400. Throughput: 0: 912.2. Samples: 281412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:56:22,452][00480] Avg episode reward: [(0, '5.307')] +[2024-08-25 05:56:22,460][03602] Saving new best policy, reward=5.307! +[2024-08-25 05:56:27,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3582.3). Total num frames: 1142784. Throughput: 0: 925.6. Samples: 284554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:27,454][00480] Avg episode reward: [(0, '5.452')] +[2024-08-25 05:56:27,457][03602] Saving new best policy, reward=5.452! +[2024-08-25 05:56:27,815][03615] Updated weights for policy 0, policy_version 280 (0.0015) +[2024-08-25 05:56:32,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1159168. Throughput: 0: 923.6. Samples: 289052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:56:32,449][00480] Avg episode reward: [(0, '5.567')] +[2024-08-25 05:56:32,457][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth... +[2024-08-25 05:56:32,571][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth +[2024-08-25 05:56:32,586][03602] Saving new best policy, reward=5.567! +[2024-08-25 05:56:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1179648. Throughput: 0: 909.2. Samples: 294884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:37,449][00480] Avg episode reward: [(0, '5.488')] +[2024-08-25 05:56:39,001][03615] Updated weights for policy 0, policy_version 290 (0.0013) +[2024-08-25 05:56:42,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1200128. Throughput: 0: 914.5. Samples: 298002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:42,449][00480] Avg episode reward: [(0, '5.447')] +[2024-08-25 05:56:47,447][00480] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1212416. Throughput: 0: 929.4. Samples: 302974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:47,449][00480] Avg episode reward: [(0, '5.696')] +[2024-08-25 05:56:47,454][03602] Saving new best policy, reward=5.696! +[2024-08-25 05:56:50,941][03615] Updated weights for policy 0, policy_version 300 (0.0015) +[2024-08-25 05:56:52,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1232896. Throughput: 0: 910.8. Samples: 308546. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:52,450][00480] Avg episode reward: [(0, '6.153')] +[2024-08-25 05:56:52,459][03602] Saving new best policy, reward=6.153! +[2024-08-25 05:56:57,446][00480] Fps is (10 sec: 4096.1, 60 sec: 3686.5, 300 sec: 3596.2). Total num frames: 1253376. Throughput: 0: 909.7. Samples: 311632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:56:57,449][00480] Avg episode reward: [(0, '6.296')] +[2024-08-25 05:56:57,454][03602] Saving new best policy, reward=6.296! +[2024-08-25 05:57:01,995][03615] Updated weights for policy 0, policy_version 310 (0.0026) +[2024-08-25 05:57:02,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1269760. Throughput: 0: 929.7. Samples: 316830. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:57:02,457][00480] Avg episode reward: [(0, '5.952')] +[2024-08-25 05:57:07,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1286144. Throughput: 0: 900.0. Samples: 321914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:57:07,453][00480] Avg episode reward: [(0, '5.917')] +[2024-08-25 05:57:12,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1306624. Throughput: 0: 900.9. Samples: 325096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:57:12,453][00480] Avg episode reward: [(0, '5.698')] +[2024-08-25 05:57:12,465][03615] Updated weights for policy 0, policy_version 320 (0.0013) +[2024-08-25 05:57:17,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1327104. Throughput: 0: 930.8. Samples: 330936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:57:17,449][00480] Avg episode reward: [(0, '6.340')] +[2024-08-25 05:57:17,461][03602] Saving new best policy, reward=6.340! +[2024-08-25 05:57:22,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1343488. Throughput: 0: 907.4. Samples: 335716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:57:22,449][00480] Avg episode reward: [(0, '6.131')] +[2024-08-25 05:57:24,133][03615] Updated weights for policy 0, policy_version 330 (0.0023) +[2024-08-25 05:57:27,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1363968. Throughput: 0: 908.5. Samples: 338884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:57:27,449][00480] Avg episode reward: [(0, '6.742')] +[2024-08-25 05:57:27,451][03602] Saving new best policy, reward=6.742! +[2024-08-25 05:57:32,450][00480] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3596.1). Total num frames: 1380352. Throughput: 0: 931.4. Samples: 344890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:57:32,453][00480] Avg episode reward: [(0, '7.073')] +[2024-08-25 05:57:32,467][03602] Saving new best policy, reward=7.073! +[2024-08-25 05:57:35,949][03615] Updated weights for policy 0, policy_version 340 (0.0013) +[2024-08-25 05:57:37,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1396736. Throughput: 0: 905.0. Samples: 349272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:57:37,448][00480] Avg episode reward: [(0, '7.846')] +[2024-08-25 05:57:37,458][03602] Saving new best policy, reward=7.846! +[2024-08-25 05:57:42,447][00480] Fps is (10 sec: 3687.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1417216. Throughput: 0: 905.6. Samples: 352384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:57:42,448][00480] Avg episode reward: [(0, '8.207')] +[2024-08-25 05:57:42,467][03602] Saving new best policy, reward=8.207! +[2024-08-25 05:57:45,918][03615] Updated weights for policy 0, policy_version 350 (0.0012) +[2024-08-25 05:57:47,449][00480] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3610.0). Total num frames: 1437696. Throughput: 0: 929.8. Samples: 358674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:57:47,455][00480] Avg episode reward: [(0, '8.300')] +[2024-08-25 05:57:47,457][03602] Saving new best policy, reward=8.300! +[2024-08-25 05:57:52,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1449984. Throughput: 0: 907.5. Samples: 362750. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:57:52,453][00480] Avg episode reward: [(0, '7.970')] +[2024-08-25 05:57:57,447][00480] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1470464. Throughput: 0: 907.6. Samples: 365940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:57:57,453][00480] Avg episode reward: [(0, '7.920')] +[2024-08-25 05:57:57,656][03615] Updated weights for policy 0, policy_version 360 (0.0024) +[2024-08-25 05:58:02,450][00480] Fps is (10 sec: 4094.5, 60 sec: 3686.2, 300 sec: 3610.0). Total num frames: 1490944. Throughput: 0: 915.2. Samples: 372124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:02,452][00480] Avg episode reward: [(0, '8.029')] +[2024-08-25 05:58:07,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1503232. Throughput: 0: 908.0. Samples: 376574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:07,452][00480] Avg episode reward: [(0, '7.760')] +[2024-08-25 05:58:09,600][03615] Updated weights for policy 0, policy_version 370 (0.0018) +[2024-08-25 05:58:12,446][00480] Fps is (10 sec: 3687.7, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 1527808. Throughput: 0: 898.7. Samples: 379326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:12,454][00480] Avg episode reward: [(0, '7.511')] +[2024-08-25 05:58:17,446][00480] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3624.0). Total num frames: 1548288. Throughput: 0: 907.4. Samples: 385722. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:17,455][00480] Avg episode reward: [(0, '7.336')] +[2024-08-25 05:58:20,113][03615] Updated weights for policy 0, policy_version 380 (0.0021) +[2024-08-25 05:58:22,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 1560576. Throughput: 0: 917.7. Samples: 390570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:58:22,449][00480] Avg episode reward: [(0, '7.697')] +[2024-08-25 05:58:27,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1581056. Throughput: 0: 903.4. Samples: 393036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:27,449][00480] Avg episode reward: [(0, '8.244')] +[2024-08-25 05:58:31,097][03615] Updated weights for policy 0, policy_version 390 (0.0012) +[2024-08-25 05:58:32,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 1601536. Throughput: 0: 903.5. Samples: 399330. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:32,449][00480] Avg episode reward: [(0, '8.843')] +[2024-08-25 05:58:32,461][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth... +[2024-08-25 05:58:32,580][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth +[2024-08-25 05:58:32,592][03602] Saving new best policy, reward=8.843! +[2024-08-25 05:58:37,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1613824. Throughput: 0: 925.7. Samples: 404408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:37,452][00480] Avg episode reward: [(0, '8.907')] +[2024-08-25 05:58:37,487][03602] Saving new best policy, reward=8.907! +[2024-08-25 05:58:42,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1634304. Throughput: 0: 898.4. Samples: 406366. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:58:42,455][00480] Avg episode reward: [(0, '8.824')] +[2024-08-25 05:58:42,953][03615] Updated weights for policy 0, policy_version 400 (0.0013) +[2024-08-25 05:58:47,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3651.7). Total num frames: 1654784. Throughput: 0: 904.2. Samples: 412808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:58:47,449][00480] Avg episode reward: [(0, '9.529')] +[2024-08-25 05:58:47,452][03602] Saving new best policy, reward=9.529! +[2024-08-25 05:58:52,450][00480] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 1671168. Throughput: 0: 927.3. Samples: 418308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:58:52,453][00480] Avg episode reward: [(0, '9.471')] +[2024-08-25 05:58:54,354][03615] Updated weights for policy 0, policy_version 410 (0.0017) +[2024-08-25 05:58:57,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1687552. Throughput: 0: 910.1. Samples: 420280. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:58:57,449][00480] Avg episode reward: [(0, '9.609')] +[2024-08-25 05:58:57,451][03602] Saving new best policy, reward=9.609! +[2024-08-25 05:59:02,446][00480] Fps is (10 sec: 3687.7, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 1708032. Throughput: 0: 900.0. Samples: 426224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:59:02,452][00480] Avg episode reward: [(0, '10.149')] +[2024-08-25 05:59:02,460][03602] Saving new best policy, reward=10.149! +[2024-08-25 05:59:04,830][03615] Updated weights for policy 0, policy_version 420 (0.0012) +[2024-08-25 05:59:07,450][00480] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 1724416. Throughput: 0: 918.9. Samples: 431924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:59:07,453][00480] Avg episode reward: [(0, '10.921')] +[2024-08-25 05:59:07,456][03602] Saving new best policy, reward=10.921! +[2024-08-25 05:59:12,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1740800. Throughput: 0: 906.9. Samples: 433848. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:59:12,449][00480] Avg episode reward: [(0, '11.253')] +[2024-08-25 05:59:12,457][03602] Saving new best policy, reward=11.253! +[2024-08-25 05:59:16,737][03615] Updated weights for policy 0, policy_version 430 (0.0012) +[2024-08-25 05:59:17,447][00480] Fps is (10 sec: 3687.7, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1761280. Throughput: 0: 895.5. Samples: 439626. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:59:17,449][00480] Avg episode reward: [(0, '10.513')] +[2024-08-25 05:59:22,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1781760. Throughput: 0: 919.9. Samples: 445802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:59:22,449][00480] Avg episode reward: [(0, '11.748')] +[2024-08-25 05:59:22,460][03602] Saving new best policy, reward=11.748! +[2024-08-25 05:59:27,446][00480] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1794048. Throughput: 0: 919.3. Samples: 447734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:59:27,453][00480] Avg episode reward: [(0, '11.013')] +[2024-08-25 05:59:28,643][03615] Updated weights for policy 0, policy_version 440 (0.0024) +[2024-08-25 05:59:32,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1814528. Throughput: 0: 897.5. Samples: 453194. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:59:32,449][00480] Avg episode reward: [(0, '11.937')] +[2024-08-25 05:59:32,456][03602] Saving new best policy, reward=11.937! +[2024-08-25 05:59:37,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1835008. Throughput: 0: 914.5. Samples: 459458. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-08-25 05:59:37,449][00480] Avg episode reward: [(0, '11.308')] +[2024-08-25 05:59:38,892][03615] Updated weights for policy 0, policy_version 450 (0.0021) +[2024-08-25 05:59:42,447][00480] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1851392. Throughput: 0: 918.1. Samples: 461594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 05:59:42,450][00480] Avg episode reward: [(0, '11.311')] +[2024-08-25 05:59:47,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.8). Total num frames: 1871872. Throughput: 0: 895.6. Samples: 466524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 05:59:47,454][00480] Avg episode reward: [(0, '10.943')] +[2024-08-25 05:59:50,350][03615] Updated weights for policy 0, policy_version 460 (0.0015) +[2024-08-25 05:59:52,446][00480] Fps is (10 sec: 4096.1, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 1892352. Throughput: 0: 910.7. Samples: 472902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-25 05:59:52,449][00480] Avg episode reward: [(0, '11.379')] +[2024-08-25 05:59:57,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1904640. Throughput: 0: 925.2. Samples: 475482. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 05:59:57,455][00480] Avg episode reward: [(0, '11.407')] +[2024-08-25 06:00:01,927][03615] Updated weights for policy 0, policy_version 470 (0.0022) +[2024-08-25 06:00:02,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1925120. Throughput: 0: 901.9. Samples: 480210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:02,451][00480] Avg episode reward: [(0, '12.072')] +[2024-08-25 06:00:02,465][03602] Saving new best policy, reward=12.072! +[2024-08-25 06:00:07,451][00480] Fps is (10 sec: 4094.1, 60 sec: 3686.3, 300 sec: 3665.5). Total num frames: 1945600. Throughput: 0: 903.8. Samples: 486476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:00:07,455][00480] Avg episode reward: [(0, '11.538')] +[2024-08-25 06:00:12,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1961984. Throughput: 0: 924.5. Samples: 489336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:12,454][00480] Avg episode reward: [(0, '12.173')] +[2024-08-25 06:00:12,469][03602] Saving new best policy, reward=12.173! +[2024-08-25 06:00:13,398][03615] Updated weights for policy 0, policy_version 480 (0.0017) +[2024-08-25 06:00:17,446][00480] Fps is (10 sec: 3278.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1978368. Throughput: 0: 902.0. Samples: 493782. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:17,449][00480] Avg episode reward: [(0, '11.923')] +[2024-08-25 06:00:22,447][00480] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1998848. Throughput: 0: 906.0. Samples: 500226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:22,448][00480] Avg episode reward: [(0, '12.402')] +[2024-08-25 06:00:22,458][03602] Saving new best policy, reward=12.402! +[2024-08-25 06:00:23,463][03615] Updated weights for policy 0, policy_version 490 (0.0020) +[2024-08-25 06:00:27,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2019328. Throughput: 0: 928.6. Samples: 503382. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:27,456][00480] Avg episode reward: [(0, '13.813')] +[2024-08-25 06:00:27,461][03602] Saving new best policy, reward=13.813! +[2024-08-25 06:00:32,447][00480] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2035712. Throughput: 0: 911.9. Samples: 507560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:00:32,454][00480] Avg episode reward: [(0, '14.506')] +[2024-08-25 06:00:32,468][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth... +[2024-08-25 06:00:32,468][00480] Components not started: RolloutWorker_w5, RolloutWorker_w6, RolloutWorker_w7, wait_time=600.0 seconds +[2024-08-25 06:00:32,585][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth +[2024-08-25 06:00:32,601][03602] Saving new best policy, reward=14.506! +[2024-08-25 06:00:35,415][03615] Updated weights for policy 0, policy_version 500 (0.0016) +[2024-08-25 06:00:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2056192. Throughput: 0: 906.8. Samples: 513708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:37,449][00480] Avg episode reward: [(0, '15.643')] +[2024-08-25 06:00:37,451][03602] Saving new best policy, reward=15.643! +[2024-08-25 06:00:42,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2072576. Throughput: 0: 918.0. Samples: 516792. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:42,449][00480] Avg episode reward: [(0, '16.508')] +[2024-08-25 06:00:42,468][03602] Saving new best policy, reward=16.508! +[2024-08-25 06:00:47,260][03615] Updated weights for policy 0, policy_version 510 (0.0014) +[2024-08-25 06:00:47,447][00480] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2088960. Throughput: 0: 909.8. Samples: 521150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:00:47,453][00480] Avg episode reward: [(0, '15.648')] +[2024-08-25 06:00:52,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2109440. Throughput: 0: 908.4. Samples: 527348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:52,448][00480] Avg episode reward: [(0, '16.676')] +[2024-08-25 06:00:52,459][03602] Saving new best policy, reward=16.676! +[2024-08-25 06:00:57,321][03615] Updated weights for policy 0, policy_version 520 (0.0012) +[2024-08-25 06:00:57,446][00480] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2129920. Throughput: 0: 914.4. Samples: 530482. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:00:57,449][00480] Avg episode reward: [(0, '16.454')] +[2024-08-25 06:01:02,447][00480] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2142208. Throughput: 0: 918.1. Samples: 535098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:02,449][00480] Avg episode reward: [(0, '16.847')] +[2024-08-25 06:01:02,457][03602] Saving new best policy, reward=16.847! +[2024-08-25 06:01:07,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2162688. Throughput: 0: 898.5. Samples: 540660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:01:07,452][00480] Avg episode reward: [(0, '16.685')] +[2024-08-25 06:01:09,071][03615] Updated weights for policy 0, policy_version 530 (0.0015) +[2024-08-25 06:01:12,447][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2183168. Throughput: 0: 898.7. Samples: 543824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:12,452][00480] Avg episode reward: [(0, '16.363')] +[2024-08-25 06:01:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2195456. Throughput: 0: 922.8. Samples: 549084. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:01:17,451][00480] Avg episode reward: [(0, '16.161')] +[2024-08-25 06:01:20,592][03615] Updated weights for policy 0, policy_version 540 (0.0014) +[2024-08-25 06:01:22,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2215936. Throughput: 0: 905.3. Samples: 554446. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:22,449][00480] Avg episode reward: [(0, '15.664')] +[2024-08-25 06:01:27,446][00480] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2240512. Throughput: 0: 908.8. Samples: 557686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:01:27,449][00480] Avg episode reward: [(0, '15.687')] +[2024-08-25 06:01:31,032][03615] Updated weights for policy 0, policy_version 550 (0.0017) +[2024-08-25 06:01:32,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2252800. Throughput: 0: 937.1. Samples: 563320. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:01:32,448][00480] Avg episode reward: [(0, '14.680')] +[2024-08-25 06:01:37,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2273280. Throughput: 0: 908.8. Samples: 568246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:37,454][00480] Avg episode reward: [(0, '14.742')] +[2024-08-25 06:01:41,868][03615] Updated weights for policy 0, policy_version 560 (0.0012) +[2024-08-25 06:01:42,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2293760. Throughput: 0: 910.3. Samples: 571446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:01:42,454][00480] Avg episode reward: [(0, '14.283')] +[2024-08-25 06:01:47,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2310144. Throughput: 0: 942.2. Samples: 577496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:47,449][00480] Avg episode reward: [(0, '14.161')] +[2024-08-25 06:01:52,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2326528. Throughput: 0: 920.7. Samples: 582090. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:01:52,454][00480] Avg episode reward: [(0, '14.466')] +[2024-08-25 06:01:53,466][03615] Updated weights for policy 0, policy_version 570 (0.0023) +[2024-08-25 06:01:57,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2351104. Throughput: 0: 921.8. Samples: 585304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:01:57,453][00480] Avg episode reward: [(0, '14.510')] +[2024-08-25 06:02:02,449][00480] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3665.5). Total num frames: 2367488. Throughput: 0: 948.6. Samples: 591772. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:02,452][00480] Avg episode reward: [(0, '14.813')] +[2024-08-25 06:02:04,133][03615] Updated weights for policy 0, policy_version 580 (0.0013) +[2024-08-25 06:02:07,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2383872. Throughput: 0: 922.9. Samples: 595976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:02:07,449][00480] Avg episode reward: [(0, '14.748')] +[2024-08-25 06:02:12,446][00480] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2404352. Throughput: 0: 920.9. Samples: 599126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:12,448][00480] Avg episode reward: [(0, '16.496')] +[2024-08-25 06:02:14,705][03615] Updated weights for policy 0, policy_version 590 (0.0012) +[2024-08-25 06:02:17,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2424832. Throughput: 0: 938.4. Samples: 605546. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:02:17,452][00480] Avg episode reward: [(0, '17.592')] +[2024-08-25 06:02:17,469][03602] Saving new best policy, reward=17.592! +[2024-08-25 06:02:22,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2437120. Throughput: 0: 925.9. Samples: 609912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:22,449][00480] Avg episode reward: [(0, '17.952')] +[2024-08-25 06:02:22,458][03602] Saving new best policy, reward=17.952! +[2024-08-25 06:02:26,613][03615] Updated weights for policy 0, policy_version 600 (0.0018) +[2024-08-25 06:02:27,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2457600. Throughput: 0: 917.8. Samples: 612746. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0) +[2024-08-25 06:02:27,449][00480] Avg episode reward: [(0, '18.678')] +[2024-08-25 06:02:27,452][03602] Saving new best policy, reward=18.678! +[2024-08-25 06:02:32,447][00480] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2482176. Throughput: 0: 923.8. Samples: 619068. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:32,455][00480] Avg episode reward: [(0, '17.784')] +[2024-08-25 06:02:32,468][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000606_2482176.pth... +[2024-08-25 06:02:32,614][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000391_1601536.pth +[2024-08-25 06:02:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2494464. Throughput: 0: 926.1. Samples: 623766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-25 06:02:37,449][00480] Avg episode reward: [(0, '18.023')] +[2024-08-25 06:02:38,478][03615] Updated weights for policy 0, policy_version 610 (0.0013) +[2024-08-25 06:02:42,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2514944. Throughput: 0: 910.4. Samples: 626274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:42,454][00480] Avg episode reward: [(0, '17.891')] +[2024-08-25 06:02:47,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2535424. Throughput: 0: 910.8. Samples: 632754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:02:47,449][00480] Avg episode reward: [(0, '18.803')] +[2024-08-25 06:02:47,457][03602] Saving new best policy, reward=18.803! +[2024-08-25 06:02:48,121][03615] Updated weights for policy 0, policy_version 620 (0.0014) +[2024-08-25 06:02:52,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2551808. Throughput: 0: 928.6. Samples: 637762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:02:52,456][00480] Avg episode reward: [(0, '18.727')] +[2024-08-25 06:02:57,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2568192. Throughput: 0: 907.5. Samples: 639964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:02:57,455][00480] Avg episode reward: [(0, '19.218')] +[2024-08-25 06:02:57,458][03602] Saving new best policy, reward=19.218! +[2024-08-25 06:02:59,791][03615] Updated weights for policy 0, policy_version 630 (0.0021) +[2024-08-25 06:03:02,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 2588672. Throughput: 0: 907.6. Samples: 646388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:03:02,452][00480] Avg episode reward: [(0, '18.159')] +[2024-08-25 06:03:07,452][00480] Fps is (10 sec: 3684.3, 60 sec: 3686.1, 300 sec: 3651.6). Total num frames: 2605056. Throughput: 0: 928.7. Samples: 651710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:03:07,461][00480] Avg episode reward: [(0, '18.746')] +[2024-08-25 06:03:11,710][03615] Updated weights for policy 0, policy_version 640 (0.0015) +[2024-08-25 06:03:12,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2621440. Throughput: 0: 911.2. Samples: 653752. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:03:12,449][00480] Avg episode reward: [(0, '19.480')] +[2024-08-25 06:03:12,460][03602] Saving new best policy, reward=19.480! +[2024-08-25 06:03:17,446][00480] Fps is (10 sec: 3688.5, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2641920. Throughput: 0: 908.0. Samples: 659928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:03:17,453][00480] Avg episode reward: [(0, '20.983')] +[2024-08-25 06:03:17,476][03602] Saving new best policy, reward=20.983! +[2024-08-25 06:03:22,046][03615] Updated weights for policy 0, policy_version 650 (0.0027) +[2024-08-25 06:03:22,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2662400. Throughput: 0: 933.2. Samples: 665762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:03:22,451][00480] Avg episode reward: [(0, '21.454')] +[2024-08-25 06:03:22,461][03602] Saving new best policy, reward=21.454! +[2024-08-25 06:03:27,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2678784. Throughput: 0: 921.4. Samples: 667738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:03:27,449][00480] Avg episode reward: [(0, '21.886')] +[2024-08-25 06:03:27,452][03602] Saving new best policy, reward=21.886! +[2024-08-25 06:03:32,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2699264. Throughput: 0: 905.2. Samples: 673488. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:03:32,451][00480] Avg episode reward: [(0, '21.374')] +[2024-08-25 06:03:33,168][03615] Updated weights for policy 0, policy_version 660 (0.0015) +[2024-08-25 06:03:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2715648. Throughput: 0: 927.9. Samples: 679518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:03:37,449][00480] Avg episode reward: [(0, '20.750')] +[2024-08-25 06:03:42,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2732032. Throughput: 0: 923.2. Samples: 681508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:03:42,449][00480] Avg episode reward: [(0, '19.632')] +[2024-08-25 06:03:45,114][03615] Updated weights for policy 0, policy_version 670 (0.0013) +[2024-08-25 06:03:47,447][00480] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2752512. Throughput: 0: 904.6. Samples: 687096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:03:47,449][00480] Avg episode reward: [(0, '18.347')] +[2024-08-25 06:03:52,448][00480] Fps is (10 sec: 4095.5, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 2772992. Throughput: 0: 929.0. Samples: 693512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:03:52,451][00480] Avg episode reward: [(0, '18.751')] +[2024-08-25 06:03:56,169][03615] Updated weights for policy 0, policy_version 680 (0.0012) +[2024-08-25 06:03:57,449][00480] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3651.7). Total num frames: 2785280. Throughput: 0: 929.8. Samples: 695594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:03:57,451][00480] Avg episode reward: [(0, '18.917')] +[2024-08-25 06:04:02,446][00480] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2805760. Throughput: 0: 908.4. Samples: 700804. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:02,449][00480] Avg episode reward: [(0, '18.106')] +[2024-08-25 06:04:06,395][03615] Updated weights for policy 0, policy_version 690 (0.0013) +[2024-08-25 06:04:07,446][00480] Fps is (10 sec: 4506.7, 60 sec: 3755.0, 300 sec: 3693.3). Total num frames: 2830336. Throughput: 0: 920.7. Samples: 707192. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:07,454][00480] Avg episode reward: [(0, '18.207')] +[2024-08-25 06:04:12,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2842624. Throughput: 0: 933.6. Samples: 709748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:04:12,456][00480] Avg episode reward: [(0, '18.302')] +[2024-08-25 06:04:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2863104. Throughput: 0: 915.1. Samples: 714668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:17,449][00480] Avg episode reward: [(0, '18.936')] +[2024-08-25 06:04:18,109][03615] Updated weights for policy 0, policy_version 700 (0.0014) +[2024-08-25 06:04:22,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2883584. Throughput: 0: 925.6. Samples: 721170. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:04:22,453][00480] Avg episode reward: [(0, '19.119')] +[2024-08-25 06:04:27,447][00480] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2899968. Throughput: 0: 946.1. Samples: 724084. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:27,455][00480] Avg episode reward: [(0, '19.149')] +[2024-08-25 06:04:29,387][03615] Updated weights for policy 0, policy_version 710 (0.0015) +[2024-08-25 06:04:32,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2920448. Throughput: 0: 923.1. Samples: 728634. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:04:32,453][00480] Avg episode reward: [(0, '19.275')] +[2024-08-25 06:04:32,469][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth... +[2024-08-25 06:04:32,571][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth +[2024-08-25 06:04:37,446][00480] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2936832. Throughput: 0: 915.9. Samples: 734728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:37,451][00480] Avg episode reward: [(0, '19.119')] +[2024-08-25 06:04:39,585][03615] Updated weights for policy 0, policy_version 720 (0.0013) +[2024-08-25 06:04:42,450][00480] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3665.5). Total num frames: 2953216. Throughput: 0: 937.8. Samples: 737796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:42,454][00480] Avg episode reward: [(0, '18.364')] +[2024-08-25 06:04:47,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2969600. Throughput: 0: 911.8. Samples: 741836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:47,449][00480] Avg episode reward: [(0, '17.643')] +[2024-08-25 06:04:51,539][03615] Updated weights for policy 0, policy_version 730 (0.0014) +[2024-08-25 06:04:52,446][00480] Fps is (10 sec: 3687.7, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 2990080. Throughput: 0: 908.6. Samples: 748080. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:04:52,450][00480] Avg episode reward: [(0, '16.909')] +[2024-08-25 06:04:57,448][00480] Fps is (10 sec: 4095.4, 60 sec: 3754.7, 300 sec: 3679.4). Total num frames: 3010560. Throughput: 0: 921.1. Samples: 751200. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:04:57,455][00480] Avg episode reward: [(0, '17.785')] +[2024-08-25 06:05:02,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3026944. Throughput: 0: 908.0. Samples: 755526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:05:02,453][00480] Avg episode reward: [(0, '18.018')] +[2024-08-25 06:05:03,356][03615] Updated weights for policy 0, policy_version 740 (0.0018) +[2024-08-25 06:05:07,446][00480] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3047424. Throughput: 0: 896.0. Samples: 761488. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:05:07,451][00480] Avg episode reward: [(0, '20.393')] +[2024-08-25 06:05:12,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3063808. Throughput: 0: 901.3. Samples: 764640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:05:12,451][00480] Avg episode reward: [(0, '21.753')] +[2024-08-25 06:05:14,197][03615] Updated weights for policy 0, policy_version 750 (0.0012) +[2024-08-25 06:05:17,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3080192. Throughput: 0: 906.4. Samples: 769422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:05:17,456][00480] Avg episode reward: [(0, '22.673')] +[2024-08-25 06:05:17,460][03602] Saving new best policy, reward=22.673! +[2024-08-25 06:05:22,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3100672. Throughput: 0: 896.9. Samples: 775090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:05:22,450][00480] Avg episode reward: [(0, '23.062')] +[2024-08-25 06:05:22,459][03602] Saving new best policy, reward=23.062! +[2024-08-25 06:05:25,182][03615] Updated weights for policy 0, policy_version 760 (0.0012) +[2024-08-25 06:05:27,447][00480] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3121152. Throughput: 0: 895.5. Samples: 778090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:05:27,449][00480] Avg episode reward: [(0, '21.614')] +[2024-08-25 06:05:32,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3133440. Throughput: 0: 914.8. Samples: 783002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:05:32,451][00480] Avg episode reward: [(0, '21.938')] +[2024-08-25 06:05:37,148][03615] Updated weights for policy 0, policy_version 770 (0.0015) +[2024-08-25 06:05:37,447][00480] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3153920. Throughput: 0: 894.1. Samples: 788316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:05:37,449][00480] Avg episode reward: [(0, '19.841')] +[2024-08-25 06:05:42,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 3174400. Throughput: 0: 895.5. Samples: 791496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:05:42,449][00480] Avg episode reward: [(0, '19.912')] +[2024-08-25 06:05:47,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3186688. Throughput: 0: 917.1. Samples: 796796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:05:47,452][00480] Avg episode reward: [(0, '19.923')] +[2024-08-25 06:05:48,903][03615] Updated weights for policy 0, policy_version 780 (0.0015) +[2024-08-25 06:05:52,447][00480] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3207168. Throughput: 0: 902.5. Samples: 802102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:05:52,448][00480] Avg episode reward: [(0, '19.104')] +[2024-08-25 06:05:57,446][00480] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 3231744. Throughput: 0: 901.9. Samples: 805226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:05:57,449][00480] Avg episode reward: [(0, '18.008')] +[2024-08-25 06:05:58,472][03615] Updated weights for policy 0, policy_version 790 (0.0013) +[2024-08-25 06:06:02,448][00480] Fps is (10 sec: 3685.9, 60 sec: 3618.0, 300 sec: 3665.6). Total num frames: 3244032. Throughput: 0: 924.1. Samples: 811006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:02,455][00480] Avg episode reward: [(0, '18.920')] +[2024-08-25 06:06:07,448][00480] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3665.6). Total num frames: 3264512. Throughput: 0: 902.6. Samples: 815708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:06:07,450][00480] Avg episode reward: [(0, '19.297')] +[2024-08-25 06:06:10,357][03615] Updated weights for policy 0, policy_version 800 (0.0019) +[2024-08-25 06:06:12,447][00480] Fps is (10 sec: 4096.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3284992. Throughput: 0: 906.8. Samples: 818894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:06:12,450][00480] Avg episode reward: [(0, '18.989')] +[2024-08-25 06:06:17,446][00480] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3301376. Throughput: 0: 936.0. Samples: 825124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:17,449][00480] Avg episode reward: [(0, '21.119')] +[2024-08-25 06:06:21,909][03615] Updated weights for policy 0, policy_version 810 (0.0024) +[2024-08-25 06:06:22,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3317760. Throughput: 0: 917.5. Samples: 829604. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:22,452][00480] Avg episode reward: [(0, '21.175')] +[2024-08-25 06:06:27,449][00480] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 3338240. Throughput: 0: 917.6. Samples: 832790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:06:27,451][00480] Avg episode reward: [(0, '21.815')] +[2024-08-25 06:06:31,634][03615] Updated weights for policy 0, policy_version 820 (0.0017) +[2024-08-25 06:06:32,446][00480] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3358720. Throughput: 0: 941.8. Samples: 839178. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:32,455][00480] Avg episode reward: [(0, '20.788')] +[2024-08-25 06:06:32,474][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... +[2024-08-25 06:06:32,595][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000606_2482176.pth +[2024-08-25 06:06:37,450][00480] Fps is (10 sec: 3276.5, 60 sec: 3617.9, 300 sec: 3651.6). Total num frames: 3371008. Throughput: 0: 917.1. Samples: 843374. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:37,459][00480] Avg episode reward: [(0, '21.850')] +[2024-08-25 06:06:42,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3395584. Throughput: 0: 915.3. Samples: 846414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:06:42,449][00480] Avg episode reward: [(0, '21.540')] +[2024-08-25 06:06:43,387][03615] Updated weights for policy 0, policy_version 830 (0.0026) +[2024-08-25 06:06:47,449][00480] Fps is (10 sec: 4506.3, 60 sec: 3822.8, 300 sec: 3693.3). Total num frames: 3416064. Throughput: 0: 929.5. Samples: 852832. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:47,451][00480] Avg episode reward: [(0, '22.087')] +[2024-08-25 06:06:52,447][00480] Fps is (10 sec: 3276.5, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3428352. Throughput: 0: 928.1. Samples: 857470. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:06:52,450][00480] Avg episode reward: [(0, '21.354')] +[2024-08-25 06:06:54,955][03615] Updated weights for policy 0, policy_version 840 (0.0017) +[2024-08-25 06:06:57,447][00480] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3448832. Throughput: 0: 918.8. Samples: 860240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-08-25 06:06:57,452][00480] Avg episode reward: [(0, '21.976')] +[2024-08-25 06:07:02,446][00480] Fps is (10 sec: 4096.3, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 3469312. Throughput: 0: 921.6. Samples: 866596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:07:02,453][00480] Avg episode reward: [(0, '21.892')] +[2024-08-25 06:07:05,526][03615] Updated weights for policy 0, policy_version 850 (0.0012) +[2024-08-25 06:07:07,446][00480] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 3485696. Throughput: 0: 928.2. Samples: 871372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:07:07,452][00480] Avg episode reward: [(0, '21.878')] +[2024-08-25 06:07:12,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3502080. Throughput: 0: 909.4. Samples: 873710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:07:12,456][00480] Avg episode reward: [(0, '20.616')] +[2024-08-25 06:07:16,749][03615] Updated weights for policy 0, policy_version 860 (0.0012) +[2024-08-25 06:07:17,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3522560. Throughput: 0: 906.8. Samples: 879986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:07:17,455][00480] Avg episode reward: [(0, '20.713')] +[2024-08-25 06:07:22,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3538944. Throughput: 0: 926.7. Samples: 885072. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:22,450][00480] Avg episode reward: [(0, '20.813')] +[2024-08-25 06:07:27,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 3555328. Throughput: 0: 901.8. Samples: 886996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:27,452][00480] Avg episode reward: [(0, '20.510')] +[2024-08-25 06:07:28,950][03615] Updated weights for policy 0, policy_version 870 (0.0011) +[2024-08-25 06:07:32,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3575808. Throughput: 0: 894.8. Samples: 893098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:32,449][00480] Avg episode reward: [(0, '20.538')] +[2024-08-25 06:07:37,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 3592192. Throughput: 0: 910.6. Samples: 898448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:07:37,451][00480] Avg episode reward: [(0, '21.309')] +[2024-08-25 06:07:41,109][03615] Updated weights for policy 0, policy_version 880 (0.0012) +[2024-08-25 06:07:42,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 3608576. Throughput: 0: 893.2. Samples: 900434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:42,448][00480] Avg episode reward: [(0, '20.922')] +[2024-08-25 06:07:47,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3651.7). Total num frames: 3629056. Throughput: 0: 887.0. Samples: 906510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:07:47,449][00480] Avg episode reward: [(0, '19.915')] +[2024-08-25 06:07:50,614][03615] Updated weights for policy 0, policy_version 890 (0.0014) +[2024-08-25 06:07:52,447][00480] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3649536. Throughput: 0: 912.7. Samples: 912446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:52,453][00480] Avg episode reward: [(0, '19.539')] +[2024-08-25 06:07:57,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 3661824. Throughput: 0: 904.1. Samples: 914396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:07:57,452][00480] Avg episode reward: [(0, '20.936')] +[2024-08-25 06:08:02,446][00480] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3651.8). Total num frames: 3682304. Throughput: 0: 889.9. Samples: 920030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:08:02,452][00480] Avg episode reward: [(0, '20.399')] +[2024-08-25 06:08:02,467][03615] Updated weights for policy 0, policy_version 900 (0.0013) +[2024-08-25 06:08:07,451][00480] Fps is (10 sec: 4094.1, 60 sec: 3617.9, 300 sec: 3665.5). Total num frames: 3702784. Throughput: 0: 914.6. Samples: 926234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:08:07,453][00480] Avg episode reward: [(0, '20.627')] +[2024-08-25 06:08:12,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3719168. Throughput: 0: 916.4. Samples: 928232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:12,452][00480] Avg episode reward: [(0, '21.845')] +[2024-08-25 06:08:14,285][03615] Updated weights for policy 0, policy_version 910 (0.0012) +[2024-08-25 06:08:17,446][00480] Fps is (10 sec: 3688.1, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3739648. Throughput: 0: 902.0. Samples: 933690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:08:17,450][00480] Avg episode reward: [(0, '22.722')] +[2024-08-25 06:08:22,447][00480] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3760128. Throughput: 0: 924.1. Samples: 940032. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:22,451][00480] Avg episode reward: [(0, '21.768')] +[2024-08-25 06:08:24,889][03615] Updated weights for policy 0, policy_version 920 (0.0012) +[2024-08-25 06:08:27,447][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3772416. Throughput: 0: 930.1. Samples: 942290. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:27,449][00480] Avg episode reward: [(0, '22.452')] +[2024-08-25 06:08:32,447][00480] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3792896. Throughput: 0: 901.8. Samples: 947090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:32,449][00480] Avg episode reward: [(0, '23.764')] +[2024-08-25 06:08:32,462][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000926_3792896.pth... +[2024-08-25 06:08:32,583][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth +[2024-08-25 06:08:32,597][03602] Saving new best policy, reward=23.764! +[2024-08-25 06:08:36,162][03615] Updated weights for policy 0, policy_version 930 (0.0013) +[2024-08-25 06:08:37,446][00480] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3813376. Throughput: 0: 905.1. Samples: 953176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:08:37,449][00480] Avg episode reward: [(0, '23.282')] +[2024-08-25 06:08:42,451][00480] Fps is (10 sec: 3275.2, 60 sec: 3617.8, 300 sec: 3637.7). Total num frames: 3825664. Throughput: 0: 919.6. Samples: 955782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:08:42,454][00480] Avg episode reward: [(0, '22.241')] +[2024-08-25 06:08:47,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3846144. Throughput: 0: 896.6. Samples: 960376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:47,449][00480] Avg episode reward: [(0, '21.647')] +[2024-08-25 06:08:48,129][03615] Updated weights for policy 0, policy_version 940 (0.0013) +[2024-08-25 06:08:52,447][00480] Fps is (10 sec: 4098.0, 60 sec: 3618.2, 300 sec: 3665.6). Total num frames: 3866624. Throughput: 0: 898.4. Samples: 966656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:08:52,453][00480] Avg episode reward: [(0, '21.906')] +[2024-08-25 06:08:57,446][00480] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3883008. Throughput: 0: 919.2. Samples: 969598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:08:57,453][00480] Avg episode reward: [(0, '20.080')] +[2024-08-25 06:09:00,087][03615] Updated weights for policy 0, policy_version 950 (0.0018) +[2024-08-25 06:09:02,446][00480] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3899392. Throughput: 0: 891.6. Samples: 973810. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:09:02,449][00480] Avg episode reward: [(0, '20.054')] +[2024-08-25 06:09:07,448][00480] Fps is (10 sec: 3685.8, 60 sec: 3618.3, 300 sec: 3651.7). Total num frames: 3919872. Throughput: 0: 885.8. Samples: 979896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:09:07,452][00480] Avg episode reward: [(0, '19.286')] +[2024-08-25 06:09:10,515][03615] Updated weights for policy 0, policy_version 960 (0.0016) +[2024-08-25 06:09:12,450][00480] Fps is (10 sec: 3685.1, 60 sec: 3617.9, 300 sec: 3637.8). Total num frames: 3936256. Throughput: 0: 903.0. Samples: 982930. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-08-25 06:09:12,457][00480] Avg episode reward: [(0, '19.020')] +[2024-08-25 06:09:17,446][00480] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3952640. Throughput: 0: 886.4. Samples: 986978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:09:17,449][00480] Avg episode reward: [(0, '19.082')] +[2024-08-25 06:09:22,275][03615] Updated weights for policy 0, policy_version 970 (0.0022) +[2024-08-25 06:09:22,447][00480] Fps is (10 sec: 3687.6, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 3973120. Throughput: 0: 888.2. Samples: 993144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-08-25 06:09:22,454][00480] Avg episode reward: [(0, '19.797')] +[2024-08-25 06:09:27,447][00480] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3989504. Throughput: 0: 900.4. Samples: 996294. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-08-25 06:09:27,451][00480] Avg episode reward: [(0, '21.077')] +[2024-08-25 06:09:32,196][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-25 06:09:32,202][00480] Component Batcher_0 stopped! +[2024-08-25 06:09:32,210][00480] Component RolloutWorker_w5 process died already! Don't wait for it. +[2024-08-25 06:09:32,197][03602] Stopping Batcher_0... +[2024-08-25 06:09:32,216][03602] Loop batcher_evt_loop terminating... +[2024-08-25 06:09:32,217][00480] Component RolloutWorker_w6 process died already! Don't wait for it. +[2024-08-25 06:09:32,222][00480] Component RolloutWorker_w7 process died already! Don't wait for it. +[2024-08-25 06:09:32,262][03615] Weights refcount: 2 0 +[2024-08-25 06:09:32,265][00480] Component InferenceWorker_p0-w0 stopped! +[2024-08-25 06:09:32,274][03615] Stopping InferenceWorker_p0-w0... +[2024-08-25 06:09:32,274][03615] Loop inference_proc0-0_evt_loop terminating... +[2024-08-25 06:09:32,296][03602] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth +[2024-08-25 06:09:32,311][03602] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-25 06:09:32,478][03602] Stopping LearnerWorker_p0... +[2024-08-25 06:09:32,479][03602] Loop learner_proc0_evt_loop terminating... +[2024-08-25 06:09:32,478][00480] Component LearnerWorker_p0 stopped! +[2024-08-25 06:09:32,562][00480] Component RolloutWorker_w2 stopped! +[2024-08-25 06:09:32,567][03619] Stopping RolloutWorker_w2... +[2024-08-25 06:09:32,574][00480] Component RolloutWorker_w0 stopped! +[2024-08-25 06:09:32,580][03616] Stopping RolloutWorker_w0... +[2024-08-25 06:09:32,571][03619] Loop rollout_proc2_evt_loop terminating... +[2024-08-25 06:09:32,582][03616] Loop rollout_proc0_evt_loop terminating... +[2024-08-25 06:09:32,587][00480] Component RolloutWorker_w4 stopped! +[2024-08-25 06:09:32,591][03620] Stopping RolloutWorker_w4... +[2024-08-25 06:09:32,593][03620] Loop rollout_proc4_evt_loop terminating... +[2024-08-25 06:09:32,703][03618] Stopping RolloutWorker_w3... +[2024-08-25 06:09:32,705][00480] Component RolloutWorker_w3 stopped! +[2024-08-25 06:09:32,720][03618] Loop rollout_proc3_evt_loop terminating... +[2024-08-25 06:09:32,738][03617] Stopping RolloutWorker_w1... +[2024-08-25 06:09:32,739][00480] Component RolloutWorker_w1 stopped! +[2024-08-25 06:09:32,750][00480] Waiting for process learner_proc0 to stop... +[2024-08-25 06:09:32,740][03617] Loop rollout_proc1_evt_loop terminating... +[2024-08-25 06:09:33,930][00480] Waiting for process inference_proc0-0 to join... +[2024-08-25 06:09:34,098][00480] Waiting for process rollout_proc0 to join... +[2024-08-25 06:09:34,896][00480] Waiting for process rollout_proc1 to join... +[2024-08-25 06:09:34,901][00480] Waiting for process rollout_proc2 to join... +[2024-08-25 06:09:34,905][00480] Waiting for process rollout_proc3 to join... +[2024-08-25 06:09:34,908][00480] Waiting for process rollout_proc4 to join... +[2024-08-25 06:09:34,913][00480] Waiting for process rollout_proc5 to join... +[2024-08-25 06:09:34,915][00480] Waiting for process rollout_proc6 to join... +[2024-08-25 06:09:34,918][00480] Waiting for process rollout_proc7 to join... +[2024-08-25 06:09:34,920][00480] Batcher 0 profile tree view: +batching: 24.1132, releasing_batches: 0.0209 +[2024-08-25 06:09:34,925][00480] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0032 + wait_policy_total: 468.6072 +update_model: 8.0840 + weight_update: 0.0016 +one_step: 0.0027 + handle_policy_step: 592.4449 + deserialize: 15.4385, stack: 3.2987, obs_to_device_normalize: 130.1822, forward: 301.0249, send_messages: 22.8562 + prepare_outputs: 89.9754 + to_cpu: 57.0471 +[2024-08-25 06:09:34,927][00480] Learner 0 profile tree view: +misc: 0.0049, prepare_batch: 15.6415 +train: 70.3593 + epoch_init: 0.0057, minibatch_init: 0.0066, losses_postprocess: 0.5199, kl_divergence: 0.4614, after_optimizer: 32.9373 + calculate_losses: 22.7038 + losses_init: 0.0075, forward_head: 1.5812, bptt_initial: 15.0678, tail: 1.0409, advantages_returns: 0.2310, losses: 2.5043 + bptt: 1.9685 + bptt_forward_core: 1.8872 + update: 13.2225 + clip: 1.3945 +[2024-08-25 06:09:34,929][00480] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4991, enqueue_policy_requests: 119.2249, env_step: 857.7475, overhead: 17.8712, complete_rollouts: 8.7535 +save_policy_outputs: 33.2262 + split_output_tensors: 11.4967 +[2024-08-25 06:09:34,930][00480] Loop Runner_EvtLoop terminating... +[2024-08-25 06:09:34,932][00480] Runner profile tree view: +main_loop: 1136.6234 +[2024-08-25 06:09:34,933][00480] Collected {0: 4005888}, FPS: 3524.4 +[2024-08-25 06:09:35,184][00480] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-25 06:09:35,186][00480] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-25 06:09:35,189][00480] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-25 06:09:35,191][00480] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-25 06:09:35,193][00480] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-25 06:09:35,195][00480] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-25 06:09:35,196][00480] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-08-25 06:09:35,197][00480] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-25 06:09:35,199][00480] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-08-25 06:09:35,200][00480] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-08-25 06:09:35,201][00480] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-25 06:09:35,202][00480] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-25 06:09:35,203][00480] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-25 06:09:35,204][00480] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-25 06:09:35,206][00480] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-25 06:09:35,229][00480] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-08-25 06:09:35,231][00480] RunningMeanStd input shape: (3, 72, 128) +[2024-08-25 06:09:35,233][00480] RunningMeanStd input shape: (1,) +[2024-08-25 06:09:35,250][00480] ConvEncoder: input_channels=3 +[2024-08-25 06:09:35,376][00480] Conv encoder output size: 512 +[2024-08-25 06:09:35,378][00480] Policy head output size: 512 +[2024-08-25 06:09:36,963][00480] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-25 06:09:37,800][00480] Num frames 100... +[2024-08-25 06:09:37,919][00480] Num frames 200... +[2024-08-25 06:09:38,047][00480] Num frames 300... +[2024-08-25 06:09:38,165][00480] Num frames 400... +[2024-08-25 06:09:38,293][00480] Num frames 500... +[2024-08-25 06:09:38,417][00480] Num frames 600... +[2024-08-25 06:09:38,539][00480] Num frames 700... +[2024-08-25 06:09:38,659][00480] Num frames 800... +[2024-08-25 06:09:38,711][00480] Avg episode rewards: #0: 19.000, true rewards: #0: 8.000 +[2024-08-25 06:09:38,712][00480] Avg episode reward: 19.000, avg true_objective: 8.000 +[2024-08-25 06:09:38,831][00480] Num frames 900... +[2024-08-25 06:09:38,964][00480] Num frames 1000... +[2024-08-25 06:09:39,080][00480] Num frames 1100... +[2024-08-25 06:09:39,197][00480] Num frames 1200... +[2024-08-25 06:09:39,323][00480] Num frames 1300... +[2024-08-25 06:09:39,488][00480] Avg episode rewards: #0: 14.465, true rewards: #0: 6.965 +[2024-08-25 06:09:39,490][00480] Avg episode reward: 14.465, avg true_objective: 6.965 +[2024-08-25 06:09:39,501][00480] Num frames 1400... +[2024-08-25 06:09:39,617][00480] Num frames 1500... +[2024-08-25 06:09:39,735][00480] Num frames 1600... +[2024-08-25 06:09:39,852][00480] Num frames 1700... +[2024-08-25 06:09:39,978][00480] Num frames 1800... +[2024-08-25 06:09:40,098][00480] Num frames 1900... +[2024-08-25 06:09:40,216][00480] Num frames 2000... +[2024-08-25 06:09:40,345][00480] Num frames 2100... +[2024-08-25 06:09:40,435][00480] Avg episode rewards: #0: 14.097, true rewards: #0: 7.097 +[2024-08-25 06:09:40,437][00480] Avg episode reward: 14.097, avg true_objective: 7.097 +[2024-08-25 06:09:40,519][00480] Num frames 2200... +[2024-08-25 06:09:40,638][00480] Num frames 2300... +[2024-08-25 06:09:40,753][00480] Num frames 2400... +[2024-08-25 06:09:40,874][00480] Num frames 2500... +[2024-08-25 06:09:40,999][00480] Num frames 2600... +[2024-08-25 06:09:41,108][00480] Avg episode rewards: #0: 12.353, true rewards: #0: 6.602 +[2024-08-25 06:09:41,110][00480] Avg episode reward: 12.353, avg true_objective: 6.602 +[2024-08-25 06:09:41,177][00480] Num frames 2700... +[2024-08-25 06:09:41,302][00480] Num frames 2800... +[2024-08-25 06:09:41,419][00480] Num frames 2900... +[2024-08-25 06:09:41,545][00480] Num frames 3000... +[2024-08-25 06:09:41,699][00480] Num frames 3100... +[2024-08-25 06:09:41,859][00480] Num frames 3200... +[2024-08-25 06:09:42,028][00480] Num frames 3300... +[2024-08-25 06:09:42,186][00480] Num frames 3400... +[2024-08-25 06:09:42,359][00480] Num frames 3500... +[2024-08-25 06:09:42,517][00480] Num frames 3600... +[2024-08-25 06:09:42,675][00480] Num frames 3700... +[2024-08-25 06:09:42,836][00480] Num frames 3800... +[2024-08-25 06:09:43,013][00480] Num frames 3900... +[2024-08-25 06:09:43,138][00480] Avg episode rewards: #0: 15.684, true rewards: #0: 7.884 +[2024-08-25 06:09:43,140][00480] Avg episode reward: 15.684, avg true_objective: 7.884 +[2024-08-25 06:09:43,237][00480] Num frames 4000... +[2024-08-25 06:09:43,413][00480] Num frames 4100... +[2024-08-25 06:09:43,586][00480] Num frames 4200... +[2024-08-25 06:09:43,752][00480] Num frames 4300... +[2024-08-25 06:09:43,923][00480] Num frames 4400... +[2024-08-25 06:09:44,109][00480] Num frames 4500... +[2024-08-25 06:09:44,227][00480] Num frames 4600... +[2024-08-25 06:09:44,347][00480] Num frames 4700... +[2024-08-25 06:09:44,475][00480] Num frames 4800... +[2024-08-25 06:09:44,593][00480] Num frames 4900... +[2024-08-25 06:09:44,652][00480] Avg episode rewards: #0: 16.837, true rewards: #0: 8.170 +[2024-08-25 06:09:44,653][00480] Avg episode reward: 16.837, avg true_objective: 8.170 +[2024-08-25 06:09:44,772][00480] Num frames 5000... +[2024-08-25 06:09:44,890][00480] Num frames 5100... +[2024-08-25 06:09:45,019][00480] Num frames 5200... +[2024-08-25 06:09:45,135][00480] Num frames 5300... +[2024-08-25 06:09:45,250][00480] Num frames 5400... +[2024-08-25 06:09:45,371][00480] Num frames 5500... +[2024-08-25 06:09:45,496][00480] Num frames 5600... +[2024-08-25 06:09:45,616][00480] Num frames 5700... +[2024-08-25 06:09:45,734][00480] Num frames 5800... +[2024-08-25 06:09:45,826][00480] Avg episode rewards: #0: 16.900, true rewards: #0: 8.329 +[2024-08-25 06:09:45,828][00480] Avg episode reward: 16.900, avg true_objective: 8.329 +[2024-08-25 06:09:45,914][00480] Num frames 5900... +[2024-08-25 06:09:46,045][00480] Num frames 6000... +[2024-08-25 06:09:46,160][00480] Num frames 6100... +[2024-08-25 06:09:46,280][00480] Num frames 6200... +[2024-08-25 06:09:46,401][00480] Num frames 6300... +[2024-08-25 06:09:46,531][00480] Num frames 6400... +[2024-08-25 06:09:46,648][00480] Num frames 6500... +[2024-08-25 06:09:46,766][00480] Num frames 6600... +[2024-08-25 06:09:46,882][00480] Num frames 6700... +[2024-08-25 06:09:46,988][00480] Avg episode rewards: #0: 17.173, true rewards: #0: 8.422 +[2024-08-25 06:09:46,989][00480] Avg episode reward: 17.173, avg true_objective: 8.422 +[2024-08-25 06:09:47,066][00480] Num frames 6800... +[2024-08-25 06:09:47,182][00480] Num frames 6900... +[2024-08-25 06:09:47,299][00480] Num frames 7000... +[2024-08-25 06:09:47,417][00480] Num frames 7100... +[2024-08-25 06:09:47,539][00480] Num frames 7200... +[2024-08-25 06:09:47,651][00480] Num frames 7300... +[2024-08-25 06:09:47,768][00480] Num frames 7400... +[2024-08-25 06:09:47,882][00480] Num frames 7500... +[2024-08-25 06:09:48,012][00480] Num frames 7600... +[2024-08-25 06:09:48,126][00480] Num frames 7700... +[2024-08-25 06:09:48,243][00480] Num frames 7800... +[2024-08-25 06:09:48,363][00480] Num frames 7900... +[2024-08-25 06:09:48,479][00480] Num frames 8000... +[2024-08-25 06:09:48,609][00480] Num frames 8100... +[2024-08-25 06:09:48,731][00480] Num frames 8200... +[2024-08-25 06:09:48,838][00480] Avg episode rewards: #0: 19.158, true rewards: #0: 9.158 +[2024-08-25 06:09:48,839][00480] Avg episode reward: 19.158, avg true_objective: 9.158 +[2024-08-25 06:09:48,911][00480] Num frames 8300... +[2024-08-25 06:09:49,033][00480] Num frames 8400... +[2024-08-25 06:09:49,150][00480] Num frames 8500... +[2024-08-25 06:09:49,264][00480] Num frames 8600... +[2024-08-25 06:09:49,384][00480] Num frames 8700... +[2024-08-25 06:09:49,499][00480] Num frames 8800... +[2024-08-25 06:09:49,622][00480] Num frames 8900... +[2024-08-25 06:09:49,739][00480] Num frames 9000... +[2024-08-25 06:09:49,854][00480] Num frames 9100... +[2024-08-25 06:09:49,983][00480] Num frames 9200... +[2024-08-25 06:09:50,105][00480] Num frames 9300... +[2024-08-25 06:09:50,223][00480] Num frames 9400... +[2024-08-25 06:09:50,310][00480] Avg episode rewards: #0: 20.026, true rewards: #0: 9.426 +[2024-08-25 06:09:50,311][00480] Avg episode reward: 20.026, avg true_objective: 9.426 +[2024-08-25 06:10:45,670][00480] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-08-25 06:12:24,134][00480] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-08-25 06:12:24,136][00480] Overriding arg 'num_workers' with value 1 passed from command line +[2024-08-25 06:12:24,138][00480] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-08-25 06:12:24,140][00480] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-08-25 06:12:24,141][00480] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-08-25 06:12:24,143][00480] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-08-25 06:12:24,145][00480] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-08-25 06:12:24,146][00480] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-08-25 06:12:24,148][00480] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-08-25 06:12:24,149][00480] Adding new argument 'hf_repository'='hugging-robot/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-08-25 06:12:24,151][00480] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-08-25 06:12:24,152][00480] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-08-25 06:12:24,155][00480] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-08-25 06:12:24,156][00480] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-08-25 06:12:24,158][00480] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-08-25 06:12:24,166][00480] RunningMeanStd input shape: (3, 72, 128) +[2024-08-25 06:12:24,171][00480] RunningMeanStd input shape: (1,) +[2024-08-25 06:12:24,193][00480] ConvEncoder: input_channels=3 +[2024-08-25 06:12:24,233][00480] Conv encoder output size: 512 +[2024-08-25 06:12:24,235][00480] Policy head output size: 512 +[2024-08-25 06:12:24,253][00480] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-08-25 06:12:24,716][00480] Num frames 100... +[2024-08-25 06:12:24,833][00480] Num frames 200... +[2024-08-25 06:12:24,959][00480] Num frames 300... +[2024-08-25 06:12:25,128][00480] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-08-25 06:12:25,130][00480] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-08-25 06:12:25,151][00480] Num frames 400... +[2024-08-25 06:12:25,279][00480] Num frames 500... +[2024-08-25 06:12:25,400][00480] Num frames 600... +[2024-08-25 06:12:25,519][00480] Num frames 700... +[2024-08-25 06:12:25,633][00480] Num frames 800... +[2024-08-25 06:12:25,748][00480] Num frames 900... +[2024-08-25 06:12:25,860][00480] Num frames 1000... +[2024-08-25 06:12:25,989][00480] Num frames 1100... +[2024-08-25 06:12:26,109][00480] Num frames 1200... +[2024-08-25 06:12:26,224][00480] Num frames 1300... +[2024-08-25 06:12:26,351][00480] Num frames 1400... +[2024-08-25 06:12:26,472][00480] Num frames 1500... +[2024-08-25 06:12:26,594][00480] Num frames 1600... +[2024-08-25 06:12:26,715][00480] Num frames 1700... +[2024-08-25 06:12:26,833][00480] Num frames 1800... +[2024-08-25 06:12:26,961][00480] Num frames 1900... +[2024-08-25 06:12:27,088][00480] Num frames 2000... +[2024-08-25 06:12:27,205][00480] Num frames 2100... +[2024-08-25 06:12:27,332][00480] Num frames 2200... +[2024-08-25 06:12:27,455][00480] Num frames 2300... +[2024-08-25 06:12:27,534][00480] Avg episode rewards: #0: 29.075, true rewards: #0: 11.575 +[2024-08-25 06:12:27,536][00480] Avg episode reward: 29.075, avg true_objective: 11.575 +[2024-08-25 06:12:27,635][00480] Num frames 2400... +[2024-08-25 06:12:27,754][00480] Num frames 2500... +[2024-08-25 06:12:27,876][00480] Num frames 2600... +[2024-08-25 06:12:28,000][00480] Num frames 2700... +[2024-08-25 06:12:28,124][00480] Num frames 2800... +[2024-08-25 06:12:28,237][00480] Num frames 2900... +[2024-08-25 06:12:28,364][00480] Num frames 3000... +[2024-08-25 06:12:28,485][00480] Num frames 3100... +[2024-08-25 06:12:28,601][00480] Num frames 3200... +[2024-08-25 06:12:28,683][00480] Avg episode rewards: #0: 25.743, true rewards: #0: 10.743 +[2024-08-25 06:12:28,685][00480] Avg episode reward: 25.743, avg true_objective: 10.743 +[2024-08-25 06:12:28,778][00480] Num frames 3300... +[2024-08-25 06:12:28,899][00480] Num frames 3400... +[2024-08-25 06:12:29,025][00480] Num frames 3500... +[2024-08-25 06:12:29,145][00480] Num frames 3600... +[2024-08-25 06:12:29,263][00480] Num frames 3700... +[2024-08-25 06:12:29,386][00480] Num frames 3800... +[2024-08-25 06:12:29,502][00480] Num frames 3900... +[2024-08-25 06:12:29,619][00480] Num frames 4000... +[2024-08-25 06:12:29,735][00480] Num frames 4100... +[2024-08-25 06:12:29,853][00480] Num frames 4200... +[2024-08-25 06:12:29,979][00480] Num frames 4300... +[2024-08-25 06:12:30,063][00480] Avg episode rewards: #0: 25.810, true rewards: #0: 10.810 +[2024-08-25 06:12:30,066][00480] Avg episode reward: 25.810, avg true_objective: 10.810 +[2024-08-25 06:12:30,159][00480] Num frames 4400... +[2024-08-25 06:12:30,291][00480] Num frames 4500... +[2024-08-25 06:12:30,467][00480] Num frames 4600... +[2024-08-25 06:12:30,627][00480] Num frames 4700... +[2024-08-25 06:12:30,790][00480] Num frames 4800... +[2024-08-25 06:12:30,955][00480] Num frames 4900... +[2024-08-25 06:12:31,126][00480] Num frames 5000... +[2024-08-25 06:12:31,280][00480] Num frames 5100... +[2024-08-25 06:12:31,445][00480] Num frames 5200... +[2024-08-25 06:12:31,578][00480] Avg episode rewards: #0: 24.692, true rewards: #0: 10.492 +[2024-08-25 06:12:31,580][00480] Avg episode reward: 24.692, avg true_objective: 10.492 +[2024-08-25 06:12:31,677][00480] Num frames 5300... +[2024-08-25 06:12:31,843][00480] Num frames 5400... +[2024-08-25 06:12:32,012][00480] Num frames 5500... +[2024-08-25 06:12:32,181][00480] Num frames 5600... +[2024-08-25 06:12:32,348][00480] Num frames 5700... +[2024-08-25 06:12:32,516][00480] Num frames 5800... +[2024-08-25 06:12:32,687][00480] Num frames 5900... +[2024-08-25 06:12:32,835][00480] Num frames 6000... +[2024-08-25 06:12:32,958][00480] Num frames 6100... +[2024-08-25 06:12:33,078][00480] Num frames 6200... +[2024-08-25 06:12:33,197][00480] Num frames 6300... +[2024-08-25 06:12:33,292][00480] Avg episode rewards: #0: 25.057, true rewards: #0: 10.557 +[2024-08-25 06:12:33,294][00480] Avg episode reward: 25.057, avg true_objective: 10.557 +[2024-08-25 06:12:33,374][00480] Num frames 6400... +[2024-08-25 06:12:33,489][00480] Num frames 6500... +[2024-08-25 06:12:33,616][00480] Num frames 6600... +[2024-08-25 06:12:33,733][00480] Num frames 6700... +[2024-08-25 06:12:33,853][00480] Num frames 6800... +[2024-08-25 06:12:33,980][00480] Num frames 6900... +[2024-08-25 06:12:34,122][00480] Avg episode rewards: #0: 23.249, true rewards: #0: 9.963 +[2024-08-25 06:12:34,124][00480] Avg episode reward: 23.249, avg true_objective: 9.963 +[2024-08-25 06:12:34,158][00480] Num frames 7000... +[2024-08-25 06:12:34,275][00480] Num frames 7100... +[2024-08-25 06:12:34,395][00480] Num frames 7200... +[2024-08-25 06:12:34,510][00480] Num frames 7300... +[2024-08-25 06:12:34,634][00480] Num frames 7400... +[2024-08-25 06:12:34,749][00480] Num frames 7500... +[2024-08-25 06:12:34,868][00480] Num frames 7600... +[2024-08-25 06:12:34,990][00480] Num frames 7700... +[2024-08-25 06:12:35,114][00480] Num frames 7800... +[2024-08-25 06:12:35,236][00480] Num frames 7900... +[2024-08-25 06:12:35,351][00480] Num frames 8000... +[2024-08-25 06:12:35,470][00480] Num frames 8100... +[2024-08-25 06:12:35,599][00480] Num frames 8200... +[2024-08-25 06:12:35,682][00480] Avg episode rewards: #0: 24.028, true rewards: #0: 10.277 +[2024-08-25 06:12:35,684][00480] Avg episode reward: 24.028, avg true_objective: 10.277 +[2024-08-25 06:12:35,777][00480] Num frames 8300... +[2024-08-25 06:12:35,895][00480] Num frames 8400... +[2024-08-25 06:12:36,023][00480] Num frames 8500... +[2024-08-25 06:12:36,144][00480] Num frames 8600... +[2024-08-25 06:12:36,262][00480] Num frames 8700... +[2024-08-25 06:12:36,382][00480] Num frames 8800... +[2024-08-25 06:12:36,497][00480] Num frames 8900... +[2024-08-25 06:12:36,620][00480] Num frames 9000... +[2024-08-25 06:12:36,738][00480] Num frames 9100... +[2024-08-25 06:12:36,854][00480] Num frames 9200... +[2024-08-25 06:12:36,979][00480] Num frames 9300... +[2024-08-25 06:12:37,104][00480] Num frames 9400... +[2024-08-25 06:12:37,203][00480] Avg episode rewards: #0: 24.709, true rewards: #0: 10.487 +[2024-08-25 06:12:37,204][00480] Avg episode reward: 24.709, avg true_objective: 10.487 +[2024-08-25 06:12:37,279][00480] Num frames 9500... +[2024-08-25 06:12:37,394][00480] Num frames 9600... +[2024-08-25 06:12:37,544][00480] Num frames 9700... +[2024-08-25 06:12:37,673][00480] Num frames 9800... +[2024-08-25 06:12:37,789][00480] Num frames 9900... +[2024-08-25 06:12:37,907][00480] Num frames 10000... +[2024-08-25 06:12:38,037][00480] Num frames 10100... +[2024-08-25 06:12:38,154][00480] Num frames 10200... +[2024-08-25 06:12:38,247][00480] Avg episode rewards: #0: 23.632, true rewards: #0: 10.232 +[2024-08-25 06:12:38,248][00480] Avg episode reward: 23.632, avg true_objective: 10.232 +[2024-08-25 06:13:39,751][00480] Replay video saved to /content/train_dir/default_experiment/replay.mp4!