[2023-03-27 00:39:49,442][00453] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-27 00:39:49,445][00453] Rollout worker 0 uses device cpu [2023-03-27 00:39:49,446][00453] Rollout worker 1 uses device cpu [2023-03-27 00:39:49,449][00453] Rollout worker 2 uses device cpu [2023-03-27 00:39:49,449][00453] Rollout worker 3 uses device cpu [2023-03-27 00:39:49,450][00453] Rollout worker 4 uses device cpu [2023-03-27 00:39:49,452][00453] Rollout worker 5 uses device cpu [2023-03-27 00:39:49,456][00453] Rollout worker 6 uses device cpu [2023-03-27 00:39:49,457][00453] Rollout worker 7 uses device cpu [2023-03-27 00:39:49,674][00453] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-27 00:39:49,679][00453] InferenceWorker_p0-w0: min num requests: 2 [2023-03-27 00:39:49,718][00453] Starting all processes... [2023-03-27 00:39:49,722][00453] Starting process learner_proc0 [2023-03-27 00:39:49,807][00453] Starting all processes... [2023-03-27 00:39:49,827][00453] Starting process inference_proc0-0 [2023-03-27 00:39:49,829][00453] Starting process rollout_proc0 [2023-03-27 00:39:49,829][00453] Starting process rollout_proc1 [2023-03-27 00:39:49,829][00453] Starting process rollout_proc2 [2023-03-27 00:39:49,830][00453] Starting process rollout_proc3 [2023-03-27 00:39:49,848][00453] Starting process rollout_proc4 [2023-03-27 00:39:49,848][00453] Starting process rollout_proc5 [2023-03-27 00:39:49,848][00453] Starting process rollout_proc6 [2023-03-27 00:39:49,849][00453] Starting process rollout_proc7 [2023-03-27 00:39:58,814][17679] Worker 2 uses CPU cores [0] [2023-03-27 00:39:59,090][17678] Worker 4 uses CPU cores [0] [2023-03-27 00:39:59,178][17676] Worker 3 uses CPU cores [1] [2023-03-27 00:39:59,220][17658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-27 00:39:59,229][17658] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-27 00:39:59,254][17673] Worker 1 uses CPU cores [1] [2023-03-27 00:39:59,264][17658] Num visible devices: 1 [2023-03-27 00:39:59,286][17658] Starting seed is not provided [2023-03-27 00:39:59,287][17658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-27 00:39:59,288][17658] Initializing actor-critic model on device cuda:0 [2023-03-27 00:39:59,288][17658] RunningMeanStd input shape: (3, 72, 128) [2023-03-27 00:39:59,290][17658] RunningMeanStd input shape: (1,) [2023-03-27 00:39:59,336][17681] Worker 6 uses CPU cores [0] [2023-03-27 00:39:59,339][17658] ConvEncoder: input_channels=3 [2023-03-27 00:39:59,356][17680] Worker 5 uses CPU cores [1] [2023-03-27 00:39:59,366][17674] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-27 00:39:59,366][17674] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-27 00:39:59,370][17671] Worker 0 uses CPU cores [0] [2023-03-27 00:39:59,382][17674] Num visible devices: 1 [2023-03-27 00:39:59,396][17682] Worker 7 uses CPU cores [1] [2023-03-27 00:39:59,596][17658] Conv encoder output size: 512 [2023-03-27 00:39:59,597][17658] Policy head output size: 512 [2023-03-27 00:39:59,643][17658] Created Actor Critic model with architecture: [2023-03-27 00:39:59,643][17658] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-27 00:40:06,752][17658] Using optimizer [2023-03-27 00:40:06,753][17658] No checkpoints found [2023-03-27 00:40:06,753][17658] Did not load from checkpoint, starting from scratch! [2023-03-27 00:40:06,754][17658] Initialized policy 0 weights for model version 0 [2023-03-27 00:40:06,757][17658] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-27 00:40:06,764][17658] LearnerWorker_p0 finished initialization! [2023-03-27 00:40:06,965][17674] RunningMeanStd input shape: (3, 72, 128) [2023-03-27 00:40:06,966][17674] RunningMeanStd input shape: (1,) [2023-03-27 00:40:06,979][17674] ConvEncoder: input_channels=3 [2023-03-27 00:40:07,078][17674] Conv encoder output size: 512 [2023-03-27 00:40:07,079][17674] Policy head output size: 512 [2023-03-27 00:40:09,384][00453] Inference worker 0-0 is ready! [2023-03-27 00:40:09,385][00453] All inference workers are ready! Signal rollout workers to start! [2023-03-27 00:40:09,530][17680] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,526][17676] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,536][17682] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,544][17673] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,542][17671] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,563][17678] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,584][17679] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,592][17681] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:40:09,665][00453] Heartbeat connected on Batcher_0 [2023-03-27 00:40:09,672][00453] Heartbeat connected on LearnerWorker_p0 [2023-03-27 00:40:09,722][00453] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-27 00:40:09,828][00453] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-27 00:40:10,442][17679] Decorrelating experience for 0 frames... [2023-03-27 00:40:10,444][17678] Decorrelating experience for 0 frames... [2023-03-27 00:40:10,785][17680] Decorrelating experience for 0 frames... [2023-03-27 00:40:10,787][17673] Decorrelating experience for 0 frames... [2023-03-27 00:40:10,789][17682] Decorrelating experience for 0 frames... [2023-03-27 00:40:10,793][17676] Decorrelating experience for 0 frames... [2023-03-27 00:40:11,151][17679] Decorrelating experience for 32 frames... [2023-03-27 00:40:11,875][17681] Decorrelating experience for 0 frames... [2023-03-27 00:40:11,961][17679] Decorrelating experience for 64 frames... [2023-03-27 00:40:12,140][17676] Decorrelating experience for 32 frames... [2023-03-27 00:40:12,142][17682] Decorrelating experience for 32 frames... [2023-03-27 00:40:12,144][17673] Decorrelating experience for 32 frames... [2023-03-27 00:40:12,147][17680] Decorrelating experience for 32 frames... [2023-03-27 00:40:12,525][17681] Decorrelating experience for 32 frames... [2023-03-27 00:40:12,905][17671] Decorrelating experience for 0 frames... [2023-03-27 00:40:13,144][17681] Decorrelating experience for 64 frames... [2023-03-27 00:40:13,163][17682] Decorrelating experience for 64 frames... [2023-03-27 00:40:13,380][17673] Decorrelating experience for 64 frames... [2023-03-27 00:40:13,698][17681] Decorrelating experience for 96 frames... [2023-03-27 00:40:13,834][00453] Heartbeat connected on RolloutWorker_w6 [2023-03-27 00:40:14,099][17679] Decorrelating experience for 96 frames... [2023-03-27 00:40:14,138][17680] Decorrelating experience for 64 frames... [2023-03-27 00:40:14,244][00453] Heartbeat connected on RolloutWorker_w2 [2023-03-27 00:40:14,604][17678] Decorrelating experience for 32 frames... [2023-03-27 00:40:14,828][00453] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-27 00:40:15,188][17682] Decorrelating experience for 96 frames... [2023-03-27 00:40:15,307][17678] Decorrelating experience for 64 frames... [2023-03-27 00:40:15,711][00453] Heartbeat connected on RolloutWorker_w7 [2023-03-27 00:40:15,855][17678] Decorrelating experience for 96 frames... [2023-03-27 00:40:15,879][17673] Decorrelating experience for 96 frames... [2023-03-27 00:40:15,941][00453] Heartbeat connected on RolloutWorker_w4 [2023-03-27 00:40:16,254][00453] Heartbeat connected on RolloutWorker_w1 [2023-03-27 00:40:16,668][17676] Decorrelating experience for 64 frames... [2023-03-27 00:40:17,339][17680] Decorrelating experience for 96 frames... [2023-03-27 00:40:17,477][00453] Heartbeat connected on RolloutWorker_w5 [2023-03-27 00:40:17,644][17676] Decorrelating experience for 96 frames... [2023-03-27 00:40:17,722][17671] Decorrelating experience for 32 frames... [2023-03-27 00:40:17,806][00453] Heartbeat connected on RolloutWorker_w3 [2023-03-27 00:40:19,101][17671] Decorrelating experience for 64 frames... [2023-03-27 00:40:19,828][00453] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 5.2. Samples: 52. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-27 00:40:19,835][00453] Avg episode reward: [(0, '1.280')] [2023-03-27 00:40:24,229][17671] Decorrelating experience for 96 frames... [2023-03-27 00:40:24,278][17658] Signal inference workers to stop experience collection... [2023-03-27 00:40:24,311][17674] InferenceWorker_p0-w0: stopping experience collection [2023-03-27 00:40:24,432][00453] Heartbeat connected on RolloutWorker_w0 [2023-03-27 00:40:24,828][00453] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 159.2. Samples: 2388. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-27 00:40:24,842][00453] Avg episode reward: [(0, '2.614')] [2023-03-27 00:40:27,966][17658] Signal inference workers to resume experience collection... [2023-03-27 00:40:27,967][17674] InferenceWorker_p0-w0: resuming experience collection [2023-03-27 00:40:29,828][00453] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 119.4. Samples: 2388. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-03-27 00:40:29,835][00453] Avg episode reward: [(0, '2.993')] [2023-03-27 00:40:34,828][00453] Fps is (10 sec: 1638.5, 60 sec: 655.4, 300 sec: 655.4). Total num frames: 16384. Throughput: 0: 208.8. Samples: 5220. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-03-27 00:40:34,835][00453] Avg episode reward: [(0, '3.436')] [2023-03-27 00:40:39,828][00453] Fps is (10 sec: 2048.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 24576. Throughput: 0: 243.8. Samples: 7314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:40:39,838][00453] Avg episode reward: [(0, '3.685')] [2023-03-27 00:40:44,828][00453] Fps is (10 sec: 1638.4, 60 sec: 936.2, 300 sec: 936.2). Total num frames: 32768. Throughput: 0: 232.9. Samples: 8150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:40:44,834][00453] Avg episode reward: [(0, '3.769')] [2023-03-27 00:40:45,628][17674] Updated weights for policy 0, policy_version 10 (0.0013) [2023-03-27 00:40:49,828][00453] Fps is (10 sec: 2867.2, 60 sec: 1331.2, 300 sec: 1331.2). Total num frames: 53248. Throughput: 0: 320.8. Samples: 12832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:40:49,830][00453] Avg episode reward: [(0, '4.298')] [2023-03-27 00:40:54,829][00453] Fps is (10 sec: 3276.5, 60 sec: 1456.3, 300 sec: 1456.3). Total num frames: 65536. Throughput: 0: 381.0. Samples: 17146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:40:54,835][00453] Avg episode reward: [(0, '4.388')] [2023-03-27 00:40:59,828][00453] Fps is (10 sec: 2048.0, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 73728. Throughput: 0: 408.2. Samples: 18370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-27 00:40:59,834][00453] Avg episode reward: [(0, '4.604')] [2023-03-27 00:41:02,332][17674] Updated weights for policy 0, policy_version 20 (0.0041) [2023-03-27 00:41:04,828][00453] Fps is (10 sec: 2457.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 90112. Throughput: 0: 483.6. Samples: 21812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:41:04,831][00453] Avg episode reward: [(0, '4.383')] [2023-03-27 00:41:09,828][00453] Fps is (10 sec: 3276.8, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 106496. Throughput: 0: 545.1. Samples: 26918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:09,830][00453] Avg episode reward: [(0, '4.266')] [2023-03-27 00:41:09,836][17658] Saving new best policy, reward=4.266! [2023-03-27 00:41:14,011][17674] Updated weights for policy 0, policy_version 30 (0.0026) [2023-03-27 00:41:14,828][00453] Fps is (10 sec: 3276.8, 60 sec: 2048.0, 300 sec: 1890.5). Total num frames: 122880. Throughput: 0: 610.2. Samples: 29846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:14,831][00453] Avg episode reward: [(0, '4.305')] [2023-03-27 00:41:14,851][17658] Saving new best policy, reward=4.305! [2023-03-27 00:41:19,828][00453] Fps is (10 sec: 2457.4, 60 sec: 2184.5, 300 sec: 1872.4). Total num frames: 131072. Throughput: 0: 620.4. Samples: 33140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:19,831][00453] Avg episode reward: [(0, '4.416')] [2023-03-27 00:41:19,840][17658] Saving new best policy, reward=4.416! [2023-03-27 00:41:24,828][00453] Fps is (10 sec: 2048.0, 60 sec: 2389.3, 300 sec: 1911.5). Total num frames: 143360. Throughput: 0: 649.6. Samples: 36548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:41:24,831][00453] Avg episode reward: [(0, '4.380')] [2023-03-27 00:41:29,648][17674] Updated weights for policy 0, policy_version 40 (0.0026) [2023-03-27 00:41:29,828][00453] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2048.0). Total num frames: 163840. Throughput: 0: 693.7. Samples: 39366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:41:29,836][00453] Avg episode reward: [(0, '4.309')] [2023-03-27 00:41:34,828][00453] Fps is (10 sec: 3686.5, 60 sec: 2730.7, 300 sec: 2120.3). Total num frames: 180224. Throughput: 0: 721.1. Samples: 45282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:41:34,833][00453] Avg episode reward: [(0, '4.472')] [2023-03-27 00:41:34,845][17658] Saving new best policy, reward=4.472! [2023-03-27 00:41:39,828][00453] Fps is (10 sec: 2867.3, 60 sec: 2798.9, 300 sec: 2139.0). Total num frames: 192512. Throughput: 0: 707.3. Samples: 48974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:39,830][00453] Avg episode reward: [(0, '4.658')] [2023-03-27 00:41:39,836][17658] Saving new best policy, reward=4.658! [2023-03-27 00:41:43,292][17674] Updated weights for policy 0, policy_version 50 (0.0022) [2023-03-27 00:41:44,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2198.9). Total num frames: 208896. Throughput: 0: 727.6. Samples: 51110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:41:44,838][00453] Avg episode reward: [(0, '4.560')] [2023-03-27 00:41:44,854][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000051_208896.pth... [2023-03-27 00:41:49,828][00453] Fps is (10 sec: 3276.9, 60 sec: 2867.2, 300 sec: 2252.8). Total num frames: 225280. Throughput: 0: 774.8. Samples: 56676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:49,830][00453] Avg episode reward: [(0, '4.392')] [2023-03-27 00:41:54,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2262.5). Total num frames: 237568. Throughput: 0: 739.8. Samples: 60208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:41:54,832][00453] Avg episode reward: [(0, '4.366')] [2023-03-27 00:41:56,967][17674] Updated weights for policy 0, policy_version 60 (0.0021) [2023-03-27 00:41:59,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2271.4). Total num frames: 249856. Throughput: 0: 721.3. Samples: 62304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:41:59,833][00453] Avg episode reward: [(0, '4.490')] [2023-03-27 00:42:04,828][00453] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2350.7). Total num frames: 270336. Throughput: 0: 744.4. Samples: 66638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:04,831][00453] Avg episode reward: [(0, '4.813')] [2023-03-27 00:42:04,842][17658] Saving new best policy, reward=4.813! [2023-03-27 00:42:08,202][17674] Updated weights for policy 0, policy_version 70 (0.0017) [2023-03-27 00:42:09,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3072.0, 300 sec: 2423.5). Total num frames: 290816. Throughput: 0: 822.5. Samples: 73560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:09,835][00453] Avg episode reward: [(0, '4.708')] [2023-03-27 00:42:14,828][00453] Fps is (10 sec: 4096.2, 60 sec: 3140.3, 300 sec: 2490.4). Total num frames: 311296. Throughput: 0: 836.8. Samples: 77020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:42:14,829][00453] Avg episode reward: [(0, '4.447')] [2023-03-27 00:42:19,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 2489.1). Total num frames: 323584. Throughput: 0: 810.8. Samples: 81766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:42:19,831][00453] Avg episode reward: [(0, '4.333')] [2023-03-27 00:42:20,060][17674] Updated weights for policy 0, policy_version 80 (0.0016) [2023-03-27 00:42:24,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2518.3). Total num frames: 339968. Throughput: 0: 818.8. Samples: 85818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:24,829][00453] Avg episode reward: [(0, '4.295')] [2023-03-27 00:42:29,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 2603.9). Total num frames: 364544. Throughput: 0: 851.8. Samples: 89440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:42:29,830][00453] Avg episode reward: [(0, '4.382')] [2023-03-27 00:42:30,149][17674] Updated weights for policy 0, policy_version 90 (0.0035) [2023-03-27 00:42:34,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 2655.3). Total num frames: 385024. Throughput: 0: 879.5. Samples: 96254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:42:34,834][00453] Avg episode reward: [(0, '4.637')] [2023-03-27 00:42:39,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2648.7). Total num frames: 397312. Throughput: 0: 880.9. Samples: 99850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:39,833][00453] Avg episode reward: [(0, '4.733')] [2023-03-27 00:42:44,828][00453] Fps is (10 sec: 2048.0, 60 sec: 3276.8, 300 sec: 2616.2). Total num frames: 405504. Throughput: 0: 856.8. Samples: 100858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:44,833][00453] Avg episode reward: [(0, '4.718')] [2023-03-27 00:42:45,266][17674] Updated weights for policy 0, policy_version 100 (0.0027) [2023-03-27 00:42:49,828][00453] Fps is (10 sec: 2048.0, 60 sec: 3208.5, 300 sec: 2611.2). Total num frames: 417792. Throughput: 0: 842.4. Samples: 104544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:42:49,834][00453] Avg episode reward: [(0, '4.826')] [2023-03-27 00:42:49,837][17658] Saving new best policy, reward=4.826! [2023-03-27 00:42:54,829][00453] Fps is (10 sec: 2457.2, 60 sec: 3208.4, 300 sec: 2606.5). Total num frames: 430080. Throughput: 0: 766.2. Samples: 108042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:42:54,837][00453] Avg episode reward: [(0, '4.615')] [2023-03-27 00:42:59,828][00453] Fps is (10 sec: 2048.0, 60 sec: 3140.3, 300 sec: 2578.1). Total num frames: 438272. Throughput: 0: 726.4. Samples: 109708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:42:59,830][00453] Avg episode reward: [(0, '4.668')] [2023-03-27 00:43:04,507][17674] Updated weights for policy 0, policy_version 110 (0.0026) [2023-03-27 00:43:04,828][00453] Fps is (10 sec: 2048.4, 60 sec: 3003.8, 300 sec: 2574.6). Total num frames: 450560. Throughput: 0: 681.2. Samples: 112420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:43:04,841][00453] Avg episode reward: [(0, '4.637')] [2023-03-27 00:43:09,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2594.1). Total num frames: 466944. Throughput: 0: 709.1. Samples: 117726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:43:09,830][00453] Avg episode reward: [(0, '4.817')] [2023-03-27 00:43:14,828][00453] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2634.7). Total num frames: 487424. Throughput: 0: 694.8. Samples: 120708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:43:14,829][00453] Avg episode reward: [(0, '4.909')] [2023-03-27 00:43:14,845][17658] Saving new best policy, reward=4.909! [2023-03-27 00:43:15,477][17674] Updated weights for policy 0, policy_version 120 (0.0029) [2023-03-27 00:43:19,828][00453] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2630.1). Total num frames: 499712. Throughput: 0: 641.1. Samples: 125104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:43:19,832][00453] Avg episode reward: [(0, '4.772')] [2023-03-27 00:43:24,828][00453] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2625.6). Total num frames: 512000. Throughput: 0: 646.0. Samples: 128918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:43:24,835][00453] Avg episode reward: [(0, '4.579')] [2023-03-27 00:43:29,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2641.9). Total num frames: 528384. Throughput: 0: 681.0. Samples: 131502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:43:29,829][00453] Avg episode reward: [(0, '4.691')] [2023-03-27 00:43:30,319][17674] Updated weights for policy 0, policy_version 130 (0.0026) [2023-03-27 00:43:34,828][00453] Fps is (10 sec: 3276.9, 60 sec: 2662.4, 300 sec: 2657.4). Total num frames: 544768. Throughput: 0: 700.9. Samples: 136084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:43:34,836][00453] Avg episode reward: [(0, '4.661')] [2023-03-27 00:43:39,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2633.1). Total num frames: 552960. Throughput: 0: 695.7. Samples: 139346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:43:39,833][00453] Avg episode reward: [(0, '4.803')] [2023-03-27 00:43:44,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2648.1). Total num frames: 569344. Throughput: 0: 700.0. Samples: 141208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:43:44,830][00453] Avg episode reward: [(0, '4.597')] [2023-03-27 00:43:44,845][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000139_569344.pth... [2023-03-27 00:43:45,725][17674] Updated weights for policy 0, policy_version 140 (0.0019) [2023-03-27 00:43:49,828][00453] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2662.4). Total num frames: 585728. Throughput: 0: 751.4. Samples: 146234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:43:49,834][00453] Avg episode reward: [(0, '4.574')] [2023-03-27 00:43:54,828][00453] Fps is (10 sec: 3686.4, 60 sec: 2935.6, 300 sec: 2694.3). Total num frames: 606208. Throughput: 0: 760.9. Samples: 151966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:43:54,833][00453] Avg episode reward: [(0, '4.486')] [2023-03-27 00:43:57,708][17674] Updated weights for policy 0, policy_version 150 (0.0023) [2023-03-27 00:43:59,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2689.1). Total num frames: 618496. Throughput: 0: 741.4. Samples: 154070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:43:59,830][00453] Avg episode reward: [(0, '4.507')] [2023-03-27 00:44:04,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2684.2). Total num frames: 630784. Throughput: 0: 720.9. Samples: 157544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:44:04,831][00453] Avg episode reward: [(0, '4.543')] [2023-03-27 00:44:09,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2679.5). Total num frames: 643072. Throughput: 0: 718.8. Samples: 161266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:44:09,835][00453] Avg episode reward: [(0, '4.884')] [2023-03-27 00:44:13,390][17674] Updated weights for policy 0, policy_version 160 (0.0021) [2023-03-27 00:44:14,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2674.9). Total num frames: 655360. Throughput: 0: 709.7. Samples: 163438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:44:14,830][00453] Avg episode reward: [(0, '5.097')] [2023-03-27 00:44:14,845][17658] Saving new best policy, reward=5.097! [2023-03-27 00:44:19,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2670.6). Total num frames: 667648. Throughput: 0: 690.7. Samples: 167166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:44:19,837][00453] Avg episode reward: [(0, '4.994')] [2023-03-27 00:44:24,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2666.4). Total num frames: 679936. Throughput: 0: 694.1. Samples: 170582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:44:24,830][00453] Avg episode reward: [(0, '4.833')] [2023-03-27 00:44:28,970][17674] Updated weights for policy 0, policy_version 170 (0.0019) [2023-03-27 00:44:29,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2678.2). Total num frames: 696320. Throughput: 0: 727.3. Samples: 173936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:44:29,830][00453] Avg episode reward: [(0, '4.580')] [2023-03-27 00:44:34,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2674.0). Total num frames: 708608. Throughput: 0: 701.6. Samples: 177806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:44:34,832][00453] Avg episode reward: [(0, '4.718')] [2023-03-27 00:44:39,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2670.0). Total num frames: 720896. Throughput: 0: 640.5. Samples: 180790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:44:39,833][00453] Avg episode reward: [(0, '4.675')] [2023-03-27 00:44:44,570][17674] Updated weights for policy 0, policy_version 180 (0.0018) [2023-03-27 00:44:44,828][00453] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2681.0). Total num frames: 737280. Throughput: 0: 632.8. Samples: 182548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:44:44,830][00453] Avg episode reward: [(0, '4.797')] [2023-03-27 00:44:49,828][00453] Fps is (10 sec: 4096.0, 60 sec: 2935.5, 300 sec: 2720.9). Total num frames: 761856. Throughput: 0: 713.8. Samples: 189666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:44:49,830][00453] Avg episode reward: [(0, '4.846')] [2023-03-27 00:44:54,828][00453] Fps is (10 sec: 3686.3, 60 sec: 2798.9, 300 sec: 2716.3). Total num frames: 774144. Throughput: 0: 739.3. Samples: 194534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:44:54,830][00453] Avg episode reward: [(0, '4.980')] [2023-03-27 00:44:55,051][17674] Updated weights for policy 0, policy_version 190 (0.0019) [2023-03-27 00:44:59,828][00453] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2711.8). Total num frames: 786432. Throughput: 0: 740.5. Samples: 196762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:44:59,833][00453] Avg episode reward: [(0, '4.901')] [2023-03-27 00:45:04,828][00453] Fps is (10 sec: 3276.9, 60 sec: 2935.5, 300 sec: 2735.3). Total num frames: 806912. Throughput: 0: 742.0. Samples: 200558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:45:04,835][00453] Avg episode reward: [(0, '5.069')] [2023-03-27 00:45:08,113][17674] Updated weights for policy 0, policy_version 200 (0.0019) [2023-03-27 00:45:09,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2790.8). Total num frames: 823296. Throughput: 0: 803.0. Samples: 206716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:45:09,835][00453] Avg episode reward: [(0, '4.922')] [2023-03-27 00:45:14,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2832.5). Total num frames: 835584. Throughput: 0: 785.6. Samples: 209288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:45:14,830][00453] Avg episode reward: [(0, '4.822')] [2023-03-27 00:45:19,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2874.1). Total num frames: 847872. Throughput: 0: 756.3. Samples: 211838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:45:19,834][00453] Avg episode reward: [(0, '4.779')] [2023-03-27 00:45:23,523][17674] Updated weights for policy 0, policy_version 210 (0.0033) [2023-03-27 00:45:24,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 860160. Throughput: 0: 788.8. Samples: 216286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:45:24,830][00453] Avg episode reward: [(0, '4.819')] [2023-03-27 00:45:29,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2943.6). Total num frames: 884736. Throughput: 0: 814.8. Samples: 219212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:45:29,830][00453] Avg episode reward: [(0, '5.001')] [2023-03-27 00:45:34,315][17674] Updated weights for policy 0, policy_version 220 (0.0015) [2023-03-27 00:45:34,829][00453] Fps is (10 sec: 4095.6, 60 sec: 3208.5, 300 sec: 2971.3). Total num frames: 901120. Throughput: 0: 795.4. Samples: 225460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:45:34,833][00453] Avg episode reward: [(0, '5.266')] [2023-03-27 00:45:34,848][17658] Saving new best policy, reward=5.266! [2023-03-27 00:45:39,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2985.2). Total num frames: 913408. Throughput: 0: 763.8. Samples: 228906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:45:39,833][00453] Avg episode reward: [(0, '5.616')] [2023-03-27 00:45:39,841][17658] Saving new best policy, reward=5.616! [2023-03-27 00:45:44,828][00453] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 2971.3). Total num frames: 929792. Throughput: 0: 766.8. Samples: 231268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:45:44,835][00453] Avg episode reward: [(0, '5.441')] [2023-03-27 00:45:44,849][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000227_929792.pth... [2023-03-27 00:45:45,033][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000051_208896.pth [2023-03-27 00:45:48,364][17674] Updated weights for policy 0, policy_version 230 (0.0016) [2023-03-27 00:45:49,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 946176. Throughput: 0: 793.3. Samples: 236258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:45:49,829][00453] Avg episode reward: [(0, '5.471')] [2023-03-27 00:45:54,828][00453] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 966656. Throughput: 0: 789.0. Samples: 242222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:45:54,832][00453] Avg episode reward: [(0, '5.446')] [2023-03-27 00:45:59,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3013.0). Total num frames: 978944. Throughput: 0: 780.5. Samples: 244412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:45:59,830][00453] Avg episode reward: [(0, '5.419')] [2023-03-27 00:45:59,930][17674] Updated weights for policy 0, policy_version 240 (0.0026) [2023-03-27 00:46:04,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 1003520. Throughput: 0: 849.5. Samples: 250066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:46:04,830][00453] Avg episode reward: [(0, '5.344')] [2023-03-27 00:46:08,981][17674] Updated weights for policy 0, policy_version 250 (0.0026) [2023-03-27 00:46:09,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3054.6). Total num frames: 1024000. Throughput: 0: 906.5. Samples: 257080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:46:09,830][00453] Avg episode reward: [(0, '5.457')] [2023-03-27 00:46:14,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3096.3). Total num frames: 1044480. Throughput: 0: 908.0. Samples: 260074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:46:14,836][00453] Avg episode reward: [(0, '5.769')] [2023-03-27 00:46:14,852][17658] Saving new best policy, reward=5.769! [2023-03-27 00:46:19,828][00453] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3096.3). Total num frames: 1056768. Throughput: 0: 867.3. Samples: 264490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:46:19,833][00453] Avg episode reward: [(0, '5.880')] [2023-03-27 00:46:19,845][17658] Saving new best policy, reward=5.880! [2023-03-27 00:46:21,471][17674] Updated weights for policy 0, policy_version 260 (0.0014) [2023-03-27 00:46:24,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3096.3). Total num frames: 1077248. Throughput: 0: 916.0. Samples: 270126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:46:24,830][00453] Avg episode reward: [(0, '6.447')] [2023-03-27 00:46:24,847][17658] Saving new best policy, reward=6.447! [2023-03-27 00:46:29,828][00453] Fps is (10 sec: 4505.7, 60 sec: 3618.1, 300 sec: 3124.1). Total num frames: 1101824. Throughput: 0: 940.9. Samples: 273608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:46:29,833][00453] Avg episode reward: [(0, '6.471')] [2023-03-27 00:46:29,836][17658] Saving new best policy, reward=6.471! [2023-03-27 00:46:30,424][17674] Updated weights for policy 0, policy_version 270 (0.0015) [2023-03-27 00:46:34,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3138.0). Total num frames: 1118208. Throughput: 0: 963.5. Samples: 279616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:46:34,830][00453] Avg episode reward: [(0, '6.361')] [2023-03-27 00:46:39,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3110.2). Total num frames: 1126400. Throughput: 0: 900.5. Samples: 282746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:46:39,833][00453] Avg episode reward: [(0, '6.484')] [2023-03-27 00:46:39,841][17658] Saving new best policy, reward=6.484! [2023-03-27 00:46:43,795][17674] Updated weights for policy 0, policy_version 280 (0.0026) [2023-03-27 00:46:44,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3138.0). Total num frames: 1150976. Throughput: 0: 916.9. Samples: 285672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:46:44,830][00453] Avg episode reward: [(0, '6.493')] [2023-03-27 00:46:44,840][17658] Saving new best policy, reward=6.493! [2023-03-27 00:46:49,828][00453] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3179.6). Total num frames: 1175552. Throughput: 0: 948.6. Samples: 292752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:46:49,833][00453] Avg episode reward: [(0, '6.936')] [2023-03-27 00:46:49,840][17658] Saving new best policy, reward=6.936! [2023-03-27 00:46:53,636][17674] Updated weights for policy 0, policy_version 290 (0.0023) [2023-03-27 00:46:54,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3179.6). Total num frames: 1187840. Throughput: 0: 913.8. Samples: 298202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:46:54,830][00453] Avg episode reward: [(0, '7.661')] [2023-03-27 00:46:54,844][17658] Saving new best policy, reward=7.661! [2023-03-27 00:46:59,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3165.7). Total num frames: 1204224. Throughput: 0: 895.2. Samples: 300360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:46:59,833][00453] Avg episode reward: [(0, '7.713')] [2023-03-27 00:46:59,835][17658] Saving new best policy, reward=7.713! [2023-03-27 00:47:04,633][17674] Updated weights for policy 0, policy_version 300 (0.0017) [2023-03-27 00:47:04,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3179.6). Total num frames: 1228800. Throughput: 0: 927.5. Samples: 306228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:47:04,833][00453] Avg episode reward: [(0, '7.805')] [2023-03-27 00:47:04,848][17658] Saving new best policy, reward=7.805! [2023-03-27 00:47:09,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3179.6). Total num frames: 1249280. Throughput: 0: 959.2. Samples: 313288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:47:09,830][00453] Avg episode reward: [(0, '7.679')] [2023-03-27 00:47:14,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3193.5). Total num frames: 1265664. Throughput: 0: 938.5. Samples: 315840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:47:14,841][00453] Avg episode reward: [(0, '7.603')] [2023-03-27 00:47:15,376][17674] Updated weights for policy 0, policy_version 310 (0.0018) [2023-03-27 00:47:19,828][00453] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3179.6). Total num frames: 1277952. Throughput: 0: 897.8. Samples: 320016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:47:19,830][00453] Avg episode reward: [(0, '7.515')] [2023-03-27 00:47:24,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3165.7). Total num frames: 1298432. Throughput: 0: 940.9. Samples: 325088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:47:24,830][00453] Avg episode reward: [(0, '7.952')] [2023-03-27 00:47:24,842][17658] Saving new best policy, reward=7.952! [2023-03-27 00:47:29,425][17674] Updated weights for policy 0, policy_version 320 (0.0011) [2023-03-27 00:47:29,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3138.0). Total num frames: 1310720. Throughput: 0: 921.9. Samples: 327156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:29,838][00453] Avg episode reward: [(0, '8.017')] [2023-03-27 00:47:29,844][17658] Saving new best policy, reward=8.017! [2023-03-27 00:47:34,828][00453] Fps is (10 sec: 2048.0, 60 sec: 3345.1, 300 sec: 3124.1). Total num frames: 1318912. Throughput: 0: 831.1. Samples: 330150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:34,835][00453] Avg episode reward: [(0, '8.217')] [2023-03-27 00:47:34,849][17658] Saving new best policy, reward=8.217! [2023-03-27 00:47:39,828][00453] Fps is (10 sec: 2047.9, 60 sec: 3413.3, 300 sec: 3138.0). Total num frames: 1331200. Throughput: 0: 790.9. Samples: 333794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:39,836][00453] Avg episode reward: [(0, '8.625')] [2023-03-27 00:47:39,838][17658] Saving new best policy, reward=8.625! [2023-03-27 00:47:43,515][17674] Updated weights for policy 0, policy_version 330 (0.0016) [2023-03-27 00:47:44,828][00453] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3179.6). Total num frames: 1355776. Throughput: 0: 818.4. Samples: 337188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:44,830][00453] Avg episode reward: [(0, '9.491')] [2023-03-27 00:47:44,837][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000331_1355776.pth... [2023-03-27 00:47:44,948][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000139_569344.pth [2023-03-27 00:47:44,956][17658] Saving new best policy, reward=9.491! [2023-03-27 00:47:49,828][00453] Fps is (10 sec: 4915.2, 60 sec: 3413.3, 300 sec: 3221.3). Total num frames: 1380352. Throughput: 0: 845.5. Samples: 344276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:49,830][00453] Avg episode reward: [(0, '9.325')] [2023-03-27 00:47:54,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1388544. Throughput: 0: 772.8. Samples: 348064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:54,832][00453] Avg episode reward: [(0, '9.308')] [2023-03-27 00:47:56,574][17674] Updated weights for policy 0, policy_version 340 (0.0029) [2023-03-27 00:47:59,828][00453] Fps is (10 sec: 1638.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1396736. Throughput: 0: 741.3. Samples: 349198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:47:59,834][00453] Avg episode reward: [(0, '8.905')] [2023-03-27 00:48:04,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1417216. Throughput: 0: 751.4. Samples: 353830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:48:04,830][00453] Avg episode reward: [(0, '9.390')] [2023-03-27 00:48:08,205][17674] Updated weights for policy 0, policy_version 350 (0.0021) [2023-03-27 00:48:09,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1437696. Throughput: 0: 781.6. Samples: 360262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:48:09,830][00453] Avg episode reward: [(0, '9.945')] [2023-03-27 00:48:09,832][17658] Saving new best policy, reward=9.945! [2023-03-27 00:48:14,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 1454080. Throughput: 0: 783.9. Samples: 362432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:48:14,832][00453] Avg episode reward: [(0, '10.303')] [2023-03-27 00:48:14,845][17658] Saving new best policy, reward=10.303! [2023-03-27 00:48:19,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3249.0). Total num frames: 1470464. Throughput: 0: 811.2. Samples: 366652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:48:19,830][00453] Avg episode reward: [(0, '10.067')] [2023-03-27 00:48:20,691][17674] Updated weights for policy 0, policy_version 360 (0.0013) [2023-03-27 00:48:24,828][00453] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1495040. Throughput: 0: 888.8. Samples: 373790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:48:24,834][00453] Avg episode reward: [(0, '9.749')] [2023-03-27 00:48:29,266][17674] Updated weights for policy 0, policy_version 370 (0.0013) [2023-03-27 00:48:29,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1515520. Throughput: 0: 892.9. Samples: 377368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:48:29,830][00453] Avg episode reward: [(0, '9.568')] [2023-03-27 00:48:34,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1531904. Throughput: 0: 852.6. Samples: 382642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:48:34,839][00453] Avg episode reward: [(0, '9.745')] [2023-03-27 00:48:39,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3304.6). Total num frames: 1544192. Throughput: 0: 854.0. Samples: 386492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:48:39,833][00453] Avg episode reward: [(0, '9.723')] [2023-03-27 00:48:43,028][17674] Updated weights for policy 0, policy_version 380 (0.0015) [2023-03-27 00:48:44,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1560576. Throughput: 0: 885.3. Samples: 389036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:48:44,834][00453] Avg episode reward: [(0, '9.935')] [2023-03-27 00:48:49,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 1576960. Throughput: 0: 901.5. Samples: 394396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:48:49,829][00453] Avg episode reward: [(0, '10.160')] [2023-03-27 00:48:54,830][00453] Fps is (10 sec: 2866.6, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 1589248. Throughput: 0: 848.9. Samples: 398462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:48:54,838][00453] Avg episode reward: [(0, '10.650')] [2023-03-27 00:48:54,848][17658] Saving new best policy, reward=10.650! [2023-03-27 00:48:57,863][17674] Updated weights for policy 0, policy_version 390 (0.0012) [2023-03-27 00:48:59,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 1601536. Throughput: 0: 829.1. Samples: 399740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:48:59,831][00453] Avg episode reward: [(0, '10.888')] [2023-03-27 00:48:59,832][17658] Saving new best policy, reward=10.888! [2023-03-27 00:49:04,828][00453] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 1622016. Throughput: 0: 860.4. Samples: 405372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:49:04,830][00453] Avg episode reward: [(0, '11.823')] [2023-03-27 00:49:04,842][17658] Saving new best policy, reward=11.823! [2023-03-27 00:49:08,602][17674] Updated weights for policy 0, policy_version 400 (0.0011) [2023-03-27 00:49:09,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1638400. Throughput: 0: 820.6. Samples: 410718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:49:09,834][00453] Avg episode reward: [(0, '11.766')] [2023-03-27 00:49:14,830][00453] Fps is (10 sec: 2866.7, 60 sec: 3276.7, 300 sec: 3332.3). Total num frames: 1650688. Throughput: 0: 773.7. Samples: 412188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:49:14,832][00453] Avg episode reward: [(0, '11.612')] [2023-03-27 00:49:19,828][00453] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1667072. Throughput: 0: 749.2. Samples: 416354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:49:19,830][00453] Avg episode reward: [(0, '11.927')] [2023-03-27 00:49:19,839][17658] Saving new best policy, reward=11.927! [2023-03-27 00:49:21,830][17674] Updated weights for policy 0, policy_version 410 (0.0030) [2023-03-27 00:49:24,828][00453] Fps is (10 sec: 4096.8, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 1691648. Throughput: 0: 822.7. Samples: 423514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:49:24,829][00453] Avg episode reward: [(0, '11.835')] [2023-03-27 00:49:29,828][00453] Fps is (10 sec: 4505.7, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 1712128. Throughput: 0: 847.6. Samples: 427176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:49:29,832][00453] Avg episode reward: [(0, '12.798')] [2023-03-27 00:49:29,837][17658] Saving new best policy, reward=12.798! [2023-03-27 00:49:32,313][17674] Updated weights for policy 0, policy_version 420 (0.0025) [2023-03-27 00:49:34,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3401.8). Total num frames: 1724416. Throughput: 0: 820.8. Samples: 431334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:49:34,842][00453] Avg episode reward: [(0, '12.746')] [2023-03-27 00:49:39,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3387.9). Total num frames: 1736704. Throughput: 0: 811.1. Samples: 434962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:49:39,833][00453] Avg episode reward: [(0, '12.758')] [2023-03-27 00:49:44,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3346.2). Total num frames: 1748992. Throughput: 0: 825.0. Samples: 436864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:49:44,834][00453] Avg episode reward: [(0, '13.470')] [2023-03-27 00:49:44,847][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_1748992.pth... [2023-03-27 00:49:45,017][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000227_929792.pth [2023-03-27 00:49:45,039][17658] Saving new best policy, reward=13.470! [2023-03-27 00:49:47,768][17674] Updated weights for policy 0, policy_version 430 (0.0011) [2023-03-27 00:49:49,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3360.1). Total num frames: 1765376. Throughput: 0: 801.8. Samples: 441454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:49:49,831][00453] Avg episode reward: [(0, '13.843')] [2023-03-27 00:49:49,837][17658] Saving new best policy, reward=13.843! [2023-03-27 00:49:54,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3208.6, 300 sec: 3374.0). Total num frames: 1781760. Throughput: 0: 776.6. Samples: 445664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:49:54,830][00453] Avg episode reward: [(0, '13.667')] [2023-03-27 00:49:59,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 1794048. Throughput: 0: 789.9. Samples: 447734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:49:59,830][00453] Avg episode reward: [(0, '14.709')] [2023-03-27 00:49:59,885][17658] Saving new best policy, reward=14.709! [2023-03-27 00:50:01,584][17674] Updated weights for policy 0, policy_version 440 (0.0032) [2023-03-27 00:50:04,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3360.1). Total num frames: 1814528. Throughput: 0: 816.4. Samples: 453090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:04,833][00453] Avg episode reward: [(0, '14.670')] [2023-03-27 00:50:09,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 1835008. Throughput: 0: 794.2. Samples: 459252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:50:09,832][00453] Avg episode reward: [(0, '13.902')] [2023-03-27 00:50:12,425][17674] Updated weights for policy 0, policy_version 450 (0.0039) [2023-03-27 00:50:14,832][00453] Fps is (10 sec: 3275.2, 60 sec: 3276.6, 300 sec: 3387.8). Total num frames: 1847296. Throughput: 0: 761.4. Samples: 461442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:50:14,836][00453] Avg episode reward: [(0, '13.436')] [2023-03-27 00:50:19,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 1867776. Throughput: 0: 771.7. Samples: 466060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:50:19,835][00453] Avg episode reward: [(0, '12.950')] [2023-03-27 00:50:23,378][17674] Updated weights for policy 0, policy_version 460 (0.0027) [2023-03-27 00:50:24,828][00453] Fps is (10 sec: 4098.0, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 1888256. Throughput: 0: 839.8. Samples: 472752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:24,835][00453] Avg episode reward: [(0, '12.865')] [2023-03-27 00:50:29,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3415.7). Total num frames: 1908736. Throughput: 0: 874.6. Samples: 476220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:29,832][00453] Avg episode reward: [(0, '13.750')] [2023-03-27 00:50:34,828][00453] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3415.6). Total num frames: 1921024. Throughput: 0: 869.4. Samples: 480578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:34,830][00453] Avg episode reward: [(0, '13.610')] [2023-03-27 00:50:35,180][17674] Updated weights for policy 0, policy_version 470 (0.0018) [2023-03-27 00:50:39,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1941504. Throughput: 0: 894.7. Samples: 485926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:50:39,830][00453] Avg episode reward: [(0, '13.424')] [2023-03-27 00:50:44,616][17674] Updated weights for policy 0, policy_version 480 (0.0017) [2023-03-27 00:50:44,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3457.3). Total num frames: 1966080. Throughput: 0: 927.1. Samples: 489452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:50:44,830][00453] Avg episode reward: [(0, '13.419')] [2023-03-27 00:50:49,831][00453] Fps is (10 sec: 4094.4, 60 sec: 3617.9, 300 sec: 3443.4). Total num frames: 1982464. Throughput: 0: 951.0. Samples: 495888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:50:49,834][00453] Avg episode reward: [(0, '13.906')] [2023-03-27 00:50:54,829][00453] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3457.3). Total num frames: 1998848. Throughput: 0: 907.2. Samples: 500076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:54,832][00453] Avg episode reward: [(0, '14.116')] [2023-03-27 00:50:57,430][17674] Updated weights for policy 0, policy_version 490 (0.0022) [2023-03-27 00:50:59,828][00453] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 2015232. Throughput: 0: 907.4. Samples: 502270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:50:59,835][00453] Avg episode reward: [(0, '14.371')] [2023-03-27 00:51:04,828][00453] Fps is (10 sec: 4096.7, 60 sec: 3754.7, 300 sec: 3443.4). Total num frames: 2039808. Throughput: 0: 957.2. Samples: 509136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:51:04,834][00453] Avg episode reward: [(0, '15.319')] [2023-03-27 00:51:04,848][17658] Saving new best policy, reward=15.319! [2023-03-27 00:51:06,421][17674] Updated weights for policy 0, policy_version 500 (0.0024) [2023-03-27 00:51:09,828][00453] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 2056192. Throughput: 0: 939.2. Samples: 515018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:51:09,830][00453] Avg episode reward: [(0, '14.546')] [2023-03-27 00:51:14,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3686.7, 300 sec: 3429.5). Total num frames: 2068480. Throughput: 0: 902.6. Samples: 516836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:51:14,835][00453] Avg episode reward: [(0, '14.542')] [2023-03-27 00:51:19,829][00453] Fps is (10 sec: 2457.2, 60 sec: 3549.8, 300 sec: 3401.7). Total num frames: 2080768. Throughput: 0: 860.6. Samples: 519308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:51:19,832][00453] Avg episode reward: [(0, '14.894')] [2023-03-27 00:51:21,398][17674] Updated weights for policy 0, policy_version 510 (0.0012) [2023-03-27 00:51:24,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 2101248. Throughput: 0: 895.7. Samples: 526234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:51:24,834][00453] Avg episode reward: [(0, '16.794')] [2023-03-27 00:51:24,851][17658] Saving new best policy, reward=16.794! [2023-03-27 00:51:29,834][00453] Fps is (10 sec: 3684.6, 60 sec: 3481.2, 300 sec: 3387.8). Total num frames: 2117632. Throughput: 0: 883.2. Samples: 529202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:51:29,837][00453] Avg episode reward: [(0, '16.844')] [2023-03-27 00:51:29,844][17658] Saving new best policy, reward=16.844! [2023-03-27 00:51:33,600][17674] Updated weights for policy 0, policy_version 520 (0.0031) [2023-03-27 00:51:34,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2129920. Throughput: 0: 822.6. Samples: 532902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:51:34,832][00453] Avg episode reward: [(0, '17.299')] [2023-03-27 00:51:34,852][17658] Saving new best policy, reward=17.299! [2023-03-27 00:51:39,828][00453] Fps is (10 sec: 3279.0, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2150400. Throughput: 0: 848.3. Samples: 538248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:51:39,833][00453] Avg episode reward: [(0, '17.656')] [2023-03-27 00:51:39,838][17658] Saving new best policy, reward=17.656! [2023-03-27 00:51:43,762][17674] Updated weights for policy 0, policy_version 530 (0.0019) [2023-03-27 00:51:44,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2174976. Throughput: 0: 875.2. Samples: 541652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:51:44,829][00453] Avg episode reward: [(0, '17.825')] [2023-03-27 00:51:44,843][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth... [2023-03-27 00:51:44,966][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000331_1355776.pth [2023-03-27 00:51:44,989][17658] Saving new best policy, reward=17.825! [2023-03-27 00:51:49,829][00453] Fps is (10 sec: 4095.2, 60 sec: 3481.7, 300 sec: 3401.7). Total num frames: 2191360. Throughput: 0: 853.9. Samples: 547562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:51:49,833][00453] Avg episode reward: [(0, '17.180')] [2023-03-27 00:51:54,828][00453] Fps is (10 sec: 2867.0, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 2203648. Throughput: 0: 818.8. Samples: 551866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:51:54,830][00453] Avg episode reward: [(0, '16.851')] [2023-03-27 00:51:56,764][17674] Updated weights for policy 0, policy_version 540 (0.0022) [2023-03-27 00:51:59,828][00453] Fps is (10 sec: 3277.3, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 2224128. Throughput: 0: 834.3. Samples: 554380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:51:59,830][00453] Avg episode reward: [(0, '16.377')] [2023-03-27 00:52:04,828][00453] Fps is (10 sec: 4505.9, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2248704. Throughput: 0: 931.3. Samples: 561214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:52:04,833][00453] Avg episode reward: [(0, '17.444')] [2023-03-27 00:52:05,581][17674] Updated weights for policy 0, policy_version 550 (0.0014) [2023-03-27 00:52:09,828][00453] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2265088. Throughput: 0: 903.3. Samples: 566882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:52:09,832][00453] Avg episode reward: [(0, '17.662')] [2023-03-27 00:52:14,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 2281472. Throughput: 0: 886.5. Samples: 569090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:52:14,834][00453] Avg episode reward: [(0, '18.182')] [2023-03-27 00:52:14,847][17658] Saving new best policy, reward=18.182! [2023-03-27 00:52:18,927][17674] Updated weights for policy 0, policy_version 560 (0.0021) [2023-03-27 00:52:19,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3387.9). Total num frames: 2297856. Throughput: 0: 903.3. Samples: 573552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:52:19,830][00453] Avg episode reward: [(0, '17.172')] [2023-03-27 00:52:24,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3415.6). Total num frames: 2318336. Throughput: 0: 938.4. Samples: 580478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:52:24,836][00453] Avg episode reward: [(0, '17.849')] [2023-03-27 00:52:28,819][17674] Updated weights for policy 0, policy_version 570 (0.0017) [2023-03-27 00:52:29,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3443.4). Total num frames: 2334720. Throughput: 0: 928.3. Samples: 583426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-03-27 00:52:29,833][00453] Avg episode reward: [(0, '17.311')] [2023-03-27 00:52:34,829][00453] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3457.3). Total num frames: 2351104. Throughput: 0: 896.1. Samples: 587884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-03-27 00:52:34,835][00453] Avg episode reward: [(0, '17.690')] [2023-03-27 00:52:39,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3443.4). Total num frames: 2371584. Throughput: 0: 934.9. Samples: 593938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:52:39,830][00453] Avg episode reward: [(0, '17.613')] [2023-03-27 00:52:39,907][17674] Updated weights for policy 0, policy_version 580 (0.0027) [2023-03-27 00:52:44,828][00453] Fps is (10 sec: 4096.5, 60 sec: 3618.1, 300 sec: 3429.5). Total num frames: 2392064. Throughput: 0: 957.6. Samples: 597474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:52:44,830][00453] Avg episode reward: [(0, '17.342')] [2023-03-27 00:52:49,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3457.3). Total num frames: 2408448. Throughput: 0: 916.4. Samples: 602454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:52:49,836][00453] Avg episode reward: [(0, '17.872')] [2023-03-27 00:52:51,558][17674] Updated weights for policy 0, policy_version 590 (0.0011) [2023-03-27 00:52:54,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3485.1). Total num frames: 2424832. Throughput: 0: 886.7. Samples: 606784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:52:54,831][00453] Avg episode reward: [(0, '17.885')] [2023-03-27 00:52:59,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3485.1). Total num frames: 2445312. Throughput: 0: 904.3. Samples: 609782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:52:59,830][00453] Avg episode reward: [(0, '18.287')] [2023-03-27 00:52:59,837][17658] Saving new best policy, reward=18.287! [2023-03-27 00:53:01,970][17674] Updated weights for policy 0, policy_version 600 (0.0029) [2023-03-27 00:53:04,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2469888. Throughput: 0: 957.8. Samples: 616654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:53:04,829][00453] Avg episode reward: [(0, '17.864')] [2023-03-27 00:53:09,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2486272. Throughput: 0: 922.0. Samples: 621970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:53:09,831][00453] Avg episode reward: [(0, '18.205')] [2023-03-27 00:53:13,907][17674] Updated weights for policy 0, policy_version 610 (0.0017) [2023-03-27 00:53:14,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2498560. Throughput: 0: 903.8. Samples: 624098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:53:14,834][00453] Avg episode reward: [(0, '17.820')] [2023-03-27 00:53:19,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3485.1). Total num frames: 2523136. Throughput: 0: 933.7. Samples: 629898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:53:19,832][00453] Avg episode reward: [(0, '18.554')] [2023-03-27 00:53:19,838][17658] Saving new best policy, reward=18.554! [2023-03-27 00:53:23,331][17674] Updated weights for policy 0, policy_version 620 (0.0011) [2023-03-27 00:53:24,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3485.1). Total num frames: 2543616. Throughput: 0: 952.3. Samples: 636790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:53:24,835][00453] Avg episode reward: [(0, '18.379')] [2023-03-27 00:53:29,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3485.1). Total num frames: 2560000. Throughput: 0: 933.8. Samples: 639496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:53:29,830][00453] Avg episode reward: [(0, '18.958')] [2023-03-27 00:53:29,832][17658] Saving new best policy, reward=18.958! [2023-03-27 00:53:34,828][00453] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3499.0). Total num frames: 2576384. Throughput: 0: 921.4. Samples: 643916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:53:34,833][00453] Avg episode reward: [(0, '19.689')] [2023-03-27 00:53:34,851][17658] Saving new best policy, reward=19.689! [2023-03-27 00:53:35,593][17674] Updated weights for policy 0, policy_version 630 (0.0028) [2023-03-27 00:53:39,828][00453] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3526.7). Total num frames: 2600960. Throughput: 0: 968.1. Samples: 650350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:53:39,835][00453] Avg episode reward: [(0, '20.872')] [2023-03-27 00:53:39,839][17658] Saving new best policy, reward=20.872! [2023-03-27 00:53:44,140][17674] Updated weights for policy 0, policy_version 640 (0.0011) [2023-03-27 00:53:44,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3540.6). Total num frames: 2621440. Throughput: 0: 979.2. Samples: 653848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:53:44,835][00453] Avg episode reward: [(0, '20.402')] [2023-03-27 00:53:44,846][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth... [2023-03-27 00:53:44,955][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_1748992.pth [2023-03-27 00:53:49,828][00453] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3554.5). Total num frames: 2637824. Throughput: 0: 953.3. Samples: 659554. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:53:49,832][00453] Avg episode reward: [(0, '20.544')] [2023-03-27 00:53:54,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3568.4). Total num frames: 2654208. Throughput: 0: 937.2. Samples: 664144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:53:54,835][00453] Avg episode reward: [(0, '20.346')] [2023-03-27 00:53:56,507][17674] Updated weights for policy 0, policy_version 650 (0.0017) [2023-03-27 00:53:59,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3568.4). Total num frames: 2674688. Throughput: 0: 948.6. Samples: 666784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:53:59,829][00453] Avg episode reward: [(0, '19.519')] [2023-03-27 00:54:04,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2691072. Throughput: 0: 947.1. Samples: 672518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:54:04,835][00453] Avg episode reward: [(0, '20.615')] [2023-03-27 00:54:08,890][17674] Updated weights for policy 0, policy_version 660 (0.0032) [2023-03-27 00:54:09,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2703360. Throughput: 0: 889.2. Samples: 676802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:09,830][00453] Avg episode reward: [(0, '19.492')] [2023-03-27 00:54:14,828][00453] Fps is (10 sec: 2457.6, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2715648. Throughput: 0: 862.2. Samples: 678294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:14,830][00453] Avg episode reward: [(0, '20.013')] [2023-03-27 00:54:19,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2740224. Throughput: 0: 898.2. Samples: 684336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:54:19,830][00453] Avg episode reward: [(0, '20.762')] [2023-03-27 00:54:20,397][17674] Updated weights for policy 0, policy_version 670 (0.0014) [2023-03-27 00:54:24,829][00453] Fps is (10 sec: 4914.3, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 2764800. Throughput: 0: 912.1. Samples: 691394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:24,832][00453] Avg episode reward: [(0, '20.607')] [2023-03-27 00:54:29,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2777088. Throughput: 0: 885.8. Samples: 693710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:29,830][00453] Avg episode reward: [(0, '18.828')] [2023-03-27 00:54:31,886][17674] Updated weights for policy 0, policy_version 680 (0.0023) [2023-03-27 00:54:34,828][00453] Fps is (10 sec: 2867.7, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 2793472. Throughput: 0: 856.2. Samples: 698082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:54:34,830][00453] Avg episode reward: [(0, '18.234')] [2023-03-27 00:54:39,828][00453] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2818048. Throughput: 0: 903.5. Samples: 704800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:54:39,833][00453] Avg episode reward: [(0, '17.542')] [2023-03-27 00:54:41,373][17674] Updated weights for policy 0, policy_version 690 (0.0022) [2023-03-27 00:54:44,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2838528. Throughput: 0: 921.3. Samples: 708242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:44,830][00453] Avg episode reward: [(0, '17.549')] [2023-03-27 00:54:49,828][00453] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2854912. Throughput: 0: 908.8. Samples: 713414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:54:49,829][00453] Avg episode reward: [(0, '18.004')] [2023-03-27 00:54:53,583][17674] Updated weights for policy 0, policy_version 700 (0.0016) [2023-03-27 00:54:54,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2871296. Throughput: 0: 917.6. Samples: 718096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:54:54,830][00453] Avg episode reward: [(0, '17.649')] [2023-03-27 00:54:59,828][00453] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2891776. Throughput: 0: 958.6. Samples: 721430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:54:59,829][00453] Avg episode reward: [(0, '17.943')] [2023-03-27 00:55:02,798][17674] Updated weights for policy 0, policy_version 710 (0.0012) [2023-03-27 00:55:04,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2916352. Throughput: 0: 979.6. Samples: 728418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:04,831][00453] Avg episode reward: [(0, '18.630')] [2023-03-27 00:55:09,828][00453] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2928640. Throughput: 0: 927.6. Samples: 733136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:09,830][00453] Avg episode reward: [(0, '17.868')] [2023-03-27 00:55:14,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 2945024. Throughput: 0: 925.5. Samples: 735356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:55:14,830][00453] Avg episode reward: [(0, '17.488')] [2023-03-27 00:55:15,137][17674] Updated weights for policy 0, policy_version 720 (0.0020) [2023-03-27 00:55:19,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2969600. Throughput: 0: 967.8. Samples: 741632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:55:19,830][00453] Avg episode reward: [(0, '17.354')] [2023-03-27 00:55:24,081][17674] Updated weights for policy 0, policy_version 730 (0.0011) [2023-03-27 00:55:24,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3665.6). Total num frames: 2990080. Throughput: 0: 968.5. Samples: 748382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:24,830][00453] Avg episode reward: [(0, '16.447')] [2023-03-27 00:55:29,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 3006464. Throughput: 0: 941.7. Samples: 750620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:29,832][00453] Avg episode reward: [(0, '17.186')] [2023-03-27 00:55:34,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 3022848. Throughput: 0: 926.2. Samples: 755094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:34,830][00453] Avg episode reward: [(0, '17.865')] [2023-03-27 00:55:36,314][17674] Updated weights for policy 0, policy_version 740 (0.0015) [2023-03-27 00:55:39,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 3047424. Throughput: 0: 974.5. Samples: 761950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:39,830][00453] Avg episode reward: [(0, '18.175')] [2023-03-27 00:55:44,828][00453] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 3067904. Throughput: 0: 979.7. Samples: 765518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:55:44,833][00453] Avg episode reward: [(0, '19.188')] [2023-03-27 00:55:44,846][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000749_3067904.pth... [2023-03-27 00:55:44,993][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth [2023-03-27 00:55:46,068][17674] Updated weights for policy 0, policy_version 750 (0.0011) [2023-03-27 00:55:49,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3080192. Throughput: 0: 928.0. Samples: 770178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:55:49,835][00453] Avg episode reward: [(0, '20.171')] [2023-03-27 00:55:54,828][00453] Fps is (10 sec: 2867.3, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3096576. Throughput: 0: 925.2. Samples: 774772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:55:54,835][00453] Avg episode reward: [(0, '20.217')] [2023-03-27 00:55:57,771][17674] Updated weights for policy 0, policy_version 760 (0.0017) [2023-03-27 00:55:59,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3665.6). Total num frames: 3121152. Throughput: 0: 953.8. Samples: 778278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:55:59,830][00453] Avg episode reward: [(0, '20.518')] [2023-03-27 00:56:04,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3137536. Throughput: 0: 943.1. Samples: 784072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:56:04,833][00453] Avg episode reward: [(0, '20.891')] [2023-03-27 00:56:04,846][17658] Saving new best policy, reward=20.891! [2023-03-27 00:56:09,828][00453] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3149824. Throughput: 0: 884.3. Samples: 788176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:56:09,830][00453] Avg episode reward: [(0, '22.898')] [2023-03-27 00:56:09,835][17658] Saving new best policy, reward=22.898! [2023-03-27 00:56:10,812][17674] Updated weights for policy 0, policy_version 770 (0.0017) [2023-03-27 00:56:14,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3166208. Throughput: 0: 878.9. Samples: 790172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:56:14,836][00453] Avg episode reward: [(0, '23.162')] [2023-03-27 00:56:14,848][17658] Saving new best policy, reward=23.162! [2023-03-27 00:56:19,828][00453] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3190784. Throughput: 0: 922.9. Samples: 796626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:56:19,830][00453] Avg episode reward: [(0, '23.517')] [2023-03-27 00:56:19,833][17658] Saving new best policy, reward=23.517! [2023-03-27 00:56:20,626][17674] Updated weights for policy 0, policy_version 780 (0.0020) [2023-03-27 00:56:24,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 3207168. Throughput: 0: 908.7. Samples: 802840. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:56:24,831][00453] Avg episode reward: [(0, '23.268')] [2023-03-27 00:56:29,828][00453] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3223552. Throughput: 0: 881.2. Samples: 805170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:56:29,832][00453] Avg episode reward: [(0, '23.044')] [2023-03-27 00:56:33,050][17674] Updated weights for policy 0, policy_version 790 (0.0035) [2023-03-27 00:56:34,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3244032. Throughput: 0: 884.5. Samples: 809980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:56:34,830][00453] Avg episode reward: [(0, '21.888')] [2023-03-27 00:56:39,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3268608. Throughput: 0: 944.2. Samples: 817260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:56:39,833][00453] Avg episode reward: [(0, '20.251')] [2023-03-27 00:56:41,445][17674] Updated weights for policy 0, policy_version 800 (0.0018) [2023-03-27 00:56:44,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3289088. Throughput: 0: 946.5. Samples: 820872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:56:44,837][00453] Avg episode reward: [(0, '19.962')] [2023-03-27 00:56:49,830][00453] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3721.1). Total num frames: 3301376. Throughput: 0: 920.0. Samples: 825474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:56:49,836][00453] Avg episode reward: [(0, '20.007')] [2023-03-27 00:56:53,805][17674] Updated weights for policy 0, policy_version 810 (0.0018) [2023-03-27 00:56:54,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3321856. Throughput: 0: 947.7. Samples: 830822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-27 00:56:54,830][00453] Avg episode reward: [(0, '21.068')] [2023-03-27 00:56:59,828][00453] Fps is (10 sec: 4506.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3346432. Throughput: 0: 982.6. Samples: 834388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:56:59,836][00453] Avg episode reward: [(0, '21.478')] [2023-03-27 00:57:02,498][17674] Updated weights for policy 0, policy_version 820 (0.0025) [2023-03-27 00:57:04,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3362816. Throughput: 0: 983.0. Samples: 840862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:57:04,834][00453] Avg episode reward: [(0, '21.317')] [2023-03-27 00:57:09,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3379200. Throughput: 0: 943.8. Samples: 845310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:57:09,833][00453] Avg episode reward: [(0, '21.397')] [2023-03-27 00:57:14,806][17674] Updated weights for policy 0, policy_version 830 (0.0036) [2023-03-27 00:57:14,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3399680. Throughput: 0: 945.2. Samples: 847704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:57:14,830][00453] Avg episode reward: [(0, '21.242')] [2023-03-27 00:57:19,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 3420160. Throughput: 0: 990.6. Samples: 854556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:57:19,830][00453] Avg episode reward: [(0, '20.408')] [2023-03-27 00:57:24,834][00453] Fps is (10 sec: 3683.9, 60 sec: 3822.5, 300 sec: 3734.9). Total num frames: 3436544. Throughput: 0: 957.1. Samples: 860336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:57:24,850][00453] Avg episode reward: [(0, '19.716')] [2023-03-27 00:57:25,033][17674] Updated weights for policy 0, policy_version 840 (0.0022) [2023-03-27 00:57:29,829][00453] Fps is (10 sec: 3276.2, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 3452928. Throughput: 0: 923.3. Samples: 862420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:57:29,833][00453] Avg episode reward: [(0, '20.328')] [2023-03-27 00:57:34,828][00453] Fps is (10 sec: 3688.8, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3473408. Throughput: 0: 932.1. Samples: 867414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:57:34,832][00453] Avg episode reward: [(0, '20.978')] [2023-03-27 00:57:36,474][17674] Updated weights for policy 0, policy_version 850 (0.0024) [2023-03-27 00:57:39,828][00453] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3493888. Throughput: 0: 972.4. Samples: 874580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-27 00:57:39,830][00453] Avg episode reward: [(0, '22.225')] [2023-03-27 00:57:44,828][00453] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3514368. Throughput: 0: 967.0. Samples: 877902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:57:44,830][00453] Avg episode reward: [(0, '22.041')] [2023-03-27 00:57:44,846][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000858_3514368.pth... [2023-03-27 00:57:45,000][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000640_2621440.pth [2023-03-27 00:57:47,192][17674] Updated weights for policy 0, policy_version 860 (0.0023) [2023-03-27 00:57:49,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 3526656. Throughput: 0: 917.4. Samples: 882144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:57:49,830][00453] Avg episode reward: [(0, '23.048')] [2023-03-27 00:57:54,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3547136. Throughput: 0: 940.4. Samples: 887626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:57:54,830][00453] Avg episode reward: [(0, '23.704')] [2023-03-27 00:57:54,845][17658] Saving new best policy, reward=23.704! [2023-03-27 00:57:57,873][17674] Updated weights for policy 0, policy_version 870 (0.0030) [2023-03-27 00:57:59,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3571712. Throughput: 0: 961.0. Samples: 890948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:57:59,830][00453] Avg episode reward: [(0, '24.518')] [2023-03-27 00:57:59,833][17658] Saving new best policy, reward=24.518! [2023-03-27 00:58:04,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3588096. Throughput: 0: 942.2. Samples: 896956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:58:04,830][00453] Avg episode reward: [(0, '22.862')] [2023-03-27 00:58:09,704][17674] Updated weights for policy 0, policy_version 880 (0.0021) [2023-03-27 00:58:09,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3604480. Throughput: 0: 915.4. Samples: 901522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:58:09,830][00453] Avg episode reward: [(0, '22.781')] [2023-03-27 00:58:14,828][00453] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3624960. Throughput: 0: 935.1. Samples: 904496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:58:14,835][00453] Avg episode reward: [(0, '22.437')] [2023-03-27 00:58:18,602][17674] Updated weights for policy 0, policy_version 890 (0.0011) [2023-03-27 00:58:19,828][00453] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3649536. Throughput: 0: 983.2. Samples: 911658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:58:19,830][00453] Avg episode reward: [(0, '21.002')] [2023-03-27 00:58:24,828][00453] Fps is (10 sec: 4096.1, 60 sec: 3823.4, 300 sec: 3748.9). Total num frames: 3665920. Throughput: 0: 946.1. Samples: 917156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:58:24,830][00453] Avg episode reward: [(0, '20.111')] [2023-03-27 00:58:29,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 3682304. Throughput: 0: 923.2. Samples: 919448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:58:29,831][00453] Avg episode reward: [(0, '20.431')] [2023-03-27 00:58:30,784][17674] Updated weights for policy 0, policy_version 900 (0.0026) [2023-03-27 00:58:34,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3702784. Throughput: 0: 955.4. Samples: 925138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:58:34,835][00453] Avg episode reward: [(0, '21.708')] [2023-03-27 00:58:39,540][17674] Updated weights for policy 0, policy_version 910 (0.0012) [2023-03-27 00:58:39,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3727360. Throughput: 0: 990.7. Samples: 932208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:58:39,830][00453] Avg episode reward: [(0, '21.167')] [2023-03-27 00:58:44,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3743744. Throughput: 0: 977.8. Samples: 934948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:58:44,831][00453] Avg episode reward: [(0, '21.254')] [2023-03-27 00:58:49,828][00453] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3756032. Throughput: 0: 941.3. Samples: 939314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:58:49,835][00453] Avg episode reward: [(0, '21.184')] [2023-03-27 00:58:51,777][17674] Updated weights for policy 0, policy_version 920 (0.0029) [2023-03-27 00:58:54,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3780608. Throughput: 0: 981.7. Samples: 945700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:58:54,830][00453] Avg episode reward: [(0, '21.569')] [2023-03-27 00:58:59,828][00453] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3805184. Throughput: 0: 998.1. Samples: 949408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-27 00:58:59,833][00453] Avg episode reward: [(0, '21.527')] [2023-03-27 00:59:00,292][17674] Updated weights for policy 0, policy_version 930 (0.0012) [2023-03-27 00:59:04,830][00453] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 3821568. Throughput: 0: 968.8. Samples: 955254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:59:04,833][00453] Avg episode reward: [(0, '21.340')] [2023-03-27 00:59:09,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3837952. Throughput: 0: 948.0. Samples: 959814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:59:09,830][00453] Avg episode reward: [(0, '21.401')] [2023-03-27 00:59:12,372][17674] Updated weights for policy 0, policy_version 940 (0.0012) [2023-03-27 00:59:14,828][00453] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3858432. Throughput: 0: 969.9. Samples: 963094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:59:14,833][00453] Avg episode reward: [(0, '23.293')] [2023-03-27 00:59:19,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 3883008. Throughput: 0: 1003.7. Samples: 970306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:59:19,834][00453] Avg episode reward: [(0, '22.652')] [2023-03-27 00:59:21,197][17674] Updated weights for policy 0, policy_version 950 (0.0011) [2023-03-27 00:59:24,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3899392. Throughput: 0: 961.6. Samples: 975478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-27 00:59:24,831][00453] Avg episode reward: [(0, '22.069')] [2023-03-27 00:59:29,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3915776. Throughput: 0: 948.2. Samples: 977618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:59:29,833][00453] Avg episode reward: [(0, '23.893')] [2023-03-27 00:59:33,294][17674] Updated weights for policy 0, policy_version 960 (0.0019) [2023-03-27 00:59:34,828][00453] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3936256. Throughput: 0: 987.2. Samples: 983740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:59:34,834][00453] Avg episode reward: [(0, '23.765')] [2023-03-27 00:59:39,828][00453] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3960832. Throughput: 0: 1007.0. Samples: 991016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-27 00:59:39,834][00453] Avg episode reward: [(0, '23.544')] [2023-03-27 00:59:42,912][17674] Updated weights for policy 0, policy_version 970 (0.0011) [2023-03-27 00:59:44,828][00453] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3977216. Throughput: 0: 980.8. Samples: 993544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-27 00:59:44,831][00453] Avg episode reward: [(0, '23.531')] [2023-03-27 00:59:44,841][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000971_3977216.pth... [2023-03-27 00:59:45,021][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000749_3067904.pth [2023-03-27 00:59:49,828][00453] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3993600. Throughput: 0: 950.1. Samples: 998008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-27 00:59:49,834][00453] Avg episode reward: [(0, '23.660')] [2023-03-27 00:59:52,411][00453] Component Batcher_0 stopped! [2023-03-27 00:59:52,413][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-27 00:59:52,414][17658] Stopping Batcher_0... [2023-03-27 00:59:52,425][17658] Loop batcher_evt_loop terminating... [2023-03-27 00:59:52,464][17674] Weights refcount: 2 0 [2023-03-27 00:59:52,467][00453] Component InferenceWorker_p0-w0 stopped! [2023-03-27 00:59:52,470][17674] Stopping InferenceWorker_p0-w0... [2023-03-27 00:59:52,483][17674] Loop inference_proc0-0_evt_loop terminating... [2023-03-27 00:59:52,492][17676] Stopping RolloutWorker_w3... [2023-03-27 00:59:52,492][00453] Component RolloutWorker_w3 stopped! [2023-03-27 00:59:52,503][17673] Stopping RolloutWorker_w1... [2023-03-27 00:59:52,504][17673] Loop rollout_proc1_evt_loop terminating... [2023-03-27 00:59:52,505][17676] Loop rollout_proc3_evt_loop terminating... [2023-03-27 00:59:52,508][17682] Stopping RolloutWorker_w7... [2023-03-27 00:59:52,509][17680] Stopping RolloutWorker_w5... [2023-03-27 00:59:52,510][17682] Loop rollout_proc7_evt_loop terminating... [2023-03-27 00:59:52,510][17680] Loop rollout_proc5_evt_loop terminating... [2023-03-27 00:59:52,503][00453] Component RolloutWorker_w1 stopped! [2023-03-27 00:59:52,513][00453] Component RolloutWorker_w7 stopped! [2023-03-27 00:59:52,517][00453] Component RolloutWorker_w5 stopped! [2023-03-27 00:59:52,544][00453] Component RolloutWorker_w2 stopped! [2023-03-27 00:59:52,546][17679] Stopping RolloutWorker_w2... [2023-03-27 00:59:52,551][17679] Loop rollout_proc2_evt_loop terminating... [2023-03-27 00:59:52,553][00453] Component RolloutWorker_w4 stopped! [2023-03-27 00:59:52,580][00453] Component RolloutWorker_w6 stopped! [2023-03-27 00:59:52,582][00453] Component RolloutWorker_w0 stopped! [2023-03-27 00:59:52,555][17678] Stopping RolloutWorker_w4... [2023-03-27 00:59:52,563][17681] Stopping RolloutWorker_w6... [2023-03-27 00:59:52,577][17671] Stopping RolloutWorker_w0... [2023-03-27 00:59:52,584][17678] Loop rollout_proc4_evt_loop terminating... [2023-03-27 00:59:52,585][17681] Loop rollout_proc6_evt_loop terminating... [2023-03-27 00:59:52,597][17671] Loop rollout_proc0_evt_loop terminating... [2023-03-27 00:59:52,643][17658] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000858_3514368.pth [2023-03-27 00:59:52,662][17658] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-27 00:59:52,843][00453] Component LearnerWorker_p0 stopped! [2023-03-27 00:59:52,845][00453] Waiting for process learner_proc0 to stop... [2023-03-27 00:59:52,848][17658] Stopping LearnerWorker_p0... [2023-03-27 00:59:52,851][17658] Loop learner_proc0_evt_loop terminating... [2023-03-27 00:59:54,705][00453] Waiting for process inference_proc0-0 to join... [2023-03-27 00:59:54,916][00453] Waiting for process rollout_proc0 to join... [2023-03-27 00:59:55,422][00453] Waiting for process rollout_proc1 to join... [2023-03-27 00:59:55,424][00453] Waiting for process rollout_proc2 to join... [2023-03-27 00:59:55,427][00453] Waiting for process rollout_proc3 to join... [2023-03-27 00:59:55,430][00453] Waiting for process rollout_proc4 to join... [2023-03-27 00:59:55,433][00453] Waiting for process rollout_proc5 to join... [2023-03-27 00:59:55,435][00453] Waiting for process rollout_proc6 to join... [2023-03-27 00:59:55,437][00453] Waiting for process rollout_proc7 to join... [2023-03-27 00:59:55,439][00453] Batcher 0 profile tree view: batching: 26.0807, releasing_batches: 0.0244 [2023-03-27 00:59:55,442][00453] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 593.3086 update_model: 7.7360 weight_update: 0.0011 one_step: 0.0024 handle_policy_step: 535.7402 deserialize: 15.6303, stack: 2.9852, obs_to_device_normalize: 117.2372, forward: 255.3240, send_messages: 29.6418 prepare_outputs: 87.2372 to_cpu: 53.3602 [2023-03-27 00:59:55,443][00453] Learner 0 profile tree view: misc: 0.0054, prepare_batch: 19.2895 train: 77.4745 epoch_init: 0.0057, minibatch_init: 0.0193, losses_postprocess: 0.6649, kl_divergence: 0.5757, after_optimizer: 33.3207 calculate_losses: 27.4732 losses_init: 0.0062, forward_head: 1.7666, bptt_initial: 18.0663, tail: 1.1123, advantages_returns: 0.2559, losses: 3.7402 bptt: 2.1515 bptt_forward_core: 2.0827 update: 14.7421 clip: 1.4334 [2023-03-27 00:59:55,446][00453] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3869, enqueue_policy_requests: 164.7930, env_step: 878.4039, overhead: 22.4423, complete_rollouts: 7.9020 save_policy_outputs: 20.4083 split_output_tensors: 9.9071 [2023-03-27 00:59:55,451][00453] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4203, enqueue_policy_requests: 170.8000, env_step: 876.5041, overhead: 21.7759, complete_rollouts: 7.4201 save_policy_outputs: 20.1820 split_output_tensors: 9.6807 [2023-03-27 00:59:55,452][00453] Loop Runner_EvtLoop terminating... [2023-03-27 00:59:55,459][00453] Runner profile tree view: main_loop: 1205.7407 [2023-03-27 00:59:55,460][00453] Collected {0: 4005888}, FPS: 3322.3 [2023-03-27 00:59:56,020][00453] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-27 00:59:56,022][00453] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-27 00:59:56,025][00453] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-27 00:59:56,027][00453] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-27 00:59:56,030][00453] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-27 00:59:56,032][00453] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-27 00:59:56,034][00453] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-03-27 00:59:56,036][00453] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-27 00:59:56,037][00453] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-03-27 00:59:56,038][00453] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-03-27 00:59:56,040][00453] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-27 00:59:56,041][00453] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-27 00:59:56,042][00453] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-27 00:59:56,044][00453] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-27 00:59:56,045][00453] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-27 00:59:56,070][00453] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-27 00:59:56,073][00453] RunningMeanStd input shape: (3, 72, 128) [2023-03-27 00:59:56,075][00453] RunningMeanStd input shape: (1,) [2023-03-27 00:59:56,091][00453] ConvEncoder: input_channels=3 [2023-03-27 00:59:56,225][00453] Conv encoder output size: 512 [2023-03-27 00:59:56,227][00453] Policy head output size: 512 [2023-03-27 00:59:58,643][00453] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-27 01:00:00,041][00453] Num frames 100... [2023-03-27 01:00:00,194][00453] Num frames 200... [2023-03-27 01:00:00,358][00453] Num frames 300... [2023-03-27 01:00:00,512][00453] Num frames 400... [2023-03-27 01:00:00,668][00453] Num frames 500... [2023-03-27 01:00:00,791][00453] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 [2023-03-27 01:00:00,794][00453] Avg episode reward: 7.440, avg true_objective: 5.440 [2023-03-27 01:00:00,886][00453] Num frames 600... [2023-03-27 01:00:01,038][00453] Num frames 700... [2023-03-27 01:00:01,193][00453] Num frames 800... [2023-03-27 01:00:01,346][00453] Num frames 900... [2023-03-27 01:00:01,539][00453] Avg episode rewards: #0: 7.460, true rewards: #0: 4.960 [2023-03-27 01:00:01,540][00453] Avg episode reward: 7.460, avg true_objective: 4.960 [2023-03-27 01:00:01,556][00453] Num frames 1000... [2023-03-27 01:00:01,726][00453] Num frames 1100... [2023-03-27 01:00:01,891][00453] Num frames 1200... [2023-03-27 01:00:02,054][00453] Num frames 1300... [2023-03-27 01:00:02,217][00453] Num frames 1400... [2023-03-27 01:00:02,376][00453] Num frames 1500... [2023-03-27 01:00:02,545][00453] Num frames 1600... [2023-03-27 01:00:02,705][00453] Num frames 1700... [2023-03-27 01:00:02,880][00453] Num frames 1800... [2023-03-27 01:00:03,040][00453] Num frames 1900... [2023-03-27 01:00:03,189][00453] Num frames 2000... [2023-03-27 01:00:03,309][00453] Num frames 2100... [2023-03-27 01:00:03,425][00453] Num frames 2200... [2023-03-27 01:00:03,524][00453] Avg episode rewards: #0: 13.800, true rewards: #0: 7.467 [2023-03-27 01:00:03,525][00453] Avg episode reward: 13.800, avg true_objective: 7.467 [2023-03-27 01:00:03,596][00453] Num frames 2300... [2023-03-27 01:00:03,711][00453] Num frames 2400... [2023-03-27 01:00:03,829][00453] Num frames 2500... [2023-03-27 01:00:03,951][00453] Num frames 2600... [2023-03-27 01:00:04,067][00453] Num frames 2700... [2023-03-27 01:00:04,185][00453] Num frames 2800... [2023-03-27 01:00:04,297][00453] Avg episode rewards: #0: 13.120, true rewards: #0: 7.120 [2023-03-27 01:00:04,298][00453] Avg episode reward: 13.120, avg true_objective: 7.120 [2023-03-27 01:00:04,370][00453] Num frames 2900... [2023-03-27 01:00:04,487][00453] Num frames 3000... [2023-03-27 01:00:04,599][00453] Num frames 3100... [2023-03-27 01:00:04,709][00453] Num frames 3200... [2023-03-27 01:00:04,825][00453] Num frames 3300... [2023-03-27 01:00:04,952][00453] Num frames 3400... [2023-03-27 01:00:05,063][00453] Num frames 3500... [2023-03-27 01:00:05,176][00453] Num frames 3600... [2023-03-27 01:00:05,296][00453] Num frames 3700... [2023-03-27 01:00:05,414][00453] Num frames 3800... [2023-03-27 01:00:05,558][00453] Avg episode rewards: #0: 15.346, true rewards: #0: 7.746 [2023-03-27 01:00:05,562][00453] Avg episode reward: 15.346, avg true_objective: 7.746 [2023-03-27 01:00:05,593][00453] Num frames 3900... [2023-03-27 01:00:05,708][00453] Num frames 4000... [2023-03-27 01:00:05,825][00453] Num frames 4100... [2023-03-27 01:00:05,950][00453] Num frames 4200... [2023-03-27 01:00:06,065][00453] Num frames 4300... [2023-03-27 01:00:06,179][00453] Num frames 4400... [2023-03-27 01:00:06,292][00453] Num frames 4500... [2023-03-27 01:00:06,408][00453] Num frames 4600... [2023-03-27 01:00:06,518][00453] Num frames 4700... [2023-03-27 01:00:06,635][00453] Num frames 4800... [2023-03-27 01:00:06,693][00453] Avg episode rewards: #0: 15.502, true rewards: #0: 8.002 [2023-03-27 01:00:06,695][00453] Avg episode reward: 15.502, avg true_objective: 8.002 [2023-03-27 01:00:06,806][00453] Num frames 4900... [2023-03-27 01:00:06,927][00453] Num frames 5000... [2023-03-27 01:00:07,044][00453] Num frames 5100... [2023-03-27 01:00:07,162][00453] Num frames 5200... [2023-03-27 01:00:07,277][00453] Num frames 5300... [2023-03-27 01:00:07,392][00453] Num frames 5400... [2023-03-27 01:00:07,510][00453] Num frames 5500... [2023-03-27 01:00:07,643][00453] Num frames 5600... [2023-03-27 01:00:07,813][00453] Avg episode rewards: #0: 15.853, true rewards: #0: 8.139 [2023-03-27 01:00:07,815][00453] Avg episode reward: 15.853, avg true_objective: 8.139 [2023-03-27 01:00:07,830][00453] Num frames 5700... [2023-03-27 01:00:07,956][00453] Num frames 5800... [2023-03-27 01:00:08,065][00453] Num frames 5900... [2023-03-27 01:00:08,180][00453] Num frames 6000... [2023-03-27 01:00:08,303][00453] Num frames 6100... [2023-03-27 01:00:08,436][00453] Num frames 6200... [2023-03-27 01:00:08,566][00453] Num frames 6300... [2023-03-27 01:00:08,676][00453] Num frames 6400... [2023-03-27 01:00:08,794][00453] Num frames 6500... [2023-03-27 01:00:08,919][00453] Avg episode rewards: #0: 15.951, true rewards: #0: 8.201 [2023-03-27 01:00:08,920][00453] Avg episode reward: 15.951, avg true_objective: 8.201 [2023-03-27 01:00:08,974][00453] Num frames 6600... [2023-03-27 01:00:09,086][00453] Num frames 6700... [2023-03-27 01:00:09,203][00453] Num frames 6800... [2023-03-27 01:00:09,319][00453] Num frames 6900... [2023-03-27 01:00:09,437][00453] Num frames 7000... [2023-03-27 01:00:09,572][00453] Avg episode rewards: #0: 15.526, true rewards: #0: 7.859 [2023-03-27 01:00:09,575][00453] Avg episode reward: 15.526, avg true_objective: 7.859 [2023-03-27 01:00:09,606][00453] Num frames 7100... [2023-03-27 01:00:09,725][00453] Num frames 7200... [2023-03-27 01:00:09,843][00453] Num frames 7300... [2023-03-27 01:00:09,978][00453] Num frames 7400... [2023-03-27 01:00:10,092][00453] Num frames 7500... [2023-03-27 01:00:10,206][00453] Num frames 7600... [2023-03-27 01:00:10,329][00453] Num frames 7700... [2023-03-27 01:00:10,474][00453] Avg episode rewards: #0: 15.177, true rewards: #0: 7.777 [2023-03-27 01:00:10,476][00453] Avg episode reward: 15.177, avg true_objective: 7.777 [2023-03-27 01:00:54,565][00453] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-27 01:02:33,335][00453] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-27 01:02:33,337][00453] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-27 01:02:33,339][00453] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-27 01:02:33,342][00453] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-27 01:02:33,344][00453] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-27 01:02:33,347][00453] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-27 01:02:33,348][00453] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-27 01:02:33,349][00453] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-27 01:02:33,351][00453] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-27 01:02:33,352][00453] Adding new argument 'hf_repository'='SebastianS/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-27 01:02:33,353][00453] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-27 01:02:33,355][00453] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-27 01:02:33,356][00453] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-27 01:02:33,357][00453] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-27 01:02:33,359][00453] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-27 01:02:33,384][00453] RunningMeanStd input shape: (3, 72, 128) [2023-03-27 01:02:33,386][00453] RunningMeanStd input shape: (1,) [2023-03-27 01:02:33,400][00453] ConvEncoder: input_channels=3 [2023-03-27 01:02:33,435][00453] Conv encoder output size: 512 [2023-03-27 01:02:33,436][00453] Policy head output size: 512 [2023-03-27 01:02:33,462][00453] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-27 01:02:33,929][00453] Num frames 100... [2023-03-27 01:02:34,041][00453] Num frames 200... [2023-03-27 01:02:34,155][00453] Num frames 300... [2023-03-27 01:02:34,271][00453] Num frames 400... [2023-03-27 01:02:34,393][00453] Num frames 500... [2023-03-27 01:02:34,514][00453] Num frames 600... [2023-03-27 01:02:34,670][00453] Num frames 700... [2023-03-27 01:02:34,825][00453] Num frames 800... [2023-03-27 01:02:34,989][00453] Num frames 900... [2023-03-27 01:02:35,156][00453] Num frames 1000... [2023-03-27 01:02:35,316][00453] Num frames 1100... [2023-03-27 01:02:35,467][00453] Num frames 1200... [2023-03-27 01:02:35,616][00453] Avg episode rewards: #0: 25.480, true rewards: #0: 12.480 [2023-03-27 01:02:35,617][00453] Avg episode reward: 25.480, avg true_objective: 12.480 [2023-03-27 01:02:35,705][00453] Num frames 1300... [2023-03-27 01:02:35,864][00453] Num frames 1400... [2023-03-27 01:02:36,019][00453] Num frames 1500... [2023-03-27 01:02:36,172][00453] Num frames 1600... [2023-03-27 01:02:36,337][00453] Num frames 1700... [2023-03-27 01:02:36,496][00453] Num frames 1800... [2023-03-27 01:02:36,661][00453] Num frames 1900... [2023-03-27 01:02:36,822][00453] Num frames 2000... [2023-03-27 01:02:37,009][00453] Avg episode rewards: #0: 23.400, true rewards: #0: 10.400 [2023-03-27 01:02:37,011][00453] Avg episode reward: 23.400, avg true_objective: 10.400 [2023-03-27 01:02:37,051][00453] Num frames 2100... [2023-03-27 01:02:37,206][00453] Num frames 2200... [2023-03-27 01:02:37,365][00453] Num frames 2300... [2023-03-27 01:02:37,525][00453] Num frames 2400... [2023-03-27 01:02:37,701][00453] Num frames 2500... [2023-03-27 01:02:37,866][00453] Num frames 2600... [2023-03-27 01:02:38,034][00453] Num frames 2700... [2023-03-27 01:02:38,177][00453] Num frames 2800... [2023-03-27 01:02:38,249][00453] Avg episode rewards: #0: 19.710, true rewards: #0: 9.377 [2023-03-27 01:02:38,250][00453] Avg episode reward: 19.710, avg true_objective: 9.377 [2023-03-27 01:02:38,354][00453] Num frames 2900... [2023-03-27 01:02:38,466][00453] Num frames 3000... [2023-03-27 01:02:38,582][00453] Num frames 3100... [2023-03-27 01:02:38,704][00453] Num frames 3200... [2023-03-27 01:02:38,831][00453] Num frames 3300... [2023-03-27 01:02:38,954][00453] Num frames 3400... [2023-03-27 01:02:39,071][00453] Num frames 3500... [2023-03-27 01:02:39,190][00453] Num frames 3600... [2023-03-27 01:02:39,308][00453] Num frames 3700... [2023-03-27 01:02:39,423][00453] Num frames 3800... [2023-03-27 01:02:39,538][00453] Num frames 3900... [2023-03-27 01:02:39,655][00453] Num frames 4000... [2023-03-27 01:02:39,781][00453] Num frames 4100... [2023-03-27 01:02:39,885][00453] Avg episode rewards: #0: 21.840, true rewards: #0: 10.340 [2023-03-27 01:02:39,887][00453] Avg episode reward: 21.840, avg true_objective: 10.340 [2023-03-27 01:02:39,963][00453] Num frames 4200... [2023-03-27 01:02:40,078][00453] Num frames 4300... [2023-03-27 01:02:40,191][00453] Num frames 4400... [2023-03-27 01:02:40,301][00453] Num frames 4500... [2023-03-27 01:02:40,413][00453] Num frames 4600... [2023-03-27 01:02:40,563][00453] Avg episode rewards: #0: 18.960, true rewards: #0: 9.360 [2023-03-27 01:02:40,565][00453] Avg episode reward: 18.960, avg true_objective: 9.360 [2023-03-27 01:02:40,591][00453] Num frames 4700... [2023-03-27 01:02:40,712][00453] Num frames 4800... [2023-03-27 01:02:40,831][00453] Num frames 4900... [2023-03-27 01:02:40,953][00453] Num frames 5000... [2023-03-27 01:02:41,064][00453] Num frames 5100... [2023-03-27 01:02:41,179][00453] Num frames 5200... [2023-03-27 01:02:41,290][00453] Num frames 5300... [2023-03-27 01:02:41,397][00453] Num frames 5400... [2023-03-27 01:02:41,512][00453] Num frames 5500... [2023-03-27 01:02:41,634][00453] Num frames 5600... [2023-03-27 01:02:41,733][00453] Avg episode rewards: #0: 19.567, true rewards: #0: 9.400 [2023-03-27 01:02:41,735][00453] Avg episode reward: 19.567, avg true_objective: 9.400 [2023-03-27 01:02:41,809][00453] Num frames 5700... [2023-03-27 01:02:41,928][00453] Num frames 5800... [2023-03-27 01:02:42,071][00453] Num frames 5900... [2023-03-27 01:02:42,191][00453] Num frames 6000... [2023-03-27 01:02:42,314][00453] Num frames 6100... [2023-03-27 01:02:42,436][00453] Num frames 6200... [2023-03-27 01:02:42,557][00453] Num frames 6300... [2023-03-27 01:02:42,680][00453] Num frames 6400... [2023-03-27 01:02:42,797][00453] Num frames 6500... [2023-03-27 01:02:42,925][00453] Num frames 6600... [2023-03-27 01:02:43,044][00453] Num frames 6700... [2023-03-27 01:02:43,154][00453] Num frames 6800... [2023-03-27 01:02:43,319][00453] Avg episode rewards: #0: 21.420, true rewards: #0: 9.849 [2023-03-27 01:02:43,321][00453] Avg episode reward: 21.420, avg true_objective: 9.849 [2023-03-27 01:02:43,332][00453] Num frames 6900... [2023-03-27 01:02:43,442][00453] Num frames 7000... [2023-03-27 01:02:43,555][00453] Num frames 7100... [2023-03-27 01:02:43,675][00453] Num frames 7200... [2023-03-27 01:02:43,800][00453] Num frames 7300... [2023-03-27 01:02:43,922][00453] Num frames 7400... [2023-03-27 01:02:44,047][00453] Num frames 7500... [2023-03-27 01:02:44,169][00453] Num frames 7600... [2023-03-27 01:02:44,285][00453] Num frames 7700... [2023-03-27 01:02:44,371][00453] Avg episode rewards: #0: 20.908, true rewards: #0: 9.657 [2023-03-27 01:02:44,372][00453] Avg episode reward: 20.908, avg true_objective: 9.657 [2023-03-27 01:02:44,456][00453] Num frames 7800... [2023-03-27 01:02:44,575][00453] Num frames 7900... [2023-03-27 01:02:44,706][00453] Num frames 8000... [2023-03-27 01:02:44,829][00453] Num frames 8100... [2023-03-27 01:02:44,950][00453] Num frames 8200... [2023-03-27 01:02:45,069][00453] Num frames 8300... [2023-03-27 01:02:45,189][00453] Num frames 8400... [2023-03-27 01:02:45,301][00453] Num frames 8500... [2023-03-27 01:02:45,421][00453] Num frames 8600... [2023-03-27 01:02:45,563][00453] Avg episode rewards: #0: 20.972, true rewards: #0: 9.639 [2023-03-27 01:02:45,566][00453] Avg episode reward: 20.972, avg true_objective: 9.639 [2023-03-27 01:02:45,596][00453] Num frames 8700... [2023-03-27 01:02:45,718][00453] Num frames 8800... [2023-03-27 01:02:45,840][00453] Num frames 8900... [2023-03-27 01:02:45,959][00453] Num frames 9000... [2023-03-27 01:02:46,093][00453] Num frames 9100... [2023-03-27 01:02:46,210][00453] Num frames 9200... [2023-03-27 01:02:46,330][00453] Num frames 9300... [2023-03-27 01:02:46,452][00453] Num frames 9400... [2023-03-27 01:02:46,574][00453] Num frames 9500... [2023-03-27 01:02:46,699][00453] Num frames 9600... [2023-03-27 01:02:46,821][00453] Num frames 9700... [2023-03-27 01:02:46,958][00453] Num frames 9800... [2023-03-27 01:02:47,075][00453] Num frames 9900... [2023-03-27 01:02:47,188][00453] Num frames 10000... [2023-03-27 01:02:47,305][00453] Num frames 10100... [2023-03-27 01:02:47,415][00453] Num frames 10200... [2023-03-27 01:02:47,525][00453] Num frames 10300... [2023-03-27 01:02:47,662][00453] Avg episode rewards: #0: 22.571, true rewards: #0: 10.371 [2023-03-27 01:02:47,663][00453] Avg episode reward: 22.571, avg true_objective: 10.371 [2023-03-27 01:03:48,135][00453] Replay video saved to /content/train_dir/default_experiment/replay.mp4!