[2023-12-28 17:31:27,010][00255] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-12-28 17:31:27,016][00255] Rollout worker 0 uses device cpu [2023-12-28 17:31:27,018][00255] Rollout worker 1 uses device cpu [2023-12-28 17:31:27,020][00255] Rollout worker 2 uses device cpu [2023-12-28 17:31:27,021][00255] Rollout worker 3 uses device cpu [2023-12-28 17:31:27,024][00255] Rollout worker 4 uses device cpu [2023-12-28 17:31:27,027][00255] Rollout worker 5 uses device cpu [2023-12-28 17:31:27,037][00255] Rollout worker 6 uses device cpu [2023-12-28 17:31:27,049][00255] Rollout worker 7 uses device cpu [2023-12-28 17:31:27,253][00255] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-28 17:31:27,259][00255] InferenceWorker_p0-w0: min num requests: 2 [2023-12-28 17:31:27,307][00255] Starting all processes... [2023-12-28 17:31:27,311][00255] Starting process learner_proc0 [2023-12-28 17:31:27,412][00255] Starting all processes... [2023-12-28 17:31:27,552][00255] Starting process inference_proc0-0 [2023-12-28 17:31:27,553][00255] Starting process rollout_proc0 [2023-12-28 17:31:27,556][00255] Starting process rollout_proc1 [2023-12-28 17:31:27,556][00255] Starting process rollout_proc2 [2023-12-28 17:31:27,557][00255] Starting process rollout_proc3 [2023-12-28 17:31:27,559][00255] Starting process rollout_proc4 [2023-12-28 17:31:27,559][00255] Starting process rollout_proc5 [2023-12-28 17:31:27,559][00255] Starting process rollout_proc6 [2023-12-28 17:31:27,559][00255] Starting process rollout_proc7 [2023-12-28 17:31:47,869][00812] Worker 3 uses CPU cores [1] [2023-12-28 17:31:47,865][00795] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-28 17:31:47,870][00795] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-12-28 17:31:47,974][00795] Num visible devices: 1 [2023-12-28 17:31:48,003][00795] Starting seed is not provided [2023-12-28 17:31:48,003][00795] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-28 17:31:48,004][00795] Initializing actor-critic model on device cuda:0 [2023-12-28 17:31:48,005][00795] RunningMeanStd input shape: (3, 72, 128) [2023-12-28 17:31:48,007][00255] Heartbeat connected on Batcher_0 [2023-12-28 17:31:48,013][00795] RunningMeanStd input shape: (1,) [2023-12-28 17:31:48,032][00255] Heartbeat connected on RolloutWorker_w3 [2023-12-28 17:31:48,157][00795] ConvEncoder: input_channels=3 [2023-12-28 17:31:48,213][00816] Worker 7 uses CPU cores [1] [2023-12-28 17:31:48,271][00813] Worker 4 uses CPU cores [0] [2023-12-28 17:31:48,318][00255] Heartbeat connected on RolloutWorker_w7 [2023-12-28 17:31:48,356][00814] Worker 5 uses CPU cores [1] [2023-12-28 17:31:48,382][00255] Heartbeat connected on RolloutWorker_w5 [2023-12-28 17:31:48,494][00255] Heartbeat connected on RolloutWorker_w4 [2023-12-28 17:31:48,636][00810] Worker 0 uses CPU cores [0] [2023-12-28 17:31:48,673][00811] Worker 2 uses CPU cores [0] [2023-12-28 17:31:48,695][00808] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-28 17:31:48,697][00808] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-12-28 17:31:48,728][00255] Heartbeat connected on RolloutWorker_w0 [2023-12-28 17:31:48,750][00808] Num visible devices: 1 [2023-12-28 17:31:48,765][00255] Heartbeat connected on RolloutWorker_w2 [2023-12-28 17:31:48,781][00255] Heartbeat connected on InferenceWorker_p0-w0 [2023-12-28 17:31:48,826][00815] Worker 6 uses CPU cores [0] [2023-12-28 17:31:48,846][00809] Worker 1 uses CPU cores [1] [2023-12-28 17:31:48,861][00255] Heartbeat connected on RolloutWorker_w6 [2023-12-28 17:31:48,882][00255] Heartbeat connected on RolloutWorker_w1 [2023-12-28 17:31:48,904][00795] Conv encoder output size: 512 [2023-12-28 17:31:48,904][00795] Policy head output size: 512 [2023-12-28 17:31:48,964][00795] Created Actor Critic model with architecture: [2023-12-28 17:31:48,964][00795] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-12-28 17:31:49,409][00795] Using optimizer [2023-12-28 17:31:51,109][00795] No checkpoints found [2023-12-28 17:31:51,109][00795] Did not load from checkpoint, starting from scratch! [2023-12-28 17:31:51,109][00795] Initialized policy 0 weights for model version 0 [2023-12-28 17:31:51,114][00795] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-28 17:31:51,122][00795] LearnerWorker_p0 finished initialization! [2023-12-28 17:31:51,126][00255] Heartbeat connected on LearnerWorker_p0 [2023-12-28 17:31:51,359][00808] RunningMeanStd input shape: (3, 72, 128) [2023-12-28 17:31:51,360][00808] RunningMeanStd input shape: (1,) [2023-12-28 17:31:51,382][00808] ConvEncoder: input_channels=3 [2023-12-28 17:31:51,546][00808] Conv encoder output size: 512 [2023-12-28 17:31:51,546][00808] Policy head output size: 512 [2023-12-28 17:31:51,645][00255] Inference worker 0-0 is ready! [2023-12-28 17:31:51,647][00255] All inference workers are ready! Signal rollout workers to start! [2023-12-28 17:31:51,925][00814] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,928][00816] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,929][00812] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,935][00809] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,957][00810] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,960][00813] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,967][00811] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:51,984][00815] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:31:53,466][00809] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,465][00812] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,468][00814] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,596][00811] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,598][00813] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,605][00810] Decorrelating experience for 0 frames... [2023-12-28 17:31:53,935][00255] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-28 17:31:54,279][00812] Decorrelating experience for 32 frames... [2023-12-28 17:31:54,284][00814] Decorrelating experience for 32 frames... [2023-12-28 17:31:54,316][00815] Decorrelating experience for 0 frames... [2023-12-28 17:31:54,376][00811] Decorrelating experience for 32 frames... [2023-12-28 17:31:55,194][00815] Decorrelating experience for 32 frames... [2023-12-28 17:31:55,278][00813] Decorrelating experience for 32 frames... [2023-12-28 17:31:55,716][00809] Decorrelating experience for 32 frames... [2023-12-28 17:31:55,731][00816] Decorrelating experience for 0 frames... [2023-12-28 17:31:56,047][00812] Decorrelating experience for 64 frames... [2023-12-28 17:31:56,431][00815] Decorrelating experience for 64 frames... [2023-12-28 17:31:56,446][00814] Decorrelating experience for 64 frames... [2023-12-28 17:31:56,562][00813] Decorrelating experience for 64 frames... [2023-12-28 17:31:56,952][00811] Decorrelating experience for 64 frames... [2023-12-28 17:31:57,240][00812] Decorrelating experience for 96 frames... [2023-12-28 17:31:57,770][00815] Decorrelating experience for 96 frames... [2023-12-28 17:31:57,907][00813] Decorrelating experience for 96 frames... [2023-12-28 17:31:57,996][00816] Decorrelating experience for 32 frames... [2023-12-28 17:31:58,173][00811] Decorrelating experience for 96 frames... [2023-12-28 17:31:58,312][00809] Decorrelating experience for 64 frames... [2023-12-28 17:31:58,593][00814] Decorrelating experience for 96 frames... [2023-12-28 17:31:58,935][00255] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-28 17:31:59,116][00809] Decorrelating experience for 96 frames... [2023-12-28 17:31:59,344][00816] Decorrelating experience for 64 frames... [2023-12-28 17:32:01,836][00810] Decorrelating experience for 32 frames... [2023-12-28 17:32:02,567][00816] Decorrelating experience for 96 frames... [2023-12-28 17:32:03,935][00255] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 214.2. Samples: 2142. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-28 17:32:03,953][00255] Avg episode reward: [(0, '1.945')] [2023-12-28 17:32:05,840][00795] Signal inference workers to stop experience collection... [2023-12-28 17:32:05,879][00808] InferenceWorker_p0-w0: stopping experience collection [2023-12-28 17:32:06,217][00810] Decorrelating experience for 64 frames... [2023-12-28 17:32:07,605][00810] Decorrelating experience for 96 frames... [2023-12-28 17:32:08,521][00795] Signal inference workers to resume experience collection... [2023-12-28 17:32:08,522][00808] InferenceWorker_p0-w0: resuming experience collection [2023-12-28 17:32:08,937][00255] Fps is (10 sec: 409.5, 60 sec: 273.0, 300 sec: 273.0). Total num frames: 4096. Throughput: 0: 182.9. Samples: 2744. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-12-28 17:32:08,939][00255] Avg episode reward: [(0, '2.653')] [2023-12-28 17:32:13,935][00255] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 228.9. Samples: 4578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:32:13,937][00255] Avg episode reward: [(0, '3.399')] [2023-12-28 17:32:18,866][00808] Updated weights for policy 0, policy_version 10 (0.0884) [2023-12-28 17:32:18,935][00255] Fps is (10 sec: 3687.1, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 425.3. Samples: 10632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:32:18,941][00255] Avg episode reward: [(0, '3.953')] [2023-12-28 17:32:23,935][00255] Fps is (10 sec: 3686.4, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 459.8. Samples: 13794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:32:23,938][00255] Avg episode reward: [(0, '4.529')] [2023-12-28 17:32:28,935][00255] Fps is (10 sec: 2867.2, 60 sec: 1989.5, 300 sec: 1989.5). Total num frames: 69632. Throughput: 0: 514.6. Samples: 18010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:32:28,941][00255] Avg episode reward: [(0, '4.510')] [2023-12-28 17:32:33,935][00255] Fps is (10 sec: 2048.0, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 518.0. Samples: 20722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:32:33,938][00255] Avg episode reward: [(0, '4.379')] [2023-12-28 17:32:34,542][00808] Updated weights for policy 0, policy_version 20 (0.0024) [2023-12-28 17:32:38,935][00255] Fps is (10 sec: 2457.6, 60 sec: 2093.5, 300 sec: 2093.5). Total num frames: 94208. Throughput: 0: 496.4. Samples: 22340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:32:38,940][00255] Avg episode reward: [(0, '4.095')] [2023-12-28 17:32:43,935][00255] Fps is (10 sec: 3276.8, 60 sec: 2211.8, 300 sec: 2211.8). Total num frames: 110592. Throughput: 0: 617.1. Samples: 27768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:32:43,941][00255] Avg episode reward: [(0, '4.110')] [2023-12-28 17:32:43,943][00795] Saving new best policy, reward=4.110! [2023-12-28 17:32:48,017][00808] Updated weights for policy 0, policy_version 30 (0.0021) [2023-12-28 17:32:48,935][00255] Fps is (10 sec: 2867.2, 60 sec: 2234.2, 300 sec: 2234.2). Total num frames: 122880. Throughput: 0: 655.2. Samples: 31624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:32:48,941][00255] Avg episode reward: [(0, '4.277')] [2023-12-28 17:32:48,964][00795] Saving new best policy, reward=4.277! [2023-12-28 17:32:53,935][00255] Fps is (10 sec: 2457.6, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 135168. Throughput: 0: 674.7. Samples: 33102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:32:53,942][00255] Avg episode reward: [(0, '4.432')] [2023-12-28 17:32:53,945][00795] Saving new best policy, reward=4.432! [2023-12-28 17:32:58,935][00255] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2331.6). Total num frames: 151552. Throughput: 0: 737.8. Samples: 37778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:32:58,941][00255] Avg episode reward: [(0, '4.500')] [2023-12-28 17:32:58,950][00795] Saving new best policy, reward=4.500! [2023-12-28 17:33:01,441][00808] Updated weights for policy 0, policy_version 40 (0.0033) [2023-12-28 17:33:03,935][00255] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2399.1). Total num frames: 167936. Throughput: 0: 726.4. Samples: 43322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:33:03,949][00255] Avg episode reward: [(0, '4.401')] [2023-12-28 17:33:08,937][00255] Fps is (10 sec: 2866.7, 60 sec: 2935.5, 300 sec: 2402.9). Total num frames: 180224. Throughput: 0: 691.0. Samples: 44890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:33:08,946][00255] Avg episode reward: [(0, '4.304')] [2023-12-28 17:33:13,935][00255] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2406.4). Total num frames: 192512. Throughput: 0: 679.8. Samples: 48602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:33:13,938][00255] Avg episode reward: [(0, '4.316')] [2023-12-28 17:33:16,027][00808] Updated weights for policy 0, policy_version 50 (0.0035) [2023-12-28 17:33:18,935][00255] Fps is (10 sec: 3277.3, 60 sec: 2867.2, 300 sec: 2505.8). Total num frames: 212992. Throughput: 0: 744.7. Samples: 54234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:33:18,941][00255] Avg episode reward: [(0, '4.347')] [2023-12-28 17:33:18,953][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000052_212992.pth... [2023-12-28 17:33:23,937][00255] Fps is (10 sec: 3685.7, 60 sec: 2867.1, 300 sec: 2548.6). Total num frames: 229376. Throughput: 0: 771.2. Samples: 57046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:33:23,939][00255] Avg episode reward: [(0, '4.473')] [2023-12-28 17:33:28,935][00255] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2543.8). Total num frames: 241664. Throughput: 0: 724.8. Samples: 60384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:33:28,943][00255] Avg episode reward: [(0, '4.488')] [2023-12-28 17:33:30,045][00808] Updated weights for policy 0, policy_version 60 (0.0018) [2023-12-28 17:33:33,935][00255] Fps is (10 sec: 2458.1, 60 sec: 2935.5, 300 sec: 2539.5). Total num frames: 253952. Throughput: 0: 724.4. Samples: 64224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:33:33,937][00255] Avg episode reward: [(0, '4.568')] [2023-12-28 17:33:33,943][00795] Saving new best policy, reward=4.568! [2023-12-28 17:33:38,935][00255] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2613.6). Total num frames: 274432. Throughput: 0: 762.2. Samples: 67402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:33:38,937][00255] Avg episode reward: [(0, '4.420')] [2023-12-28 17:33:41,037][00808] Updated weights for policy 0, policy_version 70 (0.0022) [2023-12-28 17:33:43,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3072.0, 300 sec: 2681.0). Total num frames: 294912. Throughput: 0: 802.7. Samples: 73900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:33:43,938][00255] Avg episode reward: [(0, '4.400')] [2023-12-28 17:33:48,936][00255] Fps is (10 sec: 3686.0, 60 sec: 3140.2, 300 sec: 2706.9). Total num frames: 311296. Throughput: 0: 776.2. Samples: 78254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:33:48,944][00255] Avg episode reward: [(0, '4.423')] [2023-12-28 17:33:53,935][00255] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 786.0. Samples: 80258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:33:53,943][00255] Avg episode reward: [(0, '4.367')] [2023-12-28 17:33:54,498][00808] Updated weights for policy 0, policy_version 80 (0.0018) [2023-12-28 17:33:58,935][00255] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 2752.5). Total num frames: 344064. Throughput: 0: 826.8. Samples: 85810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:33:58,938][00255] Avg episode reward: [(0, '4.408')] [2023-12-28 17:34:03,935][00255] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 2804.2). Total num frames: 364544. Throughput: 0: 837.5. Samples: 91922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:34:03,941][00255] Avg episode reward: [(0, '4.427')] [2023-12-28 17:34:04,877][00808] Updated weights for policy 0, policy_version 90 (0.0021) [2023-12-28 17:34:08,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 2791.3). Total num frames: 376832. Throughput: 0: 817.9. Samples: 93852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:08,937][00255] Avg episode reward: [(0, '4.482')] [2023-12-28 17:34:13,936][00255] Fps is (10 sec: 2457.3, 60 sec: 3276.7, 300 sec: 2779.4). Total num frames: 389120. Throughput: 0: 831.8. Samples: 97814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:13,939][00255] Avg episode reward: [(0, '4.553')] [2023-12-28 17:34:18,006][00808] Updated weights for policy 0, policy_version 100 (0.0014) [2023-12-28 17:34:18,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2824.8). Total num frames: 409600. Throughput: 0: 874.2. Samples: 103564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:34:18,938][00255] Avg episode reward: [(0, '4.406')] [2023-12-28 17:34:23,935][00255] Fps is (10 sec: 4096.5, 60 sec: 3345.2, 300 sec: 2867.2). Total num frames: 430080. Throughput: 0: 871.8. Samples: 106632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:23,937][00255] Avg episode reward: [(0, '4.315')] [2023-12-28 17:34:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2854.0). Total num frames: 442368. Throughput: 0: 830.9. Samples: 111292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:28,942][00255] Avg episode reward: [(0, '4.521')] [2023-12-28 17:34:30,619][00808] Updated weights for policy 0, policy_version 110 (0.0028) [2023-12-28 17:34:33,936][00255] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 2867.2). Total num frames: 458752. Throughput: 0: 824.9. Samples: 115374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:34:33,940][00255] Avg episode reward: [(0, '4.603')] [2023-12-28 17:34:33,943][00795] Saving new best policy, reward=4.603! [2023-12-28 17:34:38,935][00255] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 2879.6). Total num frames: 475136. Throughput: 0: 839.0. Samples: 118014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:34:38,937][00255] Avg episode reward: [(0, '4.530')] [2023-12-28 17:34:42,564][00808] Updated weights for policy 0, policy_version 120 (0.0053) [2023-12-28 17:34:43,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 2915.4). Total num frames: 495616. Throughput: 0: 838.4. Samples: 123540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:34:43,940][00255] Avg episode reward: [(0, '4.450')] [2023-12-28 17:34:48,935][00255] Fps is (10 sec: 3276.9, 60 sec: 3276.9, 300 sec: 2902.3). Total num frames: 507904. Throughput: 0: 800.4. Samples: 127938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:34:48,938][00255] Avg episode reward: [(0, '4.428')] [2023-12-28 17:34:53,938][00255] Fps is (10 sec: 2456.9, 60 sec: 3276.7, 300 sec: 2889.9). Total num frames: 520192. Throughput: 0: 797.8. Samples: 129756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:53,944][00255] Avg episode reward: [(0, '4.577')] [2023-12-28 17:34:57,377][00808] Updated weights for policy 0, policy_version 130 (0.0025) [2023-12-28 17:34:58,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2900.4). Total num frames: 536576. Throughput: 0: 802.6. Samples: 133930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:34:58,942][00255] Avg episode reward: [(0, '4.578')] [2023-12-28 17:35:03,935][00255] Fps is (10 sec: 3687.5, 60 sec: 3208.5, 300 sec: 2931.9). Total num frames: 557056. Throughput: 0: 803.8. Samples: 139736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:35:03,941][00255] Avg episode reward: [(0, '4.493')] [2023-12-28 17:35:08,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2919.7). Total num frames: 569344. Throughput: 0: 788.4. Samples: 142108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:35:08,939][00255] Avg episode reward: [(0, '4.485')] [2023-12-28 17:35:09,565][00808] Updated weights for policy 0, policy_version 140 (0.0018) [2023-12-28 17:35:13,940][00255] Fps is (10 sec: 2456.4, 60 sec: 3208.3, 300 sec: 2908.1). Total num frames: 581632. Throughput: 0: 767.9. Samples: 145850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:35:13,943][00255] Avg episode reward: [(0, '4.533')] [2023-12-28 17:35:18,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2937.1). Total num frames: 602112. Throughput: 0: 788.8. Samples: 150870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:35:18,941][00255] Avg episode reward: [(0, '4.564')] [2023-12-28 17:35:18,953][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000147_602112.pth... [2023-12-28 17:35:21,914][00808] Updated weights for policy 0, policy_version 150 (0.0013) [2023-12-28 17:35:23,935][00255] Fps is (10 sec: 4098.0, 60 sec: 3208.5, 300 sec: 2964.7). Total num frames: 622592. Throughput: 0: 799.7. Samples: 154002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:35:23,937][00255] Avg episode reward: [(0, '4.625')] [2023-12-28 17:35:23,940][00795] Saving new best policy, reward=4.625! [2023-12-28 17:35:28,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2972.0). Total num frames: 638976. Throughput: 0: 803.5. Samples: 159698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:35:28,938][00255] Avg episode reward: [(0, '4.690')] [2023-12-28 17:35:28,953][00795] Saving new best policy, reward=4.690! [2023-12-28 17:35:33,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2960.3). Total num frames: 651264. Throughput: 0: 791.2. Samples: 163542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:35:33,937][00255] Avg episode reward: [(0, '4.687')] [2023-12-28 17:35:34,793][00808] Updated weights for policy 0, policy_version 160 (0.0018) [2023-12-28 17:35:38,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2967.3). Total num frames: 667648. Throughput: 0: 796.5. Samples: 165594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:35:38,938][00255] Avg episode reward: [(0, '4.569')] [2023-12-28 17:35:43,936][00255] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 2991.9). Total num frames: 688128. Throughput: 0: 843.3. Samples: 171880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:35:43,943][00255] Avg episode reward: [(0, '4.509')] [2023-12-28 17:35:45,898][00808] Updated weights for policy 0, policy_version 170 (0.0015) [2023-12-28 17:35:48,939][00255] Fps is (10 sec: 3685.0, 60 sec: 3276.6, 300 sec: 2997.9). Total num frames: 704512. Throughput: 0: 822.5. Samples: 176752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:35:48,948][00255] Avg episode reward: [(0, '4.452')] [2023-12-28 17:35:53,938][00255] Fps is (10 sec: 2866.6, 60 sec: 3276.8, 300 sec: 2986.6). Total num frames: 716800. Throughput: 0: 812.0. Samples: 178650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:35:53,945][00255] Avg episode reward: [(0, '4.412')] [2023-12-28 17:35:58,935][00255] Fps is (10 sec: 2868.3, 60 sec: 3276.8, 300 sec: 2992.6). Total num frames: 733184. Throughput: 0: 819.6. Samples: 182726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:35:58,942][00255] Avg episode reward: [(0, '4.501')] [2023-12-28 17:35:59,898][00808] Updated weights for policy 0, policy_version 180 (0.0015) [2023-12-28 17:36:03,935][00255] Fps is (10 sec: 3277.6, 60 sec: 3208.5, 300 sec: 2998.3). Total num frames: 749568. Throughput: 0: 839.0. Samples: 188626. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:36:03,942][00255] Avg episode reward: [(0, '4.831')] [2023-12-28 17:36:03,969][00795] Saving new best policy, reward=4.831! [2023-12-28 17:36:08,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 765952. Throughput: 0: 833.6. Samples: 191516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:36:08,941][00255] Avg episode reward: [(0, '4.986')] [2023-12-28 17:36:09,012][00795] Saving new best policy, reward=4.986! [2023-12-28 17:36:12,340][00808] Updated weights for policy 0, policy_version 190 (0.0014) [2023-12-28 17:36:13,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3277.1, 300 sec: 2993.2). Total num frames: 778240. Throughput: 0: 787.7. Samples: 195144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:36:13,937][00255] Avg episode reward: [(0, '4.966')] [2023-12-28 17:36:18,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2998.6). Total num frames: 794624. Throughput: 0: 794.3. Samples: 199286. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:36:18,937][00255] Avg episode reward: [(0, '5.108')] [2023-12-28 17:36:18,950][00795] Saving new best policy, reward=5.108! [2023-12-28 17:36:23,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3003.7). Total num frames: 811008. Throughput: 0: 812.1. Samples: 202140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:36:23,938][00255] Avg episode reward: [(0, '5.069')] [2023-12-28 17:36:24,996][00808] Updated weights for policy 0, policy_version 200 (0.0030) [2023-12-28 17:36:28,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3023.6). Total num frames: 831488. Throughput: 0: 800.1. Samples: 207884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:36:28,938][00255] Avg episode reward: [(0, '5.128')] [2023-12-28 17:36:28,947][00795] Saving new best policy, reward=5.128! [2023-12-28 17:36:33,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3013.5). Total num frames: 843776. Throughput: 0: 775.7. Samples: 211656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:36:33,937][00255] Avg episode reward: [(0, '4.889')] [2023-12-28 17:36:38,862][00808] Updated weights for policy 0, policy_version 210 (0.0024) [2023-12-28 17:36:38,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3018.1). Total num frames: 860160. Throughput: 0: 777.0. Samples: 213614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:36:38,938][00255] Avg episode reward: [(0, '5.165')] [2023-12-28 17:36:38,949][00795] Saving new best policy, reward=5.165! [2023-12-28 17:36:43,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3036.7). Total num frames: 880640. Throughput: 0: 815.9. Samples: 219442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:36:43,942][00255] Avg episode reward: [(0, '5.363')] [2023-12-28 17:36:43,945][00795] Saving new best policy, reward=5.363! [2023-12-28 17:36:48,937][00255] Fps is (10 sec: 3276.2, 60 sec: 3140.4, 300 sec: 3026.9). Total num frames: 892928. Throughput: 0: 792.0. Samples: 224266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:36:48,944][00255] Avg episode reward: [(0, '5.371')] [2023-12-28 17:36:48,955][00795] Saving new best policy, reward=5.371! [2023-12-28 17:36:52,018][00808] Updated weights for policy 0, policy_version 220 (0.0013) [2023-12-28 17:36:53,935][00255] Fps is (10 sec: 2048.0, 60 sec: 3072.1, 300 sec: 3054.6). Total num frames: 901120. Throughput: 0: 758.8. Samples: 225660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:36:53,938][00255] Avg episode reward: [(0, '5.257')] [2023-12-28 17:36:58,935][00255] Fps is (10 sec: 2048.4, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 913408. Throughput: 0: 745.5. Samples: 228690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:36:58,938][00255] Avg episode reward: [(0, '5.238')] [2023-12-28 17:37:03,935][00255] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3124.1). Total num frames: 925696. Throughput: 0: 734.0. Samples: 232314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:37:03,938][00255] Avg episode reward: [(0, '5.014')] [2023-12-28 17:37:07,125][00808] Updated weights for policy 0, policy_version 230 (0.0032) [2023-12-28 17:37:08,936][00255] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 3137.9). Total num frames: 946176. Throughput: 0: 739.2. Samples: 235402. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-12-28 17:37:08,937][00255] Avg episode reward: [(0, '4.976')] [2023-12-28 17:37:13,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 966656. Throughput: 0: 747.6. Samples: 241526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:37:13,939][00255] Avg episode reward: [(0, '5.306')] [2023-12-28 17:37:18,935][00255] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 978944. Throughput: 0: 748.4. Samples: 245336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:37:18,942][00255] Avg episode reward: [(0, '5.316')] [2023-12-28 17:37:18,956][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000239_978944.pth... [2023-12-28 17:37:19,119][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000052_212992.pth [2023-12-28 17:37:20,345][00808] Updated weights for policy 0, policy_version 240 (0.0034) [2023-12-28 17:37:23,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3124.1). Total num frames: 991232. Throughput: 0: 745.2. Samples: 247146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:37:23,942][00255] Avg episode reward: [(0, '5.385')] [2023-12-28 17:37:23,945][00795] Saving new best policy, reward=5.385! [2023-12-28 17:37:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 1011712. Throughput: 0: 735.7. Samples: 252550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:37:28,940][00255] Avg episode reward: [(0, '5.108')] [2023-12-28 17:37:31,459][00808] Updated weights for policy 0, policy_version 250 (0.0028) [2023-12-28 17:37:33,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 1032192. Throughput: 0: 761.9. Samples: 258552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:37:33,938][00255] Avg episode reward: [(0, '5.271')] [2023-12-28 17:37:38,935][00255] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1044480. Throughput: 0: 774.4. Samples: 260508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:37:38,937][00255] Avg episode reward: [(0, '5.469')] [2023-12-28 17:37:38,947][00795] Saving new best policy, reward=5.469! [2023-12-28 17:37:43,935][00255] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3165.7). Total num frames: 1056768. Throughput: 0: 793.3. Samples: 264388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:37:43,940][00255] Avg episode reward: [(0, '5.549')] [2023-12-28 17:37:43,943][00795] Saving new best policy, reward=5.549! [2023-12-28 17:37:45,448][00808] Updated weights for policy 0, policy_version 260 (0.0026) [2023-12-28 17:37:48,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 3193.5). Total num frames: 1077248. Throughput: 0: 839.0. Samples: 270068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:37:48,938][00255] Avg episode reward: [(0, '5.491')] [2023-12-28 17:37:53,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1097728. Throughput: 0: 839.1. Samples: 273162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:37:53,938][00255] Avg episode reward: [(0, '5.789')] [2023-12-28 17:37:53,940][00795] Saving new best policy, reward=5.789! [2023-12-28 17:37:56,313][00808] Updated weights for policy 0, policy_version 270 (0.0017) [2023-12-28 17:37:58,939][00255] Fps is (10 sec: 3275.5, 60 sec: 3276.6, 300 sec: 3193.4). Total num frames: 1110016. Throughput: 0: 804.6. Samples: 277736. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:37:58,947][00255] Avg episode reward: [(0, '5.489')] [2023-12-28 17:38:03,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 1122304. Throughput: 0: 808.8. Samples: 281730. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:38:03,937][00255] Avg episode reward: [(0, '5.506')] [2023-12-28 17:38:08,777][00808] Updated weights for policy 0, policy_version 280 (0.0019) [2023-12-28 17:38:08,935][00255] Fps is (10 sec: 3687.9, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1146880. Throughput: 0: 834.6. Samples: 284702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:38:08,943][00255] Avg episode reward: [(0, '5.721')] [2023-12-28 17:38:13,935][00255] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1167360. Throughput: 0: 862.3. Samples: 291352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:38:13,941][00255] Avg episode reward: [(0, '5.902')] [2023-12-28 17:38:13,944][00795] Saving new best policy, reward=5.902! [2023-12-28 17:38:18,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1179648. Throughput: 0: 835.8. Samples: 296164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:38:18,939][00255] Avg episode reward: [(0, '5.784')] [2023-12-28 17:38:20,884][00808] Updated weights for policy 0, policy_version 290 (0.0023) [2023-12-28 17:38:23,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 1196032. Throughput: 0: 838.7. Samples: 298248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-12-28 17:38:23,939][00255] Avg episode reward: [(0, '5.872')] [2023-12-28 17:38:28,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 1216512. Throughput: 0: 869.8. Samples: 303530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:38:28,944][00255] Avg episode reward: [(0, '5.842')] [2023-12-28 17:38:31,626][00808] Updated weights for policy 0, policy_version 300 (0.0021) [2023-12-28 17:38:33,937][00255] Fps is (10 sec: 4095.1, 60 sec: 3413.2, 300 sec: 3262.9). Total num frames: 1236992. Throughput: 0: 884.3. Samples: 309864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:38:33,943][00255] Avg episode reward: [(0, '5.829')] [2023-12-28 17:38:38,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3235.1). Total num frames: 1249280. Throughput: 0: 866.3. Samples: 312144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:38:38,941][00255] Avg episode reward: [(0, '5.993')] [2023-12-28 17:38:38,952][00795] Saving new best policy, reward=5.993! [2023-12-28 17:38:43,935][00255] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3235.2). Total num frames: 1265664. Throughput: 0: 850.3. Samples: 315998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:38:43,937][00255] Avg episode reward: [(0, '6.367')] [2023-12-28 17:38:43,942][00795] Saving new best policy, reward=6.367! [2023-12-28 17:38:45,490][00808] Updated weights for policy 0, policy_version 310 (0.0023) [2023-12-28 17:38:48,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 1282048. Throughput: 0: 879.0. Samples: 321284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:38:48,938][00255] Avg episode reward: [(0, '6.376')] [2023-12-28 17:38:48,955][00795] Saving new best policy, reward=6.376! [2023-12-28 17:38:53,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 1302528. Throughput: 0: 881.7. Samples: 324378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:38:53,940][00255] Avg episode reward: [(0, '6.268')] [2023-12-28 17:38:56,075][00808] Updated weights for policy 0, policy_version 320 (0.0024) [2023-12-28 17:38:58,936][00255] Fps is (10 sec: 3276.5, 60 sec: 3413.5, 300 sec: 3221.2). Total num frames: 1314816. Throughput: 0: 843.0. Samples: 329290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:38:58,938][00255] Avg episode reward: [(0, '6.170')] [2023-12-28 17:39:03,942][00255] Fps is (10 sec: 2455.9, 60 sec: 3412.9, 300 sec: 3221.2). Total num frames: 1327104. Throughput: 0: 816.9. Samples: 332928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:39:03,948][00255] Avg episode reward: [(0, '6.059')] [2023-12-28 17:39:08,935][00255] Fps is (10 sec: 2867.5, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 1343488. Throughput: 0: 821.8. Samples: 335230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:39:08,938][00255] Avg episode reward: [(0, '6.254')] [2023-12-28 17:39:09,988][00808] Updated weights for policy 0, policy_version 330 (0.0028) [2023-12-28 17:39:13,936][00255] Fps is (10 sec: 4098.7, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 1368064. Throughput: 0: 840.3. Samples: 341344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:39:13,940][00255] Avg episode reward: [(0, '6.382')] [2023-12-28 17:39:13,946][00795] Saving new best policy, reward=6.382! [2023-12-28 17:39:18,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1380352. Throughput: 0: 808.3. Samples: 346234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:39:18,938][00255] Avg episode reward: [(0, '6.331')] [2023-12-28 17:39:18,953][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000337_1380352.pth... [2023-12-28 17:39:19,114][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000147_602112.pth [2023-12-28 17:39:22,390][00808] Updated weights for policy 0, policy_version 340 (0.0025) [2023-12-28 17:39:23,935][00255] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1396736. Throughput: 0: 800.0. Samples: 348144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:39:23,942][00255] Avg episode reward: [(0, '6.163')] [2023-12-28 17:39:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1413120. Throughput: 0: 821.6. Samples: 352972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:28,937][00255] Avg episode reward: [(0, '6.052')] [2023-12-28 17:39:32,972][00808] Updated weights for policy 0, policy_version 350 (0.0027) [2023-12-28 17:39:33,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3262.9). Total num frames: 1437696. Throughput: 0: 853.7. Samples: 359702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:33,937][00255] Avg episode reward: [(0, '6.432')] [2023-12-28 17:39:33,941][00795] Saving new best policy, reward=6.432! [2023-12-28 17:39:38,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 1449984. Throughput: 0: 847.2. Samples: 362504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:38,940][00255] Avg episode reward: [(0, '6.955')] [2023-12-28 17:39:38,955][00795] Saving new best policy, reward=6.955! [2023-12-28 17:39:43,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 1466368. Throughput: 0: 829.1. Samples: 366598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:43,939][00255] Avg episode reward: [(0, '7.139')] [2023-12-28 17:39:43,943][00795] Saving new best policy, reward=7.139! [2023-12-28 17:39:46,482][00808] Updated weights for policy 0, policy_version 360 (0.0017) [2023-12-28 17:39:48,936][00255] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1482752. Throughput: 0: 859.1. Samples: 371580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:48,941][00255] Avg episode reward: [(0, '7.324')] [2023-12-28 17:39:48,952][00795] Saving new best policy, reward=7.324! [2023-12-28 17:39:53,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1503232. Throughput: 0: 877.6. Samples: 374724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:39:53,941][00255] Avg episode reward: [(0, '7.307')] [2023-12-28 17:39:56,215][00808] Updated weights for policy 0, policy_version 370 (0.0030) [2023-12-28 17:39:58,938][00255] Fps is (10 sec: 3685.3, 60 sec: 3413.2, 300 sec: 3262.9). Total num frames: 1519616. Throughput: 0: 868.7. Samples: 380438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:39:58,941][00255] Avg episode reward: [(0, '8.075')] [2023-12-28 17:39:58,958][00795] Saving new best policy, reward=8.075! [2023-12-28 17:40:03,939][00255] Fps is (10 sec: 2866.1, 60 sec: 3413.5, 300 sec: 3262.9). Total num frames: 1531904. Throughput: 0: 847.5. Samples: 384374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:03,941][00255] Avg episode reward: [(0, '7.914')] [2023-12-28 17:40:08,935][00255] Fps is (10 sec: 2868.1, 60 sec: 3413.3, 300 sec: 3276.9). Total num frames: 1548288. Throughput: 0: 848.1. Samples: 386308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:08,938][00255] Avg episode reward: [(0, '8.117')] [2023-12-28 17:40:08,945][00795] Saving new best policy, reward=8.117! [2023-12-28 17:40:10,287][00808] Updated weights for policy 0, policy_version 380 (0.0030) [2023-12-28 17:40:13,938][00255] Fps is (10 sec: 3686.9, 60 sec: 3344.9, 300 sec: 3276.8). Total num frames: 1568768. Throughput: 0: 866.1. Samples: 391950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:13,943][00255] Avg episode reward: [(0, '8.500')] [2023-12-28 17:40:13,948][00795] Saving new best policy, reward=8.500! [2023-12-28 17:40:18,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 1585152. Throughput: 0: 841.5. Samples: 397570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:18,938][00255] Avg episode reward: [(0, '8.645')] [2023-12-28 17:40:18,957][00795] Saving new best policy, reward=8.645! [2023-12-28 17:40:22,180][00808] Updated weights for policy 0, policy_version 390 (0.0023) [2023-12-28 17:40:23,935][00255] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 1601536. Throughput: 0: 823.1. Samples: 399544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:23,940][00255] Avg episode reward: [(0, '8.780')] [2023-12-28 17:40:23,942][00795] Saving new best policy, reward=8.780! [2023-12-28 17:40:28,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1613824. Throughput: 0: 820.8. Samples: 403534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:40:28,943][00255] Avg episode reward: [(0, '8.315')] [2023-12-28 17:40:33,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1634304. Throughput: 0: 846.1. Samples: 409656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:40:33,941][00255] Avg episode reward: [(0, '8.735')] [2023-12-28 17:40:34,227][00808] Updated weights for policy 0, policy_version 400 (0.0025) [2023-12-28 17:40:38,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 1654784. Throughput: 0: 844.4. Samples: 412720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:40:38,938][00255] Avg episode reward: [(0, '8.351')] [2023-12-28 17:40:43,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3263.0). Total num frames: 1667072. Throughput: 0: 813.1. Samples: 417024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:40:43,942][00255] Avg episode reward: [(0, '8.824')] [2023-12-28 17:40:43,949][00795] Saving new best policy, reward=8.824! [2023-12-28 17:40:48,055][00808] Updated weights for policy 0, policy_version 410 (0.0033) [2023-12-28 17:40:48,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1679360. Throughput: 0: 817.5. Samples: 421158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:40:48,941][00255] Avg episode reward: [(0, '8.400')] [2023-12-28 17:40:53,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1699840. Throughput: 0: 840.7. Samples: 424138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:40:53,937][00255] Avg episode reward: [(0, '8.727')] [2023-12-28 17:40:58,360][00808] Updated weights for policy 0, policy_version 420 (0.0031) [2023-12-28 17:40:58,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3290.7). Total num frames: 1720320. Throughput: 0: 846.4. Samples: 430038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:40:58,938][00255] Avg episode reward: [(0, '8.620')] [2023-12-28 17:41:03,938][00255] Fps is (10 sec: 3275.9, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1732608. Throughput: 0: 808.0. Samples: 433932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:41:03,941][00255] Avg episode reward: [(0, '9.349')] [2023-12-28 17:41:03,945][00795] Saving new best policy, reward=9.349! [2023-12-28 17:41:08,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1744896. Throughput: 0: 805.0. Samples: 435768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:41:08,937][00255] Avg episode reward: [(0, '9.561')] [2023-12-28 17:41:08,950][00795] Saving new best policy, reward=9.561! [2023-12-28 17:41:12,533][00808] Updated weights for policy 0, policy_version 430 (0.0013) [2023-12-28 17:41:13,935][00255] Fps is (10 sec: 3277.7, 60 sec: 3276.9, 300 sec: 3290.7). Total num frames: 1765376. Throughput: 0: 831.9. Samples: 440970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:41:13,938][00255] Avg episode reward: [(0, '10.113')] [2023-12-28 17:41:13,944][00795] Saving new best policy, reward=10.113! [2023-12-28 17:41:18,937][00255] Fps is (10 sec: 3685.7, 60 sec: 3276.7, 300 sec: 3290.7). Total num frames: 1781760. Throughput: 0: 828.7. Samples: 446948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:41:18,946][00255] Avg episode reward: [(0, '11.030')] [2023-12-28 17:41:18,959][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_1785856.pth... [2023-12-28 17:41:19,126][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000239_978944.pth [2023-12-28 17:41:19,150][00795] Saving new best policy, reward=11.030! [2023-12-28 17:41:23,935][00255] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1798144. Throughput: 0: 798.7. Samples: 448662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:41:23,941][00255] Avg episode reward: [(0, '11.130')] [2023-12-28 17:41:23,947][00795] Saving new best policy, reward=11.130! [2023-12-28 17:41:25,628][00808] Updated weights for policy 0, policy_version 440 (0.0022) [2023-12-28 17:41:28,935][00255] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1810432. Throughput: 0: 786.4. Samples: 452412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:41:28,938][00255] Avg episode reward: [(0, '11.524')] [2023-12-28 17:41:28,947][00795] Saving new best policy, reward=11.524! [2023-12-28 17:41:33,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 1826816. Throughput: 0: 814.0. Samples: 457790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:41:33,941][00255] Avg episode reward: [(0, '10.917')] [2023-12-28 17:41:37,274][00808] Updated weights for policy 0, policy_version 450 (0.0032) [2023-12-28 17:41:38,937][00255] Fps is (10 sec: 3685.8, 60 sec: 3208.4, 300 sec: 3276.8). Total num frames: 1847296. Throughput: 0: 810.4. Samples: 460606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:41:38,939][00255] Avg episode reward: [(0, '10.977')] [2023-12-28 17:41:43,938][00255] Fps is (10 sec: 3275.9, 60 sec: 3208.4, 300 sec: 3276.8). Total num frames: 1859584. Throughput: 0: 780.9. Samples: 465180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:41:43,940][00255] Avg episode reward: [(0, '11.011')] [2023-12-28 17:41:48,937][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.4, 300 sec: 3290.7). Total num frames: 1871872. Throughput: 0: 777.8. Samples: 468932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:41:48,943][00255] Avg episode reward: [(0, '10.602')] [2023-12-28 17:41:51,628][00808] Updated weights for policy 0, policy_version 460 (0.0043) [2023-12-28 17:41:53,935][00255] Fps is (10 sec: 3277.7, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 1892352. Throughput: 0: 796.1. Samples: 471594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:41:53,938][00255] Avg episode reward: [(0, '11.219')] [2023-12-28 17:41:58,935][00255] Fps is (10 sec: 4096.7, 60 sec: 3208.5, 300 sec: 3346.2). Total num frames: 1912832. Throughput: 0: 814.6. Samples: 477626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:41:58,938][00255] Avg episode reward: [(0, '11.373')] [2023-12-28 17:42:03,498][00808] Updated weights for policy 0, policy_version 470 (0.0040) [2023-12-28 17:42:03,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.7, 300 sec: 3318.5). Total num frames: 1925120. Throughput: 0: 776.4. Samples: 481884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:42:03,941][00255] Avg episode reward: [(0, '12.285')] [2023-12-28 17:42:03,943][00795] Saving new best policy, reward=12.285! [2023-12-28 17:42:08,936][00255] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 1937408. Throughput: 0: 777.9. Samples: 483670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:42:08,941][00255] Avg episode reward: [(0, '12.658')] [2023-12-28 17:42:08,954][00795] Saving new best policy, reward=12.658! [2023-12-28 17:42:13,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3304.6). Total num frames: 1953792. Throughput: 0: 799.5. Samples: 488390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:42:13,942][00255] Avg episode reward: [(0, '13.400')] [2023-12-28 17:42:13,951][00795] Saving new best policy, reward=13.400! [2023-12-28 17:42:16,660][00808] Updated weights for policy 0, policy_version 480 (0.0013) [2023-12-28 17:42:18,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3208.6, 300 sec: 3332.3). Total num frames: 1974272. Throughput: 0: 804.8. Samples: 494008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:42:18,943][00255] Avg episode reward: [(0, '13.395')] [2023-12-28 17:42:23,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3304.6). Total num frames: 1986560. Throughput: 0: 792.3. Samples: 496256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:42:23,942][00255] Avg episode reward: [(0, '14.593')] [2023-12-28 17:42:23,945][00795] Saving new best policy, reward=14.593! [2023-12-28 17:42:28,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 1998848. Throughput: 0: 770.6. Samples: 499854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:42:28,939][00255] Avg episode reward: [(0, '14.443')] [2023-12-28 17:42:31,063][00808] Updated weights for policy 0, policy_version 490 (0.0023) [2023-12-28 17:42:33,942][00255] Fps is (10 sec: 2865.1, 60 sec: 3139.9, 300 sec: 3290.6). Total num frames: 2015232. Throughput: 0: 796.7. Samples: 504788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:42:33,945][00255] Avg episode reward: [(0, '14.903')] [2023-12-28 17:42:33,947][00795] Saving new best policy, reward=14.903! [2023-12-28 17:42:38,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3318.5). Total num frames: 2035712. Throughput: 0: 800.6. Samples: 507622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:42:38,943][00255] Avg episode reward: [(0, '15.509')] [2023-12-28 17:42:38,958][00795] Saving new best policy, reward=15.509! [2023-12-28 17:42:42,719][00808] Updated weights for policy 0, policy_version 500 (0.0029) [2023-12-28 17:42:43,935][00255] Fps is (10 sec: 3279.2, 60 sec: 3140.4, 300 sec: 3290.7). Total num frames: 2048000. Throughput: 0: 778.6. Samples: 512662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:42:43,939][00255] Avg episode reward: [(0, '15.945')] [2023-12-28 17:42:43,941][00795] Saving new best policy, reward=15.945! [2023-12-28 17:42:48,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3140.4, 300 sec: 3262.9). Total num frames: 2060288. Throughput: 0: 763.9. Samples: 516258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-12-28 17:42:48,940][00255] Avg episode reward: [(0, '16.270')] [2023-12-28 17:42:48,949][00795] Saving new best policy, reward=16.270! [2023-12-28 17:42:53,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3276.8). Total num frames: 2076672. Throughput: 0: 771.7. Samples: 518398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:42:53,938][00255] Avg episode reward: [(0, '16.805')] [2023-12-28 17:42:54,020][00795] Saving new best policy, reward=16.805! [2023-12-28 17:42:56,374][00808] Updated weights for policy 0, policy_version 510 (0.0016) [2023-12-28 17:42:58,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3304.6). Total num frames: 2097152. Throughput: 0: 794.6. Samples: 524148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:42:58,937][00255] Avg episode reward: [(0, '17.000')] [2023-12-28 17:42:58,952][00795] Saving new best policy, reward=17.000! [2023-12-28 17:43:03,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2113536. Throughput: 0: 775.2. Samples: 528894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:43:03,940][00255] Avg episode reward: [(0, '16.913')] [2023-12-28 17:43:08,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 2125824. Throughput: 0: 766.7. Samples: 530758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:43:08,943][00255] Avg episode reward: [(0, '17.169')] [2023-12-28 17:43:08,956][00795] Saving new best policy, reward=17.169! [2023-12-28 17:43:10,225][00808] Updated weights for policy 0, policy_version 520 (0.0021) [2023-12-28 17:43:13,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 2142208. Throughput: 0: 784.9. Samples: 535176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:43:13,943][00255] Avg episode reward: [(0, '16.625')] [2023-12-28 17:43:18,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 2162688. Throughput: 0: 812.7. Samples: 541356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:43:18,941][00255] Avg episode reward: [(0, '16.557')] [2023-12-28 17:43:18,953][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000528_2162688.pth... [2023-12-28 17:43:19,086][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000337_1380352.pth [2023-12-28 17:43:20,589][00808] Updated weights for policy 0, policy_version 530 (0.0023) [2023-12-28 17:43:23,938][00255] Fps is (10 sec: 3685.5, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 2179072. Throughput: 0: 814.2. Samples: 544262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:43:23,940][00255] Avg episode reward: [(0, '15.746')] [2023-12-28 17:43:28,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.2). Total num frames: 2191360. Throughput: 0: 793.6. Samples: 548372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:43:28,940][00255] Avg episode reward: [(0, '16.158')] [2023-12-28 17:43:33,876][00808] Updated weights for policy 0, policy_version 540 (0.0012) [2023-12-28 17:43:33,935][00255] Fps is (10 sec: 3277.6, 60 sec: 3277.2, 300 sec: 3262.9). Total num frames: 2211840. Throughput: 0: 823.1. Samples: 553298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:43:33,938][00255] Avg episode reward: [(0, '16.880')] [2023-12-28 17:43:38,935][00255] Fps is (10 sec: 4095.9, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2232320. Throughput: 0: 846.5. Samples: 556490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:43:38,938][00255] Avg episode reward: [(0, '16.997')] [2023-12-28 17:43:43,939][00255] Fps is (10 sec: 3684.8, 60 sec: 3344.8, 300 sec: 3276.8). Total num frames: 2248704. Throughput: 0: 857.2. Samples: 562724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:43:43,942][00255] Avg episode reward: [(0, '17.987')] [2023-12-28 17:43:43,951][00795] Saving new best policy, reward=17.987! [2023-12-28 17:43:44,283][00808] Updated weights for policy 0, policy_version 550 (0.0020) [2023-12-28 17:43:48,939][00255] Fps is (10 sec: 3275.5, 60 sec: 3413.1, 300 sec: 3262.9). Total num frames: 2265088. Throughput: 0: 842.3. Samples: 566800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:43:48,943][00255] Avg episode reward: [(0, '19.954')] [2023-12-28 17:43:48,957][00795] Saving new best policy, reward=19.954! [2023-12-28 17:43:53,935][00255] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2277376. Throughput: 0: 843.7. Samples: 568726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:43:53,937][00255] Avg episode reward: [(0, '20.771')] [2023-12-28 17:43:53,948][00795] Saving new best policy, reward=20.771! [2023-12-28 17:43:56,972][00808] Updated weights for policy 0, policy_version 560 (0.0014) [2023-12-28 17:43:58,935][00255] Fps is (10 sec: 3278.1, 60 sec: 3345.1, 300 sec: 3290.8). Total num frames: 2297856. Throughput: 0: 879.0. Samples: 574730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:43:58,942][00255] Avg episode reward: [(0, '19.836')] [2023-12-28 17:44:03,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 2318336. Throughput: 0: 865.7. Samples: 580314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:44:03,938][00255] Avg episode reward: [(0, '19.648')] [2023-12-28 17:44:08,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 2330624. Throughput: 0: 844.6. Samples: 582268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:44:08,938][00255] Avg episode reward: [(0, '20.396')] [2023-12-28 17:44:09,745][00808] Updated weights for policy 0, policy_version 570 (0.0021) [2023-12-28 17:44:13,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 2347008. Throughput: 0: 842.1. Samples: 586266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:44:13,943][00255] Avg episode reward: [(0, '19.650')] [2023-12-28 17:44:18,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2367488. Throughput: 0: 874.0. Samples: 592630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:44:18,943][00255] Avg episode reward: [(0, '18.466')] [2023-12-28 17:44:20,374][00808] Updated weights for policy 0, policy_version 580 (0.0013) [2023-12-28 17:44:23,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3304.6). Total num frames: 2387968. Throughput: 0: 876.0. Samples: 595908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:44:23,944][00255] Avg episode reward: [(0, '18.138')] [2023-12-28 17:44:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3262.9). Total num frames: 2400256. Throughput: 0: 836.7. Samples: 600372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:44:28,937][00255] Avg episode reward: [(0, '18.111')] [2023-12-28 17:44:33,748][00808] Updated weights for policy 0, policy_version 590 (0.0032) [2023-12-28 17:44:33,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 2416640. Throughput: 0: 841.4. Samples: 604660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:44:33,938][00255] Avg episode reward: [(0, '17.470')] [2023-12-28 17:44:38,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2437120. Throughput: 0: 868.7. Samples: 607816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:44:38,938][00255] Avg episode reward: [(0, '18.358')] [2023-12-28 17:44:43,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3290.7). Total num frames: 2453504. Throughput: 0: 869.1. Samples: 613840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:44:43,940][00255] Avg episode reward: [(0, '18.525')] [2023-12-28 17:44:44,192][00808] Updated weights for policy 0, policy_version 600 (0.0020) [2023-12-28 17:44:48,936][00255] Fps is (10 sec: 2866.9, 60 sec: 3345.2, 300 sec: 3262.9). Total num frames: 2465792. Throughput: 0: 832.4. Samples: 617774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:44:48,942][00255] Avg episode reward: [(0, '17.778')] [2023-12-28 17:44:53,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3263.0). Total num frames: 2482176. Throughput: 0: 833.6. Samples: 619778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:44:53,938][00255] Avg episode reward: [(0, '18.788')] [2023-12-28 17:44:57,326][00808] Updated weights for policy 0, policy_version 610 (0.0030) [2023-12-28 17:44:58,935][00255] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2502656. Throughput: 0: 873.4. Samples: 625570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:44:58,940][00255] Avg episode reward: [(0, '17.873')] [2023-12-28 17:45:03,941][00255] Fps is (10 sec: 4502.8, 60 sec: 3481.2, 300 sec: 3318.4). Total num frames: 2527232. Throughput: 0: 878.0. Samples: 632146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:45:03,943][00255] Avg episode reward: [(0, '16.710')] [2023-12-28 17:45:08,496][00808] Updated weights for policy 0, policy_version 620 (0.0026) [2023-12-28 17:45:08,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 2539520. Throughput: 0: 850.4. Samples: 634174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:45:08,942][00255] Avg episode reward: [(0, '16.771')] [2023-12-28 17:45:13,937][00255] Fps is (10 sec: 2458.7, 60 sec: 3413.2, 300 sec: 3276.8). Total num frames: 2551808. Throughput: 0: 842.5. Samples: 638284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:45:13,945][00255] Avg episode reward: [(0, '18.180')] [2023-12-28 17:45:18,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2572288. Throughput: 0: 879.6. Samples: 644240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:45:18,938][00255] Avg episode reward: [(0, '18.880')] [2023-12-28 17:45:18,951][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth... [2023-12-28 17:45:19,106][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_1785856.pth [2023-12-28 17:45:20,068][00808] Updated weights for policy 0, policy_version 630 (0.0028) [2023-12-28 17:45:23,935][00255] Fps is (10 sec: 4096.8, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 2592768. Throughput: 0: 877.2. Samples: 647290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:45:23,940][00255] Avg episode reward: [(0, '19.399')] [2023-12-28 17:45:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3290.7). Total num frames: 2605056. Throughput: 0: 847.9. Samples: 651994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:45:28,939][00255] Avg episode reward: [(0, '19.803')] [2023-12-28 17:45:33,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2617344. Throughput: 0: 840.5. Samples: 655594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:45:33,938][00255] Avg episode reward: [(0, '20.558')] [2023-12-28 17:45:34,053][00808] Updated weights for policy 0, policy_version 640 (0.0013) [2023-12-28 17:45:38,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2637824. Throughput: 0: 852.4. Samples: 658134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:45:38,937][00255] Avg episode reward: [(0, '18.920')] [2023-12-28 17:45:43,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 2658304. Throughput: 0: 856.2. Samples: 664098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:45:43,937][00255] Avg episode reward: [(0, '18.585')] [2023-12-28 17:45:44,924][00808] Updated weights for policy 0, policy_version 650 (0.0019) [2023-12-28 17:45:48,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3290.7). Total num frames: 2670592. Throughput: 0: 807.8. Samples: 668494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:45:48,938][00255] Avg episode reward: [(0, '20.001')] [2023-12-28 17:45:53,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2682880. Throughput: 0: 803.4. Samples: 670328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:45:53,940][00255] Avg episode reward: [(0, '20.653')] [2023-12-28 17:45:58,914][00808] Updated weights for policy 0, policy_version 660 (0.0035) [2023-12-28 17:45:58,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 2703360. Throughput: 0: 818.9. Samples: 675134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:45:58,937][00255] Avg episode reward: [(0, '20.488')] [2023-12-28 17:46:03,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3277.1, 300 sec: 3318.5). Total num frames: 2723840. Throughput: 0: 824.4. Samples: 681336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:46:03,937][00255] Avg episode reward: [(0, '20.289')] [2023-12-28 17:46:08,939][00255] Fps is (10 sec: 3275.5, 60 sec: 3276.6, 300 sec: 3290.6). Total num frames: 2736128. Throughput: 0: 809.3. Samples: 683710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:46:08,945][00255] Avg episode reward: [(0, '21.344')] [2023-12-28 17:46:08,966][00795] Saving new best policy, reward=21.344! [2023-12-28 17:46:11,029][00808] Updated weights for policy 0, policy_version 670 (0.0013) [2023-12-28 17:46:13,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.9, 300 sec: 3276.8). Total num frames: 2748416. Throughput: 0: 792.5. Samples: 687656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:46:13,938][00255] Avg episode reward: [(0, '21.237')] [2023-12-28 17:46:18,935][00255] Fps is (10 sec: 3278.1, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 2768896. Throughput: 0: 823.8. Samples: 692666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:46:18,938][00255] Avg episode reward: [(0, '19.055')] [2023-12-28 17:46:22,920][00808] Updated weights for policy 0, policy_version 680 (0.0014) [2023-12-28 17:46:23,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2785280. Throughput: 0: 833.3. Samples: 695632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:46:23,938][00255] Avg episode reward: [(0, '19.430')] [2023-12-28 17:46:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2801664. Throughput: 0: 816.3. Samples: 700830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:46:28,937][00255] Avg episode reward: [(0, '19.012')] [2023-12-28 17:46:33,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2813952. Throughput: 0: 801.0. Samples: 704540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:46:33,940][00255] Avg episode reward: [(0, '19.258')] [2023-12-28 17:46:37,255][00808] Updated weights for policy 0, policy_version 690 (0.0025) [2023-12-28 17:46:38,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2830336. Throughput: 0: 805.4. Samples: 706570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:46:38,942][00255] Avg episode reward: [(0, '18.966')] [2023-12-28 17:46:43,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 2850816. Throughput: 0: 828.9. Samples: 712436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:46:43,938][00255] Avg episode reward: [(0, '19.415')] [2023-12-28 17:46:48,368][00808] Updated weights for policy 0, policy_version 700 (0.0013) [2023-12-28 17:46:48,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 2867200. Throughput: 0: 801.9. Samples: 717420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:46:48,939][00255] Avg episode reward: [(0, '20.308')] [2023-12-28 17:46:53,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2879488. Throughput: 0: 791.1. Samples: 719308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:46:53,941][00255] Avg episode reward: [(0, '19.452')] [2023-12-28 17:46:58,936][00255] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 2895872. Throughput: 0: 796.9. Samples: 723516. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:46:58,938][00255] Avg episode reward: [(0, '19.740')] [2023-12-28 17:47:01,614][00808] Updated weights for policy 0, policy_version 710 (0.0015) [2023-12-28 17:47:03,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 2916352. Throughput: 0: 820.8. Samples: 729600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:47:03,938][00255] Avg episode reward: [(0, '19.890')] [2023-12-28 17:47:08,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3277.0, 300 sec: 3318.5). Total num frames: 2932736. Throughput: 0: 822.8. Samples: 732658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:08,937][00255] Avg episode reward: [(0, '20.294')] [2023-12-28 17:47:13,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 2945024. Throughput: 0: 798.7. Samples: 736770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:13,938][00255] Avg episode reward: [(0, '20.722')] [2023-12-28 17:47:14,016][00808] Updated weights for policy 0, policy_version 720 (0.0019) [2023-12-28 17:47:18,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3304.6). Total num frames: 2961408. Throughput: 0: 821.7. Samples: 741516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:47:18,940][00255] Avg episode reward: [(0, '19.981')] [2023-12-28 17:47:18,953][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000723_2961408.pth... [2023-12-28 17:47:19,103][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000528_2162688.pth [2023-12-28 17:47:23,935][00255] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 2981888. Throughput: 0: 846.0. Samples: 744640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:23,943][00255] Avg episode reward: [(0, '20.190')] [2023-12-28 17:47:24,943][00808] Updated weights for policy 0, policy_version 730 (0.0023) [2023-12-28 17:47:28,935][00255] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3346.3). Total num frames: 3002368. Throughput: 0: 851.3. Samples: 750746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:28,941][00255] Avg episode reward: [(0, '21.800')] [2023-12-28 17:47:28,953][00795] Saving new best policy, reward=21.800! [2023-12-28 17:47:33,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 3014656. Throughput: 0: 830.3. Samples: 754782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:33,944][00255] Avg episode reward: [(0, '21.542')] [2023-12-28 17:47:38,509][00808] Updated weights for policy 0, policy_version 740 (0.0021) [2023-12-28 17:47:38,935][00255] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3031040. Throughput: 0: 834.1. Samples: 756844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:38,940][00255] Avg episode reward: [(0, '21.856')] [2023-12-28 17:47:38,949][00795] Saving new best policy, reward=21.856! [2023-12-28 17:47:43,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3051520. Throughput: 0: 869.9. Samples: 762660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:47:43,945][00255] Avg episode reward: [(0, '20.698')] [2023-12-28 17:47:48,820][00808] Updated weights for policy 0, policy_version 750 (0.0021) [2023-12-28 17:47:48,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3072000. Throughput: 0: 864.4. Samples: 768500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:47:48,938][00255] Avg episode reward: [(0, '20.995')] [2023-12-28 17:47:53,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3084288. Throughput: 0: 839.9. Samples: 770454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:47:53,945][00255] Avg episode reward: [(0, '21.684')] [2023-12-28 17:47:58,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3096576. Throughput: 0: 835.2. Samples: 774356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:47:58,938][00255] Avg episode reward: [(0, '20.698')] [2023-12-28 17:48:02,075][00808] Updated weights for policy 0, policy_version 760 (0.0026) [2023-12-28 17:48:03,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 3117056. Throughput: 0: 862.9. Samples: 780348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:48:03,943][00255] Avg episode reward: [(0, '19.410')] [2023-12-28 17:48:08,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3137536. Throughput: 0: 863.5. Samples: 783496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:48:08,938][00255] Avg episode reward: [(0, '19.813')] [2023-12-28 17:48:13,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3149824. Throughput: 0: 830.2. Samples: 788106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:48:13,941][00255] Avg episode reward: [(0, '19.066')] [2023-12-28 17:48:14,110][00808] Updated weights for policy 0, policy_version 770 (0.0017) [2023-12-28 17:48:18,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3346.3). Total num frames: 3166208. Throughput: 0: 829.9. Samples: 792126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:48:18,944][00255] Avg episode reward: [(0, '17.824')] [2023-12-28 17:48:23,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3186688. Throughput: 0: 848.6. Samples: 795030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:48:23,938][00255] Avg episode reward: [(0, '17.344')] [2023-12-28 17:48:25,958][00808] Updated weights for policy 0, policy_version 780 (0.0015) [2023-12-28 17:48:28,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3207168. Throughput: 0: 854.4. Samples: 801106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:48:28,940][00255] Avg episode reward: [(0, '17.894')] [2023-12-28 17:48:33,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 3215360. Throughput: 0: 818.5. Samples: 805332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:48:33,941][00255] Avg episode reward: [(0, '19.840')] [2023-12-28 17:48:38,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 3231744. Throughput: 0: 818.8. Samples: 807302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:48:38,941][00255] Avg episode reward: [(0, '20.065')] [2023-12-28 17:48:39,874][00808] Updated weights for policy 0, policy_version 790 (0.0033) [2023-12-28 17:48:43,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3346.3). Total num frames: 3252224. Throughput: 0: 851.0. Samples: 812650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:48:43,937][00255] Avg episode reward: [(0, '20.373')] [2023-12-28 17:48:48,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 3272704. Throughput: 0: 864.3. Samples: 819242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:48:48,943][00255] Avg episode reward: [(0, '21.806')] [2023-12-28 17:48:49,156][00808] Updated weights for policy 0, policy_version 800 (0.0018) [2023-12-28 17:48:53,937][00255] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3360.1). Total num frames: 3289088. Throughput: 0: 847.5. Samples: 821636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:48:53,944][00255] Avg episode reward: [(0, '21.665')] [2023-12-28 17:48:58,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3301376. Throughput: 0: 836.3. Samples: 825738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:48:58,939][00255] Avg episode reward: [(0, '21.234')] [2023-12-28 17:49:02,446][00808] Updated weights for policy 0, policy_version 810 (0.0025) [2023-12-28 17:49:03,935][00255] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3321856. Throughput: 0: 871.2. Samples: 831328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:49:03,944][00255] Avg episode reward: [(0, '21.153')] [2023-12-28 17:49:08,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3342336. Throughput: 0: 876.1. Samples: 834456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:49:08,939][00255] Avg episode reward: [(0, '20.431')] [2023-12-28 17:49:13,504][00808] Updated weights for policy 0, policy_version 820 (0.0016) [2023-12-28 17:49:13,940][00255] Fps is (10 sec: 3684.5, 60 sec: 3481.3, 300 sec: 3360.1). Total num frames: 3358720. Throughput: 0: 856.8. Samples: 839668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:49:13,946][00255] Avg episode reward: [(0, '20.526')] [2023-12-28 17:49:18,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3371008. Throughput: 0: 853.0. Samples: 843718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:49:18,940][00255] Avg episode reward: [(0, '20.464')] [2023-12-28 17:49:18,954][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth... [2023-12-28 17:49:19,121][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth [2023-12-28 17:49:23,935][00255] Fps is (10 sec: 3278.5, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3391488. Throughput: 0: 860.7. Samples: 846032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:49:23,938][00255] Avg episode reward: [(0, '21.013')] [2023-12-28 17:49:25,846][00808] Updated weights for policy 0, policy_version 830 (0.0023) [2023-12-28 17:49:28,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3411968. Throughput: 0: 880.5. Samples: 852274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:49:28,942][00255] Avg episode reward: [(0, '19.778')] [2023-12-28 17:49:33,938][00255] Fps is (10 sec: 3275.9, 60 sec: 3481.5, 300 sec: 3346.2). Total num frames: 3424256. Throughput: 0: 846.0. Samples: 857314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:49:33,940][00255] Avg episode reward: [(0, '19.626')] [2023-12-28 17:49:38,743][00808] Updated weights for policy 0, policy_version 840 (0.0031) [2023-12-28 17:49:38,936][00255] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 3440640. Throughput: 0: 836.6. Samples: 859282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:49:38,938][00255] Avg episode reward: [(0, '20.512')] [2023-12-28 17:49:43,935][00255] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 3457024. Throughput: 0: 847.2. Samples: 863860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:49:43,938][00255] Avg episode reward: [(0, '20.560')] [2023-12-28 17:49:48,935][00255] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3477504. Throughput: 0: 865.3. Samples: 870268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:49:48,938][00255] Avg episode reward: [(0, '19.558')] [2023-12-28 17:49:49,286][00808] Updated weights for policy 0, policy_version 850 (0.0017) [2023-12-28 17:49:53,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 3493888. Throughput: 0: 861.4. Samples: 873218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:49:53,942][00255] Avg episode reward: [(0, '20.157')] [2023-12-28 17:49:58,935][00255] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 3506176. Throughput: 0: 835.0. Samples: 877240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-28 17:49:58,938][00255] Avg episode reward: [(0, '20.648')] [2023-12-28 17:50:03,026][00808] Updated weights for policy 0, policy_version 860 (0.0015) [2023-12-28 17:50:03,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 3526656. Throughput: 0: 851.5. Samples: 882036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-28 17:50:03,941][00255] Avg episode reward: [(0, '20.474')] [2023-12-28 17:50:08,936][00255] Fps is (10 sec: 4095.8, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 3547136. Throughput: 0: 870.8. Samples: 885220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:08,944][00255] Avg episode reward: [(0, '20.931')] [2023-12-28 17:50:13,099][00808] Updated weights for policy 0, policy_version 870 (0.0029) [2023-12-28 17:50:13,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3360.1). Total num frames: 3563520. Throughput: 0: 862.5. Samples: 891086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:13,938][00255] Avg episode reward: [(0, '20.818')] [2023-12-28 17:50:18,935][00255] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 3575808. Throughput: 0: 839.2. Samples: 895076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:18,938][00255] Avg episode reward: [(0, '20.952')] [2023-12-28 17:50:23,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 3588096. Throughput: 0: 836.9. Samples: 896944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:23,938][00255] Avg episode reward: [(0, '19.848')] [2023-12-28 17:50:27,692][00808] Updated weights for policy 0, policy_version 880 (0.0013) [2023-12-28 17:50:28,935][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 3608576. Throughput: 0: 839.7. Samples: 901648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:28,938][00255] Avg episode reward: [(0, '20.172')] [2023-12-28 17:50:33,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3346.2). Total num frames: 3624960. Throughput: 0: 812.5. Samples: 906830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:33,943][00255] Avg episode reward: [(0, '19.360')] [2023-12-28 17:50:38,937][00255] Fps is (10 sec: 2457.2, 60 sec: 3208.5, 300 sec: 3304.5). Total num frames: 3633152. Throughput: 0: 781.3. Samples: 908380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:50:38,939][00255] Avg episode reward: [(0, '18.815')] [2023-12-28 17:50:43,101][00808] Updated weights for policy 0, policy_version 890 (0.0020) [2023-12-28 17:50:43,935][00255] Fps is (10 sec: 2048.0, 60 sec: 3140.3, 300 sec: 3304.6). Total num frames: 3645440. Throughput: 0: 765.9. Samples: 911704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-28 17:50:43,937][00255] Avg episode reward: [(0, '18.380')] [2023-12-28 17:50:48,935][00255] Fps is (10 sec: 2867.7, 60 sec: 3072.0, 300 sec: 3318.5). Total num frames: 3661824. Throughput: 0: 758.3. Samples: 916160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:50:48,938][00255] Avg episode reward: [(0, '19.157')] [2023-12-28 17:50:53,937][00255] Fps is (10 sec: 3276.1, 60 sec: 3071.9, 300 sec: 3304.5). Total num frames: 3678208. Throughput: 0: 748.3. Samples: 918896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:50:53,943][00255] Avg episode reward: [(0, '19.779')] [2023-12-28 17:50:56,065][00808] Updated weights for policy 0, policy_version 900 (0.0027) [2023-12-28 17:50:58,937][00255] Fps is (10 sec: 2866.7, 60 sec: 3071.9, 300 sec: 3276.8). Total num frames: 3690496. Throughput: 0: 714.0. Samples: 923216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:50:58,942][00255] Avg episode reward: [(0, '19.993')] [2023-12-28 17:51:03,935][00255] Fps is (10 sec: 2867.8, 60 sec: 3003.7, 300 sec: 3290.7). Total num frames: 3706880. Throughput: 0: 710.4. Samples: 927044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:51:03,939][00255] Avg episode reward: [(0, '20.746')] [2023-12-28 17:51:08,935][00255] Fps is (10 sec: 2867.7, 60 sec: 2867.2, 300 sec: 3290.7). Total num frames: 3719168. Throughput: 0: 711.4. Samples: 928956. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-28 17:51:08,943][00255] Avg episode reward: [(0, '21.705')] [2023-12-28 17:51:10,662][00808] Updated weights for policy 0, policy_version 910 (0.0017) [2023-12-28 17:51:13,937][00255] Fps is (10 sec: 3276.1, 60 sec: 2935.4, 300 sec: 3290.7). Total num frames: 3739648. Throughput: 0: 729.5. Samples: 934478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:51:13,941][00255] Avg episode reward: [(0, '22.242')] [2023-12-28 17:51:13,946][00795] Saving new best policy, reward=22.242! [2023-12-28 17:51:18,938][00255] Fps is (10 sec: 3275.9, 60 sec: 2935.3, 300 sec: 3276.8). Total num frames: 3751936. Throughput: 0: 723.3. Samples: 939382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:51:18,941][00255] Avg episode reward: [(0, '21.866')] [2023-12-28 17:51:19,025][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000917_3756032.pth... [2023-12-28 17:51:19,261][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000723_2961408.pth [2023-12-28 17:51:23,778][00808] Updated weights for policy 0, policy_version 920 (0.0033) [2023-12-28 17:51:23,935][00255] Fps is (10 sec: 2867.8, 60 sec: 3003.7, 300 sec: 3276.8). Total num frames: 3768320. Throughput: 0: 729.4. Samples: 941200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:51:23,938][00255] Avg episode reward: [(0, '20.979')] [2023-12-28 17:51:28,935][00255] Fps is (10 sec: 2868.0, 60 sec: 2867.2, 300 sec: 3276.8). Total num frames: 3780608. Throughput: 0: 746.5. Samples: 945296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:51:28,938][00255] Avg episode reward: [(0, '22.019')] [2023-12-28 17:51:33,935][00255] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3290.7). Total num frames: 3801088. Throughput: 0: 788.8. Samples: 951658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:51:33,937][00255] Avg episode reward: [(0, '21.044')] [2023-12-28 17:51:34,969][00808] Updated weights for policy 0, policy_version 930 (0.0025) [2023-12-28 17:51:38,935][00255] Fps is (10 sec: 4096.0, 60 sec: 3140.4, 300 sec: 3290.7). Total num frames: 3821568. Throughput: 0: 796.0. Samples: 954714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:51:38,946][00255] Avg episode reward: [(0, '20.353')] [2023-12-28 17:51:43,936][00255] Fps is (10 sec: 3276.4, 60 sec: 3140.2, 300 sec: 3276.8). Total num frames: 3833856. Throughput: 0: 793.7. Samples: 958932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:51:43,951][00255] Avg episode reward: [(0, '20.139')] [2023-12-28 17:51:48,764][00808] Updated weights for policy 0, policy_version 940 (0.0024) [2023-12-28 17:51:48,935][00255] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3290.7). Total num frames: 3850240. Throughput: 0: 799.3. Samples: 963012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-28 17:51:48,938][00255] Avg episode reward: [(0, '19.159')] [2023-12-28 17:51:53,936][00255] Fps is (10 sec: 3686.6, 60 sec: 3208.6, 300 sec: 3304.6). Total num frames: 3870720. Throughput: 0: 822.9. Samples: 965988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:51:53,938][00255] Avg episode reward: [(0, '18.944')] [2023-12-28 17:51:58,935][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3290.7). Total num frames: 3887104. Throughput: 0: 836.4. Samples: 972114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:51:58,938][00255] Avg episode reward: [(0, '19.720')] [2023-12-28 17:51:59,265][00808] Updated weights for policy 0, policy_version 950 (0.0024) [2023-12-28 17:52:03,939][00255] Fps is (10 sec: 2866.3, 60 sec: 3208.3, 300 sec: 3276.8). Total num frames: 3899392. Throughput: 0: 812.7. Samples: 975956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:52:03,942][00255] Avg episode reward: [(0, '19.356')] [2023-12-28 17:52:08,935][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 3911680. Throughput: 0: 812.0. Samples: 977740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-28 17:52:08,946][00255] Avg episode reward: [(0, '19.955')] [2023-12-28 17:52:13,321][00808] Updated weights for policy 0, policy_version 960 (0.0028) [2023-12-28 17:52:13,938][00255] Fps is (10 sec: 3277.2, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 3932160. Throughput: 0: 834.0. Samples: 982830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:52:13,941][00255] Avg episode reward: [(0, '20.062')] [2023-12-28 17:52:18,938][00255] Fps is (10 sec: 4094.8, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 3952640. Throughput: 0: 837.1. Samples: 989328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:52:18,940][00255] Avg episode reward: [(0, '21.802')] [2023-12-28 17:52:23,937][00255] Fps is (10 sec: 3686.7, 60 sec: 3345.0, 300 sec: 3276.8). Total num frames: 3969024. Throughput: 0: 822.9. Samples: 991748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-28 17:52:23,940][00255] Avg episode reward: [(0, '22.859')] [2023-12-28 17:52:23,946][00795] Saving new best policy, reward=22.859! [2023-12-28 17:52:24,698][00808] Updated weights for policy 0, policy_version 970 (0.0037) [2023-12-28 17:52:28,936][00255] Fps is (10 sec: 2868.0, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3981312. Throughput: 0: 813.1. Samples: 995520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:52:28,944][00255] Avg episode reward: [(0, '22.055')] [2023-12-28 17:52:33,935][00255] Fps is (10 sec: 2458.1, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3993600. Throughput: 0: 794.3. Samples: 998754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-28 17:52:33,940][00255] Avg episode reward: [(0, '21.657')] [2023-12-28 17:52:38,932][00795] Stopping Batcher_0... [2023-12-28 17:52:38,932][00795] Loop batcher_evt_loop terminating... [2023-12-28 17:52:38,934][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-28 17:52:38,933][00255] Component Batcher_0 stopped! [2023-12-28 17:52:38,999][00255] Component RolloutWorker_w4 stopped! [2023-12-28 17:52:39,007][00813] Stopping RolloutWorker_w4... [2023-12-28 17:52:39,007][00813] Loop rollout_proc4_evt_loop terminating... [2023-12-28 17:52:39,014][00255] Component RolloutWorker_w2 stopped! [2023-12-28 17:52:39,021][00811] Stopping RolloutWorker_w2... [2023-12-28 17:52:39,022][00811] Loop rollout_proc2_evt_loop terminating... [2023-12-28 17:52:39,028][00255] Component RolloutWorker_w6 stopped! [2023-12-28 17:52:39,035][00815] Stopping RolloutWorker_w6... [2023-12-28 17:52:39,036][00815] Loop rollout_proc6_evt_loop terminating... [2023-12-28 17:52:39,052][00255] Component RolloutWorker_w0 stopped! [2023-12-28 17:52:39,059][00810] Stopping RolloutWorker_w0... [2023-12-28 17:52:39,060][00810] Loop rollout_proc0_evt_loop terminating... [2023-12-28 17:52:39,088][00808] Weights refcount: 2 0 [2023-12-28 17:52:39,110][00808] Stopping InferenceWorker_p0-w0... [2023-12-28 17:52:39,111][00808] Loop inference_proc0-0_evt_loop terminating... [2023-12-28 17:52:39,111][00255] Component InferenceWorker_p0-w0 stopped! [2023-12-28 17:52:39,137][00812] Stopping RolloutWorker_w3... [2023-12-28 17:52:39,140][00814] Stopping RolloutWorker_w5... [2023-12-28 17:52:39,137][00255] Component RolloutWorker_w3 stopped! [2023-12-28 17:52:39,138][00812] Loop rollout_proc3_evt_loop terminating... [2023-12-28 17:52:39,147][00255] Component RolloutWorker_w5 stopped! [2023-12-28 17:52:39,141][00814] Loop rollout_proc5_evt_loop terminating... [2023-12-28 17:52:39,161][00809] Stopping RolloutWorker_w1... [2023-12-28 17:52:39,161][00809] Loop rollout_proc1_evt_loop terminating... [2023-12-28 17:52:39,162][00255] Component RolloutWorker_w1 stopped! [2023-12-28 17:52:39,168][00816] Stopping RolloutWorker_w7... [2023-12-28 17:52:39,168][00255] Component RolloutWorker_w7 stopped! [2023-12-28 17:52:39,169][00816] Loop rollout_proc7_evt_loop terminating... [2023-12-28 17:52:39,213][00795] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth [2023-12-28 17:52:39,240][00795] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-28 17:52:39,518][00255] Component LearnerWorker_p0 stopped! [2023-12-28 17:52:39,520][00255] Waiting for process learner_proc0 to stop... [2023-12-28 17:52:39,522][00795] Stopping LearnerWorker_p0... [2023-12-28 17:52:39,527][00795] Loop learner_proc0_evt_loop terminating... [2023-12-28 17:52:41,803][00255] Waiting for process inference_proc0-0 to join... [2023-12-28 17:52:41,809][00255] Waiting for process rollout_proc0 to join... [2023-12-28 17:52:45,004][00255] Waiting for process rollout_proc1 to join... [2023-12-28 17:52:45,051][00255] Waiting for process rollout_proc2 to join... [2023-12-28 17:52:45,053][00255] Waiting for process rollout_proc3 to join... [2023-12-28 17:52:45,055][00255] Waiting for process rollout_proc4 to join... [2023-12-28 17:52:45,056][00255] Waiting for process rollout_proc5 to join... [2023-12-28 17:52:45,059][00255] Waiting for process rollout_proc6 to join... [2023-12-28 17:52:45,061][00255] Waiting for process rollout_proc7 to join... [2023-12-28 17:52:45,064][00255] Batcher 0 profile tree view: batching: 28.0898, releasing_batches: 0.0312 [2023-12-28 17:52:45,067][00255] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0028 wait_policy_total: 585.9297 update_model: 9.9226 weight_update: 0.0030 one_step: 0.0126 handle_policy_step: 602.7236 deserialize: 16.5802, stack: 3.3224, obs_to_device_normalize: 121.3483, forward: 322.0175, send_messages: 29.0396 prepare_outputs: 79.7466 to_cpu: 45.1381 [2023-12-28 17:52:45,068][00255] Learner 0 profile tree view: misc: 0.0058, prepare_batch: 14.7778 train: 76.8353 epoch_init: 0.0077, minibatch_init: 0.0076, losses_postprocess: 0.5790, kl_divergence: 0.7600, after_optimizer: 34.5202 calculate_losses: 27.8262 losses_init: 0.0092, forward_head: 1.3088, bptt_initial: 18.9094, tail: 1.2852, advantages_returns: 0.2868, losses: 3.6225 bptt: 2.0795 bptt_forward_core: 1.9783 update: 12.4267 clip: 0.9647 [2023-12-28 17:52:45,069][00255] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4220, enqueue_policy_requests: 170.7749, env_step: 929.0903, overhead: 24.9393, complete_rollouts: 7.5572 save_policy_outputs: 21.9222 split_output_tensors: 10.3722 [2023-12-28 17:52:45,070][00255] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4138, enqueue_policy_requests: 177.6018, env_step: 923.1664, overhead: 25.6922, complete_rollouts: 7.9335 save_policy_outputs: 22.3020 split_output_tensors: 10.8770 [2023-12-28 17:52:45,074][00255] Loop Runner_EvtLoop terminating... [2023-12-28 17:52:45,075][00255] Runner profile tree view: main_loop: 1277.7687 [2023-12-28 17:52:45,078][00255] Collected {0: 4005888}, FPS: 3135.1 [2023-12-28 17:52:45,406][00255] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-12-28 17:52:45,408][00255] Overriding arg 'num_workers' with value 1 passed from command line [2023-12-28 17:52:45,410][00255] Adding new argument 'no_render'=True that is not in the saved config file! [2023-12-28 17:52:45,414][00255] Adding new argument 'save_video'=True that is not in the saved config file! [2023-12-28 17:52:45,416][00255] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-12-28 17:52:45,417][00255] Adding new argument 'video_name'=None that is not in the saved config file! [2023-12-28 17:52:45,420][00255] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-12-28 17:52:45,421][00255] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-12-28 17:52:45,423][00255] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-12-28 17:52:45,426][00255] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-12-28 17:52:45,429][00255] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-12-28 17:52:45,430][00255] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-12-28 17:52:45,431][00255] Adding new argument 'train_script'=None that is not in the saved config file! [2023-12-28 17:52:45,432][00255] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-12-28 17:52:45,433][00255] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-12-28 17:52:45,493][00255] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-28 17:52:45,498][00255] RunningMeanStd input shape: (3, 72, 128) [2023-12-28 17:52:45,502][00255] RunningMeanStd input shape: (1,) [2023-12-28 17:52:45,527][00255] ConvEncoder: input_channels=3 [2023-12-28 17:52:45,701][00255] Conv encoder output size: 512 [2023-12-28 17:52:45,705][00255] Policy head output size: 512 [2023-12-28 17:52:46,080][00255] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-28 17:52:47,245][00255] Num frames 100... [2023-12-28 17:52:47,452][00255] Num frames 200... [2023-12-28 17:52:47,665][00255] Num frames 300... [2023-12-28 17:52:47,854][00255] Num frames 400... [2023-12-28 17:52:48,023][00255] Num frames 500... [2023-12-28 17:52:48,222][00255] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2023-12-28 17:52:48,224][00255] Avg episode reward: 9.760, avg true_objective: 5.760 [2023-12-28 17:52:48,258][00255] Num frames 600... [2023-12-28 17:52:48,399][00255] Num frames 700... [2023-12-28 17:52:48,566][00255] Num frames 800... [2023-12-28 17:52:48,730][00255] Num frames 900... [2023-12-28 17:52:48,879][00255] Num frames 1000... [2023-12-28 17:52:49,032][00255] Num frames 1100... [2023-12-28 17:52:49,201][00255] Num frames 1200... [2023-12-28 17:52:49,360][00255] Num frames 1300... [2023-12-28 17:52:49,505][00255] Num frames 1400... [2023-12-28 17:52:49,641][00255] Num frames 1500... [2023-12-28 17:52:49,764][00255] Num frames 1600... [2023-12-28 17:52:49,895][00255] Num frames 1700... [2023-12-28 17:52:50,020][00255] Num frames 1800... [2023-12-28 17:52:50,148][00255] Num frames 1900... [2023-12-28 17:52:50,281][00255] Num frames 2000... [2023-12-28 17:52:50,357][00255] Avg episode rewards: #0: 26.080, true rewards: #0: 10.080 [2023-12-28 17:52:50,359][00255] Avg episode reward: 26.080, avg true_objective: 10.080 [2023-12-28 17:52:50,467][00255] Num frames 2100... [2023-12-28 17:52:50,601][00255] Num frames 2200... [2023-12-28 17:52:50,734][00255] Num frames 2300... [2023-12-28 17:52:50,863][00255] Num frames 2400... [2023-12-28 17:52:50,996][00255] Num frames 2500... [2023-12-28 17:52:51,125][00255] Num frames 2600... [2023-12-28 17:52:51,254][00255] Num frames 2700... [2023-12-28 17:52:51,390][00255] Num frames 2800... [2023-12-28 17:52:51,521][00255] Num frames 2900... [2023-12-28 17:52:51,651][00255] Num frames 3000... [2023-12-28 17:52:51,790][00255] Num frames 3100... [2023-12-28 17:52:51,917][00255] Num frames 3200... [2023-12-28 17:52:52,052][00255] Num frames 3300... [2023-12-28 17:52:52,185][00255] Num frames 3400... [2023-12-28 17:52:52,313][00255] Num frames 3500... [2023-12-28 17:52:52,421][00255] Avg episode rewards: #0: 30.123, true rewards: #0: 11.790 [2023-12-28 17:52:52,423][00255] Avg episode reward: 30.123, avg true_objective: 11.790 [2023-12-28 17:52:52,503][00255] Num frames 3600... [2023-12-28 17:52:52,637][00255] Num frames 3700... [2023-12-28 17:52:52,778][00255] Num frames 3800... [2023-12-28 17:52:52,907][00255] Num frames 3900... [2023-12-28 17:52:53,037][00255] Num frames 4000... [2023-12-28 17:52:53,173][00255] Num frames 4100... [2023-12-28 17:52:53,303][00255] Num frames 4200... [2023-12-28 17:52:53,437][00255] Num frames 4300... [2023-12-28 17:52:53,589][00255] Avg episode rewards: #0: 26.922, true rewards: #0: 10.922 [2023-12-28 17:52:53,591][00255] Avg episode reward: 26.922, avg true_objective: 10.922 [2023-12-28 17:52:53,633][00255] Num frames 4400... [2023-12-28 17:52:53,772][00255] Num frames 4500... [2023-12-28 17:52:53,903][00255] Num frames 4600... [2023-12-28 17:52:54,032][00255] Num frames 4700... [2023-12-28 17:52:54,162][00255] Num frames 4800... [2023-12-28 17:52:54,289][00255] Num frames 4900... [2023-12-28 17:52:54,423][00255] Num frames 5000... [2023-12-28 17:52:54,556][00255] Num frames 5100... [2023-12-28 17:52:54,705][00255] Avg episode rewards: #0: 24.350, true rewards: #0: 10.350 [2023-12-28 17:52:54,707][00255] Avg episode reward: 24.350, avg true_objective: 10.350 [2023-12-28 17:52:54,746][00255] Num frames 5200... [2023-12-28 17:52:54,883][00255] Num frames 5300... [2023-12-28 17:52:55,022][00255] Num frames 5400... [2023-12-28 17:52:55,154][00255] Num frames 5500... [2023-12-28 17:52:55,286][00255] Num frames 5600... [2023-12-28 17:52:55,413][00255] Num frames 5700... [2023-12-28 17:52:55,488][00255] Avg episode rewards: #0: 22.357, true rewards: #0: 9.523 [2023-12-28 17:52:55,489][00255] Avg episode reward: 22.357, avg true_objective: 9.523 [2023-12-28 17:52:55,604][00255] Num frames 5800... [2023-12-28 17:52:55,729][00255] Num frames 5900... [2023-12-28 17:52:55,865][00255] Num frames 6000... [2023-12-28 17:52:55,998][00255] Num frames 6100... [2023-12-28 17:52:56,126][00255] Num frames 6200... [2023-12-28 17:52:56,255][00255] Num frames 6300... [2023-12-28 17:52:56,383][00255] Num frames 6400... [2023-12-28 17:52:56,509][00255] Num frames 6500... [2023-12-28 17:52:56,643][00255] Num frames 6600... [2023-12-28 17:52:56,753][00255] Avg episode rewards: #0: 21.917, true rewards: #0: 9.489 [2023-12-28 17:52:56,755][00255] Avg episode reward: 21.917, avg true_objective: 9.489 [2023-12-28 17:52:56,843][00255] Num frames 6700... [2023-12-28 17:52:56,977][00255] Num frames 6800... [2023-12-28 17:52:57,112][00255] Num frames 6900... [2023-12-28 17:52:57,243][00255] Num frames 7000... [2023-12-28 17:52:57,376][00255] Num frames 7100... [2023-12-28 17:52:57,508][00255] Num frames 7200... [2023-12-28 17:52:57,647][00255] Num frames 7300... [2023-12-28 17:52:57,774][00255] Num frames 7400... [2023-12-28 17:52:57,928][00255] Num frames 7500... [2023-12-28 17:52:58,116][00255] Num frames 7600... [2023-12-28 17:52:58,298][00255] Num frames 7700... [2023-12-28 17:52:58,479][00255] Num frames 7800... [2023-12-28 17:52:58,669][00255] Num frames 7900... [2023-12-28 17:52:58,851][00255] Num frames 8000... [2023-12-28 17:52:59,050][00255] Num frames 8100... [2023-12-28 17:52:59,250][00255] Avg episode rewards: #0: 23.472, true rewards: #0: 10.222 [2023-12-28 17:52:59,253][00255] Avg episode reward: 23.472, avg true_objective: 10.222 [2023-12-28 17:52:59,296][00255] Num frames 8200... [2023-12-28 17:52:59,476][00255] Num frames 8300... [2023-12-28 17:52:59,649][00255] Num frames 8400... [2023-12-28 17:52:59,838][00255] Num frames 8500... [2023-12-28 17:53:00,057][00255] Num frames 8600... [2023-12-28 17:53:00,200][00255] Avg episode rewards: #0: 21.938, true rewards: #0: 9.604 [2023-12-28 17:53:00,203][00255] Avg episode reward: 21.938, avg true_objective: 9.604 [2023-12-28 17:53:00,317][00255] Num frames 8700... [2023-12-28 17:53:00,524][00255] Num frames 8800... [2023-12-28 17:53:00,749][00255] Num frames 8900... [2023-12-28 17:53:00,960][00255] Num frames 9000... [2023-12-28 17:53:01,190][00255] Num frames 9100... [2023-12-28 17:53:01,381][00255] Num frames 9200... [2023-12-28 17:53:01,558][00255] Num frames 9300... [2023-12-28 17:53:01,736][00255] Num frames 9400... [2023-12-28 17:53:01,906][00255] Num frames 9500... [2023-12-28 17:53:02,083][00255] Num frames 9600... [2023-12-28 17:53:02,258][00255] Num frames 9700... [2023-12-28 17:53:02,414][00255] Num frames 9800... [2023-12-28 17:53:02,570][00255] Num frames 9900... [2023-12-28 17:53:02,739][00255] Num frames 10000... [2023-12-28 17:53:02,896][00255] Num frames 10100... [2023-12-28 17:53:03,029][00255] Avg episode rewards: #0: 23.647, true rewards: #0: 10.147 [2023-12-28 17:53:03,031][00255] Avg episode reward: 23.647, avg true_objective: 10.147 [2023-12-28 17:54:08,960][00255] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-12-28 17:59:56,881][00255] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-12-28 17:59:56,883][00255] Overriding arg 'num_workers' with value 1 passed from command line [2023-12-28 17:59:56,885][00255] Adding new argument 'no_render'=True that is not in the saved config file! [2023-12-28 17:59:56,887][00255] Adding new argument 'save_video'=True that is not in the saved config file! [2023-12-28 17:59:56,889][00255] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-12-28 17:59:56,891][00255] Adding new argument 'video_name'=None that is not in the saved config file! [2023-12-28 17:59:56,894][00255] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-12-28 17:59:56,896][00255] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-12-28 17:59:56,897][00255] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-12-28 17:59:56,898][00255] Adding new argument 'hf_repository'='andreatorch/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-12-28 17:59:56,900][00255] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-12-28 17:59:56,901][00255] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-12-28 17:59:56,902][00255] Adding new argument 'train_script'=None that is not in the saved config file! [2023-12-28 17:59:56,903][00255] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-12-28 17:59:56,905][00255] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-12-28 17:59:56,939][00255] RunningMeanStd input shape: (3, 72, 128) [2023-12-28 17:59:56,940][00255] RunningMeanStd input shape: (1,) [2023-12-28 17:59:56,955][00255] ConvEncoder: input_channels=3 [2023-12-28 17:59:56,992][00255] Conv encoder output size: 512 [2023-12-28 17:59:56,993][00255] Policy head output size: 512 [2023-12-28 17:59:57,013][00255] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-28 17:59:57,427][00255] Num frames 100... [2023-12-28 17:59:57,564][00255] Num frames 200... [2023-12-28 17:59:57,691][00255] Num frames 300... [2023-12-28 17:59:57,817][00255] Num frames 400... [2023-12-28 17:59:57,944][00255] Num frames 500... [2023-12-28 17:59:58,076][00255] Num frames 600... [2023-12-28 17:59:58,202][00255] Num frames 700... [2023-12-28 17:59:58,337][00255] Num frames 800... [2023-12-28 17:59:58,442][00255] Avg episode rewards: #0: 17.320, true rewards: #0: 8.320 [2023-12-28 17:59:58,444][00255] Avg episode reward: 17.320, avg true_objective: 8.320 [2023-12-28 17:59:58,543][00255] Num frames 900... [2023-12-28 17:59:58,666][00255] Num frames 1000... [2023-12-28 17:59:58,792][00255] Num frames 1100... [2023-12-28 17:59:58,920][00255] Num frames 1200... [2023-12-28 17:59:59,050][00255] Num frames 1300... [2023-12-28 17:59:59,109][00255] Avg episode rewards: #0: 13.005, true rewards: #0: 6.505 [2023-12-28 17:59:59,110][00255] Avg episode reward: 13.005, avg true_objective: 6.505 [2023-12-28 17:59:59,238][00255] Num frames 1400... [2023-12-28 17:59:59,367][00255] Num frames 1500... [2023-12-28 17:59:59,498][00255] Num frames 1600... [2023-12-28 17:59:59,629][00255] Num frames 1700... [2023-12-28 17:59:59,753][00255] Num frames 1800... [2023-12-28 17:59:59,880][00255] Num frames 1900... [2023-12-28 18:00:00,006][00255] Num frames 2000... [2023-12-28 18:00:00,134][00255] Num frames 2100... [2023-12-28 18:00:00,268][00255] Num frames 2200... [2023-12-28 18:00:00,397][00255] Num frames 2300... [2023-12-28 18:00:00,533][00255] Num frames 2400... [2023-12-28 18:00:00,663][00255] Num frames 2500... [2023-12-28 18:00:00,790][00255] Num frames 2600... [2023-12-28 18:00:00,921][00255] Num frames 2700... [2023-12-28 18:00:01,055][00255] Num frames 2800... [2023-12-28 18:00:01,185][00255] Num frames 2900... [2023-12-28 18:00:01,314][00255] Num frames 3000... [2023-12-28 18:00:01,452][00255] Num frames 3100... [2023-12-28 18:00:01,596][00255] Num frames 3200... [2023-12-28 18:00:01,728][00255] Num frames 3300... [2023-12-28 18:00:01,854][00255] Num frames 3400... [2023-12-28 18:00:01,913][00255] Avg episode rewards: #0: 27.003, true rewards: #0: 11.337 [2023-12-28 18:00:01,914][00255] Avg episode reward: 27.003, avg true_objective: 11.337 [2023-12-28 18:00:02,063][00255] Num frames 3500... [2023-12-28 18:00:02,268][00255] Num frames 3600... [2023-12-28 18:00:02,447][00255] Num frames 3700... [2023-12-28 18:00:02,638][00255] Num frames 3800... [2023-12-28 18:00:02,815][00255] Num frames 3900... [2023-12-28 18:00:02,997][00255] Num frames 4000... [2023-12-28 18:00:03,180][00255] Num frames 4100... [2023-12-28 18:00:03,372][00255] Num frames 4200... [2023-12-28 18:00:03,548][00255] Num frames 4300... [2023-12-28 18:00:03,727][00255] Num frames 4400... [2023-12-28 18:00:03,905][00255] Num frames 4500... [2023-12-28 18:00:04,095][00255] Num frames 4600... [2023-12-28 18:00:04,297][00255] Num frames 4700... [2023-12-28 18:00:04,493][00255] Num frames 4800... [2023-12-28 18:00:04,678][00255] Num frames 4900... [2023-12-28 18:00:04,861][00255] Num frames 5000... [2023-12-28 18:00:05,048][00255] Num frames 5100... [2023-12-28 18:00:05,186][00255] Num frames 5200... [2023-12-28 18:00:05,320][00255] Num frames 5300... [2023-12-28 18:00:05,405][00255] Avg episode rewards: #0: 31.802, true rewards: #0: 13.303 [2023-12-28 18:00:05,406][00255] Avg episode reward: 31.802, avg true_objective: 13.303 [2023-12-28 18:00:05,508][00255] Num frames 5400... [2023-12-28 18:00:05,640][00255] Num frames 5500... [2023-12-28 18:00:05,771][00255] Num frames 5600... [2023-12-28 18:00:05,897][00255] Num frames 5700... [2023-12-28 18:00:06,029][00255] Num frames 5800... [2023-12-28 18:00:06,159][00255] Num frames 5900... [2023-12-28 18:00:06,293][00255] Num frames 6000... [2023-12-28 18:00:06,465][00255] Avg episode rewards: #0: 27.978, true rewards: #0: 12.178 [2023-12-28 18:00:06,467][00255] Avg episode reward: 27.978, avg true_objective: 12.178 [2023-12-28 18:00:06,485][00255] Num frames 6100... [2023-12-28 18:00:06,612][00255] Num frames 6200... [2023-12-28 18:00:06,742][00255] Num frames 6300... [2023-12-28 18:00:06,867][00255] Num frames 6400... [2023-12-28 18:00:07,002][00255] Num frames 6500... [2023-12-28 18:00:07,130][00255] Num frames 6600... [2023-12-28 18:00:07,256][00255] Num frames 6700... [2023-12-28 18:00:07,386][00255] Num frames 6800... [2023-12-28 18:00:07,508][00255] Num frames 6900... [2023-12-28 18:00:07,636][00255] Num frames 7000... [2023-12-28 18:00:07,716][00255] Avg episode rewards: #0: 26.695, true rewards: #0: 11.695 [2023-12-28 18:00:07,717][00255] Avg episode reward: 26.695, avg true_objective: 11.695 [2023-12-28 18:00:07,828][00255] Num frames 7100... [2023-12-28 18:00:07,953][00255] Num frames 7200... [2023-12-28 18:00:08,089][00255] Num frames 7300... [2023-12-28 18:00:08,220][00255] Num frames 7400... [2023-12-28 18:00:08,355][00255] Num frames 7500... [2023-12-28 18:00:08,497][00255] Avg episode rewards: #0: 24.668, true rewards: #0: 10.811 [2023-12-28 18:00:08,498][00255] Avg episode reward: 24.668, avg true_objective: 10.811 [2023-12-28 18:00:08,550][00255] Num frames 7600... [2023-12-28 18:00:08,691][00255] Num frames 7700... [2023-12-28 18:00:08,829][00255] Num frames 7800... [2023-12-28 18:00:08,955][00255] Num frames 7900... [2023-12-28 18:00:09,087][00255] Num frames 8000... [2023-12-28 18:00:09,215][00255] Num frames 8100... [2023-12-28 18:00:09,341][00255] Num frames 8200... [2023-12-28 18:00:09,472][00255] Num frames 8300... [2023-12-28 18:00:09,597][00255] Num frames 8400... [2023-12-28 18:00:09,721][00255] Num frames 8500... [2023-12-28 18:00:09,859][00255] Num frames 8600... [2023-12-28 18:00:09,984][00255] Num frames 8700... [2023-12-28 18:00:10,143][00255] Num frames 8800... [2023-12-28 18:00:10,275][00255] Num frames 8900... [2023-12-28 18:00:10,408][00255] Num frames 9000... [2023-12-28 18:00:10,537][00255] Num frames 9100... [2023-12-28 18:00:10,668][00255] Num frames 9200... [2023-12-28 18:00:10,797][00255] Num frames 9300... [2023-12-28 18:00:10,941][00255] Num frames 9400... [2023-12-28 18:00:11,138][00255] Avg episode rewards: #0: 27.621, true rewards: #0: 11.871 [2023-12-28 18:00:11,140][00255] Avg episode reward: 27.621, avg true_objective: 11.871 [2023-12-28 18:00:11,150][00255] Num frames 9500... [2023-12-28 18:00:11,281][00255] Num frames 9600... [2023-12-28 18:00:11,413][00255] Num frames 9700... [2023-12-28 18:00:11,542][00255] Num frames 9800... [2023-12-28 18:00:11,674][00255] Num frames 9900... [2023-12-28 18:00:11,799][00255] Num frames 10000... [2023-12-28 18:00:11,945][00255] Num frames 10100... [2023-12-28 18:00:12,089][00255] Num frames 10200... [2023-12-28 18:00:12,220][00255] Num frames 10300... [2023-12-28 18:00:12,354][00255] Num frames 10400... [2023-12-28 18:00:12,482][00255] Num frames 10500... [2023-12-28 18:00:12,618][00255] Num frames 10600... [2023-12-28 18:00:12,746][00255] Num frames 10700... [2023-12-28 18:00:12,879][00255] Num frames 10800... [2023-12-28 18:00:13,010][00255] Num frames 10900... [2023-12-28 18:00:13,142][00255] Num frames 11000... [2023-12-28 18:00:13,270][00255] Num frames 11100... [2023-12-28 18:00:13,405][00255] Num frames 11200... [2023-12-28 18:00:13,540][00255] Num frames 11300... [2023-12-28 18:00:13,668][00255] Num frames 11400... [2023-12-28 18:00:13,799][00255] Num frames 11500... [2023-12-28 18:00:13,994][00255] Avg episode rewards: #0: 30.330, true rewards: #0: 12.886 [2023-12-28 18:00:13,995][00255] Avg episode reward: 30.330, avg true_objective: 12.886 [2023-12-28 18:00:14,004][00255] Num frames 11600... [2023-12-28 18:00:14,137][00255] Num frames 11700... [2023-12-28 18:00:14,267][00255] Num frames 11800... [2023-12-28 18:00:14,402][00255] Num frames 11900... [2023-12-28 18:00:14,529][00255] Num frames 12000... [2023-12-28 18:00:14,659][00255] Num frames 12100... [2023-12-28 18:00:14,787][00255] Num frames 12200... [2023-12-28 18:00:14,890][00255] Avg episode rewards: #0: 28.537, true rewards: #0: 12.237 [2023-12-28 18:00:14,893][00255] Avg episode reward: 28.537, avg true_objective: 12.237 [2023-12-28 18:01:31,820][00255] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-12-28 18:02:48,857][00255] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-12-28 18:02:48,859][00255] Overriding arg 'num_workers' with value 1 passed from command line [2023-12-28 18:02:48,861][00255] Adding new argument 'no_render'=True that is not in the saved config file! [2023-12-28 18:02:48,863][00255] Adding new argument 'save_video'=True that is not in the saved config file! [2023-12-28 18:02:48,865][00255] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-12-28 18:02:48,867][00255] Adding new argument 'video_name'=None that is not in the saved config file! [2023-12-28 18:02:48,869][00255] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-12-28 18:02:48,872][00255] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-12-28 18:02:48,873][00255] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-12-28 18:02:48,874][00255] Adding new argument 'hf_repository'='andreatorch/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-12-28 18:02:48,876][00255] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-12-28 18:02:48,877][00255] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-12-28 18:02:48,880][00255] Adding new argument 'train_script'=None that is not in the saved config file! [2023-12-28 18:02:48,881][00255] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-12-28 18:02:48,882][00255] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-12-28 18:02:48,952][00255] RunningMeanStd input shape: (3, 72, 128) [2023-12-28 18:02:48,956][00255] RunningMeanStd input shape: (1,) [2023-12-28 18:02:48,979][00255] ConvEncoder: input_channels=3 [2023-12-28 18:02:49,053][00255] Conv encoder output size: 512 [2023-12-28 18:02:49,055][00255] Policy head output size: 512 [2023-12-28 18:02:49,090][00255] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-28 18:02:49,725][00255] Num frames 100... [2023-12-28 18:02:49,911][00255] Num frames 200... [2023-12-28 18:02:50,103][00255] Num frames 300... [2023-12-28 18:02:50,294][00255] Num frames 400... [2023-12-28 18:02:50,477][00255] Num frames 500... [2023-12-28 18:02:50,643][00255] Num frames 600... [2023-12-28 18:02:50,775][00255] Num frames 700... [2023-12-28 18:02:50,913][00255] Num frames 800... [2023-12-28 18:02:51,048][00255] Num frames 900... [2023-12-28 18:02:51,194][00255] Num frames 1000... [2023-12-28 18:02:51,333][00255] Num frames 1100... [2023-12-28 18:02:51,465][00255] Num frames 1200... [2023-12-28 18:02:51,594][00255] Num frames 1300... [2023-12-28 18:02:51,722][00255] Num frames 1400... [2023-12-28 18:02:51,850][00255] Num frames 1500... [2023-12-28 18:02:51,979][00255] Num frames 1600... [2023-12-28 18:02:52,113][00255] Num frames 1700... [2023-12-28 18:02:52,258][00255] Num frames 1800... [2023-12-28 18:02:52,392][00255] Num frames 1900... [2023-12-28 18:02:52,521][00255] Num frames 2000... [2023-12-28 18:02:52,654][00255] Num frames 2100... [2023-12-28 18:02:52,707][00255] Avg episode rewards: #0: 50.999, true rewards: #0: 21.000 [2023-12-28 18:02:52,709][00255] Avg episode reward: 50.999, avg true_objective: 21.000 [2023-12-28 18:02:52,834][00255] Num frames 2200... [2023-12-28 18:02:52,967][00255] Num frames 2300... [2023-12-28 18:02:53,103][00255] Num frames 2400... [2023-12-28 18:02:53,240][00255] Num frames 2500... [2023-12-28 18:02:53,376][00255] Num frames 2600... [2023-12-28 18:02:53,500][00255] Num frames 2700... [2023-12-28 18:02:53,634][00255] Num frames 2800... [2023-12-28 18:02:53,756][00255] Num frames 2900... [2023-12-28 18:02:53,882][00255] Num frames 3000... [2023-12-28 18:02:54,017][00255] Num frames 3100... [2023-12-28 18:02:54,134][00255] Avg episode rewards: #0: 38.225, true rewards: #0: 15.725 [2023-12-28 18:02:54,136][00255] Avg episode reward: 38.225, avg true_objective: 15.725 [2023-12-28 18:02:54,211][00255] Num frames 3200... [2023-12-28 18:02:54,357][00255] Num frames 3300... [2023-12-28 18:02:54,484][00255] Num frames 3400... [2023-12-28 18:02:54,615][00255] Num frames 3500... [2023-12-28 18:02:54,741][00255] Num frames 3600... [2023-12-28 18:02:54,868][00255] Num frames 3700... [2023-12-28 18:02:54,997][00255] Num frames 3800... [2023-12-28 18:02:55,128][00255] Num frames 3900... [2023-12-28 18:02:55,266][00255] Num frames 4000... [2023-12-28 18:02:55,397][00255] Num frames 4100... [2023-12-28 18:02:55,473][00255] Avg episode rewards: #0: 32.050, true rewards: #0: 13.717 [2023-12-28 18:02:55,475][00255] Avg episode reward: 32.050, avg true_objective: 13.717 [2023-12-28 18:02:55,588][00255] Num frames 4200... [2023-12-28 18:02:55,717][00255] Num frames 4300... [2023-12-28 18:02:55,845][00255] Num frames 4400... [2023-12-28 18:02:55,974][00255] Num frames 4500... [2023-12-28 18:02:56,106][00255] Num frames 4600... [2023-12-28 18:02:56,234][00255] Num frames 4700... [2023-12-28 18:02:56,367][00255] Avg episode rewards: #0: 27.137, true rewards: #0: 11.888 [2023-12-28 18:02:56,369][00255] Avg episode reward: 27.137, avg true_objective: 11.888 [2023-12-28 18:02:56,427][00255] Num frames 4800... [2023-12-28 18:02:56,554][00255] Num frames 4900... [2023-12-28 18:02:56,680][00255] Num frames 5000... [2023-12-28 18:02:56,807][00255] Num frames 5100... [2023-12-28 18:02:56,936][00255] Num frames 5200... [2023-12-28 18:02:56,998][00255] Avg episode rewards: #0: 22.806, true rewards: #0: 10.406 [2023-12-28 18:02:56,999][00255] Avg episode reward: 22.806, avg true_objective: 10.406 [2023-12-28 18:02:57,127][00255] Num frames 5300... [2023-12-28 18:02:57,259][00255] Num frames 5400... [2023-12-28 18:02:57,400][00255] Num frames 5500... [2023-12-28 18:02:57,532][00255] Num frames 5600... [2023-12-28 18:02:57,660][00255] Num frames 5700... [2023-12-28 18:02:57,793][00255] Num frames 5800... [2023-12-28 18:02:57,929][00255] Num frames 5900... [2023-12-28 18:02:58,060][00255] Num frames 6000... [2023-12-28 18:02:58,188][00255] Num frames 6100... [2023-12-28 18:02:58,327][00255] Num frames 6200... [2023-12-28 18:02:58,458][00255] Num frames 6300... [2023-12-28 18:02:58,590][00255] Num frames 6400... [2023-12-28 18:02:58,716][00255] Num frames 6500... [2023-12-28 18:02:58,847][00255] Num frames 6600... [2023-12-28 18:02:58,978][00255] Num frames 6700... [2023-12-28 18:02:59,109][00255] Num frames 6800... [2023-12-28 18:02:59,239][00255] Num frames 6900... [2023-12-28 18:02:59,375][00255] Num frames 7000... [2023-12-28 18:02:59,509][00255] Num frames 7100... [2023-12-28 18:02:59,644][00255] Num frames 7200... [2023-12-28 18:02:59,774][00255] Num frames 7300... [2023-12-28 18:02:59,835][00255] Avg episode rewards: #0: 28.171, true rewards: #0: 12.172 [2023-12-28 18:02:59,836][00255] Avg episode reward: 28.171, avg true_objective: 12.172 [2023-12-28 18:02:59,976][00255] Num frames 7400... [2023-12-28 18:03:00,111][00255] Num frames 7500... [2023-12-28 18:03:00,240][00255] Num frames 7600... [2023-12-28 18:03:00,326][00255] Avg episode rewards: #0: 25.033, true rewards: #0: 10.890 [2023-12-28 18:03:00,328][00255] Avg episode reward: 25.033, avg true_objective: 10.890 [2023-12-28 18:03:00,438][00255] Num frames 7700... [2023-12-28 18:03:00,565][00255] Num frames 7800... [2023-12-28 18:03:00,743][00255] Num frames 7900... [2023-12-28 18:03:00,935][00255] Num frames 8000... [2023-12-28 18:03:01,124][00255] Num frames 8100... [2023-12-28 18:03:01,308][00255] Num frames 8200... [2023-12-28 18:03:01,505][00255] Num frames 8300... [2023-12-28 18:03:01,693][00255] Num frames 8400... [2023-12-28 18:03:01,885][00255] Num frames 8500... [2023-12-28 18:03:01,970][00255] Avg episode rewards: #0: 24.515, true rewards: #0: 10.640 [2023-12-28 18:03:01,973][00255] Avg episode reward: 24.515, avg true_objective: 10.640 [2023-12-28 18:03:02,166][00255] Num frames 8600... [2023-12-28 18:03:02,353][00255] Num frames 8700... [2023-12-28 18:03:02,545][00255] Num frames 8800... [2023-12-28 18:03:02,732][00255] Num frames 8900... [2023-12-28 18:03:02,927][00255] Num frames 9000... [2023-12-28 18:03:03,127][00255] Num frames 9100... [2023-12-28 18:03:03,317][00255] Num frames 9200... [2023-12-28 18:03:03,510][00255] Num frames 9300... [2023-12-28 18:03:03,690][00255] Num frames 9400... [2023-12-28 18:03:03,816][00255] Num frames 9500... [2023-12-28 18:03:03,953][00255] Num frames 9600... [2023-12-28 18:03:04,006][00255] Avg episode rewards: #0: 24.222, true rewards: #0: 10.667 [2023-12-28 18:03:04,007][00255] Avg episode reward: 24.222, avg true_objective: 10.667 [2023-12-28 18:03:04,145][00255] Num frames 9700... [2023-12-28 18:03:04,279][00255] Num frames 9800... [2023-12-28 18:03:04,408][00255] Num frames 9900... [2023-12-28 18:03:04,550][00255] Num frames 10000... [2023-12-28 18:03:04,687][00255] Num frames 10100... [2023-12-28 18:03:04,826][00255] Num frames 10200... [2023-12-28 18:03:04,955][00255] Num frames 10300... [2023-12-28 18:03:05,088][00255] Num frames 10400... [2023-12-28 18:03:05,226][00255] Num frames 10500... [2023-12-28 18:03:05,360][00255] Num frames 10600... [2023-12-28 18:03:05,493][00255] Num frames 10700... [2023-12-28 18:03:05,627][00255] Num frames 10800... [2023-12-28 18:03:05,757][00255] Num frames 10900... [2023-12-28 18:03:05,885][00255] Num frames 11000... [2023-12-28 18:03:06,017][00255] Num frames 11100... [2023-12-28 18:03:06,163][00255] Avg episode rewards: #0: 25.468, true rewards: #0: 11.168 [2023-12-28 18:03:06,165][00255] Avg episode reward: 25.468, avg true_objective: 11.168 [2023-12-28 18:04:15,463][00255] Replay video saved to /content/train_dir/default_experiment/replay.mp4!