[2024-01-05 12:41:58,383][00209] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-01-05 12:41:58,388][00209] Rollout worker 0 uses device cpu [2024-01-05 12:41:58,389][00209] Rollout worker 1 uses device cpu [2024-01-05 12:41:58,390][00209] Rollout worker 2 uses device cpu [2024-01-05 12:41:58,396][00209] Rollout worker 3 uses device cpu [2024-01-05 12:41:58,398][00209] Rollout worker 4 uses device cpu [2024-01-05 12:41:58,399][00209] Rollout worker 5 uses device cpu [2024-01-05 12:41:58,403][00209] Rollout worker 6 uses device cpu [2024-01-05 12:41:58,404][00209] Rollout worker 7 uses device cpu [2024-01-05 12:41:58,581][00209] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 12:41:58,583][00209] InferenceWorker_p0-w0: min num requests: 2 [2024-01-05 12:41:58,618][00209] Starting all processes... [2024-01-05 12:41:58,620][00209] Starting process learner_proc0 [2024-01-05 12:41:58,670][00209] Starting all processes... [2024-01-05 12:41:58,677][00209] Starting process inference_proc0-0 [2024-01-05 12:41:58,678][00209] Starting process rollout_proc0 [2024-01-05 12:41:58,679][00209] Starting process rollout_proc1 [2024-01-05 12:41:58,679][00209] Starting process rollout_proc2 [2024-01-05 12:41:58,679][00209] Starting process rollout_proc3 [2024-01-05 12:41:58,680][00209] Starting process rollout_proc4 [2024-01-05 12:41:58,680][00209] Starting process rollout_proc5 [2024-01-05 12:41:58,680][00209] Starting process rollout_proc6 [2024-01-05 12:41:58,680][00209] Starting process rollout_proc7 [2024-01-05 12:42:15,173][02317] Worker 3 uses CPU cores [1] [2024-01-05 12:42:15,252][02312] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 12:42:15,260][02312] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-01-05 12:42:15,353][02312] Num visible devices: 1 [2024-01-05 12:42:15,380][02316] Worker 5 uses CPU cores [1] [2024-01-05 12:42:15,405][02315] Worker 2 uses CPU cores [0] [2024-01-05 12:42:15,451][02313] Worker 0 uses CPU cores [0] [2024-01-05 12:42:15,472][02299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 12:42:15,473][02299] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-01-05 12:42:15,510][02318] Worker 6 uses CPU cores [0] [2024-01-05 12:42:15,520][02299] Num visible devices: 1 [2024-01-05 12:42:15,536][02299] Starting seed is not provided [2024-01-05 12:42:15,537][02299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 12:42:15,537][02299] Initializing actor-critic model on device cuda:0 [2024-01-05 12:42:15,537][02299] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 12:42:15,540][02299] RunningMeanStd input shape: (1,) [2024-01-05 12:42:15,564][02314] Worker 1 uses CPU cores [1] [2024-01-05 12:42:15,565][02319] Worker 4 uses CPU cores [0] [2024-01-05 12:42:15,575][02299] ConvEncoder: input_channels=3 [2024-01-05 12:42:15,586][02322] Worker 7 uses CPU cores [1] [2024-01-05 12:42:15,815][02299] Conv encoder output size: 512 [2024-01-05 12:42:15,815][02299] Policy head output size: 512 [2024-01-05 12:42:15,867][02299] Created Actor Critic model with architecture: [2024-01-05 12:42:15,867][02299] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-01-05 12:42:16,290][02299] Using optimizer [2024-01-05 12:42:17,567][02299] No checkpoints found [2024-01-05 12:42:17,568][02299] Did not load from checkpoint, starting from scratch! [2024-01-05 12:42:17,568][02299] Initialized policy 0 weights for model version 0 [2024-01-05 12:42:17,572][02299] LearnerWorker_p0 finished initialization! [2024-01-05 12:42:17,573][02299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 12:42:17,770][02312] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 12:42:17,773][02312] RunningMeanStd input shape: (1,) [2024-01-05 12:42:17,789][02312] ConvEncoder: input_channels=3 [2024-01-05 12:42:17,891][02312] Conv encoder output size: 512 [2024-01-05 12:42:17,891][02312] Policy head output size: 512 [2024-01-05 12:42:17,959][00209] Inference worker 0-0 is ready! [2024-01-05 12:42:17,961][00209] All inference workers are ready! Signal rollout workers to start! [2024-01-05 12:42:18,171][02318] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,173][02319] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,170][02315] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,176][02313] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,179][02316] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,188][02317] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,187][02314] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,179][02322] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 12:42:18,574][00209] Heartbeat connected on Batcher_0 [2024-01-05 12:42:18,577][00209] Heartbeat connected on LearnerWorker_p0 [2024-01-05 12:42:18,609][00209] Heartbeat connected on InferenceWorker_p0-w0 [2024-01-05 12:42:18,852][02316] Decorrelating experience for 0 frames... [2024-01-05 12:42:19,243][02316] Decorrelating experience for 32 frames... [2024-01-05 12:42:19,432][02318] Decorrelating experience for 0 frames... [2024-01-05 12:42:19,434][02313] Decorrelating experience for 0 frames... [2024-01-05 12:42:19,436][02315] Decorrelating experience for 0 frames... [2024-01-05 12:42:20,392][02318] Decorrelating experience for 32 frames... [2024-01-05 12:42:20,394][02313] Decorrelating experience for 32 frames... [2024-01-05 12:42:20,405][02315] Decorrelating experience for 32 frames... [2024-01-05 12:42:20,411][02319] Decorrelating experience for 0 frames... [2024-01-05 12:42:20,797][02316] Decorrelating experience for 64 frames... [2024-01-05 12:42:21,637][02317] Decorrelating experience for 0 frames... [2024-01-05 12:42:21,660][02319] Decorrelating experience for 32 frames... [2024-01-05 12:42:21,672][02316] Decorrelating experience for 96 frames... [2024-01-05 12:42:21,830][00209] Heartbeat connected on RolloutWorker_w5 [2024-01-05 12:42:22,002][02318] Decorrelating experience for 64 frames... [2024-01-05 12:42:22,009][02315] Decorrelating experience for 64 frames... [2024-01-05 12:42:22,361][02317] Decorrelating experience for 32 frames... [2024-01-05 12:42:22,552][00209] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 12:42:23,011][02322] Decorrelating experience for 0 frames... [2024-01-05 12:42:23,104][02313] Decorrelating experience for 64 frames... [2024-01-05 12:42:23,108][02319] Decorrelating experience for 64 frames... [2024-01-05 12:42:23,427][02322] Decorrelating experience for 32 frames... [2024-01-05 12:42:25,215][02315] Decorrelating experience for 96 frames... [2024-01-05 12:42:25,462][02313] Decorrelating experience for 96 frames... [2024-01-05 12:42:25,464][02319] Decorrelating experience for 96 frames... [2024-01-05 12:42:25,679][02314] Decorrelating experience for 0 frames... [2024-01-05 12:42:26,029][00209] Heartbeat connected on RolloutWorker_w2 [2024-01-05 12:42:26,251][02322] Decorrelating experience for 64 frames... [2024-01-05 12:42:26,519][00209] Heartbeat connected on RolloutWorker_w4 [2024-01-05 12:42:26,526][00209] Heartbeat connected on RolloutWorker_w0 [2024-01-05 12:42:27,556][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 136.7. Samples: 684. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 12:42:27,561][00209] Avg episode reward: [(0, '4.184')] [2024-01-05 12:42:28,064][02317] Decorrelating experience for 64 frames... [2024-01-05 12:42:28,081][02314] Decorrelating experience for 32 frames... [2024-01-05 12:42:28,961][02318] Decorrelating experience for 96 frames... [2024-01-05 12:42:30,310][00209] Heartbeat connected on RolloutWorker_w6 [2024-01-05 12:42:32,552][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 158.8. Samples: 1588. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 12:42:32,556][00209] Avg episode reward: [(0, '3.341')] [2024-01-05 12:42:32,856][02322] Decorrelating experience for 96 frames... [2024-01-05 12:42:33,127][02299] Signal inference workers to stop experience collection... [2024-01-05 12:42:33,145][02312] InferenceWorker_p0-w0: stopping experience collection [2024-01-05 12:42:33,141][02317] Decorrelating experience for 96 frames... [2024-01-05 12:42:33,254][00209] Heartbeat connected on RolloutWorker_w7 [2024-01-05 12:42:33,346][00209] Heartbeat connected on RolloutWorker_w3 [2024-01-05 12:42:33,374][02314] Decorrelating experience for 64 frames... [2024-01-05 12:42:33,750][02314] Decorrelating experience for 96 frames... [2024-01-05 12:42:33,823][00209] Heartbeat connected on RolloutWorker_w1 [2024-01-05 12:42:34,534][02299] Signal inference workers to resume experience collection... [2024-01-05 12:42:34,536][02312] InferenceWorker_p0-w0: resuming experience collection [2024-01-05 12:42:37,552][00209] Fps is (10 sec: 1639.0, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 233.1. Samples: 3496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-01-05 12:42:37,555][00209] Avg episode reward: [(0, '3.509')] [2024-01-05 12:42:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 467.1. Samples: 9342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:42:42,554][00209] Avg episode reward: [(0, '3.884')] [2024-01-05 12:42:44,197][02312] Updated weights for policy 0, policy_version 10 (0.0017) [2024-01-05 12:42:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 463.4. Samples: 11586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-01-05 12:42:47,557][00209] Avg episode reward: [(0, '4.250')] [2024-01-05 12:42:52,552][00209] Fps is (10 sec: 2867.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 523.3. Samples: 15700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:42:52,554][00209] Avg episode reward: [(0, '4.501')] [2024-01-05 12:42:56,917][02312] Updated weights for policy 0, policy_version 20 (0.0014) [2024-01-05 12:42:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 600.8. Samples: 21028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 12:42:57,554][00209] Avg episode reward: [(0, '4.432')] [2024-01-05 12:43:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 607.0. Samples: 24280. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:43:02,554][00209] Avg episode reward: [(0, '4.374')] [2024-01-05 12:43:02,577][02299] Saving new best policy, reward=4.374! [2024-01-05 12:43:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 656.1. Samples: 29524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:43:07,554][00209] Avg episode reward: [(0, '4.409')] [2024-01-05 12:43:07,557][02299] Saving new best policy, reward=4.409! [2024-01-05 12:43:08,502][02312] Updated weights for policy 0, policy_version 30 (0.0031) [2024-01-05 12:43:12,553][00209] Fps is (10 sec: 2866.7, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 731.4. Samples: 33594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:43:12,558][00209] Avg episode reward: [(0, '4.253')] [2024-01-05 12:43:17,555][00209] Fps is (10 sec: 3275.8, 60 sec: 2755.3, 300 sec: 2755.3). Total num frames: 151552. Throughput: 0: 771.5. Samples: 36306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:43:17,564][00209] Avg episode reward: [(0, '4.156')] [2024-01-05 12:43:19,670][02312] Updated weights for policy 0, policy_version 40 (0.0013) [2024-01-05 12:43:22,552][00209] Fps is (10 sec: 4506.3, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 875.1. Samples: 42874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:43:22,557][00209] Avg episode reward: [(0, '4.358')] [2024-01-05 12:43:27,552][00209] Fps is (10 sec: 3687.6, 60 sec: 3140.5, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 860.1. Samples: 48046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:43:27,555][00209] Avg episode reward: [(0, '4.369')] [2024-01-05 12:43:32,164][02312] Updated weights for policy 0, policy_version 50 (0.0026) [2024-01-05 12:43:32,552][00209] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 856.0. Samples: 50106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:43:32,557][00209] Avg episode reward: [(0, '4.388')] [2024-01-05 12:43:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 877.8. Samples: 55200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:43:37,559][00209] Avg episode reward: [(0, '4.313')] [2024-01-05 12:43:42,356][02312] Updated weights for policy 0, policy_version 60 (0.0014) [2024-01-05 12:43:42,552][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 905.7. Samples: 61784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:43:42,554][00209] Avg episode reward: [(0, '4.232')] [2024-01-05 12:43:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 894.6. Samples: 64536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:43:47,555][00209] Avg episode reward: [(0, '4.329')] [2024-01-05 12:43:52,553][00209] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3049.2). Total num frames: 274432. Throughput: 0: 871.4. Samples: 68736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:43:52,560][00209] Avg episode reward: [(0, '4.465')] [2024-01-05 12:43:52,570][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2024-01-05 12:43:52,772][02299] Saving new best policy, reward=4.465! [2024-01-05 12:43:55,346][02312] Updated weights for policy 0, policy_version 70 (0.0016) [2024-01-05 12:43:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3104.3). Total num frames: 294912. Throughput: 0: 899.3. Samples: 74062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:43:57,554][00209] Avg episode reward: [(0, '4.550')] [2024-01-05 12:43:57,560][02299] Saving new best policy, reward=4.550! [2024-01-05 12:44:02,552][00209] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3153.9). Total num frames: 315392. Throughput: 0: 909.0. Samples: 77210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:44:02,555][00209] Avg episode reward: [(0, '4.459')] [2024-01-05 12:44:05,727][02312] Updated weights for policy 0, policy_version 80 (0.0019) [2024-01-05 12:44:07,553][00209] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3159.7). Total num frames: 331776. Throughput: 0: 883.2. Samples: 82620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:44:07,556][00209] Avg episode reward: [(0, '4.396')] [2024-01-05 12:44:12,554][00209] Fps is (10 sec: 2866.7, 60 sec: 3549.8, 300 sec: 3127.8). Total num frames: 344064. Throughput: 0: 861.2. Samples: 86800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:12,559][00209] Avg episode reward: [(0, '4.381')] [2024-01-05 12:44:17,552][00209] Fps is (10 sec: 3277.3, 60 sec: 3550.1, 300 sec: 3170.0). Total num frames: 364544. Throughput: 0: 868.7. Samples: 89198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:44:17,557][00209] Avg episode reward: [(0, '4.423')] [2024-01-05 12:44:18,235][02312] Updated weights for policy 0, policy_version 90 (0.0021) [2024-01-05 12:44:22,552][00209] Fps is (10 sec: 4096.7, 60 sec: 3481.6, 300 sec: 3208.5). Total num frames: 385024. Throughput: 0: 900.4. Samples: 95720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:44:22,555][00209] Avg episode reward: [(0, '4.428')] [2024-01-05 12:44:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3211.3). Total num frames: 401408. Throughput: 0: 873.2. Samples: 101080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:44:27,557][00209] Avg episode reward: [(0, '4.677')] [2024-01-05 12:44:27,561][02299] Saving new best policy, reward=4.677! [2024-01-05 12:44:29,683][02312] Updated weights for policy 0, policy_version 100 (0.0013) [2024-01-05 12:44:32,552][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3182.3). Total num frames: 413696. Throughput: 0: 856.9. Samples: 103096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:32,556][00209] Avg episode reward: [(0, '4.803')] [2024-01-05 12:44:32,572][02299] Saving new best policy, reward=4.803! [2024-01-05 12:44:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 865.4. Samples: 107678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:37,554][00209] Avg episode reward: [(0, '4.762')] [2024-01-05 12:44:41,095][02312] Updated weights for policy 0, policy_version 110 (0.0028) [2024-01-05 12:44:42,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3247.5). Total num frames: 454656. Throughput: 0: 890.3. Samples: 114124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:42,554][00209] Avg episode reward: [(0, '4.584')] [2024-01-05 12:44:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3248.6). Total num frames: 471040. Throughput: 0: 888.0. Samples: 117168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:47,554][00209] Avg episode reward: [(0, '4.578')] [2024-01-05 12:44:52,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3222.2). Total num frames: 483328. Throughput: 0: 858.0. Samples: 121230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:44:52,561][00209] Avg episode reward: [(0, '4.548')] [2024-01-05 12:44:54,122][02312] Updated weights for policy 0, policy_version 120 (0.0031) [2024-01-05 12:44:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3250.4). Total num frames: 503808. Throughput: 0: 876.9. Samples: 126258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:44:57,554][00209] Avg episode reward: [(0, '4.611')] [2024-01-05 12:45:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 524288. Throughput: 0: 895.2. Samples: 129482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:45:02,558][00209] Avg episode reward: [(0, '4.682')] [2024-01-05 12:45:04,074][02312] Updated weights for policy 0, policy_version 130 (0.0020) [2024-01-05 12:45:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3276.8). Total num frames: 540672. Throughput: 0: 877.9. Samples: 135224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:45:07,557][00209] Avg episode reward: [(0, '4.574')] [2024-01-05 12:45:12,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3252.7). Total num frames: 552960. Throughput: 0: 850.4. Samples: 139346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:45:12,554][00209] Avg episode reward: [(0, '4.606')] [2024-01-05 12:45:17,350][02312] Updated weights for policy 0, policy_version 140 (0.0015) [2024-01-05 12:45:17,555][00209] Fps is (10 sec: 3275.9, 60 sec: 3481.4, 300 sec: 3276.7). Total num frames: 573440. Throughput: 0: 852.2. Samples: 141446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:45:17,561][00209] Avg episode reward: [(0, '4.524')] [2024-01-05 12:45:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3299.6). Total num frames: 593920. Throughput: 0: 893.3. Samples: 147876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:45:22,561][00209] Avg episode reward: [(0, '4.383')] [2024-01-05 12:45:27,552][00209] Fps is (10 sec: 3687.4, 60 sec: 3481.6, 300 sec: 3298.9). Total num frames: 610304. Throughput: 0: 879.4. Samples: 153698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:45:27,554][00209] Avg episode reward: [(0, '4.429')] [2024-01-05 12:45:27,595][02312] Updated weights for policy 0, policy_version 150 (0.0023) [2024-01-05 12:45:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3298.4). Total num frames: 626688. Throughput: 0: 858.0. Samples: 155780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:45:32,559][00209] Avg episode reward: [(0, '4.360')] [2024-01-05 12:45:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3297.8). Total num frames: 643072. Throughput: 0: 863.7. Samples: 160098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:45:37,554][00209] Avg episode reward: [(0, '4.765')] [2024-01-05 12:45:40,022][02312] Updated weights for policy 0, policy_version 160 (0.0014) [2024-01-05 12:45:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3317.8). Total num frames: 663552. Throughput: 0: 895.4. Samples: 166550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:45:42,554][00209] Avg episode reward: [(0, '4.640')] [2024-01-05 12:45:47,557][00209] Fps is (10 sec: 4094.1, 60 sec: 3549.6, 300 sec: 3336.7). Total num frames: 684032. Throughput: 0: 897.5. Samples: 169874. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 12:45:47,559][00209] Avg episode reward: [(0, '4.404')] [2024-01-05 12:45:51,655][02312] Updated weights for policy 0, policy_version 170 (0.0029) [2024-01-05 12:45:52,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3315.8). Total num frames: 696320. Throughput: 0: 864.3. Samples: 174116. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 12:45:52,559][00209] Avg episode reward: [(0, '4.535')] [2024-01-05 12:45:52,570][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth... [2024-01-05 12:45:57,552][00209] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 3314.9). Total num frames: 712704. Throughput: 0: 875.3. Samples: 178736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:45:57,554][00209] Avg episode reward: [(0, '4.412')] [2024-01-05 12:46:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3332.7). Total num frames: 733184. Throughput: 0: 900.0. Samples: 181942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 12:46:02,554][00209] Avg episode reward: [(0, '4.559')] [2024-01-05 12:46:02,570][02312] Updated weights for policy 0, policy_version 180 (0.0036) [2024-01-05 12:46:07,556][00209] Fps is (10 sec: 4094.1, 60 sec: 3549.6, 300 sec: 3349.6). Total num frames: 753664. Throughput: 0: 897.7. Samples: 188278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:46:07,559][00209] Avg episode reward: [(0, '4.753')] [2024-01-05 12:46:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3330.2). Total num frames: 765952. Throughput: 0: 859.5. Samples: 192376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:46:12,558][00209] Avg episode reward: [(0, '4.744')] [2024-01-05 12:46:15,711][02312] Updated weights for policy 0, policy_version 190 (0.0028) [2024-01-05 12:46:17,552][00209] Fps is (10 sec: 2868.5, 60 sec: 3481.8, 300 sec: 3329.1). Total num frames: 782336. Throughput: 0: 858.8. Samples: 194428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:46:17,560][00209] Avg episode reward: [(0, '4.720')] [2024-01-05 12:46:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3362.1). Total num frames: 806912. Throughput: 0: 895.4. Samples: 200392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:46:22,554][00209] Avg episode reward: [(0, '4.819')] [2024-01-05 12:46:22,565][02299] Saving new best policy, reward=4.819! [2024-01-05 12:46:25,413][02312] Updated weights for policy 0, policy_version 200 (0.0016) [2024-01-05 12:46:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3360.4). Total num frames: 823296. Throughput: 0: 888.4. Samples: 206530. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 12:46:27,560][00209] Avg episode reward: [(0, '4.749')] [2024-01-05 12:46:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3358.7). Total num frames: 839680. Throughput: 0: 860.4. Samples: 208590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:46:32,554][00209] Avg episode reward: [(0, '4.599')] [2024-01-05 12:46:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3341.1). Total num frames: 851968. Throughput: 0: 858.3. Samples: 212738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:46:37,560][00209] Avg episode reward: [(0, '4.579')] [2024-01-05 12:46:38,943][02312] Updated weights for policy 0, policy_version 210 (0.0025) [2024-01-05 12:46:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3355.6). Total num frames: 872448. Throughput: 0: 891.2. Samples: 218838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:46:42,558][00209] Avg episode reward: [(0, '4.491')] [2024-01-05 12:46:47,553][00209] Fps is (10 sec: 4095.6, 60 sec: 3481.8, 300 sec: 3369.5). Total num frames: 892928. Throughput: 0: 892.2. Samples: 222092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:46:47,559][00209] Avg episode reward: [(0, '4.432')] [2024-01-05 12:46:49,528][02312] Updated weights for policy 0, policy_version 220 (0.0027) [2024-01-05 12:46:52,556][00209] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3367.8). Total num frames: 909312. Throughput: 0: 858.2. Samples: 226896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:46:52,563][00209] Avg episode reward: [(0, '4.545')] [2024-01-05 12:46:57,552][00209] Fps is (10 sec: 2867.5, 60 sec: 3481.6, 300 sec: 3351.3). Total num frames: 921600. Throughput: 0: 859.2. Samples: 231042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 12:46:57,554][00209] Avg episode reward: [(0, '4.546')] [2024-01-05 12:47:01,874][02312] Updated weights for policy 0, policy_version 230 (0.0026) [2024-01-05 12:47:02,552][00209] Fps is (10 sec: 3278.2, 60 sec: 3481.6, 300 sec: 3364.6). Total num frames: 942080. Throughput: 0: 884.0. Samples: 234208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:47:02,554][00209] Avg episode reward: [(0, '4.741')] [2024-01-05 12:47:07,552][00209] Fps is (10 sec: 4095.8, 60 sec: 3481.8, 300 sec: 3377.4). Total num frames: 962560. Throughput: 0: 892.7. Samples: 240566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:47:07,556][00209] Avg episode reward: [(0, '4.850')] [2024-01-05 12:47:07,558][02299] Saving new best policy, reward=4.850! [2024-01-05 12:47:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3375.7). Total num frames: 978944. Throughput: 0: 855.3. Samples: 245020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:47:12,554][00209] Avg episode reward: [(0, '4.652')] [2024-01-05 12:47:13,799][02312] Updated weights for policy 0, policy_version 240 (0.0030) [2024-01-05 12:47:17,554][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 855.3. Samples: 247080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:47:17,561][00209] Avg episode reward: [(0, '4.691')] [2024-01-05 12:47:22,552][00209] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 1011712. Throughput: 0: 881.9. Samples: 252424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:47:22,555][00209] Avg episode reward: [(0, '4.708')] [2024-01-05 12:47:24,839][02312] Updated weights for policy 0, policy_version 250 (0.0013) [2024-01-05 12:47:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1032192. Throughput: 0: 889.6. Samples: 258872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:47:27,555][00209] Avg episode reward: [(0, '4.771')] [2024-01-05 12:47:32,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1048576. Throughput: 0: 870.5. Samples: 261262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:47:32,557][00209] Avg episode reward: [(0, '4.810')] [2024-01-05 12:47:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1060864. Throughput: 0: 856.8. Samples: 265448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:47:37,559][00209] Avg episode reward: [(0, '4.735')] [2024-01-05 12:47:37,903][02312] Updated weights for policy 0, policy_version 260 (0.0019) [2024-01-05 12:47:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1081344. Throughput: 0: 889.6. Samples: 271076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:47:42,554][00209] Avg episode reward: [(0, '5.062')] [2024-01-05 12:47:42,622][02299] Saving new best policy, reward=5.062! [2024-01-05 12:47:47,368][02312] Updated weights for policy 0, policy_version 270 (0.0017) [2024-01-05 12:47:47,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1105920. Throughput: 0: 892.5. Samples: 274370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:47:47,560][00209] Avg episode reward: [(0, '5.255')] [2024-01-05 12:47:47,563][02299] Saving new best policy, reward=5.255! [2024-01-05 12:47:52,556][00209] Fps is (10 sec: 3685.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1118208. Throughput: 0: 873.6. Samples: 279882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:47:52,560][00209] Avg episode reward: [(0, '5.249')] [2024-01-05 12:47:52,578][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth... [2024-01-05 12:47:52,751][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2024-01-05 12:47:57,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1134592. Throughput: 0: 867.4. Samples: 284054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:47:57,559][00209] Avg episode reward: [(0, '5.048')] [2024-01-05 12:48:00,378][02312] Updated weights for policy 0, policy_version 280 (0.0029) [2024-01-05 12:48:02,552][00209] Fps is (10 sec: 3687.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1155072. Throughput: 0: 880.4. Samples: 286700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:48:02,554][00209] Avg episode reward: [(0, '5.291')] [2024-01-05 12:48:02,566][02299] Saving new best policy, reward=5.291! [2024-01-05 12:48:07,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1175552. Throughput: 0: 903.4. Samples: 293076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:48:07,558][00209] Avg episode reward: [(0, '5.651')] [2024-01-05 12:48:07,561][02299] Saving new best policy, reward=5.651! [2024-01-05 12:48:10,890][02312] Updated weights for policy 0, policy_version 290 (0.0013) [2024-01-05 12:48:12,554][00209] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 1191936. Throughput: 0: 872.4. Samples: 298132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:48:12,557][00209] Avg episode reward: [(0, '5.420')] [2024-01-05 12:48:17,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1204224. Throughput: 0: 866.2. Samples: 300242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:48:17,560][00209] Avg episode reward: [(0, '5.335')] [2024-01-05 12:48:22,554][00209] Fps is (10 sec: 3277.1, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 1224704. Throughput: 0: 885.4. Samples: 305292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:48:22,560][00209] Avg episode reward: [(0, '5.596')] [2024-01-05 12:48:23,217][02312] Updated weights for policy 0, policy_version 300 (0.0017) [2024-01-05 12:48:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1245184. Throughput: 0: 908.5. Samples: 311960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:48:27,554][00209] Avg episode reward: [(0, '6.022')] [2024-01-05 12:48:27,558][02299] Saving new best policy, reward=6.022! [2024-01-05 12:48:32,554][00209] Fps is (10 sec: 3686.2, 60 sec: 3549.7, 300 sec: 3512.8). Total num frames: 1261568. Throughput: 0: 898.2. Samples: 314792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:48:32,557][00209] Avg episode reward: [(0, '6.042')] [2024-01-05 12:48:32,572][02299] Saving new best policy, reward=6.042! [2024-01-05 12:48:34,519][02312] Updated weights for policy 0, policy_version 310 (0.0029) [2024-01-05 12:48:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1273856. Throughput: 0: 867.9. Samples: 318932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:48:37,554][00209] Avg episode reward: [(0, '6.071')] [2024-01-05 12:48:37,571][02299] Saving new best policy, reward=6.071! [2024-01-05 12:48:42,552][00209] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1294336. Throughput: 0: 890.2. Samples: 324114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:48:42,555][00209] Avg episode reward: [(0, '6.512')] [2024-01-05 12:48:42,564][02299] Saving new best policy, reward=6.512! [2024-01-05 12:48:45,687][02312] Updated weights for policy 0, policy_version 320 (0.0024) [2024-01-05 12:48:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1314816. Throughput: 0: 904.2. Samples: 327388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:48:47,557][00209] Avg episode reward: [(0, '7.038')] [2024-01-05 12:48:47,616][02299] Saving new best policy, reward=7.038! [2024-01-05 12:48:52,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 1331200. Throughput: 0: 894.3. Samples: 333320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:48:52,554][00209] Avg episode reward: [(0, '7.457')] [2024-01-05 12:48:52,562][02299] Saving new best policy, reward=7.457! [2024-01-05 12:48:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1347584. Throughput: 0: 873.1. Samples: 337418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:48:57,559][00209] Avg episode reward: [(0, '7.158')] [2024-01-05 12:48:58,720][02312] Updated weights for policy 0, policy_version 330 (0.0029) [2024-01-05 12:49:02,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1363968. Throughput: 0: 875.3. Samples: 339632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:02,560][00209] Avg episode reward: [(0, '7.331')] [2024-01-05 12:49:07,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1388544. Throughput: 0: 905.9. Samples: 346054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:07,554][00209] Avg episode reward: [(0, '6.805')] [2024-01-05 12:49:08,433][02312] Updated weights for policy 0, policy_version 340 (0.0016) [2024-01-05 12:49:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 1404928. Throughput: 0: 882.0. Samples: 351652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:49:12,558][00209] Avg episode reward: [(0, '6.611')] [2024-01-05 12:49:17,553][00209] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 1417216. Throughput: 0: 864.0. Samples: 353672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:49:17,557][00209] Avg episode reward: [(0, '6.423')] [2024-01-05 12:49:21,632][02312] Updated weights for policy 0, policy_version 350 (0.0041) [2024-01-05 12:49:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 1437696. Throughput: 0: 875.0. Samples: 358306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:22,560][00209] Avg episode reward: [(0, '6.883')] [2024-01-05 12:49:27,552][00209] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1458176. Throughput: 0: 907.8. Samples: 364964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:49:27,559][00209] Avg episode reward: [(0, '6.820')] [2024-01-05 12:49:31,525][02312] Updated weights for policy 0, policy_version 360 (0.0042) [2024-01-05 12:49:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 1474560. Throughput: 0: 908.0. Samples: 368250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:32,554][00209] Avg episode reward: [(0, '6.970')] [2024-01-05 12:49:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1490944. Throughput: 0: 870.0. Samples: 372470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:49:37,554][00209] Avg episode reward: [(0, '7.341')] [2024-01-05 12:49:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1507328. Throughput: 0: 887.6. Samples: 377358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:42,554][00209] Avg episode reward: [(0, '8.245')] [2024-01-05 12:49:42,571][02299] Saving new best policy, reward=8.245! [2024-01-05 12:49:43,985][02312] Updated weights for policy 0, policy_version 370 (0.0020) [2024-01-05 12:49:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1527808. Throughput: 0: 907.9. Samples: 380488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:49:47,558][00209] Avg episode reward: [(0, '8.835')] [2024-01-05 12:49:47,564][02299] Saving new best policy, reward=8.835! [2024-01-05 12:49:52,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1548288. Throughput: 0: 904.5. Samples: 386756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:49:52,556][00209] Avg episode reward: [(0, '8.767')] [2024-01-05 12:49:52,576][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000378_1548288.pth... [2024-01-05 12:49:52,729][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth [2024-01-05 12:49:55,321][02312] Updated weights for policy 0, policy_version 380 (0.0018) [2024-01-05 12:49:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1560576. Throughput: 0: 873.2. Samples: 390946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:49:57,557][00209] Avg episode reward: [(0, '9.184')] [2024-01-05 12:49:57,559][02299] Saving new best policy, reward=9.184! [2024-01-05 12:50:02,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1576960. Throughput: 0: 872.1. Samples: 392914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:50:02,554][00209] Avg episode reward: [(0, '9.319')] [2024-01-05 12:50:02,571][02299] Saving new best policy, reward=9.319! [2024-01-05 12:50:06,925][02312] Updated weights for policy 0, policy_version 390 (0.0019) [2024-01-05 12:50:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1597440. Throughput: 0: 902.3. Samples: 398910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:50:07,554][00209] Avg episode reward: [(0, '9.845')] [2024-01-05 12:50:07,557][02299] Saving new best policy, reward=9.845! [2024-01-05 12:50:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1617920. Throughput: 0: 887.9. Samples: 404920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:50:12,555][00209] Avg episode reward: [(0, '10.227')] [2024-01-05 12:50:12,565][02299] Saving new best policy, reward=10.227! [2024-01-05 12:50:17,560][00209] Fps is (10 sec: 3274.3, 60 sec: 3549.5, 300 sec: 3512.7). Total num frames: 1630208. Throughput: 0: 859.0. Samples: 406912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:50:17,567][00209] Avg episode reward: [(0, '9.861')] [2024-01-05 12:50:19,697][02312] Updated weights for policy 0, policy_version 400 (0.0013) [2024-01-05 12:50:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1646592. Throughput: 0: 857.9. Samples: 411076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:50:22,557][00209] Avg episode reward: [(0, '9.335')] [2024-01-05 12:50:27,552][00209] Fps is (10 sec: 3689.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1667072. Throughput: 0: 890.9. Samples: 417450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:50:27,559][00209] Avg episode reward: [(0, '8.744')] [2024-01-05 12:50:29,647][02312] Updated weights for policy 0, policy_version 410 (0.0026) [2024-01-05 12:50:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1687552. Throughput: 0: 896.1. Samples: 420812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:50:32,556][00209] Avg episode reward: [(0, '8.956')] [2024-01-05 12:50:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1703936. Throughput: 0: 861.4. Samples: 425520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:50:37,557][00209] Avg episode reward: [(0, '9.717')] [2024-01-05 12:50:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1716224. Throughput: 0: 865.2. Samples: 429878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:50:42,560][00209] Avg episode reward: [(0, '10.251')] [2024-01-05 12:50:42,571][02299] Saving new best policy, reward=10.251! [2024-01-05 12:50:42,892][02312] Updated weights for policy 0, policy_version 420 (0.0021) [2024-01-05 12:50:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1740800. Throughput: 0: 892.1. Samples: 433060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:50:47,558][00209] Avg episode reward: [(0, '10.287')] [2024-01-05 12:50:47,563][02299] Saving new best policy, reward=10.287! [2024-01-05 12:50:52,251][02312] Updated weights for policy 0, policy_version 430 (0.0028) [2024-01-05 12:50:52,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1761280. Throughput: 0: 905.0. Samples: 439636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:50:52,561][00209] Avg episode reward: [(0, '10.086')] [2024-01-05 12:50:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1773568. Throughput: 0: 869.6. Samples: 444054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:50:57,555][00209] Avg episode reward: [(0, '10.773')] [2024-01-05 12:50:57,558][02299] Saving new best policy, reward=10.773! [2024-01-05 12:51:02,552][00209] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1785856. Throughput: 0: 871.5. Samples: 446122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:51:02,561][00209] Avg episode reward: [(0, '10.206')] [2024-01-05 12:51:05,525][02312] Updated weights for policy 0, policy_version 440 (0.0023) [2024-01-05 12:51:07,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1810432. Throughput: 0: 902.1. Samples: 451670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:51:07,554][00209] Avg episode reward: [(0, '10.449')] [2024-01-05 12:51:12,554][00209] Fps is (10 sec: 4504.4, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 1830912. Throughput: 0: 906.7. Samples: 458256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:51:12,557][00209] Avg episode reward: [(0, '11.309')] [2024-01-05 12:51:12,569][02299] Saving new best policy, reward=11.309! [2024-01-05 12:51:16,443][02312] Updated weights for policy 0, policy_version 450 (0.0041) [2024-01-05 12:51:17,552][00209] Fps is (10 sec: 3276.9, 60 sec: 3550.3, 300 sec: 3512.8). Total num frames: 1843200. Throughput: 0: 877.6. Samples: 460302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:51:17,560][00209] Avg episode reward: [(0, '11.712')] [2024-01-05 12:51:17,562][02299] Saving new best policy, reward=11.712! [2024-01-05 12:51:22,552][00209] Fps is (10 sec: 2867.9, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1859584. Throughput: 0: 865.1. Samples: 464448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:51:22,559][00209] Avg episode reward: [(0, '11.904')] [2024-01-05 12:51:22,574][02299] Saving new best policy, reward=11.904! [2024-01-05 12:51:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1880064. Throughput: 0: 896.6. Samples: 470226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:51:27,555][00209] Avg episode reward: [(0, '12.790')] [2024-01-05 12:51:27,562][02299] Saving new best policy, reward=12.790! [2024-01-05 12:51:28,284][02312] Updated weights for policy 0, policy_version 460 (0.0020) [2024-01-05 12:51:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1900544. Throughput: 0: 897.5. Samples: 473448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:51:32,554][00209] Avg episode reward: [(0, '12.236')] [2024-01-05 12:51:37,554][00209] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 1916928. Throughput: 0: 869.7. Samples: 478772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:51:37,560][00209] Avg episode reward: [(0, '12.444')] [2024-01-05 12:51:40,181][02312] Updated weights for policy 0, policy_version 470 (0.0020) [2024-01-05 12:51:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 1929216. Throughput: 0: 863.9. Samples: 482928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:51:42,555][00209] Avg episode reward: [(0, '12.731')] [2024-01-05 12:51:47,554][00209] Fps is (10 sec: 3276.6, 60 sec: 3481.4, 300 sec: 3526.7). Total num frames: 1949696. Throughput: 0: 881.1. Samples: 485772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:51:47,561][00209] Avg episode reward: [(0, '12.546')] [2024-01-05 12:51:50,779][02312] Updated weights for policy 0, policy_version 480 (0.0022) [2024-01-05 12:51:52,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 1974272. Throughput: 0: 903.7. Samples: 492338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:51:52,560][00209] Avg episode reward: [(0, '12.782')] [2024-01-05 12:51:52,573][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000482_1974272.pth... [2024-01-05 12:51:52,703][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth [2024-01-05 12:51:57,555][00209] Fps is (10 sec: 3686.0, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 1986560. Throughput: 0: 869.2. Samples: 497370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:51:57,565][00209] Avg episode reward: [(0, '12.369')] [2024-01-05 12:52:02,553][00209] Fps is (10 sec: 2457.2, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 1998848. Throughput: 0: 869.7. Samples: 499442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:52:02,561][00209] Avg episode reward: [(0, '13.784')] [2024-01-05 12:52:02,576][02299] Saving new best policy, reward=13.784! [2024-01-05 12:52:04,056][02312] Updated weights for policy 0, policy_version 490 (0.0018) [2024-01-05 12:52:07,552][00209] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2019328. Throughput: 0: 886.7. Samples: 504348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:52:07,560][00209] Avg episode reward: [(0, '13.363')] [2024-01-05 12:52:12,552][00209] Fps is (10 sec: 4096.7, 60 sec: 3481.7, 300 sec: 3554.5). Total num frames: 2039808. Throughput: 0: 903.0. Samples: 510860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:52:12,557][00209] Avg episode reward: [(0, '14.139')] [2024-01-05 12:52:12,603][02299] Saving new best policy, reward=14.139! [2024-01-05 12:52:13,657][02312] Updated weights for policy 0, policy_version 500 (0.0024) [2024-01-05 12:52:17,560][00209] Fps is (10 sec: 3683.6, 60 sec: 3549.4, 300 sec: 3540.5). Total num frames: 2056192. Throughput: 0: 891.4. Samples: 513566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:52:17,566][00209] Avg episode reward: [(0, '14.182')] [2024-01-05 12:52:17,569][02299] Saving new best policy, reward=14.182! [2024-01-05 12:52:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2068480. Throughput: 0: 863.3. Samples: 517620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 12:52:22,556][00209] Avg episode reward: [(0, '13.768')] [2024-01-05 12:52:26,856][02312] Updated weights for policy 0, policy_version 510 (0.0031) [2024-01-05 12:52:27,552][00209] Fps is (10 sec: 3279.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2088960. Throughput: 0: 888.7. Samples: 522920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:52:27,559][00209] Avg episode reward: [(0, '13.735')] [2024-01-05 12:52:32,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2113536. Throughput: 0: 899.8. Samples: 526262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:52:32,558][00209] Avg episode reward: [(0, '13.619')] [2024-01-05 12:52:37,324][02312] Updated weights for policy 0, policy_version 520 (0.0025) [2024-01-05 12:52:37,552][00209] Fps is (10 sec: 4095.9, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 2129920. Throughput: 0: 884.0. Samples: 532118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:52:37,560][00209] Avg episode reward: [(0, '14.789')] [2024-01-05 12:52:37,563][02299] Saving new best policy, reward=14.789! [2024-01-05 12:52:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2142208. Throughput: 0: 862.9. Samples: 536196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:52:42,554][00209] Avg episode reward: [(0, '15.742')] [2024-01-05 12:52:42,572][02299] Saving new best policy, reward=15.742! [2024-01-05 12:52:47,552][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.8, 300 sec: 3526.8). Total num frames: 2158592. Throughput: 0: 862.7. Samples: 538264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:52:47,556][00209] Avg episode reward: [(0, '15.491')] [2024-01-05 12:52:49,430][02312] Updated weights for policy 0, policy_version 530 (0.0012) [2024-01-05 12:52:52,552][00209] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2183168. Throughput: 0: 901.5. Samples: 544914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:52:52,559][00209] Avg episode reward: [(0, '16.978')] [2024-01-05 12:52:52,575][02299] Saving new best policy, reward=16.978! [2024-01-05 12:52:57,554][00209] Fps is (10 sec: 4095.2, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 2199552. Throughput: 0: 879.9. Samples: 550458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:52:57,561][00209] Avg episode reward: [(0, '17.538')] [2024-01-05 12:52:57,562][02299] Saving new best policy, reward=17.538! [2024-01-05 12:53:01,469][02312] Updated weights for policy 0, policy_version 540 (0.0015) [2024-01-05 12:53:02,552][00209] Fps is (10 sec: 2867.3, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 2211840. Throughput: 0: 864.2. Samples: 552450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:53:02,554][00209] Avg episode reward: [(0, '16.989')] [2024-01-05 12:53:07,552][00209] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 2228224. Throughput: 0: 871.3. Samples: 556828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:53:07,554][00209] Avg episode reward: [(0, '15.753')] [2024-01-05 12:53:12,121][02312] Updated weights for policy 0, policy_version 550 (0.0020) [2024-01-05 12:53:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2252800. Throughput: 0: 901.7. Samples: 563498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:53:12,558][00209] Avg episode reward: [(0, '16.705')] [2024-01-05 12:53:17,552][00209] Fps is (10 sec: 4095.9, 60 sec: 3550.3, 300 sec: 3540.6). Total num frames: 2269184. Throughput: 0: 901.7. Samples: 566838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:53:17,560][00209] Avg episode reward: [(0, '16.232')] [2024-01-05 12:53:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2285568. Throughput: 0: 865.1. Samples: 571048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:53:22,554][00209] Avg episode reward: [(0, '14.448')] [2024-01-05 12:53:25,255][02312] Updated weights for policy 0, policy_version 560 (0.0053) [2024-01-05 12:53:27,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2301952. Throughput: 0: 881.7. Samples: 575874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:53:27,559][00209] Avg episode reward: [(0, '15.314')] [2024-01-05 12:53:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2322432. Throughput: 0: 908.5. Samples: 579146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:53:32,559][00209] Avg episode reward: [(0, '16.130')] [2024-01-05 12:53:34,633][02312] Updated weights for policy 0, policy_version 570 (0.0022) [2024-01-05 12:53:37,552][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2342912. Throughput: 0: 904.3. Samples: 585606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:53:37,554][00209] Avg episode reward: [(0, '15.751')] [2024-01-05 12:53:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2355200. Throughput: 0: 874.6. Samples: 589812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:53:42,557][00209] Avg episode reward: [(0, '15.364')] [2024-01-05 12:53:47,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2371584. Throughput: 0: 875.6. Samples: 591852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:53:47,554][00209] Avg episode reward: [(0, '16.452')] [2024-01-05 12:53:47,619][02312] Updated weights for policy 0, policy_version 580 (0.0015) [2024-01-05 12:53:52,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2396160. Throughput: 0: 917.5. Samples: 598116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:53:52,558][00209] Avg episode reward: [(0, '17.508')] [2024-01-05 12:53:52,570][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000585_2396160.pth... [2024-01-05 12:53:52,699][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000378_1548288.pth [2024-01-05 12:53:57,504][02312] Updated weights for policy 0, policy_version 590 (0.0017) [2024-01-05 12:53:57,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 2416640. Throughput: 0: 909.5. Samples: 604426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:53:57,555][00209] Avg episode reward: [(0, '17.409')] [2024-01-05 12:54:02,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2428928. Throughput: 0: 882.1. Samples: 606534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:02,559][00209] Avg episode reward: [(0, '17.727')] [2024-01-05 12:54:02,570][02299] Saving new best policy, reward=17.727! [2024-01-05 12:54:07,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2445312. Throughput: 0: 880.8. Samples: 610684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:54:07,555][00209] Avg episode reward: [(0, '17.674')] [2024-01-05 12:54:10,113][02312] Updated weights for policy 0, policy_version 600 (0.0035) [2024-01-05 12:54:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2465792. Throughput: 0: 916.2. Samples: 617102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:12,555][00209] Avg episode reward: [(0, '17.017')] [2024-01-05 12:54:17,553][00209] Fps is (10 sec: 4095.3, 60 sec: 3618.0, 300 sec: 3554.5). Total num frames: 2486272. Throughput: 0: 913.9. Samples: 620274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:54:17,558][00209] Avg episode reward: [(0, '17.076')] [2024-01-05 12:54:20,995][02312] Updated weights for policy 0, policy_version 610 (0.0017) [2024-01-05 12:54:22,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2502656. Throughput: 0: 879.3. Samples: 625176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:54:22,554][00209] Avg episode reward: [(0, '16.082')] [2024-01-05 12:54:27,552][00209] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2514944. Throughput: 0: 879.3. Samples: 629382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:27,559][00209] Avg episode reward: [(0, '15.640')] [2024-01-05 12:54:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2539520. Throughput: 0: 908.0. Samples: 632712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:54:32,554][00209] Avg episode reward: [(0, '16.795')] [2024-01-05 12:54:32,563][02312] Updated weights for policy 0, policy_version 620 (0.0022) [2024-01-05 12:54:37,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2560000. Throughput: 0: 917.0. Samples: 639382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:54:37,554][00209] Avg episode reward: [(0, '16.851')] [2024-01-05 12:54:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2572288. Throughput: 0: 878.2. Samples: 643944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:42,556][00209] Avg episode reward: [(0, '16.554')] [2024-01-05 12:54:44,460][02312] Updated weights for policy 0, policy_version 630 (0.0018) [2024-01-05 12:54:47,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2588672. Throughput: 0: 877.9. Samples: 646038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:54:47,556][00209] Avg episode reward: [(0, '16.179')] [2024-01-05 12:54:52,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2609152. Throughput: 0: 910.9. Samples: 651676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:52,555][00209] Avg episode reward: [(0, '15.454')] [2024-01-05 12:54:54,879][02312] Updated weights for policy 0, policy_version 640 (0.0016) [2024-01-05 12:54:57,553][00209] Fps is (10 sec: 4095.6, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 2629632. Throughput: 0: 917.3. Samples: 658382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:54:57,558][00209] Avg episode reward: [(0, '14.719')] [2024-01-05 12:55:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2646016. Throughput: 0: 899.2. Samples: 660738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:55:02,558][00209] Avg episode reward: [(0, '14.901')] [2024-01-05 12:55:07,552][00209] Fps is (10 sec: 2867.5, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2658304. Throughput: 0: 882.9. Samples: 664906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:55:07,554][00209] Avg episode reward: [(0, '16.470')] [2024-01-05 12:55:07,729][02312] Updated weights for policy 0, policy_version 650 (0.0026) [2024-01-05 12:55:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.5). Total num frames: 2682880. Throughput: 0: 918.4. Samples: 670710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:55:12,554][00209] Avg episode reward: [(0, '18.084')] [2024-01-05 12:55:12,567][02299] Saving new best policy, reward=18.084! [2024-01-05 12:55:17,255][02312] Updated weights for policy 0, policy_version 660 (0.0015) [2024-01-05 12:55:17,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 2703360. Throughput: 0: 914.8. Samples: 673880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:55:17,555][00209] Avg episode reward: [(0, '18.243')] [2024-01-05 12:55:17,558][02299] Saving new best policy, reward=18.243! [2024-01-05 12:55:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2715648. Throughput: 0: 884.3. Samples: 679174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:55:22,554][00209] Avg episode reward: [(0, '19.213')] [2024-01-05 12:55:22,561][02299] Saving new best policy, reward=19.213! [2024-01-05 12:55:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2732032. Throughput: 0: 874.4. Samples: 683294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:55:27,555][00209] Avg episode reward: [(0, '19.397')] [2024-01-05 12:55:27,558][02299] Saving new best policy, reward=19.397! [2024-01-05 12:55:30,581][02312] Updated weights for policy 0, policy_version 670 (0.0014) [2024-01-05 12:55:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2752512. Throughput: 0: 887.3. Samples: 685968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:55:32,558][00209] Avg episode reward: [(0, '17.714')] [2024-01-05 12:55:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2772992. Throughput: 0: 910.0. Samples: 692624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:55:37,554][00209] Avg episode reward: [(0, '19.332')] [2024-01-05 12:55:40,629][02312] Updated weights for policy 0, policy_version 680 (0.0013) [2024-01-05 12:55:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2789376. Throughput: 0: 877.6. Samples: 697874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:55:42,556][00209] Avg episode reward: [(0, '20.254')] [2024-01-05 12:55:42,568][02299] Saving new best policy, reward=20.254! [2024-01-05 12:55:47,553][00209] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 2801664. Throughput: 0: 869.4. Samples: 699860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:55:47,555][00209] Avg episode reward: [(0, '19.355')] [2024-01-05 12:55:52,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2822144. Throughput: 0: 890.5. Samples: 704980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:55:52,561][00209] Avg episode reward: [(0, '19.853')] [2024-01-05 12:55:52,572][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000689_2822144.pth... [2024-01-05 12:55:52,699][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000482_1974272.pth [2024-01-05 12:55:53,072][02312] Updated weights for policy 0, policy_version 690 (0.0012) [2024-01-05 12:55:57,552][00209] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2842624. Throughput: 0: 906.8. Samples: 711518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:55:57,555][00209] Avg episode reward: [(0, '19.898')] [2024-01-05 12:56:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2859008. Throughput: 0: 899.7. Samples: 714368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:56:02,558][00209] Avg episode reward: [(0, '18.175')] [2024-01-05 12:56:04,455][02312] Updated weights for policy 0, policy_version 700 (0.0029) [2024-01-05 12:56:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2875392. Throughput: 0: 875.8. Samples: 718586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:56:07,554][00209] Avg episode reward: [(0, '18.681')] [2024-01-05 12:56:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 2891776. Throughput: 0: 901.0. Samples: 723840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:56:12,559][00209] Avg episode reward: [(0, '19.061')] [2024-01-05 12:56:15,566][02312] Updated weights for policy 0, policy_version 710 (0.0024) [2024-01-05 12:56:17,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2916352. Throughput: 0: 911.3. Samples: 726976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:56:17,556][00209] Avg episode reward: [(0, '19.552')] [2024-01-05 12:56:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2932736. Throughput: 0: 894.4. Samples: 732872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:56:22,554][00209] Avg episode reward: [(0, '20.411')] [2024-01-05 12:56:22,564][02299] Saving new best policy, reward=20.411! [2024-01-05 12:56:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2945024. Throughput: 0: 868.9. Samples: 736976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:56:27,561][00209] Avg episode reward: [(0, '20.707')] [2024-01-05 12:56:27,563][02299] Saving new best policy, reward=20.707! [2024-01-05 12:56:28,243][02312] Updated weights for policy 0, policy_version 720 (0.0014) [2024-01-05 12:56:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2965504. Throughput: 0: 874.6. Samples: 739216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:56:32,560][00209] Avg episode reward: [(0, '23.157')] [2024-01-05 12:56:32,573][02299] Saving new best policy, reward=23.157! [2024-01-05 12:56:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2985984. Throughput: 0: 907.6. Samples: 745824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:56:37,560][00209] Avg episode reward: [(0, '21.831')] [2024-01-05 12:56:38,064][02312] Updated weights for policy 0, policy_version 730 (0.0019) [2024-01-05 12:56:42,554][00209] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3568.4). Total num frames: 3002368. Throughput: 0: 887.7. Samples: 751466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:56:42,557][00209] Avg episode reward: [(0, '21.659')] [2024-01-05 12:56:47,554][00209] Fps is (10 sec: 2866.7, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 3014656. Throughput: 0: 870.5. Samples: 753544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:56:47,558][00209] Avg episode reward: [(0, '21.043')] [2024-01-05 12:56:51,230][02312] Updated weights for policy 0, policy_version 740 (0.0012) [2024-01-05 12:56:52,552][00209] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3035136. Throughput: 0: 879.7. Samples: 758174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:56:52,554][00209] Avg episode reward: [(0, '21.002')] [2024-01-05 12:56:57,552][00209] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3055616. Throughput: 0: 909.3. Samples: 764758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:56:57,554][00209] Avg episode reward: [(0, '19.934')] [2024-01-05 12:57:00,702][02312] Updated weights for policy 0, policy_version 750 (0.0013) [2024-01-05 12:57:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3076096. Throughput: 0: 914.8. Samples: 768142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:57:02,555][00209] Avg episode reward: [(0, '20.911')] [2024-01-05 12:57:07,554][00209] Fps is (10 sec: 3276.0, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 3088384. Throughput: 0: 874.6. Samples: 772230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:57:07,557][00209] Avg episode reward: [(0, '21.421')] [2024-01-05 12:57:12,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 3104768. Throughput: 0: 888.9. Samples: 776978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:57:12,559][00209] Avg episode reward: [(0, '21.395')] [2024-01-05 12:57:13,658][02312] Updated weights for policy 0, policy_version 760 (0.0014) [2024-01-05 12:57:17,552][00209] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3129344. Throughput: 0: 910.6. Samples: 780192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:57:17,554][00209] Avg episode reward: [(0, '21.829')] [2024-01-05 12:57:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3145728. Throughput: 0: 905.2. Samples: 786556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:57:22,555][00209] Avg episode reward: [(0, '22.368')] [2024-01-05 12:57:24,463][02312] Updated weights for policy 0, policy_version 770 (0.0029) [2024-01-05 12:57:27,557][00209] Fps is (10 sec: 3275.0, 60 sec: 3617.8, 300 sec: 3554.4). Total num frames: 3162112. Throughput: 0: 875.1. Samples: 790850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:57:27,560][00209] Avg episode reward: [(0, '21.899')] [2024-01-05 12:57:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3178496. Throughput: 0: 876.7. Samples: 792996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:57:32,554][00209] Avg episode reward: [(0, '21.866')] [2024-01-05 12:57:36,003][02312] Updated weights for policy 0, policy_version 780 (0.0027) [2024-01-05 12:57:37,552][00209] Fps is (10 sec: 3688.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3198976. Throughput: 0: 913.5. Samples: 799280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:57:37,558][00209] Avg episode reward: [(0, '22.289')] [2024-01-05 12:57:42,552][00209] Fps is (10 sec: 4095.9, 60 sec: 3618.3, 300 sec: 3596.1). Total num frames: 3219456. Throughput: 0: 905.5. Samples: 805508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:57:42,558][00209] Avg episode reward: [(0, '22.201')] [2024-01-05 12:57:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3231744. Throughput: 0: 875.7. Samples: 807550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:57:47,554][00209] Avg episode reward: [(0, '21.986')] [2024-01-05 12:57:47,863][02312] Updated weights for policy 0, policy_version 790 (0.0025) [2024-01-05 12:57:52,552][00209] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3248128. Throughput: 0: 878.1. Samples: 811744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:57:52,554][00209] Avg episode reward: [(0, '22.346')] [2024-01-05 12:57:52,561][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000793_3248128.pth... [2024-01-05 12:57:52,691][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000585_2396160.pth [2024-01-05 12:57:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3268608. Throughput: 0: 911.5. Samples: 817996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:57:57,558][00209] Avg episode reward: [(0, '23.955')] [2024-01-05 12:57:57,561][02299] Saving new best policy, reward=23.955! [2024-01-05 12:57:58,550][02312] Updated weights for policy 0, policy_version 800 (0.0021) [2024-01-05 12:58:02,554][00209] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3596.1). Total num frames: 3289088. Throughput: 0: 911.9. Samples: 821228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:58:02,556][00209] Avg episode reward: [(0, '23.270')] [2024-01-05 12:58:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 3305472. Throughput: 0: 878.2. Samples: 826076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 12:58:07,559][00209] Avg episode reward: [(0, '21.897')] [2024-01-05 12:58:11,768][02312] Updated weights for policy 0, policy_version 810 (0.0044) [2024-01-05 12:58:12,552][00209] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3317760. Throughput: 0: 876.7. Samples: 830298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:58:12,554][00209] Avg episode reward: [(0, '22.245')] [2024-01-05 12:58:17,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3342336. Throughput: 0: 902.5. Samples: 833608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:58:17,560][00209] Avg episode reward: [(0, '22.824')] [2024-01-05 12:58:21,089][02312] Updated weights for policy 0, policy_version 820 (0.0017) [2024-01-05 12:58:22,558][00209] Fps is (10 sec: 4502.6, 60 sec: 3617.7, 300 sec: 3596.1). Total num frames: 3362816. Throughput: 0: 907.2. Samples: 840112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:58:22,561][00209] Avg episode reward: [(0, '21.926')] [2024-01-05 12:58:27,556][00209] Fps is (10 sec: 3275.6, 60 sec: 3550.0, 300 sec: 3568.3). Total num frames: 3375104. Throughput: 0: 870.1. Samples: 844664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 12:58:27,558][00209] Avg episode reward: [(0, '21.282')] [2024-01-05 12:58:32,552][00209] Fps is (10 sec: 2869.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3391488. Throughput: 0: 871.7. Samples: 846778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:58:32,554][00209] Avg episode reward: [(0, '22.090')] [2024-01-05 12:58:34,145][02312] Updated weights for policy 0, policy_version 830 (0.0032) [2024-01-05 12:58:37,552][00209] Fps is (10 sec: 3687.7, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3411968. Throughput: 0: 906.1. Samples: 852518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:58:37,554][00209] Avg episode reward: [(0, '21.555')] [2024-01-05 12:58:42,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3610.0). Total num frames: 3436544. Throughput: 0: 914.8. Samples: 859160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:58:42,556][00209] Avg episode reward: [(0, '20.673')] [2024-01-05 12:58:44,077][02312] Updated weights for policy 0, policy_version 840 (0.0034) [2024-01-05 12:58:47,558][00209] Fps is (10 sec: 3684.0, 60 sec: 3617.7, 300 sec: 3568.3). Total num frames: 3448832. Throughput: 0: 894.3. Samples: 861478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:58:47,561][00209] Avg episode reward: [(0, '20.994')] [2024-01-05 12:58:52,552][00209] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3461120. Throughput: 0: 879.2. Samples: 865640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:58:52,560][00209] Avg episode reward: [(0, '21.375')] [2024-01-05 12:58:56,519][02312] Updated weights for policy 0, policy_version 850 (0.0023) [2024-01-05 12:58:57,552][00209] Fps is (10 sec: 3688.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3485696. Throughput: 0: 915.5. Samples: 871496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:58:57,554][00209] Avg episode reward: [(0, '21.892')] [2024-01-05 12:59:02,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 3506176. Throughput: 0: 915.5. Samples: 874804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:59:02,559][00209] Avg episode reward: [(0, '22.775')] [2024-01-05 12:59:07,315][02312] Updated weights for policy 0, policy_version 860 (0.0024) [2024-01-05 12:59:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3522560. Throughput: 0: 890.4. Samples: 880176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:59:07,555][00209] Avg episode reward: [(0, '22.739')] [2024-01-05 12:59:12,558][00209] Fps is (10 sec: 2865.6, 60 sec: 3617.8, 300 sec: 3554.4). Total num frames: 3534848. Throughput: 0: 881.3. Samples: 884326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 12:59:12,560][00209] Avg episode reward: [(0, '22.946')] [2024-01-05 12:59:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3555328. Throughput: 0: 897.2. Samples: 887150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 12:59:17,554][00209] Avg episode reward: [(0, '22.866')] [2024-01-05 12:59:18,964][02312] Updated weights for policy 0, policy_version 870 (0.0021) [2024-01-05 12:59:22,552][00209] Fps is (10 sec: 4098.3, 60 sec: 3550.3, 300 sec: 3596.1). Total num frames: 3575808. Throughput: 0: 914.4. Samples: 893664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:59:22,554][00209] Avg episode reward: [(0, '21.686')] [2024-01-05 12:59:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3568.4). Total num frames: 3592192. Throughput: 0: 880.8. Samples: 898796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:59:27,554][00209] Avg episode reward: [(0, '23.075')] [2024-01-05 12:59:30,902][02312] Updated weights for policy 0, policy_version 880 (0.0026) [2024-01-05 12:59:32,552][00209] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3608576. Throughput: 0: 876.1. Samples: 900896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:59:32,556][00209] Avg episode reward: [(0, '22.494')] [2024-01-05 12:59:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3629056. Throughput: 0: 899.4. Samples: 906114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 12:59:37,554][00209] Avg episode reward: [(0, '21.836')] [2024-01-05 12:59:41,235][02312] Updated weights for policy 0, policy_version 890 (0.0019) [2024-01-05 12:59:42,552][00209] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3649536. Throughput: 0: 916.8. Samples: 912750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:59:42,554][00209] Avg episode reward: [(0, '20.704')] [2024-01-05 12:59:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.3, 300 sec: 3568.4). Total num frames: 3661824. Throughput: 0: 904.0. Samples: 915486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:59:47,563][00209] Avg episode reward: [(0, '21.344')] [2024-01-05 12:59:52,558][00209] Fps is (10 sec: 2865.3, 60 sec: 3617.7, 300 sec: 3554.4). Total num frames: 3678208. Throughput: 0: 874.9. Samples: 919552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 12:59:52,566][00209] Avg episode reward: [(0, '20.264')] [2024-01-05 12:59:52,576][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000898_3678208.pth... [2024-01-05 12:59:52,776][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000689_2822144.pth [2024-01-05 12:59:54,505][02312] Updated weights for policy 0, policy_version 900 (0.0012) [2024-01-05 12:59:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3698688. Throughput: 0: 901.8. Samples: 924904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 12:59:57,554][00209] Avg episode reward: [(0, '19.052')] [2024-01-05 13:00:02,552][00209] Fps is (10 sec: 4098.7, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3719168. Throughput: 0: 912.7. Samples: 928222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:00:02,559][00209] Avg episode reward: [(0, '19.685')] [2024-01-05 13:00:04,051][02312] Updated weights for policy 0, policy_version 910 (0.0024) [2024-01-05 13:00:07,559][00209] Fps is (10 sec: 3683.6, 60 sec: 3549.4, 300 sec: 3568.3). Total num frames: 3735552. Throughput: 0: 895.7. Samples: 933978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:00:07,562][00209] Avg episode reward: [(0, '20.256')] [2024-01-05 13:00:12,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3550.2, 300 sec: 3540.6). Total num frames: 3747840. Throughput: 0: 869.6. Samples: 937930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:00:12,556][00209] Avg episode reward: [(0, '21.712')] [2024-01-05 13:00:17,476][02312] Updated weights for policy 0, policy_version 920 (0.0018) [2024-01-05 13:00:17,552][00209] Fps is (10 sec: 3279.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3768320. Throughput: 0: 869.5. Samples: 940024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:00:17,554][00209] Avg episode reward: [(0, '22.290')] [2024-01-05 13:00:22,552][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3788800. Throughput: 0: 895.4. Samples: 946406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:00:22,555][00209] Avg episode reward: [(0, '22.325')] [2024-01-05 13:00:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3805184. Throughput: 0: 878.3. Samples: 952272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:00:27,556][00209] Avg episode reward: [(0, '21.819')] [2024-01-05 13:00:27,688][02312] Updated weights for policy 0, policy_version 930 (0.0021) [2024-01-05 13:00:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3821568. Throughput: 0: 864.1. Samples: 954372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:00:32,554][00209] Avg episode reward: [(0, '21.475')] [2024-01-05 13:00:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 3837952. Throughput: 0: 871.2. Samples: 958750. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 13:00:37,554][00209] Avg episode reward: [(0, '21.917')] [2024-01-05 13:00:39,898][02312] Updated weights for policy 0, policy_version 940 (0.0023) [2024-01-05 13:00:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 3858432. Throughput: 0: 898.6. Samples: 965342. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 13:00:42,555][00209] Avg episode reward: [(0, '20.129')] [2024-01-05 13:00:47,553][00209] Fps is (10 sec: 4095.4, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3878912. Throughput: 0: 899.0. Samples: 968680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:00:47,556][00209] Avg episode reward: [(0, '19.728')] [2024-01-05 13:00:51,389][02312] Updated weights for policy 0, policy_version 950 (0.0028) [2024-01-05 13:00:52,554][00209] Fps is (10 sec: 3276.0, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 3891200. Throughput: 0: 869.9. Samples: 973118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:00:52,558][00209] Avg episode reward: [(0, '20.549')] [2024-01-05 13:00:57,552][00209] Fps is (10 sec: 2867.6, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 3907584. Throughput: 0: 878.7. Samples: 977472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:00:57,554][00209] Avg episode reward: [(0, '22.780')] [2024-01-05 13:01:02,552][00209] Fps is (10 sec: 3687.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 3928064. Throughput: 0: 906.1. Samples: 980800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:01:02,555][00209] Avg episode reward: [(0, '23.331')] [2024-01-05 13:01:02,720][02312] Updated weights for policy 0, policy_version 960 (0.0019) [2024-01-05 13:01:07,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.3, 300 sec: 3582.3). Total num frames: 3948544. Throughput: 0: 906.1. Samples: 987180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:01:07,558][00209] Avg episode reward: [(0, '24.599')] [2024-01-05 13:01:07,571][02299] Saving new best policy, reward=24.599! [2024-01-05 13:01:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3964928. Throughput: 0: 868.3. Samples: 991344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-01-05 13:01:12,556][00209] Avg episode reward: [(0, '24.408')] [2024-01-05 13:01:15,664][02312] Updated weights for policy 0, policy_version 970 (0.0021) [2024-01-05 13:01:17,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 3977216. Throughput: 0: 866.8. Samples: 993376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:01:17,555][00209] Avg episode reward: [(0, '25.690')] [2024-01-05 13:01:17,561][02299] Saving new best policy, reward=25.690! [2024-01-05 13:01:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 3997696. Throughput: 0: 896.4. Samples: 999088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:01:22,557][00209] Avg episode reward: [(0, '25.379')] [2024-01-05 13:01:25,389][02312] Updated weights for policy 0, policy_version 980 (0.0029) [2024-01-05 13:01:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4018176. Throughput: 0: 894.2. Samples: 1005580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:01:27,562][00209] Avg episode reward: [(0, '23.419')] [2024-01-05 13:01:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4034560. Throughput: 0: 866.4. Samples: 1007668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:01:32,556][00209] Avg episode reward: [(0, '22.253')] [2024-01-05 13:01:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 4046848. Throughput: 0: 862.5. Samples: 1011930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:01:37,554][00209] Avg episode reward: [(0, '20.903')] [2024-01-05 13:01:38,488][02312] Updated weights for policy 0, policy_version 990 (0.0025) [2024-01-05 13:01:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4071424. Throughput: 0: 903.2. Samples: 1018114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:01:42,554][00209] Avg episode reward: [(0, '21.645')] [2024-01-05 13:01:47,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 4091904. Throughput: 0: 902.4. Samples: 1021410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:01:47,554][00209] Avg episode reward: [(0, '20.772')] [2024-01-05 13:01:48,177][02312] Updated weights for policy 0, policy_version 1000 (0.0033) [2024-01-05 13:01:52,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 4108288. Throughput: 0: 872.2. Samples: 1026428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:01:52,559][00209] Avg episode reward: [(0, '21.014')] [2024-01-05 13:01:52,575][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001003_4108288.pth... [2024-01-05 13:01:52,710][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000793_3248128.pth [2024-01-05 13:01:57,557][00209] Fps is (10 sec: 2865.9, 60 sec: 3549.6, 300 sec: 3540.6). Total num frames: 4120576. Throughput: 0: 872.2. Samples: 1030598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:01:57,559][00209] Avg episode reward: [(0, '20.106')] [2024-01-05 13:02:01,054][02312] Updated weights for policy 0, policy_version 1010 (0.0023) [2024-01-05 13:02:02,552][00209] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4141056. Throughput: 0: 895.5. Samples: 1033674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:02:02,555][00209] Avg episode reward: [(0, '20.091')] [2024-01-05 13:02:07,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4161536. Throughput: 0: 911.2. Samples: 1040092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:07,561][00209] Avg episode reward: [(0, '20.070')] [2024-01-05 13:02:12,255][02312] Updated weights for policy 0, policy_version 1020 (0.0015) [2024-01-05 13:02:12,553][00209] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 4177920. Throughput: 0: 873.6. Samples: 1044892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:12,558][00209] Avg episode reward: [(0, '20.778')] [2024-01-05 13:02:17,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 4190208. Throughput: 0: 874.1. Samples: 1047002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:17,554][00209] Avg episode reward: [(0, '19.591')] [2024-01-05 13:02:22,552][00209] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 4210688. Throughput: 0: 896.7. Samples: 1052280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:02:22,554][00209] Avg episode reward: [(0, '20.737')] [2024-01-05 13:02:23,730][02312] Updated weights for policy 0, policy_version 1030 (0.0014) [2024-01-05 13:02:27,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 4235264. Throughput: 0: 907.1. Samples: 1058934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:02:27,559][00209] Avg episode reward: [(0, '21.220')] [2024-01-05 13:02:32,554][00209] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 4247552. Throughput: 0: 890.2. Samples: 1061472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:32,556][00209] Avg episode reward: [(0, '21.489')] [2024-01-05 13:02:35,912][02312] Updated weights for policy 0, policy_version 1040 (0.0017) [2024-01-05 13:02:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 4263936. Throughput: 0: 871.2. Samples: 1065634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:02:37,556][00209] Avg episode reward: [(0, '22.071')] [2024-01-05 13:02:42,552][00209] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4284416. Throughput: 0: 902.6. Samples: 1071210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:02:42,554][00209] Avg episode reward: [(0, '20.972')] [2024-01-05 13:02:46,171][02312] Updated weights for policy 0, policy_version 1050 (0.0024) [2024-01-05 13:02:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4304896. Throughput: 0: 907.8. Samples: 1074524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:02:47,558][00209] Avg episode reward: [(0, '22.315')] [2024-01-05 13:02:52,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4321280. Throughput: 0: 886.4. Samples: 1079980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:52,558][00209] Avg episode reward: [(0, '21.577')] [2024-01-05 13:02:57,552][00209] Fps is (10 sec: 2867.1, 60 sec: 3550.1, 300 sec: 3540.6). Total num frames: 4333568. Throughput: 0: 872.9. Samples: 1084170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:02:57,560][00209] Avg episode reward: [(0, '21.405')] [2024-01-05 13:02:59,381][02312] Updated weights for policy 0, policy_version 1060 (0.0017) [2024-01-05 13:03:02,552][00209] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4354048. Throughput: 0: 881.0. Samples: 1086648. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 13:03:02,560][00209] Avg episode reward: [(0, '20.711')] [2024-01-05 13:03:07,552][00209] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4374528. Throughput: 0: 908.8. Samples: 1093174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:03:07,559][00209] Avg episode reward: [(0, '22.274')] [2024-01-05 13:03:08,805][02312] Updated weights for policy 0, policy_version 1070 (0.0033) [2024-01-05 13:03:12,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4390912. Throughput: 0: 879.1. Samples: 1098492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:03:12,558][00209] Avg episode reward: [(0, '21.758')] [2024-01-05 13:03:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.7). Total num frames: 4407296. Throughput: 0: 869.0. Samples: 1100576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-01-05 13:03:17,555][00209] Avg episode reward: [(0, '22.526')] [2024-01-05 13:03:21,753][02312] Updated weights for policy 0, policy_version 1080 (0.0022) [2024-01-05 13:03:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4423680. Throughput: 0: 884.7. Samples: 1105444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:03:22,559][00209] Avg episode reward: [(0, '23.284')] [2024-01-05 13:03:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4448256. Throughput: 0: 907.8. Samples: 1112060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:03:27,554][00209] Avg episode reward: [(0, '23.028')] [2024-01-05 13:03:32,268][02312] Updated weights for policy 0, policy_version 1090 (0.0015) [2024-01-05 13:03:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 4464640. Throughput: 0: 901.0. Samples: 1115070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:03:32,556][00209] Avg episode reward: [(0, '22.853')] [2024-01-05 13:03:37,554][00209] Fps is (10 sec: 2866.5, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 4476928. Throughput: 0: 872.9. Samples: 1119262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:03:37,561][00209] Avg episode reward: [(0, '22.477')] [2024-01-05 13:03:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 4497408. Throughput: 0: 893.6. Samples: 1124382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:03:42,554][00209] Avg episode reward: [(0, '22.384')] [2024-01-05 13:03:44,400][02312] Updated weights for policy 0, policy_version 1100 (0.0012) [2024-01-05 13:03:47,552][00209] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 4517888. Throughput: 0: 911.8. Samples: 1127680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:03:47,557][00209] Avg episode reward: [(0, '23.681')] [2024-01-05 13:03:52,554][00209] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 4534272. Throughput: 0: 902.7. Samples: 1133798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:03:52,562][00209] Avg episode reward: [(0, '23.172')] [2024-01-05 13:03:52,585][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001107_4534272.pth... [2024-01-05 13:03:52,778][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000898_3678208.pth [2024-01-05 13:03:56,003][02312] Updated weights for policy 0, policy_version 1110 (0.0020) [2024-01-05 13:03:57,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 4546560. Throughput: 0: 875.2. Samples: 1137878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:03:57,558][00209] Avg episode reward: [(0, '22.915')] [2024-01-05 13:04:02,552][00209] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 4567040. Throughput: 0: 875.3. Samples: 1139964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:04:02,557][00209] Avg episode reward: [(0, '22.304')] [2024-01-05 13:04:06,772][02312] Updated weights for policy 0, policy_version 1120 (0.0025) [2024-01-05 13:04:07,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4587520. Throughput: 0: 912.4. Samples: 1146500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:04:07,559][00209] Avg episode reward: [(0, '23.152')] [2024-01-05 13:04:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 4608000. Throughput: 0: 897.0. Samples: 1152424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:04:12,556][00209] Avg episode reward: [(0, '22.365')] [2024-01-05 13:04:17,556][00209] Fps is (10 sec: 3275.3, 60 sec: 3549.6, 300 sec: 3540.6). Total num frames: 4620288. Throughput: 0: 876.7. Samples: 1154524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:04:17,559][00209] Avg episode reward: [(0, '22.822')] [2024-01-05 13:04:19,529][02312] Updated weights for policy 0, policy_version 1130 (0.0012) [2024-01-05 13:04:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 4636672. Throughput: 0: 882.4. Samples: 1158966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:04:22,557][00209] Avg episode reward: [(0, '22.935')] [2024-01-05 13:04:27,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4661248. Throughput: 0: 916.2. Samples: 1165610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:04:27,554][00209] Avg episode reward: [(0, '24.707')] [2024-01-05 13:04:29,097][02312] Updated weights for policy 0, policy_version 1140 (0.0020) [2024-01-05 13:04:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4677632. Throughput: 0: 916.2. Samples: 1168908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:04:32,557][00209] Avg episode reward: [(0, '23.913')] [2024-01-05 13:04:37,552][00209] Fps is (10 sec: 3276.7, 60 sec: 3618.3, 300 sec: 3540.6). Total num frames: 4694016. Throughput: 0: 877.3. Samples: 1173274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:04:37,559][00209] Avg episode reward: [(0, '22.624')] [2024-01-05 13:04:42,195][02312] Updated weights for policy 0, policy_version 1150 (0.0018) [2024-01-05 13:04:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 4710400. Throughput: 0: 893.2. Samples: 1178072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:04:42,554][00209] Avg episode reward: [(0, '22.003')] [2024-01-05 13:04:47,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3568.5). Total num frames: 4730880. Throughput: 0: 920.3. Samples: 1181378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:04:47,554][00209] Avg episode reward: [(0, '21.509')] [2024-01-05 13:04:51,947][02312] Updated weights for policy 0, policy_version 1160 (0.0024) [2024-01-05 13:04:52,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 4751360. Throughput: 0: 919.2. Samples: 1187862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:04:52,554][00209] Avg episode reward: [(0, '21.151')] [2024-01-05 13:04:57,556][00209] Fps is (10 sec: 3275.3, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 4763648. Throughput: 0: 879.6. Samples: 1192008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:04:57,565][00209] Avg episode reward: [(0, '21.644')] [2024-01-05 13:05:02,555][00209] Fps is (10 sec: 2866.2, 60 sec: 3549.7, 300 sec: 3540.7). Total num frames: 4780032. Throughput: 0: 879.1. Samples: 1194082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:05:02,559][00209] Avg episode reward: [(0, '22.328')] [2024-01-05 13:05:04,774][02312] Updated weights for policy 0, policy_version 1170 (0.0033) [2024-01-05 13:05:07,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 4804608. Throughput: 0: 914.9. Samples: 1200136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:05:07,560][00209] Avg episode reward: [(0, '23.698')] [2024-01-05 13:05:12,552][00209] Fps is (10 sec: 4097.5, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4820992. Throughput: 0: 907.6. Samples: 1206450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:05:12,561][00209] Avg episode reward: [(0, '25.029')] [2024-01-05 13:05:15,541][02312] Updated weights for policy 0, policy_version 1180 (0.0030) [2024-01-05 13:05:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 4837376. Throughput: 0: 880.7. Samples: 1208538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:05:17,557][00209] Avg episode reward: [(0, '25.487')] [2024-01-05 13:05:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 4853760. Throughput: 0: 877.4. Samples: 1212756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:05:22,555][00209] Avg episode reward: [(0, '25.078')] [2024-01-05 13:05:26,992][02312] Updated weights for policy 0, policy_version 1190 (0.0014) [2024-01-05 13:05:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 4874240. Throughput: 0: 910.3. Samples: 1219036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:05:27,554][00209] Avg episode reward: [(0, '24.975')] [2024-01-05 13:05:32,555][00209] Fps is (10 sec: 4094.5, 60 sec: 3617.9, 300 sec: 3582.2). Total num frames: 4894720. Throughput: 0: 910.8. Samples: 1222368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:05:32,558][00209] Avg episode reward: [(0, '25.756')] [2024-01-05 13:05:32,570][02299] Saving new best policy, reward=25.756! [2024-01-05 13:05:37,555][00209] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3554.5). Total num frames: 4907008. Throughput: 0: 872.7. Samples: 1227134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:05:37,560][00209] Avg episode reward: [(0, '24.873')] [2024-01-05 13:05:39,377][02312] Updated weights for policy 0, policy_version 1200 (0.0024) [2024-01-05 13:05:42,552][00209] Fps is (10 sec: 2868.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 4923392. Throughput: 0: 876.4. Samples: 1231440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:05:42,559][00209] Avg episode reward: [(0, '24.944')] [2024-01-05 13:05:47,552][00209] Fps is (10 sec: 4097.2, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 4947968. Throughput: 0: 903.6. Samples: 1234742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:05:47,554][00209] Avg episode reward: [(0, '25.035')] [2024-01-05 13:05:49,330][02312] Updated weights for policy 0, policy_version 1210 (0.0026) [2024-01-05 13:05:52,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 4968448. Throughput: 0: 917.6. Samples: 1241426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:05:52,554][00209] Avg episode reward: [(0, '25.485')] [2024-01-05 13:05:52,570][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001213_4968448.pth... [2024-01-05 13:05:52,732][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001003_4108288.pth [2024-01-05 13:05:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3568.4). Total num frames: 4980736. Throughput: 0: 877.6. Samples: 1245940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:05:57,557][00209] Avg episode reward: [(0, '26.126')] [2024-01-05 13:05:57,562][02299] Saving new best policy, reward=26.126! [2024-01-05 13:06:02,552][00209] Fps is (10 sec: 2457.6, 60 sec: 3550.1, 300 sec: 3540.6). Total num frames: 4993024. Throughput: 0: 874.8. Samples: 1247906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:06:02,554][00209] Avg episode reward: [(0, '26.908')] [2024-01-05 13:06:02,572][02299] Saving new best policy, reward=26.908! [2024-01-05 13:06:02,826][02312] Updated weights for policy 0, policy_version 1220 (0.0026) [2024-01-05 13:06:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5017600. Throughput: 0: 905.5. Samples: 1253502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:06:07,557][00209] Avg episode reward: [(0, '26.471')] [2024-01-05 13:06:12,095][02312] Updated weights for policy 0, policy_version 1230 (0.0018) [2024-01-05 13:06:12,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 5038080. Throughput: 0: 914.0. Samples: 1260168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:06:12,559][00209] Avg episode reward: [(0, '27.567')] [2024-01-05 13:06:12,575][02299] Saving new best policy, reward=27.567! [2024-01-05 13:06:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5050368. Throughput: 0: 884.2. Samples: 1262156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:06:17,555][00209] Avg episode reward: [(0, '27.104')] [2024-01-05 13:06:22,554][00209] Fps is (10 sec: 2866.8, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 5066752. Throughput: 0: 871.1. Samples: 1266332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:06:22,560][00209] Avg episode reward: [(0, '27.165')] [2024-01-05 13:06:25,293][02312] Updated weights for policy 0, policy_version 1240 (0.0022) [2024-01-05 13:06:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5087232. Throughput: 0: 908.4. Samples: 1272316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:06:27,557][00209] Avg episode reward: [(0, '26.012')] [2024-01-05 13:06:32,552][00209] Fps is (10 sec: 4096.7, 60 sec: 3550.1, 300 sec: 3596.1). Total num frames: 5107712. Throughput: 0: 908.1. Samples: 1275608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:06:32,556][00209] Avg episode reward: [(0, '26.406')] [2024-01-05 13:06:35,702][02312] Updated weights for policy 0, policy_version 1250 (0.0026) [2024-01-05 13:06:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 5124096. Throughput: 0: 872.4. Samples: 1280686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:06:37,559][00209] Avg episode reward: [(0, '24.979')] [2024-01-05 13:06:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 5136384. Throughput: 0: 865.5. Samples: 1284886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 13:06:42,554][00209] Avg episode reward: [(0, '25.482')] [2024-01-05 13:06:47,389][02312] Updated weights for policy 0, policy_version 1260 (0.0022) [2024-01-05 13:06:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5160960. Throughput: 0: 889.0. Samples: 1287910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:06:47,554][00209] Avg episode reward: [(0, '25.445')] [2024-01-05 13:06:52,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 5181440. Throughput: 0: 913.9. Samples: 1294626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:06:52,557][00209] Avg episode reward: [(0, '25.665')] [2024-01-05 13:06:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 5197824. Throughput: 0: 873.6. Samples: 1299482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:06:57,554][00209] Avg episode reward: [(0, '25.236')] [2024-01-05 13:06:58,917][02312] Updated weights for policy 0, policy_version 1270 (0.0039) [2024-01-05 13:07:02,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 5210112. Throughput: 0: 876.7. Samples: 1301606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:07:02,557][00209] Avg episode reward: [(0, '25.735')] [2024-01-05 13:07:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5230592. Throughput: 0: 898.6. Samples: 1306766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:07:07,560][00209] Avg episode reward: [(0, '26.389')] [2024-01-05 13:07:09,999][02312] Updated weights for policy 0, policy_version 1280 (0.0018) [2024-01-05 13:07:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 5251072. Throughput: 0: 911.8. Samples: 1313346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:07:12,554][00209] Avg episode reward: [(0, '26.221')] [2024-01-05 13:07:17,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 5267456. Throughput: 0: 896.3. Samples: 1315942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:07:17,560][00209] Avg episode reward: [(0, '25.729')] [2024-01-05 13:07:22,556][00209] Fps is (10 sec: 2865.9, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 5279744. Throughput: 0: 875.7. Samples: 1320098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:07:22,559][00209] Avg episode reward: [(0, '24.176')] [2024-01-05 13:07:23,013][02312] Updated weights for policy 0, policy_version 1290 (0.0041) [2024-01-05 13:07:27,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5300224. Throughput: 0: 903.2. Samples: 1325532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:07:27,561][00209] Avg episode reward: [(0, '22.284')] [2024-01-05 13:07:32,513][02312] Updated weights for policy 0, policy_version 1300 (0.0020) [2024-01-05 13:07:32,552][00209] Fps is (10 sec: 4507.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 5324800. Throughput: 0: 910.0. Samples: 1328860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:07:32,560][00209] Avg episode reward: [(0, '20.017')] [2024-01-05 13:07:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5337088. Throughput: 0: 889.6. Samples: 1334658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:07:37,557][00209] Avg episode reward: [(0, '19.305')] [2024-01-05 13:07:42,554][00209] Fps is (10 sec: 2866.5, 60 sec: 3618.0, 300 sec: 3554.5). Total num frames: 5353472. Throughput: 0: 874.6. Samples: 1338842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:07:42,560][00209] Avg episode reward: [(0, '19.229')] [2024-01-05 13:07:45,559][02312] Updated weights for policy 0, policy_version 1310 (0.0031) [2024-01-05 13:07:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5373952. Throughput: 0: 881.0. Samples: 1341250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:07:47,557][00209] Avg episode reward: [(0, '21.126')] [2024-01-05 13:07:52,552][00209] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 5394432. Throughput: 0: 914.5. Samples: 1347920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:07:52,555][00209] Avg episode reward: [(0, '22.787')] [2024-01-05 13:07:52,573][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001317_5394432.pth... [2024-01-05 13:07:52,695][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001107_4534272.pth [2024-01-05 13:07:55,373][02312] Updated weights for policy 0, policy_version 1320 (0.0025) [2024-01-05 13:07:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5410816. Throughput: 0: 887.2. Samples: 1353272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:07:57,560][00209] Avg episode reward: [(0, '24.091')] [2024-01-05 13:08:02,552][00209] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 5423104. Throughput: 0: 875.5. Samples: 1355340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:08:02,558][00209] Avg episode reward: [(0, '23.826')] [2024-01-05 13:08:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5443584. Throughput: 0: 889.8. Samples: 1360136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:08:07,554][00209] Avg episode reward: [(0, '26.056')] [2024-01-05 13:08:08,178][02312] Updated weights for policy 0, policy_version 1330 (0.0015) [2024-01-05 13:08:12,552][00209] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5464064. Throughput: 0: 916.1. Samples: 1366756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:08:12,554][00209] Avg episode reward: [(0, '25.278')] [2024-01-05 13:08:17,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5480448. Throughput: 0: 909.5. Samples: 1369788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:08:17,557][00209] Avg episode reward: [(0, '24.804')] [2024-01-05 13:08:19,199][02312] Updated weights for policy 0, policy_version 1340 (0.0020) [2024-01-05 13:08:22,553][00209] Fps is (10 sec: 3276.5, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 5496832. Throughput: 0: 873.2. Samples: 1373954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:08:22,555][00209] Avg episode reward: [(0, '24.890')] [2024-01-05 13:08:27,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5513216. Throughput: 0: 890.4. Samples: 1378906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:08:27,554][00209] Avg episode reward: [(0, '25.646')] [2024-01-05 13:08:30,615][02312] Updated weights for policy 0, policy_version 1350 (0.0024) [2024-01-05 13:08:32,552][00209] Fps is (10 sec: 4096.4, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 5537792. Throughput: 0: 911.0. Samples: 1382244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:08:32,554][00209] Avg episode reward: [(0, '26.124')] [2024-01-05 13:08:37,554][00209] Fps is (10 sec: 4095.0, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 5554176. Throughput: 0: 900.4. Samples: 1388442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:08:37,565][00209] Avg episode reward: [(0, '25.041')] [2024-01-05 13:08:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 5566464. Throughput: 0: 875.9. Samples: 1392686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:08:42,554][00209] Avg episode reward: [(0, '25.785')] [2024-01-05 13:08:42,863][02312] Updated weights for policy 0, policy_version 1360 (0.0023) [2024-01-05 13:08:47,552][00209] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5586944. Throughput: 0: 876.7. Samples: 1394792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:08:47,563][00209] Avg episode reward: [(0, '26.292')] [2024-01-05 13:08:52,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 5607424. Throughput: 0: 913.2. Samples: 1401230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:08:52,557][00209] Avg episode reward: [(0, '26.522')] [2024-01-05 13:08:52,956][02312] Updated weights for policy 0, policy_version 1370 (0.0033) [2024-01-05 13:08:57,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 5627904. Throughput: 0: 896.5. Samples: 1407100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:08:57,554][00209] Avg episode reward: [(0, '25.160')] [2024-01-05 13:09:02,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 5640192. Throughput: 0: 875.5. Samples: 1409186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:09:02,555][00209] Avg episode reward: [(0, '24.946')] [2024-01-05 13:09:06,181][02312] Updated weights for policy 0, policy_version 1380 (0.0018) [2024-01-05 13:09:07,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5656576. Throughput: 0: 878.1. Samples: 1413470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:09:07,554][00209] Avg episode reward: [(0, '25.255')] [2024-01-05 13:09:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5677056. Throughput: 0: 916.2. Samples: 1420136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:09:12,561][00209] Avg episode reward: [(0, '23.504')] [2024-01-05 13:09:15,382][02312] Updated weights for policy 0, policy_version 1390 (0.0014) [2024-01-05 13:09:17,553][00209] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 5697536. Throughput: 0: 915.3. Samples: 1423432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:09:17,560][00209] Avg episode reward: [(0, '24.291')] [2024-01-05 13:09:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5709824. Throughput: 0: 875.2. Samples: 1427826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:09:22,554][00209] Avg episode reward: [(0, '23.705')] [2024-01-05 13:09:27,552][00209] Fps is (10 sec: 2867.5, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5726208. Throughput: 0: 878.2. Samples: 1432206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:09:27,554][00209] Avg episode reward: [(0, '23.592')] [2024-01-05 13:09:28,746][02312] Updated weights for policy 0, policy_version 1400 (0.0039) [2024-01-05 13:09:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5750784. Throughput: 0: 904.9. Samples: 1435514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:09:32,560][00209] Avg episode reward: [(0, '24.746')] [2024-01-05 13:09:37,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3596.2). Total num frames: 5771264. Throughput: 0: 908.4. Samples: 1442108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:09:37,559][00209] Avg episode reward: [(0, '24.476')] [2024-01-05 13:09:38,903][02312] Updated weights for policy 0, policy_version 1410 (0.0028) [2024-01-05 13:09:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 5783552. Throughput: 0: 873.3. Samples: 1446398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:09:42,554][00209] Avg episode reward: [(0, '24.044')] [2024-01-05 13:09:47,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5799936. Throughput: 0: 873.0. Samples: 1448472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:09:47,562][00209] Avg episode reward: [(0, '23.368')] [2024-01-05 13:09:51,163][02312] Updated weights for policy 0, policy_version 1420 (0.0020) [2024-01-05 13:09:52,555][00209] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3582.3). Total num frames: 5820416. Throughput: 0: 910.3. Samples: 1454438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:09:52,557][00209] Avg episode reward: [(0, '24.668')] [2024-01-05 13:09:52,578][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001421_5820416.pth... [2024-01-05 13:09:52,721][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001213_4968448.pth [2024-01-05 13:09:57,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 5840896. Throughput: 0: 901.9. Samples: 1460722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:09:57,558][00209] Avg episode reward: [(0, '24.562')] [2024-01-05 13:10:02,552][00209] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5853184. Throughput: 0: 876.1. Samples: 1462854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:10:02,556][00209] Avg episode reward: [(0, '25.458')] [2024-01-05 13:10:02,874][02312] Updated weights for policy 0, policy_version 1430 (0.0017) [2024-01-05 13:10:07,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5869568. Throughput: 0: 870.9. Samples: 1467016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:07,559][00209] Avg episode reward: [(0, '25.918')] [2024-01-05 13:10:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 5890048. Throughput: 0: 908.9. Samples: 1473108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:10:12,554][00209] Avg episode reward: [(0, '26.112')] [2024-01-05 13:10:13,798][02312] Updated weights for policy 0, policy_version 1440 (0.0019) [2024-01-05 13:10:17,553][00209] Fps is (10 sec: 4095.7, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5910528. Throughput: 0: 908.7. Samples: 1476406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:17,558][00209] Avg episode reward: [(0, '25.978')] [2024-01-05 13:10:22,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 5926912. Throughput: 0: 872.4. Samples: 1481368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:22,556][00209] Avg episode reward: [(0, '25.536')] [2024-01-05 13:10:26,648][02312] Updated weights for policy 0, policy_version 1450 (0.0038) [2024-01-05 13:10:27,552][00209] Fps is (10 sec: 2867.4, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 5939200. Throughput: 0: 869.0. Samples: 1485502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:27,554][00209] Avg episode reward: [(0, '25.695')] [2024-01-05 13:10:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 5963776. Throughput: 0: 892.8. Samples: 1488646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:32,554][00209] Avg episode reward: [(0, '26.729')] [2024-01-05 13:10:36,195][02312] Updated weights for policy 0, policy_version 1460 (0.0020) [2024-01-05 13:10:37,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 5984256. Throughput: 0: 908.3. Samples: 1495310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:10:37,555][00209] Avg episode reward: [(0, '24.880')] [2024-01-05 13:10:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 5996544. Throughput: 0: 875.4. Samples: 1500114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:10:42,556][00209] Avg episode reward: [(0, '24.876')] [2024-01-05 13:10:47,554][00209] Fps is (10 sec: 2866.6, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 6012928. Throughput: 0: 874.8. Samples: 1502220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:10:47,560][00209] Avg episode reward: [(0, '25.451')] [2024-01-05 13:10:49,310][02312] Updated weights for policy 0, policy_version 1470 (0.0018) [2024-01-05 13:10:52,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3550.0, 300 sec: 3568.4). Total num frames: 6033408. Throughput: 0: 903.8. Samples: 1507686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:10:52,560][00209] Avg episode reward: [(0, '27.049')] [2024-01-05 13:10:57,552][00209] Fps is (10 sec: 4096.9, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 6053888. Throughput: 0: 914.8. Samples: 1514272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:10:57,554][00209] Avg episode reward: [(0, '26.043')] [2024-01-05 13:10:59,156][02312] Updated weights for policy 0, policy_version 1480 (0.0025) [2024-01-05 13:11:02,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6070272. Throughput: 0: 895.3. Samples: 1516696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:11:02,554][00209] Avg episode reward: [(0, '26.454')] [2024-01-05 13:11:07,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 6082560. Throughput: 0: 877.1. Samples: 1520836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:11:07,554][00209] Avg episode reward: [(0, '26.063')] [2024-01-05 13:11:11,686][02312] Updated weights for policy 0, policy_version 1490 (0.0035) [2024-01-05 13:11:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6103040. Throughput: 0: 912.9. Samples: 1526584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:11:12,561][00209] Avg episode reward: [(0, '24.903')] [2024-01-05 13:11:17,552][00209] Fps is (10 sec: 4505.4, 60 sec: 3618.2, 300 sec: 3596.2). Total num frames: 6127616. Throughput: 0: 917.3. Samples: 1529926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:11:17,555][00209] Avg episode reward: [(0, '24.265')] [2024-01-05 13:11:22,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6139904. Throughput: 0: 890.8. Samples: 1535396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:11:22,554][00209] Avg episode reward: [(0, '22.943')] [2024-01-05 13:11:22,736][02312] Updated weights for policy 0, policy_version 1500 (0.0019) [2024-01-05 13:11:27,556][00209] Fps is (10 sec: 2866.3, 60 sec: 3617.9, 300 sec: 3554.5). Total num frames: 6156288. Throughput: 0: 876.3. Samples: 1539550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:11:27,560][00209] Avg episode reward: [(0, '23.836')] [2024-01-05 13:11:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6176768. Throughput: 0: 887.9. Samples: 1542172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:11:32,557][00209] Avg episode reward: [(0, '24.109')] [2024-01-05 13:11:34,077][02312] Updated weights for policy 0, policy_version 1510 (0.0024) [2024-01-05 13:11:37,552][00209] Fps is (10 sec: 4097.5, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 6197248. Throughput: 0: 914.0. Samples: 1548816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:11:37,562][00209] Avg episode reward: [(0, '25.255')] [2024-01-05 13:11:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6213632. Throughput: 0: 886.0. Samples: 1554144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:11:42,557][00209] Avg episode reward: [(0, '25.634')] [2024-01-05 13:11:46,199][02312] Updated weights for policy 0, policy_version 1520 (0.0024) [2024-01-05 13:11:47,552][00209] Fps is (10 sec: 2867.0, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 6225920. Throughput: 0: 878.2. Samples: 1556214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:11:47,569][00209] Avg episode reward: [(0, '25.248')] [2024-01-05 13:11:52,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6246400. Throughput: 0: 897.2. Samples: 1561210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:11:52,554][00209] Avg episode reward: [(0, '25.338')] [2024-01-05 13:11:52,572][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001525_6246400.pth... [2024-01-05 13:11:52,707][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001317_5394432.pth [2024-01-05 13:11:56,590][02312] Updated weights for policy 0, policy_version 1530 (0.0030) [2024-01-05 13:11:57,552][00209] Fps is (10 sec: 4505.9, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 6270976. Throughput: 0: 915.3. Samples: 1567772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:11:57,554][00209] Avg episode reward: [(0, '25.187')] [2024-01-05 13:12:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6283264. Throughput: 0: 903.2. Samples: 1570568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:12:02,558][00209] Avg episode reward: [(0, '25.549')] [2024-01-05 13:12:07,556][00209] Fps is (10 sec: 2865.9, 60 sec: 3617.9, 300 sec: 3554.4). Total num frames: 6299648. Throughput: 0: 872.8. Samples: 1574676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:12:07,559][00209] Avg episode reward: [(0, '25.642')] [2024-01-05 13:12:09,611][02312] Updated weights for policy 0, policy_version 1540 (0.0019) [2024-01-05 13:12:12,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6320128. Throughput: 0: 900.3. Samples: 1580060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:12,557][00209] Avg episode reward: [(0, '25.430')] [2024-01-05 13:12:17,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 6340608. Throughput: 0: 916.2. Samples: 1583402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:17,560][00209] Avg episode reward: [(0, '26.701')] [2024-01-05 13:12:19,060][02312] Updated weights for policy 0, policy_version 1550 (0.0018) [2024-01-05 13:12:22,555][00209] Fps is (10 sec: 3685.1, 60 sec: 3617.9, 300 sec: 3582.2). Total num frames: 6356992. Throughput: 0: 896.9. Samples: 1589178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:22,558][00209] Avg episode reward: [(0, '26.349')] [2024-01-05 13:12:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3550.1, 300 sec: 3540.6). Total num frames: 6369280. Throughput: 0: 873.3. Samples: 1593444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:27,556][00209] Avg episode reward: [(0, '24.461')] [2024-01-05 13:12:32,201][02312] Updated weights for policy 0, policy_version 1560 (0.0032) [2024-01-05 13:12:32,552][00209] Fps is (10 sec: 3278.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6389760. Throughput: 0: 874.6. Samples: 1595572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:32,555][00209] Avg episode reward: [(0, '23.766')] [2024-01-05 13:12:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 6410240. Throughput: 0: 909.9. Samples: 1602154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:12:37,554][00209] Avg episode reward: [(0, '23.106')] [2024-01-05 13:12:42,300][02312] Updated weights for policy 0, policy_version 1570 (0.0022) [2024-01-05 13:12:42,554][00209] Fps is (10 sec: 4095.0, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 6430720. Throughput: 0: 894.3. Samples: 1608016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:42,557][00209] Avg episode reward: [(0, '23.798')] [2024-01-05 13:12:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 6443008. Throughput: 0: 878.1. Samples: 1610084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:47,557][00209] Avg episode reward: [(0, '22.682')] [2024-01-05 13:12:52,552][00209] Fps is (10 sec: 2867.9, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6459392. Throughput: 0: 884.5. Samples: 1614476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:12:52,554][00209] Avg episode reward: [(0, '22.254')] [2024-01-05 13:12:54,659][02312] Updated weights for policy 0, policy_version 1580 (0.0012) [2024-01-05 13:12:57,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 6483968. Throughput: 0: 912.4. Samples: 1621118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:12:57,554][00209] Avg episode reward: [(0, '23.266')] [2024-01-05 13:13:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 6500352. Throughput: 0: 911.6. Samples: 1624422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:13:02,561][00209] Avg episode reward: [(0, '22.905')] [2024-01-05 13:13:05,998][02312] Updated weights for policy 0, policy_version 1590 (0.0013) [2024-01-05 13:13:07,561][00209] Fps is (10 sec: 3273.7, 60 sec: 3617.8, 300 sec: 3568.3). Total num frames: 6516736. Throughput: 0: 876.3. Samples: 1628616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:13:07,573][00209] Avg episode reward: [(0, '23.712')] [2024-01-05 13:13:12,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 6529024. Throughput: 0: 883.5. Samples: 1633202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:13:12,555][00209] Avg episode reward: [(0, '23.783')] [2024-01-05 13:13:17,357][02312] Updated weights for policy 0, policy_version 1600 (0.0030) [2024-01-05 13:13:17,552][00209] Fps is (10 sec: 3689.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 6553600. Throughput: 0: 909.2. Samples: 1636484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:13:17,561][00209] Avg episode reward: [(0, '22.583')] [2024-01-05 13:13:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.1, 300 sec: 3582.3). Total num frames: 6569984. Throughput: 0: 900.9. Samples: 1642694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:13:22,554][00209] Avg episode reward: [(0, '21.880')] [2024-01-05 13:13:27,556][00209] Fps is (10 sec: 3275.5, 60 sec: 3617.9, 300 sec: 3554.4). Total num frames: 6586368. Throughput: 0: 863.5. Samples: 1646874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:13:27,559][00209] Avg episode reward: [(0, '21.431')] [2024-01-05 13:13:30,399][02312] Updated weights for policy 0, policy_version 1610 (0.0028) [2024-01-05 13:13:32,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 6598656. Throughput: 0: 863.2. Samples: 1648930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:13:32,554][00209] Avg episode reward: [(0, '22.307')] [2024-01-05 13:13:37,552][00209] Fps is (10 sec: 3687.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 6623232. Throughput: 0: 901.5. Samples: 1655042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:13:37,557][00209] Avg episode reward: [(0, '22.118')] [2024-01-05 13:13:40,023][02312] Updated weights for policy 0, policy_version 1620 (0.0022) [2024-01-05 13:13:42,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 6643712. Throughput: 0: 894.3. Samples: 1661362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:13:42,557][00209] Avg episode reward: [(0, '21.711')] [2024-01-05 13:13:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6656000. Throughput: 0: 868.4. Samples: 1663500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:13:47,559][00209] Avg episode reward: [(0, '21.838')] [2024-01-05 13:13:52,552][00209] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 6672384. Throughput: 0: 867.6. Samples: 1667650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:13:52,559][00209] Avg episode reward: [(0, '22.106')] [2024-01-05 13:13:52,573][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001629_6672384.pth... [2024-01-05 13:13:52,721][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001421_5820416.pth [2024-01-05 13:13:52,971][02312] Updated weights for policy 0, policy_version 1630 (0.0030) [2024-01-05 13:13:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 6692864. Throughput: 0: 907.0. Samples: 1674018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:13:57,561][00209] Avg episode reward: [(0, '24.524')] [2024-01-05 13:14:02,554][00209] Fps is (10 sec: 4095.5, 60 sec: 3549.8, 300 sec: 3582.2). Total num frames: 6713344. Throughput: 0: 906.0. Samples: 1677256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:02,555][00209] Avg episode reward: [(0, '24.490')] [2024-01-05 13:14:03,070][02312] Updated weights for policy 0, policy_version 1640 (0.0029) [2024-01-05 13:14:07,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3550.4, 300 sec: 3568.4). Total num frames: 6729728. Throughput: 0: 871.6. Samples: 1681918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:07,554][00209] Avg episode reward: [(0, '24.869')] [2024-01-05 13:14:12,552][00209] Fps is (10 sec: 2867.6, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 6742016. Throughput: 0: 873.7. Samples: 1686186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:14:12,557][00209] Avg episode reward: [(0, '24.642')] [2024-01-05 13:14:15,608][02312] Updated weights for policy 0, policy_version 1650 (0.0028) [2024-01-05 13:14:17,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 6766592. Throughput: 0: 901.4. Samples: 1689494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:14:17,554][00209] Avg episode reward: [(0, '24.299')] [2024-01-05 13:14:22,552][00209] Fps is (10 sec: 4505.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 6787072. Throughput: 0: 910.4. Samples: 1696012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:14:22,554][00209] Avg episode reward: [(0, '24.816')] [2024-01-05 13:14:26,745][02312] Updated weights for policy 0, policy_version 1660 (0.0018) [2024-01-05 13:14:27,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 6799360. Throughput: 0: 871.0. Samples: 1700556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:27,558][00209] Avg episode reward: [(0, '25.283')] [2024-01-05 13:14:32,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 6815744. Throughput: 0: 869.8. Samples: 1702642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:14:32,562][00209] Avg episode reward: [(0, '25.791')] [2024-01-05 13:14:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6836224. Throughput: 0: 901.5. Samples: 1708218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:37,555][00209] Avg episode reward: [(0, '25.761')] [2024-01-05 13:14:38,114][02312] Updated weights for policy 0, policy_version 1670 (0.0017) [2024-01-05 13:14:42,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 6856704. Throughput: 0: 905.6. Samples: 1714768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:42,554][00209] Avg episode reward: [(0, '27.634')] [2024-01-05 13:14:42,565][02299] Saving new best policy, reward=27.634! [2024-01-05 13:14:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6873088. Throughput: 0: 881.8. Samples: 1716934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:14:47,555][00209] Avg episode reward: [(0, '27.994')] [2024-01-05 13:14:47,558][02299] Saving new best policy, reward=27.994! [2024-01-05 13:14:50,541][02312] Updated weights for policy 0, policy_version 1680 (0.0042) [2024-01-05 13:14:52,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 6885376. Throughput: 0: 870.0. Samples: 1721068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:14:52,554][00209] Avg episode reward: [(0, '28.315')] [2024-01-05 13:14:52,564][02299] Saving new best policy, reward=28.315! [2024-01-05 13:14:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6905856. Throughput: 0: 905.7. Samples: 1726944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:14:57,559][00209] Avg episode reward: [(0, '27.526')] [2024-01-05 13:15:00,725][02312] Updated weights for policy 0, policy_version 1690 (0.0036) [2024-01-05 13:15:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 6926336. Throughput: 0: 905.5. Samples: 1730240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:15:02,554][00209] Avg episode reward: [(0, '26.787')] [2024-01-05 13:15:07,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 6942720. Throughput: 0: 874.4. Samples: 1735360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:15:07,554][00209] Avg episode reward: [(0, '26.113')] [2024-01-05 13:15:12,554][00209] Fps is (10 sec: 2866.7, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 6955008. Throughput: 0: 866.8. Samples: 1739564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:15:12,557][00209] Avg episode reward: [(0, '26.952')] [2024-01-05 13:15:14,122][02312] Updated weights for policy 0, policy_version 1700 (0.0015) [2024-01-05 13:15:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 6975488. Throughput: 0: 882.8. Samples: 1742368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:15:17,554][00209] Avg episode reward: [(0, '26.350')] [2024-01-05 13:15:22,552][00209] Fps is (10 sec: 4506.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 7000064. Throughput: 0: 903.6. Samples: 1748882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:15:22,560][00209] Avg episode reward: [(0, '28.291')] [2024-01-05 13:15:23,444][02312] Updated weights for policy 0, policy_version 1710 (0.0014) [2024-01-05 13:15:27,558][00209] Fps is (10 sec: 3684.0, 60 sec: 3549.5, 300 sec: 3554.4). Total num frames: 7012352. Throughput: 0: 870.9. Samples: 1753966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:15:27,561][00209] Avg episode reward: [(0, '27.422')] [2024-01-05 13:15:32,552][00209] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 7024640. Throughput: 0: 869.2. Samples: 1756050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:15:32,554][00209] Avg episode reward: [(0, '25.957')] [2024-01-05 13:15:36,666][02312] Updated weights for policy 0, policy_version 1720 (0.0012) [2024-01-05 13:15:37,552][00209] Fps is (10 sec: 3688.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7049216. Throughput: 0: 890.3. Samples: 1761132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:15:37,553][00209] Avg episode reward: [(0, '24.994')] [2024-01-05 13:15:42,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 7069696. Throughput: 0: 907.5. Samples: 1767780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:15:42,555][00209] Avg episode reward: [(0, '24.644')] [2024-01-05 13:15:47,113][02312] Updated weights for policy 0, policy_version 1730 (0.0016) [2024-01-05 13:15:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7086080. Throughput: 0: 895.6. Samples: 1770540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:15:47,557][00209] Avg episode reward: [(0, '24.339')] [2024-01-05 13:15:52,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 7098368. Throughput: 0: 874.8. Samples: 1774728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:15:52,559][00209] Avg episode reward: [(0, '23.450')] [2024-01-05 13:15:52,570][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001733_7098368.pth... [2024-01-05 13:15:52,739][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001525_6246400.pth [2024-01-05 13:15:57,553][00209] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 7118848. Throughput: 0: 899.2. Samples: 1780026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:15:57,555][00209] Avg episode reward: [(0, '24.379')] [2024-01-05 13:15:59,101][02312] Updated weights for policy 0, policy_version 1740 (0.0022) [2024-01-05 13:16:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 7139328. Throughput: 0: 910.4. Samples: 1783334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:16:02,559][00209] Avg episode reward: [(0, '25.680')] [2024-01-05 13:16:07,552][00209] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7155712. Throughput: 0: 891.5. Samples: 1789000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:16:07,554][00209] Avg episode reward: [(0, '26.330')] [2024-01-05 13:16:11,209][02312] Updated weights for policy 0, policy_version 1750 (0.0029) [2024-01-05 13:16:12,552][00209] Fps is (10 sec: 2867.1, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 7168000. Throughput: 0: 871.1. Samples: 1793160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:16:12,555][00209] Avg episode reward: [(0, '26.311')] [2024-01-05 13:16:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 7188480. Throughput: 0: 874.7. Samples: 1795412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:16:17,560][00209] Avg episode reward: [(0, '25.422')] [2024-01-05 13:16:21,698][02312] Updated weights for policy 0, policy_version 1760 (0.0014) [2024-01-05 13:16:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 7208960. Throughput: 0: 908.1. Samples: 1801996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:16:22,555][00209] Avg episode reward: [(0, '25.518')] [2024-01-05 13:16:27,554][00209] Fps is (10 sec: 4094.9, 60 sec: 3618.4, 300 sec: 3568.3). Total num frames: 7229440. Throughput: 0: 887.3. Samples: 1807712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:16:27,559][00209] Avg episode reward: [(0, '25.547')] [2024-01-05 13:16:32,556][00209] Fps is (10 sec: 3275.7, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 7241728. Throughput: 0: 873.0. Samples: 1809828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:16:32,563][00209] Avg episode reward: [(0, '24.956')] [2024-01-05 13:16:34,633][02312] Updated weights for policy 0, policy_version 1770 (0.0020) [2024-01-05 13:16:37,552][00209] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 7258112. Throughput: 0: 879.6. Samples: 1814312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:16:37,554][00209] Avg episode reward: [(0, '25.516')] [2024-01-05 13:16:42,552][00209] Fps is (10 sec: 4097.5, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 7282688. Throughput: 0: 912.4. Samples: 1821082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:16:42,554][00209] Avg episode reward: [(0, '25.867')] [2024-01-05 13:16:44,073][02312] Updated weights for policy 0, policy_version 1780 (0.0012) [2024-01-05 13:16:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7299072. Throughput: 0: 912.2. Samples: 1824382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:16:47,555][00209] Avg episode reward: [(0, '26.710')] [2024-01-05 13:16:52,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 7315456. Throughput: 0: 882.5. Samples: 1828714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:16:52,554][00209] Avg episode reward: [(0, '26.238')] [2024-01-05 13:16:56,970][02312] Updated weights for policy 0, policy_version 1790 (0.0015) [2024-01-05 13:16:57,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 7331840. Throughput: 0: 896.1. Samples: 1833484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:16:57,556][00209] Avg episode reward: [(0, '26.673')] [2024-01-05 13:17:02,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 7356416. Throughput: 0: 920.7. Samples: 1836844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:02,558][00209] Avg episode reward: [(0, '27.008')] [2024-01-05 13:17:06,767][02312] Updated weights for policy 0, policy_version 1800 (0.0027) [2024-01-05 13:17:07,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 7372800. Throughput: 0: 913.6. Samples: 1843108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:17:07,555][00209] Avg episode reward: [(0, '27.628')] [2024-01-05 13:17:12,557][00209] Fps is (10 sec: 2865.9, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 7385088. Throughput: 0: 880.4. Samples: 1847334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:17:12,558][00209] Avg episode reward: [(0, '27.688')] [2024-01-05 13:17:17,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 7401472. Throughput: 0: 881.0. Samples: 1849470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:17,554][00209] Avg episode reward: [(0, '26.772')] [2024-01-05 13:17:19,459][02312] Updated weights for policy 0, policy_version 1810 (0.0037) [2024-01-05 13:17:22,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 7426048. Throughput: 0: 920.0. Samples: 1855710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:17:22,555][00209] Avg episode reward: [(0, '26.474')] [2024-01-05 13:17:27,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3582.3). Total num frames: 7446528. Throughput: 0: 905.2. Samples: 1861814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:17:27,555][00209] Avg episode reward: [(0, '24.743')] [2024-01-05 13:17:30,162][02312] Updated weights for policy 0, policy_version 1820 (0.0022) [2024-01-05 13:17:32,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 7458816. Throughput: 0: 880.3. Samples: 1863994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:17:32,555][00209] Avg episode reward: [(0, '24.646')] [2024-01-05 13:17:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 7475200. Throughput: 0: 876.0. Samples: 1868132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:17:37,555][00209] Avg episode reward: [(0, '23.621')] [2024-01-05 13:17:41,800][02312] Updated weights for policy 0, policy_version 1830 (0.0015) [2024-01-05 13:17:42,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7495680. Throughput: 0: 914.0. Samples: 1874614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:42,557][00209] Avg episode reward: [(0, '23.151')] [2024-01-05 13:17:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 7516160. Throughput: 0: 913.2. Samples: 1877938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:47,557][00209] Avg episode reward: [(0, '23.842')] [2024-01-05 13:17:52,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 7532544. Throughput: 0: 880.4. Samples: 1882728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:52,555][00209] Avg episode reward: [(0, '23.774')] [2024-01-05 13:17:52,563][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth... [2024-01-05 13:17:52,738][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001629_6672384.pth [2024-01-05 13:17:53,950][02312] Updated weights for policy 0, policy_version 1840 (0.0025) [2024-01-05 13:17:57,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 7544832. Throughput: 0: 879.8. Samples: 1886922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:17:57,557][00209] Avg episode reward: [(0, '23.641')] [2024-01-05 13:18:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.5). Total num frames: 7569408. Throughput: 0: 906.3. Samples: 1890252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:02,554][00209] Avg episode reward: [(0, '25.489')] [2024-01-05 13:18:04,386][02312] Updated weights for policy 0, policy_version 1850 (0.0019) [2024-01-05 13:18:07,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 7589888. Throughput: 0: 910.1. Samples: 1896664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:18:07,557][00209] Avg episode reward: [(0, '25.739')] [2024-01-05 13:18:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 7602176. Throughput: 0: 875.4. Samples: 1901208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:18:12,554][00209] Avg episode reward: [(0, '25.408')] [2024-01-05 13:18:17,552][00209] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 7614464. Throughput: 0: 873.8. Samples: 1903314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:18:17,556][00209] Avg episode reward: [(0, '24.199')] [2024-01-05 13:18:17,571][02312] Updated weights for policy 0, policy_version 1860 (0.0012) [2024-01-05 13:18:22,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7639040. Throughput: 0: 907.2. Samples: 1908958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:18:22,561][00209] Avg episode reward: [(0, '23.739')] [2024-01-05 13:18:26,983][02312] Updated weights for policy 0, policy_version 1870 (0.0025) [2024-01-05 13:18:27,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 7659520. Throughput: 0: 908.9. Samples: 1915516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:27,565][00209] Avg episode reward: [(0, '23.823')] [2024-01-05 13:18:32,558][00209] Fps is (10 sec: 3274.6, 60 sec: 3549.5, 300 sec: 3554.4). Total num frames: 7671808. Throughput: 0: 884.1. Samples: 1917730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:18:32,560][00209] Avg episode reward: [(0, '23.677')] [2024-01-05 13:18:37,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 7688192. Throughput: 0: 868.6. Samples: 1921814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:37,559][00209] Avg episode reward: [(0, '23.583')] [2024-01-05 13:18:40,223][02312] Updated weights for policy 0, policy_version 1880 (0.0036) [2024-01-05 13:18:42,552][00209] Fps is (10 sec: 3688.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7708672. Throughput: 0: 904.2. Samples: 1927610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:42,563][00209] Avg episode reward: [(0, '24.908')] [2024-01-05 13:18:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 7729152. Throughput: 0: 903.2. Samples: 1930894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:47,562][00209] Avg episode reward: [(0, '25.836')] [2024-01-05 13:18:50,336][02312] Updated weights for policy 0, policy_version 1890 (0.0018) [2024-01-05 13:18:52,559][00209] Fps is (10 sec: 3683.6, 60 sec: 3549.4, 300 sec: 3568.3). Total num frames: 7745536. Throughput: 0: 880.7. Samples: 1936302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:18:52,561][00209] Avg episode reward: [(0, '25.601')] [2024-01-05 13:18:57,557][00209] Fps is (10 sec: 2865.6, 60 sec: 3549.5, 300 sec: 3540.6). Total num frames: 7757824. Throughput: 0: 873.4. Samples: 1940518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:18:57,561][00209] Avg episode reward: [(0, '23.987')] [2024-01-05 13:19:02,552][00209] Fps is (10 sec: 3279.3, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 7778304. Throughput: 0: 887.6. Samples: 1943254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:19:02,560][00209] Avg episode reward: [(0, '24.362')] [2024-01-05 13:19:02,689][02312] Updated weights for policy 0, policy_version 1900 (0.0013) [2024-01-05 13:19:07,552][00209] Fps is (10 sec: 4508.1, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 7802880. Throughput: 0: 905.9. Samples: 1949724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:19:07,562][00209] Avg episode reward: [(0, '24.420')] [2024-01-05 13:19:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 7819264. Throughput: 0: 876.4. Samples: 1954954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:19:12,554][00209] Avg episode reward: [(0, '24.330')] [2024-01-05 13:19:14,008][02312] Updated weights for policy 0, policy_version 1910 (0.0019) [2024-01-05 13:19:17,556][00209] Fps is (10 sec: 2865.9, 60 sec: 3617.9, 300 sec: 3540.6). Total num frames: 7831552. Throughput: 0: 874.9. Samples: 1957098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:19:17,559][00209] Avg episode reward: [(0, '23.780')] [2024-01-05 13:19:22,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 7852032. Throughput: 0: 898.0. Samples: 1962222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:19:22,563][00209] Avg episode reward: [(0, '25.144')] [2024-01-05 13:19:25,044][02312] Updated weights for policy 0, policy_version 1920 (0.0017) [2024-01-05 13:19:27,552][00209] Fps is (10 sec: 4097.9, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 7872512. Throughput: 0: 915.6. Samples: 1968812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:19:27,561][00209] Avg episode reward: [(0, '25.185')] [2024-01-05 13:19:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3568.4). Total num frames: 7888896. Throughput: 0: 906.4. Samples: 1971682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:19:32,555][00209] Avg episode reward: [(0, '25.188')] [2024-01-05 13:19:37,353][02312] Updated weights for policy 0, policy_version 1930 (0.0025) [2024-01-05 13:19:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 7905280. Throughput: 0: 878.2. Samples: 1975814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:19:37,557][00209] Avg episode reward: [(0, '26.095')] [2024-01-05 13:19:42,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 7921664. Throughput: 0: 905.2. Samples: 1981248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:19:42,554][00209] Avg episode reward: [(0, '25.609')] [2024-01-05 13:19:47,293][02312] Updated weights for policy 0, policy_version 1940 (0.0013) [2024-01-05 13:19:47,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 7946240. Throughput: 0: 918.7. Samples: 1984594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:19:47,554][00209] Avg episode reward: [(0, '24.261')] [2024-01-05 13:19:52,555][00209] Fps is (10 sec: 4094.9, 60 sec: 3618.4, 300 sec: 3582.2). Total num frames: 7962624. Throughput: 0: 904.8. Samples: 1990444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:19:52,557][00209] Avg episode reward: [(0, '25.511')] [2024-01-05 13:19:52,568][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001944_7962624.pth... [2024-01-05 13:19:52,767][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001733_7098368.pth [2024-01-05 13:19:57,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.5, 300 sec: 3554.5). Total num frames: 7974912. Throughput: 0: 880.9. Samples: 1994594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:19:57,557][00209] Avg episode reward: [(0, '25.797')] [2024-01-05 13:20:00,440][02312] Updated weights for policy 0, policy_version 1950 (0.0022) [2024-01-05 13:20:02,552][00209] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 7995392. Throughput: 0: 885.4. Samples: 1996938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:02,554][00209] Avg episode reward: [(0, '27.724')] [2024-01-05 13:20:07,556][00209] Fps is (10 sec: 4094.3, 60 sec: 3549.6, 300 sec: 3596.1). Total num frames: 8015872. Throughput: 0: 915.8. Samples: 2003436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:07,561][00209] Avg episode reward: [(0, '28.088')] [2024-01-05 13:20:09,812][02312] Updated weights for policy 0, policy_version 1960 (0.0033) [2024-01-05 13:20:12,555][00209] Fps is (10 sec: 3685.4, 60 sec: 3549.7, 300 sec: 3582.2). Total num frames: 8032256. Throughput: 0: 890.8. Samples: 2008902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:20:12,561][00209] Avg episode reward: [(0, '27.964')] [2024-01-05 13:20:17,552][00209] Fps is (10 sec: 3278.1, 60 sec: 3618.4, 300 sec: 3554.5). Total num frames: 8048640. Throughput: 0: 873.2. Samples: 2010974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:17,553][00209] Avg episode reward: [(0, '28.931')] [2024-01-05 13:20:17,559][02299] Saving new best policy, reward=28.931! [2024-01-05 13:20:22,552][00209] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3568.5). Total num frames: 8065024. Throughput: 0: 887.0. Samples: 2015730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:22,557][00209] Avg episode reward: [(0, '29.356')] [2024-01-05 13:20:22,568][02299] Saving new best policy, reward=29.356! [2024-01-05 13:20:23,100][02312] Updated weights for policy 0, policy_version 1970 (0.0035) [2024-01-05 13:20:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 8085504. Throughput: 0: 911.3. Samples: 2022258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:27,554][00209] Avg episode reward: [(0, '27.460')] [2024-01-05 13:20:32,555][00209] Fps is (10 sec: 4094.7, 60 sec: 3617.9, 300 sec: 3582.2). Total num frames: 8105984. Throughput: 0: 906.6. Samples: 2025394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:20:32,558][00209] Avg episode reward: [(0, '26.874')] [2024-01-05 13:20:33,745][02312] Updated weights for policy 0, policy_version 1980 (0.0019) [2024-01-05 13:20:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8118272. Throughput: 0: 872.1. Samples: 2029688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:20:37,558][00209] Avg episode reward: [(0, '26.251')] [2024-01-05 13:20:42,552][00209] Fps is (10 sec: 2868.1, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8134656. Throughput: 0: 889.8. Samples: 2034634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:20:42,560][00209] Avg episode reward: [(0, '25.373')] [2024-01-05 13:20:45,255][02312] Updated weights for policy 0, policy_version 1990 (0.0024) [2024-01-05 13:20:47,552][00209] Fps is (10 sec: 4095.8, 60 sec: 3549.8, 300 sec: 3596.1). Total num frames: 8159232. Throughput: 0: 911.5. Samples: 2037958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:20:47,555][00209] Avg episode reward: [(0, '24.544')] [2024-01-05 13:20:52,560][00209] Fps is (10 sec: 4092.5, 60 sec: 3549.5, 300 sec: 3582.2). Total num frames: 8175616. Throughput: 0: 908.1. Samples: 2044306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:20:52,563][00209] Avg episode reward: [(0, '25.177')] [2024-01-05 13:20:57,152][02312] Updated weights for policy 0, policy_version 2000 (0.0037) [2024-01-05 13:20:57,558][00209] Fps is (10 sec: 3275.1, 60 sec: 3617.8, 300 sec: 3568.3). Total num frames: 8192000. Throughput: 0: 879.1. Samples: 2048462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:20:57,563][00209] Avg episode reward: [(0, '25.650')] [2024-01-05 13:21:02,554][00209] Fps is (10 sec: 3278.8, 60 sec: 3549.7, 300 sec: 3568.3). Total num frames: 8208384. Throughput: 0: 879.1. Samples: 2050534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:21:02,557][00209] Avg episode reward: [(0, '26.323')] [2024-01-05 13:21:07,552][00209] Fps is (10 sec: 3688.5, 60 sec: 3550.1, 300 sec: 3596.2). Total num frames: 8228864. Throughput: 0: 911.4. Samples: 2056742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:21:07,554][00209] Avg episode reward: [(0, '27.003')] [2024-01-05 13:21:07,877][02312] Updated weights for policy 0, policy_version 2010 (0.0014) [2024-01-05 13:21:12,552][00209] Fps is (10 sec: 4097.0, 60 sec: 3618.3, 300 sec: 3596.1). Total num frames: 8249344. Throughput: 0: 901.6. Samples: 2062832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:21:12,557][00209] Avg episode reward: [(0, '26.933')] [2024-01-05 13:21:17,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 8261632. Throughput: 0: 877.3. Samples: 2064870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:17,555][00209] Avg episode reward: [(0, '26.778')] [2024-01-05 13:21:20,739][02312] Updated weights for policy 0, policy_version 2020 (0.0034) [2024-01-05 13:21:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8278016. Throughput: 0: 875.4. Samples: 2069082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:22,562][00209] Avg episode reward: [(0, '27.805')] [2024-01-05 13:21:27,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 8302592. Throughput: 0: 909.4. Samples: 2075558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:27,554][00209] Avg episode reward: [(0, '26.363')] [2024-01-05 13:21:30,319][02312] Updated weights for policy 0, policy_version 2030 (0.0022) [2024-01-05 13:21:32,552][00209] Fps is (10 sec: 4095.9, 60 sec: 3550.0, 300 sec: 3596.1). Total num frames: 8318976. Throughput: 0: 910.7. Samples: 2078938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:32,560][00209] Avg episode reward: [(0, '26.186')] [2024-01-05 13:21:37,552][00209] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 8335360. Throughput: 0: 872.2. Samples: 2083546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:37,557][00209] Avg episode reward: [(0, '26.236')] [2024-01-05 13:21:42,552][00209] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 8351744. Throughput: 0: 876.1. Samples: 2087880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:21:42,554][00209] Avg episode reward: [(0, '26.499')] [2024-01-05 13:21:43,408][02312] Updated weights for policy 0, policy_version 2040 (0.0012) [2024-01-05 13:21:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 8372224. Throughput: 0: 904.2. Samples: 2091222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:21:47,554][00209] Avg episode reward: [(0, '27.225')] [2024-01-05 13:21:52,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.6, 300 sec: 3596.1). Total num frames: 8392704. Throughput: 0: 914.9. Samples: 2097914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:21:52,557][00209] Avg episode reward: [(0, '27.821')] [2024-01-05 13:21:52,581][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002049_8392704.pth... [2024-01-05 13:21:52,723][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth [2024-01-05 13:21:53,381][02312] Updated weights for policy 0, policy_version 2050 (0.0015) [2024-01-05 13:21:57,553][00209] Fps is (10 sec: 3276.4, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 8404992. Throughput: 0: 875.7. Samples: 2102238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:21:57,555][00209] Avg episode reward: [(0, '27.646')] [2024-01-05 13:22:02,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 8421376. Throughput: 0: 877.6. Samples: 2104364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:22:02,554][00209] Avg episode reward: [(0, '27.533')] [2024-01-05 13:22:05,983][02312] Updated weights for policy 0, policy_version 2060 (0.0025) [2024-01-05 13:22:07,552][00209] Fps is (10 sec: 3686.8, 60 sec: 3549.8, 300 sec: 3582.3). Total num frames: 8441856. Throughput: 0: 909.9. Samples: 2110030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:22:07,555][00209] Avg episode reward: [(0, '27.522')] [2024-01-05 13:22:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 8462336. Throughput: 0: 912.9. Samples: 2116640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:22:12,556][00209] Avg episode reward: [(0, '27.043')] [2024-01-05 13:22:17,147][02312] Updated weights for policy 0, policy_version 2070 (0.0022) [2024-01-05 13:22:17,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 8478720. Throughput: 0: 883.2. Samples: 2118684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:22:17,560][00209] Avg episode reward: [(0, '27.287')] [2024-01-05 13:22:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 8491008. Throughput: 0: 874.6. Samples: 2122902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:22:22,556][00209] Avg episode reward: [(0, '26.868')] [2024-01-05 13:22:27,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 8515584. Throughput: 0: 913.8. Samples: 2129002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:22:27,556][00209] Avg episode reward: [(0, '27.211')] [2024-01-05 13:22:28,421][02312] Updated weights for policy 0, policy_version 2080 (0.0029) [2024-01-05 13:22:32,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 8536064. Throughput: 0: 912.1. Samples: 2132266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:22:32,554][00209] Avg episode reward: [(0, '27.707')] [2024-01-05 13:22:37,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 8548352. Throughput: 0: 873.1. Samples: 2137204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:22:37,554][00209] Avg episode reward: [(0, '27.699')] [2024-01-05 13:22:40,980][02312] Updated weights for policy 0, policy_version 2090 (0.0026) [2024-01-05 13:22:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8564736. Throughput: 0: 872.4. Samples: 2141496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:22:42,554][00209] Avg episode reward: [(0, '27.177')] [2024-01-05 13:22:47,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 8585216. Throughput: 0: 892.0. Samples: 2144502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:22:47,554][00209] Avg episode reward: [(0, '27.751')] [2024-01-05 13:22:50,913][02312] Updated weights for policy 0, policy_version 2100 (0.0020) [2024-01-05 13:22:52,552][00209] Fps is (10 sec: 4095.8, 60 sec: 3549.8, 300 sec: 3596.1). Total num frames: 8605696. Throughput: 0: 911.4. Samples: 2151042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:22:52,555][00209] Avg episode reward: [(0, '27.407')] [2024-01-05 13:22:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 8622080. Throughput: 0: 872.9. Samples: 2155920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:22:57,559][00209] Avg episode reward: [(0, '26.970')] [2024-01-05 13:23:02,552][00209] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 8634368. Throughput: 0: 874.5. Samples: 2158038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:23:02,560][00209] Avg episode reward: [(0, '25.845')] [2024-01-05 13:23:04,043][02312] Updated weights for policy 0, policy_version 2110 (0.0037) [2024-01-05 13:23:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 8654848. Throughput: 0: 899.0. Samples: 2163356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:23:07,554][00209] Avg episode reward: [(0, '25.982')] [2024-01-05 13:23:12,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 8679424. Throughput: 0: 912.4. Samples: 2170058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:23:12,555][00209] Avg episode reward: [(0, '26.497')] [2024-01-05 13:23:13,397][02312] Updated weights for policy 0, policy_version 2120 (0.0020) [2024-01-05 13:23:17,553][00209] Fps is (10 sec: 3685.9, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 8691712. Throughput: 0: 894.2. Samples: 2172508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:23:17,556][00209] Avg episode reward: [(0, '25.985')] [2024-01-05 13:23:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 8708096. Throughput: 0: 878.4. Samples: 2176732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:23:22,555][00209] Avg episode reward: [(0, '25.848')] [2024-01-05 13:23:26,512][02312] Updated weights for policy 0, policy_version 2130 (0.0019) [2024-01-05 13:23:27,552][00209] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 8728576. Throughput: 0: 906.4. Samples: 2182286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:23:27,562][00209] Avg episode reward: [(0, '25.683')] [2024-01-05 13:23:32,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 8749056. Throughput: 0: 913.8. Samples: 2185624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:23:32,554][00209] Avg episode reward: [(0, '25.195')] [2024-01-05 13:23:37,012][02312] Updated weights for policy 0, policy_version 2140 (0.0026) [2024-01-05 13:23:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 8765440. Throughput: 0: 893.6. Samples: 2191254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:23:37,562][00209] Avg episode reward: [(0, '25.745')] [2024-01-05 13:23:42,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8777728. Throughput: 0: 876.4. Samples: 2195360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:23:42,555][00209] Avg episode reward: [(0, '26.712')] [2024-01-05 13:23:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.5). Total num frames: 8798208. Throughput: 0: 883.4. Samples: 2197792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:23:47,554][00209] Avg episode reward: [(0, '26.879')] [2024-01-05 13:23:49,036][02312] Updated weights for policy 0, policy_version 2150 (0.0014) [2024-01-05 13:23:52,554][00209] Fps is (10 sec: 4095.1, 60 sec: 3549.8, 300 sec: 3596.2). Total num frames: 8818688. Throughput: 0: 912.5. Samples: 2204422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:23:52,557][00209] Avg episode reward: [(0, '27.647')] [2024-01-05 13:23:52,567][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002153_8818688.pth... [2024-01-05 13:23:52,697][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001944_7962624.pth [2024-01-05 13:23:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 8835072. Throughput: 0: 882.2. Samples: 2209758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:23:57,558][00209] Avg episode reward: [(0, '28.888')] [2024-01-05 13:24:00,923][02312] Updated weights for policy 0, policy_version 2160 (0.0029) [2024-01-05 13:24:02,552][00209] Fps is (10 sec: 3277.5, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 8851456. Throughput: 0: 874.3. Samples: 2211848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:24:02,560][00209] Avg episode reward: [(0, '27.992')] [2024-01-05 13:24:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8867840. Throughput: 0: 884.4. Samples: 2216528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:07,562][00209] Avg episode reward: [(0, '28.661')] [2024-01-05 13:24:11,508][02312] Updated weights for policy 0, policy_version 2170 (0.0031) [2024-01-05 13:24:12,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 8892416. Throughput: 0: 907.8. Samples: 2223138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:24:12,554][00209] Avg episode reward: [(0, '28.995')] [2024-01-05 13:24:17,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 8908800. Throughput: 0: 904.1. Samples: 2226308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:17,559][00209] Avg episode reward: [(0, '28.501')] [2024-01-05 13:24:22,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 8921088. Throughput: 0: 871.4. Samples: 2230468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:24:22,560][00209] Avg episode reward: [(0, '27.834')] [2024-01-05 13:24:24,468][02312] Updated weights for policy 0, policy_version 2180 (0.0021) [2024-01-05 13:24:27,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 8941568. Throughput: 0: 889.5. Samples: 2235386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:27,554][00209] Avg episode reward: [(0, '28.160')] [2024-01-05 13:24:32,555][00209] Fps is (10 sec: 4094.6, 60 sec: 3549.7, 300 sec: 3582.2). Total num frames: 8962048. Throughput: 0: 910.0. Samples: 2238744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:32,565][00209] Avg episode reward: [(0, '25.820')] [2024-01-05 13:24:33,998][02312] Updated weights for policy 0, policy_version 2190 (0.0025) [2024-01-05 13:24:37,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 8978432. Throughput: 0: 901.9. Samples: 2245004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:24:37,559][00209] Avg episode reward: [(0, '27.332')] [2024-01-05 13:24:42,552][00209] Fps is (10 sec: 3277.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 8994816. Throughput: 0: 875.4. Samples: 2249150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:24:42,558][00209] Avg episode reward: [(0, '27.292')] [2024-01-05 13:24:47,083][02312] Updated weights for policy 0, policy_version 2200 (0.0016) [2024-01-05 13:24:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 9011200. Throughput: 0: 875.3. Samples: 2251236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:47,554][00209] Avg episode reward: [(0, '27.797')] [2024-01-05 13:24:52,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 9031680. Throughput: 0: 915.2. Samples: 2257712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:24:52,554][00209] Avg episode reward: [(0, '28.518')] [2024-01-05 13:24:56,972][02312] Updated weights for policy 0, policy_version 2210 (0.0027) [2024-01-05 13:24:57,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 9052160. Throughput: 0: 899.7. Samples: 2263626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:24:57,561][00209] Avg episode reward: [(0, '28.528')] [2024-01-05 13:25:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 9068544. Throughput: 0: 877.3. Samples: 2265788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:25:02,562][00209] Avg episode reward: [(0, '28.157')] [2024-01-05 13:25:07,554][00209] Fps is (10 sec: 2866.6, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 9080832. Throughput: 0: 872.6. Samples: 2269738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:25:07,556][00209] Avg episode reward: [(0, '29.964')] [2024-01-05 13:25:07,558][02299] Saving new best policy, reward=29.964! [2024-01-05 13:25:10,010][02312] Updated weights for policy 0, policy_version 2220 (0.0018) [2024-01-05 13:25:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 9101312. Throughput: 0: 904.0. Samples: 2276068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:25:12,557][00209] Avg episode reward: [(0, '28.948')] [2024-01-05 13:25:17,552][00209] Fps is (10 sec: 4096.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 9121792. Throughput: 0: 904.1. Samples: 2279426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:25:17,554][00209] Avg episode reward: [(0, '28.896')] [2024-01-05 13:25:21,046][02312] Updated weights for policy 0, policy_version 2230 (0.0039) [2024-01-05 13:25:22,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 9138176. Throughput: 0: 869.8. Samples: 2284146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:25:22,555][00209] Avg episode reward: [(0, '28.841')] [2024-01-05 13:25:27,553][00209] Fps is (10 sec: 2867.0, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 9150464. Throughput: 0: 871.2. Samples: 2288356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:25:27,555][00209] Avg episode reward: [(0, '27.894')] [2024-01-05 13:25:32,396][02312] Updated weights for policy 0, policy_version 2240 (0.0030) [2024-01-05 13:25:32,552][00209] Fps is (10 sec: 3686.5, 60 sec: 3550.1, 300 sec: 3582.3). Total num frames: 9175040. Throughput: 0: 898.0. Samples: 2291648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:25:32,554][00209] Avg episode reward: [(0, '25.527')] [2024-01-05 13:25:37,552][00209] Fps is (10 sec: 4505.9, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 9195520. Throughput: 0: 904.0. Samples: 2298390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:25:37,554][00209] Avg episode reward: [(0, '25.118')] [2024-01-05 13:25:42,554][00209] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 9207808. Throughput: 0: 873.3. Samples: 2302926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:25:42,556][00209] Avg episode reward: [(0, '25.433')] [2024-01-05 13:25:44,163][02312] Updated weights for policy 0, policy_version 2250 (0.0013) [2024-01-05 13:25:47,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 9224192. Throughput: 0: 873.7. Samples: 2305104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:25:47,562][00209] Avg episode reward: [(0, '25.226')] [2024-01-05 13:25:52,552][00209] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9244672. Throughput: 0: 911.0. Samples: 2310730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:25:52,560][00209] Avg episode reward: [(0, '23.915')] [2024-01-05 13:25:52,574][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002257_9244672.pth... [2024-01-05 13:25:52,716][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002049_8392704.pth [2024-01-05 13:25:54,645][02312] Updated weights for policy 0, policy_version 2260 (0.0025) [2024-01-05 13:25:57,554][00209] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3582.3). Total num frames: 9265152. Throughput: 0: 914.1. Samples: 2317206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:25:57,557][00209] Avg episode reward: [(0, '24.234')] [2024-01-05 13:26:02,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 9281536. Throughput: 0: 888.8. Samples: 2319424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:26:02,555][00209] Avg episode reward: [(0, '24.500')] [2024-01-05 13:26:07,552][00209] Fps is (10 sec: 2867.9, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 9293824. Throughput: 0: 876.6. Samples: 2323592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:26:07,557][00209] Avg episode reward: [(0, '24.351')] [2024-01-05 13:26:07,981][02312] Updated weights for policy 0, policy_version 2270 (0.0024) [2024-01-05 13:26:12,552][00209] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9314304. Throughput: 0: 912.3. Samples: 2329410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:26:12,554][00209] Avg episode reward: [(0, '23.841')] [2024-01-05 13:26:17,342][02312] Updated weights for policy 0, policy_version 2280 (0.0013) [2024-01-05 13:26:17,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 9338880. Throughput: 0: 912.7. Samples: 2332720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:26:17,560][00209] Avg episode reward: [(0, '25.025')] [2024-01-05 13:26:22,556][00209] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3554.4). Total num frames: 9351168. Throughput: 0: 881.1. Samples: 2338042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:26:22,567][00209] Avg episode reward: [(0, '26.177')] [2024-01-05 13:26:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 9367552. Throughput: 0: 871.6. Samples: 2342146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:26:27,554][00209] Avg episode reward: [(0, '26.467')] [2024-01-05 13:26:30,530][02312] Updated weights for policy 0, policy_version 2290 (0.0026) [2024-01-05 13:26:32,552][00209] Fps is (10 sec: 3688.1, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9388032. Throughput: 0: 885.4. Samples: 2344946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:26:32,554][00209] Avg episode reward: [(0, '26.139')] [2024-01-05 13:26:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 9408512. Throughput: 0: 909.0. Samples: 2351636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:26:37,560][00209] Avg episode reward: [(0, '27.198')] [2024-01-05 13:26:40,772][02312] Updated weights for policy 0, policy_version 2300 (0.0037) [2024-01-05 13:26:42,555][00209] Fps is (10 sec: 3685.2, 60 sec: 3618.0, 300 sec: 3568.3). Total num frames: 9424896. Throughput: 0: 875.7. Samples: 2356614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:26:42,558][00209] Avg episode reward: [(0, '28.063')] [2024-01-05 13:26:47,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9437184. Throughput: 0: 873.8. Samples: 2358746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:26:47,560][00209] Avg episode reward: [(0, '26.168')] [2024-01-05 13:26:52,552][00209] Fps is (10 sec: 3277.9, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9457664. Throughput: 0: 898.8. Samples: 2364038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:26:52,560][00209] Avg episode reward: [(0, '26.691')] [2024-01-05 13:26:52,698][02312] Updated weights for policy 0, policy_version 2310 (0.0015) [2024-01-05 13:26:57,552][00209] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3596.1). Total num frames: 9482240. Throughput: 0: 915.1. Samples: 2370590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:26:57,557][00209] Avg episode reward: [(0, '26.737')] [2024-01-05 13:27:02,553][00209] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 9494528. Throughput: 0: 900.0. Samples: 2373222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:27:02,557][00209] Avg episode reward: [(0, '27.745')] [2024-01-05 13:27:04,535][02312] Updated weights for policy 0, policy_version 2320 (0.0018) [2024-01-05 13:27:07,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 9510912. Throughput: 0: 870.8. Samples: 2377226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:27:07,558][00209] Avg episode reward: [(0, '30.036')] [2024-01-05 13:27:07,566][02299] Saving new best policy, reward=30.036! [2024-01-05 13:27:12,552][00209] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 9527296. Throughput: 0: 901.1. Samples: 2382696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:27:12,554][00209] Avg episode reward: [(0, '29.436')] [2024-01-05 13:27:15,388][02312] Updated weights for policy 0, policy_version 2330 (0.0020) [2024-01-05 13:27:17,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 9551872. Throughput: 0: 913.4. Samples: 2386050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:27:17,560][00209] Avg episode reward: [(0, '29.808')] [2024-01-05 13:27:22,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 9564160. Throughput: 0: 889.7. Samples: 2391674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:27:22,555][00209] Avg episode reward: [(0, '28.781')] [2024-01-05 13:27:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9580544. Throughput: 0: 871.3. Samples: 2395820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:27:27,557][00209] Avg episode reward: [(0, '29.374')] [2024-01-05 13:27:28,348][02312] Updated weights for policy 0, policy_version 2340 (0.0027) [2024-01-05 13:27:32,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9601024. Throughput: 0: 879.0. Samples: 2398300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:27:32,559][00209] Avg episode reward: [(0, '29.517')] [2024-01-05 13:27:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 9621504. Throughput: 0: 906.6. Samples: 2404834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:27:37,561][00209] Avg episode reward: [(0, '28.778')] [2024-01-05 13:27:37,769][02312] Updated weights for policy 0, policy_version 2350 (0.0032) [2024-01-05 13:27:42,555][00209] Fps is (10 sec: 3685.4, 60 sec: 3549.9, 300 sec: 3568.3). Total num frames: 9637888. Throughput: 0: 881.2. Samples: 2410248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:27:42,559][00209] Avg episode reward: [(0, '28.656')] [2024-01-05 13:27:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 9654272. Throughput: 0: 869.6. Samples: 2412352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:27:47,554][00209] Avg episode reward: [(0, '28.287')] [2024-01-05 13:27:51,008][02312] Updated weights for policy 0, policy_version 2360 (0.0021) [2024-01-05 13:27:52,552][00209] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 9670656. Throughput: 0: 886.0. Samples: 2417094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:27:52,561][00209] Avg episode reward: [(0, '28.567')] [2024-01-05 13:27:52,575][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002361_9670656.pth... [2024-01-05 13:27:52,715][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002153_8818688.pth [2024-01-05 13:27:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 9691136. Throughput: 0: 908.8. Samples: 2423594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:27:57,557][00209] Avg episode reward: [(0, '30.400')] [2024-01-05 13:27:57,560][02299] Saving new best policy, reward=30.400! [2024-01-05 13:28:01,046][02312] Updated weights for policy 0, policy_version 2370 (0.0021) [2024-01-05 13:28:02,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3568.4). Total num frames: 9707520. Throughput: 0: 903.7. Samples: 2426716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:02,557][00209] Avg episode reward: [(0, '30.124')] [2024-01-05 13:28:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9723904. Throughput: 0: 871.0. Samples: 2430870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:28:07,557][00209] Avg episode reward: [(0, '31.566')] [2024-01-05 13:28:07,571][02299] Saving new best policy, reward=31.566! [2024-01-05 13:28:12,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 9740288. Throughput: 0: 885.7. Samples: 2435676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:28:12,554][00209] Avg episode reward: [(0, '30.616')] [2024-01-05 13:28:13,716][02312] Updated weights for policy 0, policy_version 2380 (0.0014) [2024-01-05 13:28:17,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 9764864. Throughput: 0: 904.8. Samples: 2439018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:28:17,554][00209] Avg episode reward: [(0, '30.279')] [2024-01-05 13:28:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 9781248. Throughput: 0: 898.8. Samples: 2445282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:22,558][00209] Avg episode reward: [(0, '30.881')] [2024-01-05 13:28:24,916][02312] Updated weights for policy 0, policy_version 2390 (0.0016) [2024-01-05 13:28:27,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9793536. Throughput: 0: 870.4. Samples: 2449412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:27,554][00209] Avg episode reward: [(0, '31.243')] [2024-01-05 13:28:32,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 9809920. Throughput: 0: 869.2. Samples: 2451464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:32,555][00209] Avg episode reward: [(0, '30.651')] [2024-01-05 13:28:36,338][02312] Updated weights for policy 0, policy_version 2400 (0.0021) [2024-01-05 13:28:37,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 9834496. Throughput: 0: 905.0. Samples: 2457818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:28:37,558][00209] Avg episode reward: [(0, '31.523')] [2024-01-05 13:28:42,556][00209] Fps is (10 sec: 4094.1, 60 sec: 3549.8, 300 sec: 3568.3). Total num frames: 9850880. Throughput: 0: 890.8. Samples: 2463686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:42,563][00209] Avg episode reward: [(0, '28.861')] [2024-01-05 13:28:47,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 9867264. Throughput: 0: 868.5. Samples: 2465800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:28:47,555][00209] Avg episode reward: [(0, '29.401')] [2024-01-05 13:28:48,718][02312] Updated weights for policy 0, policy_version 2410 (0.0018) [2024-01-05 13:28:52,552][00209] Fps is (10 sec: 3278.2, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 9883648. Throughput: 0: 869.3. Samples: 2469988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:28:52,555][00209] Avg episode reward: [(0, '29.318')] [2024-01-05 13:28:57,552][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9904128. Throughput: 0: 907.2. Samples: 2476500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:28:57,561][00209] Avg episode reward: [(0, '27.895')] [2024-01-05 13:28:59,102][02312] Updated weights for policy 0, policy_version 2420 (0.0019) [2024-01-05 13:29:02,552][00209] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 9924608. Throughput: 0: 906.6. Samples: 2479816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 13:29:02,557][00209] Avg episode reward: [(0, '27.925')] [2024-01-05 13:29:07,552][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9936896. Throughput: 0: 869.6. Samples: 2484412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:29:07,560][00209] Avg episode reward: [(0, '26.357')] [2024-01-05 13:29:12,151][02312] Updated weights for policy 0, policy_version 2430 (0.0030) [2024-01-05 13:29:12,552][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 9953280. Throughput: 0: 874.7. Samples: 2488774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 13:29:12,559][00209] Avg episode reward: [(0, '25.640')] [2024-01-05 13:29:17,552][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 9973760. Throughput: 0: 902.3. Samples: 2492066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:29:17,564][00209] Avg episode reward: [(0, '27.282')] [2024-01-05 13:29:21,385][02312] Updated weights for policy 0, policy_version 2440 (0.0023) [2024-01-05 13:29:22,552][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 9994240. Throughput: 0: 908.8. Samples: 2498714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:29:22,559][00209] Avg episode reward: [(0, '27.515')] [2024-01-05 13:29:25,699][02299] Stopping Batcher_0... [2024-01-05 13:29:25,699][02299] Loop batcher_evt_loop terminating... [2024-01-05 13:29:25,700][00209] Component Batcher_0 stopped! [2024-01-05 13:29:25,703][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:29:25,817][02312] Weights refcount: 2 0 [2024-01-05 13:29:25,821][00209] Component RolloutWorker_w1 stopped! [2024-01-05 13:29:25,825][00209] Component InferenceWorker_p0-w0 stopped! [2024-01-05 13:29:25,822][02312] Stopping InferenceWorker_p0-w0... [2024-01-05 13:29:25,829][00209] Component RolloutWorker_w7 stopped! [2024-01-05 13:29:25,830][02312] Loop inference_proc0-0_evt_loop terminating... [2024-01-05 13:29:25,831][02322] Stopping RolloutWorker_w7... [2024-01-05 13:29:25,832][02322] Loop rollout_proc7_evt_loop terminating... [2024-01-05 13:29:25,836][02314] Stopping RolloutWorker_w1... [2024-01-05 13:29:25,837][02314] Loop rollout_proc1_evt_loop terminating... [2024-01-05 13:29:25,842][02318] Stopping RolloutWorker_w6... [2024-01-05 13:29:25,843][02318] Loop rollout_proc6_evt_loop terminating... [2024-01-05 13:29:25,842][00209] Component RolloutWorker_w6 stopped! [2024-01-05 13:29:25,860][02315] Stopping RolloutWorker_w2... [2024-01-05 13:29:25,862][00209] Component RolloutWorker_w2 stopped! [2024-01-05 13:29:25,864][02315] Loop rollout_proc2_evt_loop terminating... [2024-01-05 13:29:25,866][00209] Component RolloutWorker_w5 stopped! [2024-01-05 13:29:25,868][02316] Stopping RolloutWorker_w5... [2024-01-05 13:29:25,873][02316] Loop rollout_proc5_evt_loop terminating... [2024-01-05 13:29:25,881][02319] Stopping RolloutWorker_w4... [2024-01-05 13:29:25,881][00209] Component RolloutWorker_w4 stopped! [2024-01-05 13:29:25,888][02319] Loop rollout_proc4_evt_loop terminating... [2024-01-05 13:29:25,888][00209] Component RolloutWorker_w0 stopped! [2024-01-05 13:29:25,894][02313] Stopping RolloutWorker_w0... [2024-01-05 13:29:25,895][02313] Loop rollout_proc0_evt_loop terminating... [2024-01-05 13:29:25,906][00209] Component RolloutWorker_w3 stopped! [2024-01-05 13:29:25,910][02317] Stopping RolloutWorker_w3... [2024-01-05 13:29:25,911][02317] Loop rollout_proc3_evt_loop terminating... [2024-01-05 13:29:25,970][02299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002257_9244672.pth [2024-01-05 13:29:25,987][02299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:29:26,188][00209] Component LearnerWorker_p0 stopped! [2024-01-05 13:29:26,191][00209] Waiting for process learner_proc0 to stop... [2024-01-05 13:29:26,196][02299] Stopping LearnerWorker_p0... [2024-01-05 13:29:26,197][02299] Loop learner_proc0_evt_loop terminating... [2024-01-05 13:29:28,195][00209] Waiting for process inference_proc0-0 to join... [2024-01-05 13:29:28,283][00209] Waiting for process rollout_proc0 to join... [2024-01-05 13:29:30,271][00209] Waiting for process rollout_proc1 to join... [2024-01-05 13:29:30,287][00209] Waiting for process rollout_proc2 to join... [2024-01-05 13:29:30,288][00209] Waiting for process rollout_proc3 to join... [2024-01-05 13:29:30,291][00209] Waiting for process rollout_proc4 to join... [2024-01-05 13:29:30,294][00209] Waiting for process rollout_proc5 to join... [2024-01-05 13:29:30,295][00209] Waiting for process rollout_proc6 to join... [2024-01-05 13:29:30,297][00209] Waiting for process rollout_proc7 to join... [2024-01-05 13:29:30,298][00209] Batcher 0 profile tree view: batching: 64.8399, releasing_batches: 0.0641 [2024-01-05 13:29:30,299][00209] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 1325.0912 update_model: 20.6028 weight_update: 0.0024 one_step: 0.0100 handle_policy_step: 1376.8333 deserialize: 38.2616, stack: 7.5973, obs_to_device_normalize: 283.6255, forward: 727.6740, send_messages: 64.0252 prepare_outputs: 184.9698 to_cpu: 107.0634 [2024-01-05 13:29:30,301][00209] Learner 0 profile tree view: misc: 0.0171, prepare_batch: 29.8637 train: 178.4249 epoch_init: 0.0185, minibatch_init: 0.0183, losses_postprocess: 1.5210, kl_divergence: 1.5928, after_optimizer: 85.1454 calculate_losses: 62.0584 losses_init: 0.0235, forward_head: 2.6536, bptt_initial: 40.9256, tail: 2.6849, advantages_returns: 0.7174, losses: 9.4451 bptt: 4.8108 bptt_forward_core: 4.5946 update: 26.5323 clip: 2.0724 [2024-01-05 13:29:30,303][00209] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.9405, enqueue_policy_requests: 380.7158, env_step: 2137.4755, overhead: 53.8287, complete_rollouts: 17.7451 save_policy_outputs: 49.0525 split_output_tensors: 23.0266 [2024-01-05 13:29:30,304][00209] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.8231, enqueue_policy_requests: 378.8671, env_step: 2137.7235, overhead: 54.5815, complete_rollouts: 17.5393 save_policy_outputs: 49.5894 split_output_tensors: 24.0411 [2024-01-05 13:29:30,305][00209] Loop Runner_EvtLoop terminating... [2024-01-05 13:29:30,307][00209] Runner profile tree view: main_loop: 2851.6888 [2024-01-05 13:29:30,308][00209] Collected {0: 10006528}, FPS: 3509.0 [2024-01-05 13:29:30,541][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 13:29:30,542][00209] Overriding arg 'num_workers' with value 1 passed from command line [2024-01-05 13:29:30,545][00209] Adding new argument 'no_render'=True that is not in the saved config file! [2024-01-05 13:29:30,547][00209] Adding new argument 'save_video'=True that is not in the saved config file! [2024-01-05 13:29:30,550][00209] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 13:29:30,551][00209] Adding new argument 'video_name'=None that is not in the saved config file! [2024-01-05 13:29:30,552][00209] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 13:29:30,553][00209] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-01-05 13:29:30,554][00209] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-01-05 13:29:30,555][00209] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-01-05 13:29:30,557][00209] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-01-05 13:29:30,558][00209] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-01-05 13:29:30,559][00209] Adding new argument 'train_script'=None that is not in the saved config file! [2024-01-05 13:29:30,560][00209] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-01-05 13:29:30,561][00209] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-01-05 13:29:30,603][00209] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:29:30,607][00209] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:29:30,609][00209] RunningMeanStd input shape: (1,) [2024-01-05 13:29:30,625][00209] ConvEncoder: input_channels=3 [2024-01-05 13:29:30,729][00209] Conv encoder output size: 512 [2024-01-05 13:29:30,731][00209] Policy head output size: 512 [2024-01-05 13:29:31,030][00209] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:29:31,770][00209] Num frames 100... [2024-01-05 13:29:31,901][00209] Num frames 200... [2024-01-05 13:29:32,034][00209] Num frames 300... [2024-01-05 13:29:32,161][00209] Num frames 400... [2024-01-05 13:29:32,288][00209] Num frames 500... [2024-01-05 13:29:32,411][00209] Num frames 600... [2024-01-05 13:29:32,552][00209] Num frames 700... [2024-01-05 13:29:32,678][00209] Num frames 800... [2024-01-05 13:29:32,790][00209] Avg episode rewards: #0: 18.420, true rewards: #0: 8.420 [2024-01-05 13:29:32,791][00209] Avg episode reward: 18.420, avg true_objective: 8.420 [2024-01-05 13:29:32,865][00209] Num frames 900... [2024-01-05 13:29:32,994][00209] Num frames 1000... [2024-01-05 13:29:33,121][00209] Num frames 1100... [2024-01-05 13:29:33,247][00209] Num frames 1200... [2024-01-05 13:29:33,371][00209] Num frames 1300... [2024-01-05 13:29:33,508][00209] Num frames 1400... [2024-01-05 13:29:33,637][00209] Num frames 1500... [2024-01-05 13:29:33,769][00209] Num frames 1600... [2024-01-05 13:29:33,838][00209] Avg episode rewards: #0: 16.550, true rewards: #0: 8.050 [2024-01-05 13:29:33,840][00209] Avg episode reward: 16.550, avg true_objective: 8.050 [2024-01-05 13:29:33,960][00209] Num frames 1700... [2024-01-05 13:29:34,087][00209] Num frames 1800... [2024-01-05 13:29:34,214][00209] Num frames 1900... [2024-01-05 13:29:34,338][00209] Num frames 2000... [2024-01-05 13:29:34,468][00209] Num frames 2100... [2024-01-05 13:29:34,595][00209] Avg episode rewards: #0: 14.500, true rewards: #0: 7.167 [2024-01-05 13:29:34,597][00209] Avg episode reward: 14.500, avg true_objective: 7.167 [2024-01-05 13:29:34,662][00209] Num frames 2200... [2024-01-05 13:29:34,790][00209] Num frames 2300... [2024-01-05 13:29:34,920][00209] Num frames 2400... [2024-01-05 13:29:35,048][00209] Num frames 2500... [2024-01-05 13:29:35,184][00209] Num frames 2600... [2024-01-05 13:29:35,312][00209] Num frames 2700... [2024-01-05 13:29:35,438][00209] Num frames 2800... [2024-01-05 13:29:35,572][00209] Num frames 2900... [2024-01-05 13:29:35,699][00209] Num frames 3000... [2024-01-05 13:29:35,828][00209] Num frames 3100... [2024-01-05 13:29:35,957][00209] Num frames 3200... [2024-01-05 13:29:36,088][00209] Num frames 3300... [2024-01-05 13:29:36,187][00209] Avg episode rewards: #0: 18.085, true rewards: #0: 8.335 [2024-01-05 13:29:36,189][00209] Avg episode reward: 18.085, avg true_objective: 8.335 [2024-01-05 13:29:36,276][00209] Num frames 3400... [2024-01-05 13:29:36,407][00209] Num frames 3500... [2024-01-05 13:29:36,544][00209] Num frames 3600... [2024-01-05 13:29:36,676][00209] Num frames 3700... [2024-01-05 13:29:36,803][00209] Num frames 3800... [2024-01-05 13:29:36,936][00209] Num frames 3900... [2024-01-05 13:29:37,069][00209] Num frames 4000... [2024-01-05 13:29:37,194][00209] Num frames 4100... [2024-01-05 13:29:37,319][00209] Num frames 4200... [2024-01-05 13:29:37,447][00209] Num frames 4300... [2024-01-05 13:29:37,588][00209] Num frames 4400... [2024-01-05 13:29:37,718][00209] Num frames 4500... [2024-01-05 13:29:37,830][00209] Avg episode rewards: #0: 20.484, true rewards: #0: 9.084 [2024-01-05 13:29:37,832][00209] Avg episode reward: 20.484, avg true_objective: 9.084 [2024-01-05 13:29:37,906][00209] Num frames 4600... [2024-01-05 13:29:38,077][00209] Num frames 4700... [2024-01-05 13:29:38,255][00209] Num frames 4800... [2024-01-05 13:29:38,429][00209] Num frames 4900... [2024-01-05 13:29:38,646][00209] Avg episode rewards: #0: 18.650, true rewards: #0: 8.317 [2024-01-05 13:29:38,649][00209] Avg episode reward: 18.650, avg true_objective: 8.317 [2024-01-05 13:29:38,670][00209] Num frames 5000... [2024-01-05 13:29:38,851][00209] Num frames 5100... [2024-01-05 13:29:39,040][00209] Num frames 5200... [2024-01-05 13:29:39,220][00209] Num frames 5300... [2024-01-05 13:29:39,401][00209] Num frames 5400... [2024-01-05 13:29:39,576][00209] Num frames 5500... [2024-01-05 13:29:39,764][00209] Num frames 5600... [2024-01-05 13:29:39,947][00209] Num frames 5700... [2024-01-05 13:29:40,137][00209] Num frames 5800... [2024-01-05 13:29:40,320][00209] Num frames 5900... [2024-01-05 13:29:40,502][00209] Num frames 6000... [2024-01-05 13:29:40,588][00209] Avg episode rewards: #0: 18.734, true rewards: #0: 8.591 [2024-01-05 13:29:40,590][00209] Avg episode reward: 18.734, avg true_objective: 8.591 [2024-01-05 13:29:40,758][00209] Num frames 6100... [2024-01-05 13:29:40,945][00209] Num frames 6200... [2024-01-05 13:29:41,088][00209] Num frames 6300... [2024-01-05 13:29:41,225][00209] Num frames 6400... [2024-01-05 13:29:41,364][00209] Num frames 6500... [2024-01-05 13:29:41,507][00209] Num frames 6600... [2024-01-05 13:29:41,637][00209] Num frames 6700... [2024-01-05 13:29:41,782][00209] Num frames 6800... [2024-01-05 13:29:41,918][00209] Num frames 6900... [2024-01-05 13:29:42,051][00209] Num frames 7000... [2024-01-05 13:29:42,184][00209] Num frames 7100... [2024-01-05 13:29:42,316][00209] Num frames 7200... [2024-01-05 13:29:42,445][00209] Num frames 7300... [2024-01-05 13:29:42,576][00209] Num frames 7400... [2024-01-05 13:29:42,704][00209] Num frames 7500... [2024-01-05 13:29:42,841][00209] Num frames 7600... [2024-01-05 13:29:42,968][00209] Num frames 7700... [2024-01-05 13:29:43,103][00209] Num frames 7800... [2024-01-05 13:29:43,232][00209] Num frames 7900... [2024-01-05 13:29:43,363][00209] Num frames 8000... [2024-01-05 13:29:43,538][00209] Avg episode rewards: #0: 22.994, true rewards: #0: 10.119 [2024-01-05 13:29:43,539][00209] Avg episode reward: 22.994, avg true_objective: 10.119 [2024-01-05 13:29:43,550][00209] Num frames 8100... [2024-01-05 13:29:43,674][00209] Num frames 8200... [2024-01-05 13:29:43,821][00209] Num frames 8300... [2024-01-05 13:29:43,947][00209] Num frames 8400... [2024-01-05 13:29:44,079][00209] Num frames 8500... [2024-01-05 13:29:44,209][00209] Num frames 8600... [2024-01-05 13:29:44,344][00209] Num frames 8700... [2024-01-05 13:29:44,405][00209] Avg episode rewards: #0: 21.448, true rewards: #0: 9.670 [2024-01-05 13:29:44,407][00209] Avg episode reward: 21.448, avg true_objective: 9.670 [2024-01-05 13:29:44,529][00209] Num frames 8800... [2024-01-05 13:29:44,664][00209] Num frames 8900... [2024-01-05 13:29:44,790][00209] Num frames 9000... [2024-01-05 13:29:44,924][00209] Num frames 9100... [2024-01-05 13:29:45,049][00209] Num frames 9200... [2024-01-05 13:29:45,164][00209] Avg episode rewards: #0: 20.447, true rewards: #0: 9.247 [2024-01-05 13:29:45,166][00209] Avg episode reward: 20.447, avg true_objective: 9.247 [2024-01-05 13:30:40,215][00209] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-01-05 13:30:40,392][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 13:30:40,395][00209] Overriding arg 'num_workers' with value 1 passed from command line [2024-01-05 13:30:40,397][00209] Adding new argument 'no_render'=True that is not in the saved config file! [2024-01-05 13:30:40,399][00209] Adding new argument 'save_video'=True that is not in the saved config file! [2024-01-05 13:30:40,400][00209] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 13:30:40,401][00209] Adding new argument 'video_name'=None that is not in the saved config file! [2024-01-05 13:30:40,402][00209] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-01-05 13:30:40,403][00209] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-01-05 13:30:40,404][00209] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-01-05 13:30:40,405][00209] Adding new argument 'hf_repository'='gchindemi/appo-vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-01-05 13:30:40,406][00209] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-01-05 13:30:40,407][00209] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-01-05 13:30:40,408][00209] Adding new argument 'train_script'=None that is not in the saved config file! [2024-01-05 13:30:40,410][00209] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-01-05 13:30:40,411][00209] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-01-05 13:30:40,449][00209] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:30:40,450][00209] RunningMeanStd input shape: (1,) [2024-01-05 13:30:40,462][00209] ConvEncoder: input_channels=3 [2024-01-05 13:30:40,500][00209] Conv encoder output size: 512 [2024-01-05 13:30:40,501][00209] Policy head output size: 512 [2024-01-05 13:30:40,521][00209] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:30:40,944][00209] Num frames 100... [2024-01-05 13:30:41,078][00209] Num frames 200... [2024-01-05 13:30:41,208][00209] Num frames 300... [2024-01-05 13:30:41,337][00209] Num frames 400... [2024-01-05 13:30:41,463][00209] Num frames 500... [2024-01-05 13:30:41,592][00209] Num frames 600... [2024-01-05 13:30:41,732][00209] Num frames 700... [2024-01-05 13:30:41,870][00209] Num frames 800... [2024-01-05 13:30:42,013][00209] Num frames 900... [2024-01-05 13:30:42,148][00209] Num frames 1000... [2024-01-05 13:30:42,280][00209] Num frames 1100... [2024-01-05 13:30:42,407][00209] Num frames 1200... [2024-01-05 13:30:42,532][00209] Num frames 1300... [2024-01-05 13:30:42,660][00209] Num frames 1400... [2024-01-05 13:30:42,801][00209] Num frames 1500... [2024-01-05 13:30:42,931][00209] Num frames 1600... [2024-01-05 13:30:43,054][00209] Num frames 1700... [2024-01-05 13:30:43,146][00209] Avg episode rewards: #0: 47.279, true rewards: #0: 17.280 [2024-01-05 13:30:43,148][00209] Avg episode reward: 47.279, avg true_objective: 17.280 [2024-01-05 13:30:43,240][00209] Num frames 1800... [2024-01-05 13:30:43,365][00209] Num frames 1900... [2024-01-05 13:30:43,492][00209] Num frames 2000... [2024-01-05 13:30:43,617][00209] Num frames 2100... [2024-01-05 13:30:43,746][00209] Num frames 2200... [2024-01-05 13:30:43,883][00209] Num frames 2300... [2024-01-05 13:30:44,009][00209] Num frames 2400... [2024-01-05 13:30:44,141][00209] Num frames 2500... [2024-01-05 13:30:44,268][00209] Num frames 2600... [2024-01-05 13:30:44,394][00209] Num frames 2700... [2024-01-05 13:30:44,519][00209] Num frames 2800... [2024-01-05 13:30:44,598][00209] Avg episode rewards: #0: 37.580, true rewards: #0: 14.080 [2024-01-05 13:30:44,599][00209] Avg episode reward: 37.580, avg true_objective: 14.080 [2024-01-05 13:30:44,711][00209] Num frames 2900... [2024-01-05 13:30:44,845][00209] Num frames 3000... [2024-01-05 13:30:44,967][00209] Num frames 3100... [2024-01-05 13:30:45,093][00209] Num frames 3200... [2024-01-05 13:30:45,260][00209] Num frames 3300... [2024-01-05 13:30:45,449][00209] Num frames 3400... [2024-01-05 13:30:45,629][00209] Num frames 3500... [2024-01-05 13:30:45,802][00209] Num frames 3600... [2024-01-05 13:30:45,979][00209] Num frames 3700... [2024-01-05 13:30:46,162][00209] Num frames 3800... [2024-01-05 13:30:46,296][00209] Avg episode rewards: #0: 32.800, true rewards: #0: 12.800 [2024-01-05 13:30:46,298][00209] Avg episode reward: 32.800, avg true_objective: 12.800 [2024-01-05 13:30:46,408][00209] Num frames 3900... [2024-01-05 13:30:46,583][00209] Num frames 4000... [2024-01-05 13:30:46,763][00209] Num frames 4100... [2024-01-05 13:30:46,954][00209] Num frames 4200... [2024-01-05 13:30:47,140][00209] Num frames 4300... [2024-01-05 13:30:47,317][00209] Num frames 4400... [2024-01-05 13:30:47,498][00209] Num frames 4500... [2024-01-05 13:30:47,681][00209] Num frames 4600... [2024-01-05 13:30:47,867][00209] Avg episode rewards: #0: 30.180, true rewards: #0: 11.680 [2024-01-05 13:30:47,869][00209] Avg episode reward: 30.180, avg true_objective: 11.680 [2024-01-05 13:30:47,933][00209] Num frames 4700... [2024-01-05 13:30:48,114][00209] Num frames 4800... [2024-01-05 13:30:48,266][00209] Num frames 4900... [2024-01-05 13:30:48,389][00209] Num frames 5000... [2024-01-05 13:30:48,516][00209] Num frames 5100... [2024-01-05 13:30:48,642][00209] Num frames 5200... [2024-01-05 13:30:48,769][00209] Num frames 5300... [2024-01-05 13:30:48,894][00209] Num frames 5400... [2024-01-05 13:30:49,029][00209] Num frames 5500... [2024-01-05 13:30:49,154][00209] Num frames 5600... [2024-01-05 13:30:49,285][00209] Num frames 5700... [2024-01-05 13:30:49,410][00209] Num frames 5800... [2024-01-05 13:30:49,537][00209] Num frames 5900... [2024-01-05 13:30:49,666][00209] Num frames 6000... [2024-01-05 13:30:49,794][00209] Num frames 6100... [2024-01-05 13:30:49,846][00209] Avg episode rewards: #0: 31.600, true rewards: #0: 12.200 [2024-01-05 13:30:49,848][00209] Avg episode reward: 31.600, avg true_objective: 12.200 [2024-01-05 13:30:49,982][00209] Num frames 6200... [2024-01-05 13:30:50,108][00209] Num frames 6300... [2024-01-05 13:30:50,235][00209] Num frames 6400... [2024-01-05 13:30:50,363][00209] Num frames 6500... [2024-01-05 13:30:50,487][00209] Num frames 6600... [2024-01-05 13:30:50,612][00209] Num frames 6700... [2024-01-05 13:30:50,742][00209] Num frames 6800... [2024-01-05 13:30:50,870][00209] Num frames 6900... [2024-01-05 13:30:50,967][00209] Avg episode rewards: #0: 29.053, true rewards: #0: 11.553 [2024-01-05 13:30:50,968][00209] Avg episode reward: 29.053, avg true_objective: 11.553 [2024-01-05 13:30:51,068][00209] Num frames 7000... [2024-01-05 13:30:51,195][00209] Num frames 7100... [2024-01-05 13:30:51,322][00209] Num frames 7200... [2024-01-05 13:30:51,446][00209] Num frames 7300... [2024-01-05 13:30:51,577][00209] Num frames 7400... [2024-01-05 13:30:51,723][00209] Num frames 7500... [2024-01-05 13:30:51,864][00209] Num frames 7600... [2024-01-05 13:30:51,999][00209] Num frames 7700... [2024-01-05 13:30:52,133][00209] Num frames 7800... [2024-01-05 13:30:52,263][00209] Num frames 7900... [2024-01-05 13:30:52,390][00209] Num frames 8000... [2024-01-05 13:30:52,467][00209] Avg episode rewards: #0: 29.163, true rewards: #0: 11.449 [2024-01-05 13:30:52,469][00209] Avg episode reward: 29.163, avg true_objective: 11.449 [2024-01-05 13:30:52,580][00209] Num frames 8100... [2024-01-05 13:30:52,708][00209] Num frames 8200... [2024-01-05 13:30:52,841][00209] Num frames 8300... [2024-01-05 13:30:52,972][00209] Num frames 8400... [2024-01-05 13:30:53,160][00209] Avg episode rewards: #0: 26.368, true rewards: #0: 10.617 [2024-01-05 13:30:53,161][00209] Avg episode reward: 26.368, avg true_objective: 10.617 [2024-01-05 13:30:53,175][00209] Num frames 8500... [2024-01-05 13:30:53,304][00209] Num frames 8600... [2024-01-05 13:30:53,432][00209] Num frames 8700... [2024-01-05 13:30:53,564][00209] Num frames 8800... [2024-01-05 13:30:53,694][00209] Num frames 8900... [2024-01-05 13:30:53,826][00209] Num frames 9000... [2024-01-05 13:30:53,955][00209] Num frames 9100... [2024-01-05 13:30:54,094][00209] Num frames 9200... [2024-01-05 13:30:54,219][00209] Num frames 9300... [2024-01-05 13:30:54,346][00209] Num frames 9400... [2024-01-05 13:30:54,475][00209] Num frames 9500... [2024-01-05 13:30:54,607][00209] Num frames 9600... [2024-01-05 13:30:54,746][00209] Num frames 9700... [2024-01-05 13:30:54,874][00209] Num frames 9800... [2024-01-05 13:30:55,005][00209] Num frames 9900... [2024-01-05 13:30:55,139][00209] Num frames 10000... [2024-01-05 13:30:55,273][00209] Num frames 10100... [2024-01-05 13:30:55,407][00209] Num frames 10200... [2024-01-05 13:30:55,538][00209] Num frames 10300... [2024-01-05 13:30:55,664][00209] Num frames 10400... [2024-01-05 13:30:55,794][00209] Num frames 10500... [2024-01-05 13:30:55,973][00209] Avg episode rewards: #0: 29.549, true rewards: #0: 11.771 [2024-01-05 13:30:55,974][00209] Avg episode reward: 29.549, avg true_objective: 11.771 [2024-01-05 13:30:55,986][00209] Num frames 10600... [2024-01-05 13:30:56,117][00209] Num frames 10700... [2024-01-05 13:30:56,279][00209] Num frames 10800... [2024-01-05 13:30:56,407][00209] Num frames 10900... [2024-01-05 13:30:56,534][00209] Num frames 11000... [2024-01-05 13:30:56,661][00209] Num frames 11100... [2024-01-05 13:30:56,789][00209] Num frames 11200... [2024-01-05 13:30:56,917][00209] Num frames 11300... [2024-01-05 13:30:57,050][00209] Num frames 11400... [2024-01-05 13:30:57,185][00209] Num frames 11500... [2024-01-05 13:30:57,324][00209] Num frames 11600... [2024-01-05 13:30:57,487][00209] Avg episode rewards: #0: 29.282, true rewards: #0: 11.682 [2024-01-05 13:30:57,489][00209] Avg episode reward: 29.282, avg true_objective: 11.682 [2024-01-05 13:32:06,433][00209] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-01-05 13:35:18,115][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 13:35:18,118][00209] Overriding arg 'num_workers' with value 1 passed from command line [2024-01-05 13:35:18,120][00209] Adding new argument 'no_render'=True that is not in the saved config file! [2024-01-05 13:35:18,121][00209] Adding new argument 'save_video'=True that is not in the saved config file! [2024-01-05 13:35:18,123][00209] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 13:35:18,125][00209] Adding new argument 'video_name'=None that is not in the saved config file! [2024-01-05 13:35:18,127][00209] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-01-05 13:35:18,128][00209] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-01-05 13:35:18,129][00209] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-01-05 13:35:18,130][00209] Adding new argument 'hf_repository'='gchindemi/appo-vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-01-05 13:35:18,132][00209] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-01-05 13:35:18,133][00209] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-01-05 13:35:18,134][00209] Adding new argument 'train_script'=None that is not in the saved config file! [2024-01-05 13:35:18,135][00209] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-01-05 13:35:18,136][00209] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-01-05 13:35:18,175][00209] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:35:18,179][00209] RunningMeanStd input shape: (1,) [2024-01-05 13:35:18,191][00209] ConvEncoder: input_channels=3 [2024-01-05 13:35:18,226][00209] Conv encoder output size: 512 [2024-01-05 13:35:18,227][00209] Policy head output size: 512 [2024-01-05 13:35:18,246][00209] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:35:18,663][00209] Num frames 100... [2024-01-05 13:35:18,805][00209] Num frames 200... [2024-01-05 13:35:18,937][00209] Num frames 300... [2024-01-05 13:35:19,065][00209] Num frames 400... [2024-01-05 13:35:19,225][00209] Avg episode rewards: #0: 8.800, true rewards: #0: 4.800 [2024-01-05 13:35:19,227][00209] Avg episode reward: 8.800, avg true_objective: 4.800 [2024-01-05 13:35:19,256][00209] Num frames 500... [2024-01-05 13:35:19,382][00209] Num frames 600... [2024-01-05 13:35:19,512][00209] Num frames 700... [2024-01-05 13:35:19,641][00209] Num frames 800... [2024-01-05 13:35:19,782][00209] Num frames 900... [2024-01-05 13:35:19,919][00209] Avg episode rewards: #0: 8.800, true rewards: #0: 4.800 [2024-01-05 13:35:19,921][00209] Avg episode reward: 8.800, avg true_objective: 4.800 [2024-01-05 13:35:19,974][00209] Num frames 1000... [2024-01-05 13:35:20,108][00209] Num frames 1100... [2024-01-05 13:35:20,234][00209] Num frames 1200... [2024-01-05 13:35:20,360][00209] Num frames 1300... [2024-01-05 13:35:20,489][00209] Num frames 1400... [2024-01-05 13:35:20,556][00209] Avg episode rewards: #0: 8.360, true rewards: #0: 4.693 [2024-01-05 13:35:20,557][00209] Avg episode reward: 8.360, avg true_objective: 4.693 [2024-01-05 13:35:20,680][00209] Num frames 1500... [2024-01-05 13:35:20,823][00209] Num frames 1600... [2024-01-05 13:35:20,955][00209] Num frames 1700... [2024-01-05 13:35:21,091][00209] Num frames 1800... [2024-01-05 13:35:21,222][00209] Num frames 1900... [2024-01-05 13:35:21,354][00209] Num frames 2000... [2024-01-05 13:35:21,493][00209] Num frames 2100... [2024-01-05 13:35:21,621][00209] Num frames 2200... [2024-01-05 13:35:21,752][00209] Num frames 2300... [2024-01-05 13:35:21,891][00209] Num frames 2400... [2024-01-05 13:35:21,944][00209] Avg episode rewards: #0: 12.250, true rewards: #0: 6.000 [2024-01-05 13:35:21,946][00209] Avg episode reward: 12.250, avg true_objective: 6.000 [2024-01-05 13:35:22,077][00209] Num frames 2500... [2024-01-05 13:35:22,202][00209] Num frames 2600... [2024-01-05 13:35:22,334][00209] Num frames 2700... [2024-01-05 13:35:22,461][00209] Num frames 2800... [2024-01-05 13:35:22,589][00209] Num frames 2900... [2024-01-05 13:35:22,714][00209] Num frames 3000... [2024-01-05 13:35:22,846][00209] Num frames 3100... [2024-01-05 13:35:22,981][00209] Num frames 3200... [2024-01-05 13:35:23,116][00209] Num frames 3300... [2024-01-05 13:35:23,241][00209] Num frames 3400... [2024-01-05 13:35:23,370][00209] Num frames 3500... [2024-01-05 13:35:23,497][00209] Num frames 3600... [2024-01-05 13:35:23,633][00209] Num frames 3700... [2024-01-05 13:35:23,763][00209] Num frames 3800... [2024-01-05 13:35:23,903][00209] Num frames 3900... [2024-01-05 13:35:24,034][00209] Num frames 4000... [2024-01-05 13:35:24,218][00209] Avg episode rewards: #0: 17.992, true rewards: #0: 8.192 [2024-01-05 13:35:24,220][00209] Avg episode reward: 17.992, avg true_objective: 8.192 [2024-01-05 13:35:24,229][00209] Num frames 4100... [2024-01-05 13:35:24,359][00209] Num frames 4200... [2024-01-05 13:35:24,484][00209] Num frames 4300... [2024-01-05 13:35:24,612][00209] Num frames 4400... [2024-01-05 13:35:24,740][00209] Num frames 4500... [2024-01-05 13:35:24,871][00209] Num frames 4600... [2024-01-05 13:35:25,009][00209] Num frames 4700... [2024-01-05 13:35:25,140][00209] Num frames 4800... [2024-01-05 13:35:25,280][00209] Avg episode rewards: #0: 17.778, true rewards: #0: 8.112 [2024-01-05 13:35:25,281][00209] Avg episode reward: 17.778, avg true_objective: 8.112 [2024-01-05 13:35:25,326][00209] Num frames 4900... [2024-01-05 13:35:25,455][00209] Num frames 5000... [2024-01-05 13:35:25,581][00209] Num frames 5100... [2024-01-05 13:35:25,707][00209] Num frames 5200... [2024-01-05 13:35:25,837][00209] Num frames 5300... [2024-01-05 13:35:25,977][00209] Num frames 5400... [2024-01-05 13:35:26,106][00209] Num frames 5500... [2024-01-05 13:35:26,241][00209] Num frames 5600... [2024-01-05 13:35:26,372][00209] Num frames 5700... [2024-01-05 13:35:26,500][00209] Num frames 5800... [2024-01-05 13:35:26,627][00209] Num frames 5900... [2024-01-05 13:35:26,764][00209] Num frames 6000... [2024-01-05 13:35:26,958][00209] Num frames 6100... [2024-01-05 13:35:27,181][00209] Num frames 6200... [2024-01-05 13:35:27,372][00209] Num frames 6300... [2024-01-05 13:35:27,551][00209] Num frames 6400... [2024-01-05 13:35:27,736][00209] Num frames 6500... [2024-01-05 13:35:27,918][00209] Num frames 6600... [2024-01-05 13:35:28,116][00209] Num frames 6700... [2024-01-05 13:35:28,300][00209] Num frames 6800... [2024-01-05 13:35:28,477][00209] Num frames 6900... [2024-01-05 13:35:28,654][00209] Avg episode rewards: #0: 23.238, true rewards: #0: 9.953 [2024-01-05 13:35:28,658][00209] Avg episode reward: 23.238, avg true_objective: 9.953 [2024-01-05 13:35:28,725][00209] Num frames 7000... [2024-01-05 13:35:28,907][00209] Num frames 7100... [2024-01-05 13:35:29,095][00209] Num frames 7200... [2024-01-05 13:35:29,278][00209] Num frames 7300... [2024-01-05 13:35:29,458][00209] Num frames 7400... [2024-01-05 13:35:29,642][00209] Num frames 7500... [2024-01-05 13:35:29,827][00209] Num frames 7600... [2024-01-05 13:35:29,958][00209] Num frames 7700... [2024-01-05 13:35:30,088][00209] Num frames 7800... [2024-01-05 13:35:30,172][00209] Avg episode rewards: #0: 22.394, true rewards: #0: 9.769 [2024-01-05 13:35:30,174][00209] Avg episode reward: 22.394, avg true_objective: 9.769 [2024-01-05 13:35:30,283][00209] Num frames 7900... [2024-01-05 13:35:30,416][00209] Num frames 8000... [2024-01-05 13:35:30,543][00209] Num frames 8100... [2024-01-05 13:35:30,672][00209] Num frames 8200... [2024-01-05 13:35:30,800][00209] Num frames 8300... [2024-01-05 13:35:30,933][00209] Num frames 8400... [2024-01-05 13:35:31,093][00209] Avg episode rewards: #0: 21.422, true rewards: #0: 9.422 [2024-01-05 13:35:31,096][00209] Avg episode reward: 21.422, avg true_objective: 9.422 [2024-01-05 13:35:31,128][00209] Num frames 8500... [2024-01-05 13:35:31,260][00209] Num frames 8600... [2024-01-05 13:35:31,399][00209] Num frames 8700... [2024-01-05 13:35:31,529][00209] Num frames 8800... [2024-01-05 13:35:31,656][00209] Num frames 8900... [2024-01-05 13:35:31,785][00209] Num frames 9000... [2024-01-05 13:35:31,913][00209] Num frames 9100... [2024-01-05 13:35:32,047][00209] Num frames 9200... [2024-01-05 13:35:32,187][00209] Num frames 9300... [2024-01-05 13:35:32,317][00209] Num frames 9400... [2024-01-05 13:35:32,448][00209] Num frames 9500... [2024-01-05 13:35:32,583][00209] Num frames 9600... [2024-01-05 13:35:32,719][00209] Num frames 9700... [2024-01-05 13:35:32,851][00209] Num frames 9800... [2024-01-05 13:35:32,981][00209] Num frames 9900... [2024-01-05 13:35:33,113][00209] Num frames 10000... [2024-01-05 13:35:33,239][00209] Avg episode rewards: #0: 23.448, true rewards: #0: 10.048 [2024-01-05 13:35:33,241][00209] Avg episode reward: 23.448, avg true_objective: 10.048 [2024-01-05 13:36:33,198][00209] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-01-05 13:36:41,122][00209] The model has been pushed to https://huggingface.co/gchindemi/appo-vizdoom_health_gathering_supreme [2024-01-05 13:39:15,380][00209] Environment doom_basic already registered, overwriting... [2024-01-05 13:39:15,383][00209] Environment doom_two_colors_easy already registered, overwriting... [2024-01-05 13:39:15,385][00209] Environment doom_two_colors_hard already registered, overwriting... [2024-01-05 13:39:15,387][00209] Environment doom_dm already registered, overwriting... [2024-01-05 13:39:15,389][00209] Environment doom_dwango5 already registered, overwriting... [2024-01-05 13:39:15,394][00209] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-01-05 13:39:15,395][00209] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-01-05 13:39:15,396][00209] Environment doom_my_way_home already registered, overwriting... [2024-01-05 13:39:15,397][00209] Environment doom_deadly_corridor already registered, overwriting... [2024-01-05 13:39:15,399][00209] Environment doom_defend_the_center already registered, overwriting... [2024-01-05 13:39:15,400][00209] Environment doom_defend_the_line already registered, overwriting... [2024-01-05 13:39:15,401][00209] Environment doom_health_gathering already registered, overwriting... [2024-01-05 13:39:15,402][00209] Environment doom_health_gathering_supreme already registered, overwriting... [2024-01-05 13:39:15,403][00209] Environment doom_battle already registered, overwriting... [2024-01-05 13:39:15,404][00209] Environment doom_battle2 already registered, overwriting... [2024-01-05 13:39:15,405][00209] Environment doom_duel_bots already registered, overwriting... [2024-01-05 13:39:15,407][00209] Environment doom_deathmatch_bots already registered, overwriting... [2024-01-05 13:39:15,408][00209] Environment doom_duel already registered, overwriting... [2024-01-05 13:39:15,409][00209] Environment doom_deathmatch_full already registered, overwriting... [2024-01-05 13:39:15,410][00209] Environment doom_benchmark already registered, overwriting... [2024-01-05 13:39:15,412][00209] register_encoder_factory: [2024-01-05 13:39:15,440][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 13:39:15,446][00209] Experiment dir /content/train_dir/default_experiment already exists! [2024-01-05 13:39:15,447][00209] Resuming existing experiment from /content/train_dir/default_experiment... [2024-01-05 13:39:15,449][00209] Weights and Biases integration disabled [2024-01-05 13:39:15,454][00209] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-01-05 13:39:17,534][00209] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10_000_000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-01-05 13:39:17,536][00209] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-01-05 13:39:17,540][00209] Rollout worker 0 uses device cpu [2024-01-05 13:39:17,542][00209] Rollout worker 1 uses device cpu [2024-01-05 13:39:17,544][00209] Rollout worker 2 uses device cpu [2024-01-05 13:39:17,545][00209] Rollout worker 3 uses device cpu [2024-01-05 13:39:17,547][00209] Rollout worker 4 uses device cpu [2024-01-05 13:39:17,548][00209] Rollout worker 5 uses device cpu [2024-01-05 13:39:17,549][00209] Rollout worker 6 uses device cpu [2024-01-05 13:39:17,550][00209] Rollout worker 7 uses device cpu [2024-01-05 13:39:17,630][00209] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:39:17,632][00209] InferenceWorker_p0-w0: min num requests: 2 [2024-01-05 13:39:17,666][00209] Starting all processes... [2024-01-05 13:39:17,667][00209] Starting process learner_proc0 [2024-01-05 13:39:17,716][00209] Starting all processes... [2024-01-05 13:39:17,721][00209] Starting process inference_proc0-0 [2024-01-05 13:39:17,722][00209] Starting process rollout_proc0 [2024-01-05 13:39:17,722][00209] Starting process rollout_proc1 [2024-01-05 13:39:17,722][00209] Starting process rollout_proc2 [2024-01-05 13:39:17,722][00209] Starting process rollout_proc3 [2024-01-05 13:39:17,722][00209] Starting process rollout_proc4 [2024-01-05 13:39:17,728][00209] Starting process rollout_proc6 [2024-01-05 13:39:17,728][00209] Starting process rollout_proc7 [2024-01-05 13:39:17,728][00209] Starting process rollout_proc5 [2024-01-05 13:39:34,349][20181] Worker 7 uses CPU cores [1] [2024-01-05 13:39:35,059][20174] Worker 3 uses CPU cores [1] [2024-01-05 13:39:35,226][20180] Worker 6 uses CPU cores [0] [2024-01-05 13:39:35,285][20182] Worker 5 uses CPU cores [1] [2024-01-05 13:39:35,298][20171] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:39:35,300][20171] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-01-05 13:39:35,338][20173] Worker 2 uses CPU cores [0] [2024-01-05 13:39:35,369][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:39:35,372][20157] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-01-05 13:39:35,394][20171] Num visible devices: 1 [2024-01-05 13:39:35,431][20157] Num visible devices: 1 [2024-01-05 13:39:35,453][20175] Worker 4 uses CPU cores [0] [2024-01-05 13:39:35,456][20157] Starting seed is not provided [2024-01-05 13:39:35,457][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:39:35,458][20157] Initializing actor-critic model on device cuda:0 [2024-01-05 13:39:35,459][20157] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:39:35,460][20157] RunningMeanStd input shape: (1,) [2024-01-05 13:39:35,501][20170] Worker 0 uses CPU cores [0] [2024-01-05 13:39:35,507][20172] Worker 1 uses CPU cores [1] [2024-01-05 13:39:35,502][20157] ConvEncoder: input_channels=3 [2024-01-05 13:39:35,677][20157] Conv encoder output size: 512 [2024-01-05 13:39:35,678][20157] Policy head output size: 512 [2024-01-05 13:39:35,703][20157] Created Actor Critic model with architecture: [2024-01-05 13:39:35,704][20157] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-01-05 13:39:36,020][20157] Using optimizer [2024-01-05 13:39:37,542][20157] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-01-05 13:39:37,591][20157] Loading model from checkpoint [2024-01-05 13:39:37,594][20157] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2024-01-05 13:39:37,595][20157] Initialized policy 0 weights for model version 2443 [2024-01-05 13:39:37,599][20157] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:39:37,614][20157] LearnerWorker_p0 finished initialization! [2024-01-05 13:39:37,626][00209] Heartbeat connected on LearnerWorker_p0 [2024-01-05 13:39:37,642][00209] Heartbeat connected on RolloutWorker_w0 [2024-01-05 13:39:37,651][00209] Heartbeat connected on RolloutWorker_w2 [2024-01-05 13:39:37,656][00209] Heartbeat connected on RolloutWorker_w3 [2024-01-05 13:39:37,660][00209] Heartbeat connected on RolloutWorker_w4 [2024-01-05 13:39:37,663][00209] Heartbeat connected on RolloutWorker_w5 [2024-01-05 13:39:37,666][00209] Heartbeat connected on RolloutWorker_w6 [2024-01-05 13:39:37,671][00209] Heartbeat connected on RolloutWorker_w7 [2024-01-05 13:39:37,673][00209] Heartbeat connected on RolloutWorker_w1 [2024-01-05 13:39:37,872][00209] Heartbeat connected on Batcher_0 [2024-01-05 13:39:37,914][20171] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:39:37,915][20171] RunningMeanStd input shape: (1,) [2024-01-05 13:39:37,935][20171] ConvEncoder: input_channels=3 [2024-01-05 13:39:38,054][20171] Conv encoder output size: 512 [2024-01-05 13:39:38,055][20171] Policy head output size: 512 [2024-01-05 13:39:38,122][00209] Inference worker 0-0 is ready! [2024-01-05 13:39:38,124][00209] All inference workers are ready! Signal rollout workers to start! [2024-01-05 13:39:38,126][00209] Heartbeat connected on InferenceWorker_p0-w0 [2024-01-05 13:39:38,398][20174] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,443][20175] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,440][20181] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,447][20182] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,456][20180] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,455][20170] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,452][20172] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:38,458][20173] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:39:39,449][20180] Decorrelating experience for 0 frames... [2024-01-05 13:39:39,446][20175] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,135][20174] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,157][20181] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,161][20182] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,176][20172] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,212][20175] Decorrelating experience for 32 frames... [2024-01-05 13:39:40,337][20173] Decorrelating experience for 0 frames... [2024-01-05 13:39:40,455][00209] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:39:41,059][20180] Decorrelating experience for 32 frames... [2024-01-05 13:39:41,296][20175] Decorrelating experience for 64 frames... [2024-01-05 13:39:41,343][20174] Decorrelating experience for 32 frames... [2024-01-05 13:39:41,368][20181] Decorrelating experience for 32 frames... [2024-01-05 13:39:41,375][20182] Decorrelating experience for 32 frames... [2024-01-05 13:39:42,207][20173] Decorrelating experience for 32 frames... [2024-01-05 13:39:42,216][20175] Decorrelating experience for 96 frames... [2024-01-05 13:39:42,572][20172] Decorrelating experience for 32 frames... [2024-01-05 13:39:43,436][20174] Decorrelating experience for 64 frames... [2024-01-05 13:39:43,441][20180] Decorrelating experience for 64 frames... [2024-01-05 13:39:43,455][20181] Decorrelating experience for 64 frames... [2024-01-05 13:39:43,458][20182] Decorrelating experience for 64 frames... [2024-01-05 13:39:44,687][20172] Decorrelating experience for 64 frames... [2024-01-05 13:39:44,812][20181] Decorrelating experience for 96 frames... [2024-01-05 13:39:44,821][20182] Decorrelating experience for 96 frames... [2024-01-05 13:39:45,226][20170] Decorrelating experience for 0 frames... [2024-01-05 13:39:45,455][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 50.4. Samples: 252. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:39:45,456][00209] Avg episode reward: [(0, '2.920')] [2024-01-05 13:39:45,465][20180] Decorrelating experience for 96 frames... [2024-01-05 13:39:46,299][20172] Decorrelating experience for 96 frames... [2024-01-05 13:39:48,709][20173] Decorrelating experience for 64 frames... [2024-01-05 13:39:48,726][20170] Decorrelating experience for 32 frames... [2024-01-05 13:39:49,489][20157] Signal inference workers to stop experience collection... [2024-01-05 13:39:49,529][20171] InferenceWorker_p0-w0: stopping experience collection [2024-01-05 13:39:50,455][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 237.4. Samples: 2374. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:39:50,457][00209] Avg episode reward: [(0, '5.030')] [2024-01-05 13:39:50,560][20174] Decorrelating experience for 96 frames... [2024-01-05 13:39:51,436][20157] Signal inference workers to resume experience collection... [2024-01-05 13:39:51,436][20171] InferenceWorker_p0-w0: resuming experience collection [2024-01-05 13:39:51,440][20157] Stopping Batcher_0... [2024-01-05 13:39:51,441][20157] Loop batcher_evt_loop terminating... [2024-01-05 13:39:51,462][20175] Stopping RolloutWorker_w4... [2024-01-05 13:39:51,472][00209] Component Batcher_0 stopped! [2024-01-05 13:39:51,480][00209] Component RolloutWorker_w4 stopped! [2024-01-05 13:39:51,472][20175] Loop rollout_proc4_evt_loop terminating... [2024-01-05 13:39:51,515][20171] Weights refcount: 2 0 [2024-01-05 13:39:51,521][20171] Stopping InferenceWorker_p0-w0... [2024-01-05 13:39:51,523][20171] Loop inference_proc0-0_evt_loop terminating... [2024-01-05 13:39:51,524][00209] Component InferenceWorker_p0-w0 stopped! [2024-01-05 13:39:51,533][20180] Stopping RolloutWorker_w6... [2024-01-05 13:39:51,541][00209] Component RolloutWorker_w6 stopped! [2024-01-05 13:39:51,545][20180] Loop rollout_proc6_evt_loop terminating... [2024-01-05 13:39:51,559][00209] Component RolloutWorker_w5 stopped! [2024-01-05 13:39:51,567][20182] Stopping RolloutWorker_w5... [2024-01-05 13:39:51,571][00209] Component RolloutWorker_w7 stopped! [2024-01-05 13:39:51,577][20181] Stopping RolloutWorker_w7... [2024-01-05 13:39:51,580][20182] Loop rollout_proc5_evt_loop terminating... [2024-01-05 13:39:51,577][20181] Loop rollout_proc7_evt_loop terminating... [2024-01-05 13:39:51,630][00209] Component RolloutWorker_w1 stopped! [2024-01-05 13:39:51,642][20172] Stopping RolloutWorker_w1... [2024-01-05 13:39:51,643][20172] Loop rollout_proc1_evt_loop terminating... [2024-01-05 13:39:51,676][00209] Component RolloutWorker_w3 stopped! [2024-01-05 13:39:51,681][20174] Stopping RolloutWorker_w3... [2024-01-05 13:39:51,692][20174] Loop rollout_proc3_evt_loop terminating... [2024-01-05 13:39:53,102][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-01-05 13:39:53,311][20157] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002361_9670656.pth [2024-01-05 13:39:53,328][20157] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-01-05 13:39:53,582][20157] Stopping LearnerWorker_p0... [2024-01-05 13:39:53,582][00209] Component LearnerWorker_p0 stopped! [2024-01-05 13:39:53,584][20157] Loop learner_proc0_evt_loop terminating... [2024-01-05 13:39:53,710][20173] Decorrelating experience for 96 frames... [2024-01-05 13:39:54,425][00209] Component RolloutWorker_w2 stopped! [2024-01-05 13:39:54,427][20173] Stopping RolloutWorker_w2... [2024-01-05 13:39:54,430][20173] Loop rollout_proc2_evt_loop terminating... [2024-01-05 13:39:54,553][20170] Decorrelating experience for 64 frames... [2024-01-05 13:39:56,324][20170] Decorrelating experience for 96 frames... [2024-01-05 13:39:56,687][00209] Component RolloutWorker_w0 stopped! [2024-01-05 13:39:56,695][00209] Waiting for process learner_proc0 to stop... [2024-01-05 13:39:56,698][00209] Waiting for process inference_proc0-0 to join... [2024-01-05 13:39:56,702][00209] Waiting for process rollout_proc0 to join... [2024-01-05 13:39:56,710][20170] Stopping RolloutWorker_w0... [2024-01-05 13:39:56,713][20170] Loop rollout_proc0_evt_loop terminating... [2024-01-05 13:39:57,404][00209] Waiting for process rollout_proc1 to join... [2024-01-05 13:39:57,407][00209] Waiting for process rollout_proc2 to join... [2024-01-05 13:39:57,409][00209] Waiting for process rollout_proc3 to join... [2024-01-05 13:39:57,411][00209] Waiting for process rollout_proc4 to join... [2024-01-05 13:39:57,414][00209] Waiting for process rollout_proc5 to join... [2024-01-05 13:39:57,416][00209] Waiting for process rollout_proc6 to join... [2024-01-05 13:39:57,418][00209] Waiting for process rollout_proc7 to join... [2024-01-05 13:39:57,421][00209] Batcher 0 profile tree view: batching: 0.2616, releasing_batches: 0.0005 [2024-01-05 13:39:57,423][00209] InferenceWorker_p0-w0 profile tree view: update_model: 0.0184 wait_policy: 0.0000 wait_policy_total: 8.0429 one_step: 0.0049 handle_policy_step: 3.1533 deserialize: 0.0470, stack: 0.0088, obs_to_device_normalize: 0.5304, forward: 2.1518, send_messages: 0.0765 prepare_outputs: 0.2449 to_cpu: 0.1289 [2024-01-05 13:39:57,424][00209] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 2.9352 train: 4.0767 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0203, after_optimizer: 0.1153 calculate_losses: 2.2828 losses_init: 0.0000, forward_head: 0.3585, bptt_initial: 1.7083, tail: 0.0964, advantages_returns: 0.0056, losses: 0.0879 bptt: 0.0254 bptt_forward_core: 0.0253 update: 1.6569 clip: 0.0991 [2024-01-05 13:39:57,426][00209] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0005 [2024-01-05 13:39:57,427][00209] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.6089, env_step: 3.1115, overhead: 0.0698, complete_rollouts: 0.0070 save_policy_outputs: 0.0829 split_output_tensors: 0.0263 [2024-01-05 13:39:57,429][00209] Loop Runner_EvtLoop terminating... [2024-01-05 13:39:57,431][00209] Runner profile tree view: main_loop: 39.7657 [2024-01-05 13:39:57,432][00209] Collected {0: 10014720}, FPS: 206.0 [2024-01-05 13:40:09,766][00209] Environment doom_basic already registered, overwriting... [2024-01-05 13:40:09,768][00209] Environment doom_two_colors_easy already registered, overwriting... [2024-01-05 13:40:09,769][00209] Environment doom_two_colors_hard already registered, overwriting... [2024-01-05 13:40:09,771][00209] Environment doom_dm already registered, overwriting... [2024-01-05 13:40:09,776][00209] Environment doom_dwango5 already registered, overwriting... [2024-01-05 13:40:09,778][00209] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-01-05 13:40:09,780][00209] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-01-05 13:40:09,782][00209] Environment doom_my_way_home already registered, overwriting... [2024-01-05 13:40:09,784][00209] Environment doom_deadly_corridor already registered, overwriting... [2024-01-05 13:40:09,786][00209] Environment doom_defend_the_center already registered, overwriting... [2024-01-05 13:40:09,788][00209] Environment doom_defend_the_line already registered, overwriting... [2024-01-05 13:40:09,789][00209] Environment doom_health_gathering already registered, overwriting... [2024-01-05 13:40:09,791][00209] Environment doom_health_gathering_supreme already registered, overwriting... [2024-01-05 13:40:09,796][00209] Environment doom_battle already registered, overwriting... [2024-01-05 13:40:09,797][00209] Environment doom_battle2 already registered, overwriting... [2024-01-05 13:40:09,798][00209] Environment doom_duel_bots already registered, overwriting... [2024-01-05 13:40:09,799][00209] Environment doom_deathmatch_bots already registered, overwriting... [2024-01-05 13:40:09,800][00209] Environment doom_duel already registered, overwriting... [2024-01-05 13:40:09,801][00209] Environment doom_deathmatch_full already registered, overwriting... [2024-01-05 13:40:09,802][00209] Environment doom_benchmark already registered, overwriting... [2024-01-05 13:40:09,804][00209] register_encoder_factory: [2024-01-05 13:40:09,832][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 13:40:09,834][00209] Overriding arg 'train_for_env_steps' with value 20000000 passed from command line [2024-01-05 13:40:09,840][00209] Experiment dir /content/train_dir/default_experiment already exists! [2024-01-05 13:40:09,842][00209] Resuming existing experiment from /content/train_dir/default_experiment... [2024-01-05 13:40:09,844][00209] Weights and Biases integration disabled [2024-01-05 13:40:09,847][00209] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-01-05 13:40:12,059][00209] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=20000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10_000_000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} git_hash=unknown git_repo_name=not a git repository [2024-01-05 13:40:12,060][00209] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-01-05 13:40:12,064][00209] Rollout worker 0 uses device cpu [2024-01-05 13:40:12,065][00209] Rollout worker 1 uses device cpu [2024-01-05 13:40:12,067][00209] Rollout worker 2 uses device cpu [2024-01-05 13:40:12,068][00209] Rollout worker 3 uses device cpu [2024-01-05 13:40:12,069][00209] Rollout worker 4 uses device cpu [2024-01-05 13:40:12,071][00209] Rollout worker 5 uses device cpu [2024-01-05 13:40:12,072][00209] Rollout worker 6 uses device cpu [2024-01-05 13:40:12,073][00209] Rollout worker 7 uses device cpu [2024-01-05 13:40:12,158][00209] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:40:12,160][00209] InferenceWorker_p0-w0: min num requests: 2 [2024-01-05 13:40:12,194][00209] Starting all processes... [2024-01-05 13:40:12,195][00209] Starting process learner_proc0 [2024-01-05 13:40:12,244][00209] Starting all processes... [2024-01-05 13:40:12,250][00209] Starting process inference_proc0-0 [2024-01-05 13:40:12,250][00209] Starting process rollout_proc0 [2024-01-05 13:40:12,263][00209] Starting process rollout_proc1 [2024-01-05 13:40:12,263][00209] Starting process rollout_proc2 [2024-01-05 13:40:12,264][00209] Starting process rollout_proc3 [2024-01-05 13:40:12,264][00209] Starting process rollout_proc4 [2024-01-05 13:40:12,264][00209] Starting process rollout_proc5 [2024-01-05 13:40:12,264][00209] Starting process rollout_proc6 [2024-01-05 13:40:12,264][00209] Starting process rollout_proc7 [2024-01-05 13:40:28,622][24422] Worker 0 uses CPU cores [0] [2024-01-05 13:40:28,644][24408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:40:28,646][24408] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-01-05 13:40:28,708][24408] Num visible devices: 1 [2024-01-05 13:40:28,743][24408] Starting seed is not provided [2024-01-05 13:40:28,744][24408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:40:28,745][24408] Initializing actor-critic model on device cuda:0 [2024-01-05 13:40:28,746][24408] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:40:28,747][24408] RunningMeanStd input shape: (1,) [2024-01-05 13:40:28,815][24426] Worker 5 uses CPU cores [1] [2024-01-05 13:40:28,824][24408] ConvEncoder: input_channels=3 [2024-01-05 13:40:28,872][24421] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:40:28,873][24421] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-01-05 13:40:28,958][24425] Worker 2 uses CPU cores [0] [2024-01-05 13:40:28,965][24421] Num visible devices: 1 [2024-01-05 13:40:29,013][24423] Worker 1 uses CPU cores [1] [2024-01-05 13:40:29,073][24424] Worker 3 uses CPU cores [1] [2024-01-05 13:40:29,091][24427] Worker 4 uses CPU cores [0] [2024-01-05 13:40:29,128][24429] Worker 7 uses CPU cores [1] [2024-01-05 13:40:29,165][24428] Worker 6 uses CPU cores [0] [2024-01-05 13:40:29,228][24408] Conv encoder output size: 512 [2024-01-05 13:40:29,229][24408] Policy head output size: 512 [2024-01-05 13:40:29,254][24408] Created Actor Critic model with architecture: [2024-01-05 13:40:29,255][24408] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-01-05 13:40:29,523][24408] Using optimizer [2024-01-05 13:40:30,518][24408] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2024-01-05 13:40:30,556][24408] Loading model from checkpoint [2024-01-05 13:40:30,558][24408] Loaded experiment state at self.train_step=2445, self.env_steps=10014720 [2024-01-05 13:40:30,559][24408] Initialized policy 0 weights for model version 2445 [2024-01-05 13:40:30,562][24408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-01-05 13:40:30,569][24408] LearnerWorker_p0 finished initialization! [2024-01-05 13:40:30,760][24421] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 13:40:30,762][24421] RunningMeanStd input shape: (1,) [2024-01-05 13:40:30,774][24421] ConvEncoder: input_channels=3 [2024-01-05 13:40:30,876][24421] Conv encoder output size: 512 [2024-01-05 13:40:30,876][24421] Policy head output size: 512 [2024-01-05 13:40:30,961][00209] Inference worker 0-0 is ready! [2024-01-05 13:40:30,963][00209] All inference workers are ready! Signal rollout workers to start! [2024-01-05 13:40:31,264][24423] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,262][24429] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,267][24426] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,268][24424] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,279][24422] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,297][24427] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,287][24425] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:31,289][24428] Doom resolution: 160x120, resize resolution: (128, 72) [2024-01-05 13:40:32,150][00209] Heartbeat connected on Batcher_0 [2024-01-05 13:40:32,154][00209] Heartbeat connected on LearnerWorker_p0 [2024-01-05 13:40:32,200][00209] Heartbeat connected on InferenceWorker_p0-w0 [2024-01-05 13:40:32,911][24425] Decorrelating experience for 0 frames... [2024-01-05 13:40:32,914][24428] Decorrelating experience for 0 frames... [2024-01-05 13:40:33,098][24429] Decorrelating experience for 0 frames... [2024-01-05 13:40:33,103][24423] Decorrelating experience for 0 frames... [2024-01-05 13:40:33,104][24426] Decorrelating experience for 0 frames... [2024-01-05 13:40:34,003][24425] Decorrelating experience for 32 frames... [2024-01-05 13:40:34,504][24426] Decorrelating experience for 32 frames... [2024-01-05 13:40:34,506][24423] Decorrelating experience for 32 frames... [2024-01-05 13:40:34,541][24424] Decorrelating experience for 0 frames... [2024-01-05 13:40:34,848][00209] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10014720. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:40:34,873][24422] Decorrelating experience for 0 frames... [2024-01-05 13:40:35,846][24429] Decorrelating experience for 32 frames... [2024-01-05 13:40:36,382][24424] Decorrelating experience for 32 frames... [2024-01-05 13:40:36,783][24428] Decorrelating experience for 32 frames... [2024-01-05 13:40:36,785][24427] Decorrelating experience for 0 frames... [2024-01-05 13:40:36,886][24425] Decorrelating experience for 64 frames... [2024-01-05 13:40:37,066][24423] Decorrelating experience for 64 frames... [2024-01-05 13:40:37,202][24422] Decorrelating experience for 32 frames... [2024-01-05 13:40:38,228][24427] Decorrelating experience for 32 frames... [2024-01-05 13:40:38,309][24429] Decorrelating experience for 64 frames... [2024-01-05 13:40:38,398][24426] Decorrelating experience for 64 frames... [2024-01-05 13:40:38,698][24428] Decorrelating experience for 64 frames... [2024-01-05 13:40:38,806][24424] Decorrelating experience for 64 frames... [2024-01-05 13:40:38,890][24423] Decorrelating experience for 96 frames... [2024-01-05 13:40:38,964][24422] Decorrelating experience for 64 frames... [2024-01-05 13:40:39,124][00209] Heartbeat connected on RolloutWorker_w1 [2024-01-05 13:40:39,848][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:40:40,038][24425] Decorrelating experience for 96 frames... [2024-01-05 13:40:40,317][00209] Heartbeat connected on RolloutWorker_w2 [2024-01-05 13:40:40,341][24426] Decorrelating experience for 96 frames... [2024-01-05 13:40:40,591][24427] Decorrelating experience for 64 frames... [2024-01-05 13:40:40,725][24428] Decorrelating experience for 96 frames... [2024-01-05 13:40:40,757][00209] Heartbeat connected on RolloutWorker_w5 [2024-01-05 13:40:41,069][24424] Decorrelating experience for 96 frames... [2024-01-05 13:40:41,128][24429] Decorrelating experience for 96 frames... [2024-01-05 13:40:41,123][00209] Heartbeat connected on RolloutWorker_w6 [2024-01-05 13:40:41,184][24422] Decorrelating experience for 96 frames... [2024-01-05 13:40:41,368][00209] Heartbeat connected on RolloutWorker_w3 [2024-01-05 13:40:41,397][00209] Heartbeat connected on RolloutWorker_w7 [2024-01-05 13:40:41,512][00209] Heartbeat connected on RolloutWorker_w0 [2024-01-05 13:40:43,670][24427] Decorrelating experience for 96 frames... [2024-01-05 13:40:44,204][00209] Heartbeat connected on RolloutWorker_w4 [2024-01-05 13:40:44,243][24408] Signal inference workers to stop experience collection... [2024-01-05 13:40:44,273][24421] InferenceWorker_p0-w0: stopping experience collection [2024-01-05 13:40:44,848][00209] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 183.6. Samples: 1836. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-01-05 13:40:44,849][00209] Avg episode reward: [(0, '3.820')] [2024-01-05 13:40:46,061][24408] Signal inference workers to resume experience collection... [2024-01-05 13:40:46,063][24421] InferenceWorker_p0-w0: resuming experience collection [2024-01-05 13:40:49,850][00209] Fps is (10 sec: 1638.1, 60 sec: 1092.1, 300 sec: 1092.1). Total num frames: 10031104. Throughput: 0: 313.3. Samples: 4700. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-01-05 13:40:49,852][00209] Avg episode reward: [(0, '6.873')] [2024-01-05 13:40:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 10043392. Throughput: 0: 333.2. Samples: 6664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:40:54,851][00209] Avg episode reward: [(0, '11.273')] [2024-01-05 13:40:58,183][24421] Updated weights for policy 0, policy_version 2455 (0.0195) [2024-01-05 13:40:59,849][00209] Fps is (10 sec: 2867.5, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 10059776. Throughput: 0: 427.6. Samples: 10690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:40:59,856][00209] Avg episode reward: [(0, '15.095')] [2024-01-05 13:41:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 10080256. Throughput: 0: 549.9. Samples: 16496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:41:04,850][00209] Avg episode reward: [(0, '19.584')] [2024-01-05 13:41:08,080][24421] Updated weights for policy 0, policy_version 2465 (0.0016) [2024-01-05 13:41:09,848][00209] Fps is (10 sec: 4096.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 10100736. Throughput: 0: 564.0. Samples: 19740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:41:09,853][00209] Avg episode reward: [(0, '20.864')] [2024-01-05 13:41:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 10113024. Throughput: 0: 604.5. Samples: 24180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:41:14,854][00209] Avg episode reward: [(0, '21.874')] [2024-01-05 13:41:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 10129408. Throughput: 0: 634.3. Samples: 28542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:41:19,851][00209] Avg episode reward: [(0, '21.952')] [2024-01-05 13:41:21,418][24421] Updated weights for policy 0, policy_version 2475 (0.0019) [2024-01-05 13:41:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 10149888. Throughput: 0: 705.2. Samples: 31736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:41:24,850][00209] Avg episode reward: [(0, '25.239')] [2024-01-05 13:41:29,849][00209] Fps is (10 sec: 4095.7, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 10170368. Throughput: 0: 810.3. Samples: 38300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:41:29,851][00209] Avg episode reward: [(0, '26.821')] [2024-01-05 13:41:32,659][24421] Updated weights for policy 0, policy_version 2485 (0.0026) [2024-01-05 13:41:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 10182656. Throughput: 0: 839.0. Samples: 42454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:41:34,852][00209] Avg episode reward: [(0, '26.896')] [2024-01-05 13:41:39,848][00209] Fps is (10 sec: 2867.5, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 10199040. Throughput: 0: 839.8. Samples: 44454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:41:39,850][00209] Avg episode reward: [(0, '26.586')] [2024-01-05 13:41:44,808][24421] Updated weights for policy 0, policy_version 2495 (0.0018) [2024-01-05 13:41:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2925.7). Total num frames: 10219520. Throughput: 0: 868.1. Samples: 49756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:41:44,850][00209] Avg episode reward: [(0, '27.718')] [2024-01-05 13:41:49,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3345.2, 300 sec: 2894.5). Total num frames: 10231808. Throughput: 0: 856.6. Samples: 55042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:41:49,851][00209] Avg episode reward: [(0, '28.014')] [2024-01-05 13:41:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2918.4). Total num frames: 10248192. Throughput: 0: 829.5. Samples: 57066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:41:54,853][00209] Avg episode reward: [(0, '28.519')] [2024-01-05 13:41:58,927][24421] Updated weights for policy 0, policy_version 2505 (0.0017) [2024-01-05 13:41:59,850][00209] Fps is (10 sec: 2866.5, 60 sec: 3345.0, 300 sec: 2891.2). Total num frames: 10260480. Throughput: 0: 821.8. Samples: 61164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:41:59,852][00209] Avg episode reward: [(0, '29.418')] [2024-01-05 13:42:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3003.7). Total num frames: 10285056. Throughput: 0: 859.9. Samples: 67238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:42:04,853][00209] Avg episode reward: [(0, '28.455')] [2024-01-05 13:42:08,478][24421] Updated weights for policy 0, policy_version 2515 (0.0014) [2024-01-05 13:42:09,848][00209] Fps is (10 sec: 4506.8, 60 sec: 3413.3, 300 sec: 3061.2). Total num frames: 10305536. Throughput: 0: 861.7. Samples: 70512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:42:09,852][00209] Avg episode reward: [(0, '28.757')] [2024-01-05 13:42:09,869][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002516_10305536.pth... [2024-01-05 13:42:10,028][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth [2024-01-05 13:42:14,858][00209] Fps is (10 sec: 3273.4, 60 sec: 3412.7, 300 sec: 3030.7). Total num frames: 10317824. Throughput: 0: 821.9. Samples: 75292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:42:14,862][00209] Avg episode reward: [(0, '28.111')] [2024-01-05 13:42:19,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3003.7). Total num frames: 10330112. Throughput: 0: 821.7. Samples: 79430. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:42:19,850][00209] Avg episode reward: [(0, '28.296')] [2024-01-05 13:42:21,764][24421] Updated weights for policy 0, policy_version 2525 (0.0029) [2024-01-05 13:42:24,848][00209] Fps is (10 sec: 3690.2, 60 sec: 3413.3, 300 sec: 3090.6). Total num frames: 10354688. Throughput: 0: 847.5. Samples: 82590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:42:24,850][00209] Avg episode reward: [(0, '27.884')] [2024-01-05 13:42:29,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3413.4, 300 sec: 3134.3). Total num frames: 10375168. Throughput: 0: 874.7. Samples: 89118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:42:29,853][00209] Avg episode reward: [(0, '26.953')] [2024-01-05 13:42:32,819][24421] Updated weights for policy 0, policy_version 2535 (0.0019) [2024-01-05 13:42:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3106.1). Total num frames: 10387456. Throughput: 0: 859.1. Samples: 93702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:42:34,850][00209] Avg episode reward: [(0, '28.822')] [2024-01-05 13:42:39,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3080.2). Total num frames: 10399744. Throughput: 0: 853.2. Samples: 95460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:42:39,855][00209] Avg episode reward: [(0, '28.598')] [2024-01-05 13:42:44,751][24421] Updated weights for policy 0, policy_version 2545 (0.0032) [2024-01-05 13:42:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 10424320. Throughput: 0: 885.3. Samples: 101002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:42:44,851][00209] Avg episode reward: [(0, '28.663')] [2024-01-05 13:42:49,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3185.8). Total num frames: 10444800. Throughput: 0: 895.6. Samples: 107542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:42:49,853][00209] Avg episode reward: [(0, '29.593')] [2024-01-05 13:42:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3159.8). Total num frames: 10457088. Throughput: 0: 873.5. Samples: 109820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:42:54,854][00209] Avg episode reward: [(0, '29.034')] [2024-01-05 13:42:57,053][24421] Updated weights for policy 0, policy_version 2555 (0.0024) [2024-01-05 13:42:59,848][00209] Fps is (10 sec: 2457.5, 60 sec: 3481.7, 300 sec: 3135.6). Total num frames: 10469376. Throughput: 0: 857.9. Samples: 113890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:42:59,852][00209] Avg episode reward: [(0, '27.917')] [2024-01-05 13:43:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3167.6). Total num frames: 10489856. Throughput: 0: 888.0. Samples: 119388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:43:04,850][00209] Avg episode reward: [(0, '28.048')] [2024-01-05 13:43:07,835][24421] Updated weights for policy 0, policy_version 2565 (0.0021) [2024-01-05 13:43:09,848][00209] Fps is (10 sec: 4505.7, 60 sec: 3481.6, 300 sec: 3223.9). Total num frames: 10514432. Throughput: 0: 891.2. Samples: 122696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2024-01-05 13:43:09,853][00209] Avg episode reward: [(0, '26.420')] [2024-01-05 13:43:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3482.2, 300 sec: 3200.0). Total num frames: 10526720. Throughput: 0: 863.9. Samples: 127992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:43:14,857][00209] Avg episode reward: [(0, '26.227')] [2024-01-05 13:43:19,848][00209] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3202.3). Total num frames: 10543104. Throughput: 0: 853.4. Samples: 132106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:43:19,855][00209] Avg episode reward: [(0, '25.700')] [2024-01-05 13:43:20,967][24421] Updated weights for policy 0, policy_version 2575 (0.0031) [2024-01-05 13:43:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3228.6). Total num frames: 10563584. Throughput: 0: 875.9. Samples: 134874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:43:24,850][00209] Avg episode reward: [(0, '23.035')] [2024-01-05 13:43:29,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3253.4). Total num frames: 10584064. Throughput: 0: 898.9. Samples: 141454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:43:29,850][00209] Avg episode reward: [(0, '24.440')] [2024-01-05 13:43:30,498][24421] Updated weights for policy 0, policy_version 2585 (0.0019) [2024-01-05 13:43:34,851][00209] Fps is (10 sec: 3275.6, 60 sec: 3481.4, 300 sec: 3231.2). Total num frames: 10596352. Throughput: 0: 861.2. Samples: 146298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:43:34,855][00209] Avg episode reward: [(0, '23.558')] [2024-01-05 13:43:39,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3232.5). Total num frames: 10612736. Throughput: 0: 856.0. Samples: 148338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:43:39,850][00209] Avg episode reward: [(0, '22.831')] [2024-01-05 13:43:43,753][24421] Updated weights for policy 0, policy_version 2595 (0.0031) [2024-01-05 13:43:44,848][00209] Fps is (10 sec: 3687.8, 60 sec: 3481.6, 300 sec: 3255.2). Total num frames: 10633216. Throughput: 0: 878.6. Samples: 153426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-01-05 13:43:44,850][00209] Avg episode reward: [(0, '23.944')] [2024-01-05 13:43:49,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 10653696. Throughput: 0: 902.0. Samples: 159976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:43:49,854][00209] Avg episode reward: [(0, '25.218')] [2024-01-05 13:43:54,554][24421] Updated weights for policy 0, policy_version 2605 (0.0022) [2024-01-05 13:43:54,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 10670080. Throughput: 0: 884.9. Samples: 162518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:43:54,850][00209] Avg episode reward: [(0, '25.145')] [2024-01-05 13:43:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3256.8). Total num frames: 10682368. Throughput: 0: 859.0. Samples: 166648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:43:59,853][00209] Avg episode reward: [(0, '24.526')] [2024-01-05 13:44:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 10702848. Throughput: 0: 883.6. Samples: 171870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:44:04,854][00209] Avg episode reward: [(0, '24.494')] [2024-01-05 13:44:06,461][24421] Updated weights for policy 0, policy_version 2615 (0.0013) [2024-01-05 13:44:09,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3295.8). Total num frames: 10723328. Throughput: 0: 894.6. Samples: 175132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-01-05 13:44:09,853][00209] Avg episode reward: [(0, '25.195')] [2024-01-05 13:44:09,864][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002618_10723328.pth... [2024-01-05 13:44:10,004][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth [2024-01-05 13:44:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3295.4). Total num frames: 10739712. Throughput: 0: 872.8. Samples: 180730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:44:14,855][00209] Avg episode reward: [(0, '22.935')] [2024-01-05 13:44:19,010][24421] Updated weights for policy 0, policy_version 2625 (0.0017) [2024-01-05 13:44:19,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 10752000. Throughput: 0: 855.5. Samples: 184792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:44:19,850][00209] Avg episode reward: [(0, '21.701')] [2024-01-05 13:44:24,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3294.6). Total num frames: 10772480. Throughput: 0: 858.4. Samples: 186968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:44:24,850][00209] Avg episode reward: [(0, '22.039')] [2024-01-05 13:44:29,621][24421] Updated weights for policy 0, policy_version 2635 (0.0020) [2024-01-05 13:44:29,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3311.7). Total num frames: 10792960. Throughput: 0: 888.8. Samples: 193422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:44:29,855][00209] Avg episode reward: [(0, '23.437')] [2024-01-05 13:44:34,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3310.9). Total num frames: 10809344. Throughput: 0: 865.3. Samples: 198914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:44:34,852][00209] Avg episode reward: [(0, '22.280')] [2024-01-05 13:44:39,853][00209] Fps is (10 sec: 2865.8, 60 sec: 3481.3, 300 sec: 3293.5). Total num frames: 10821632. Throughput: 0: 853.8. Samples: 200942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:44:39,855][00209] Avg episode reward: [(0, '22.934')] [2024-01-05 13:44:42,878][24421] Updated weights for policy 0, policy_version 2645 (0.0012) [2024-01-05 13:44:44,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3309.6). Total num frames: 10842112. Throughput: 0: 861.3. Samples: 205408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:44:44,850][00209] Avg episode reward: [(0, '23.544')] [2024-01-05 13:44:49,848][00209] Fps is (10 sec: 4098.0, 60 sec: 3481.6, 300 sec: 3325.0). Total num frames: 10862592. Throughput: 0: 889.3. Samples: 211888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:44:49,850][00209] Avg episode reward: [(0, '25.699')] [2024-01-05 13:44:52,818][24421] Updated weights for policy 0, policy_version 2655 (0.0017) [2024-01-05 13:44:54,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3324.1). Total num frames: 10878976. Throughput: 0: 885.8. Samples: 214992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:44:54,852][00209] Avg episode reward: [(0, '26.165')] [2024-01-05 13:44:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3307.7). Total num frames: 10891264. Throughput: 0: 851.5. Samples: 219046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:44:59,854][00209] Avg episode reward: [(0, '26.535')] [2024-01-05 13:45:04,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3307.1). Total num frames: 10907648. Throughput: 0: 864.1. Samples: 223678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:45:04,855][00209] Avg episode reward: [(0, '25.713')] [2024-01-05 13:45:05,838][24421] Updated weights for policy 0, policy_version 2665 (0.0021) [2024-01-05 13:45:09,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3336.4). Total num frames: 10932224. Throughput: 0: 887.2. Samples: 226894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:45:09,850][00209] Avg episode reward: [(0, '26.560')] [2024-01-05 13:45:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3335.3). Total num frames: 10948608. Throughput: 0: 882.8. Samples: 233150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:45:14,850][00209] Avg episode reward: [(0, '26.591')] [2024-01-05 13:45:17,148][24421] Updated weights for policy 0, policy_version 2675 (0.0028) [2024-01-05 13:45:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3319.9). Total num frames: 10960896. Throughput: 0: 851.9. Samples: 237248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:45:19,854][00209] Avg episode reward: [(0, '26.936')] [2024-01-05 13:45:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3319.2). Total num frames: 10977280. Throughput: 0: 851.6. Samples: 239258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:45:24,855][00209] Avg episode reward: [(0, '25.938')] [2024-01-05 13:45:28,704][24421] Updated weights for policy 0, policy_version 2685 (0.0013) [2024-01-05 13:45:29,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3346.2). Total num frames: 11001856. Throughput: 0: 888.4. Samples: 245386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:45:29,856][00209] Avg episode reward: [(0, '24.853')] [2024-01-05 13:45:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 11018240. Throughput: 0: 879.1. Samples: 251446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:45:34,853][00209] Avg episode reward: [(0, '24.907')] [2024-01-05 13:45:39,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.9, 300 sec: 3443.4). Total num frames: 11030528. Throughput: 0: 855.2. Samples: 253476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:45:39,850][00209] Avg episode reward: [(0, '26.272')] [2024-01-05 13:45:41,448][24421] Updated weights for policy 0, policy_version 2695 (0.0037) [2024-01-05 13:45:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 11046912. Throughput: 0: 854.6. Samples: 257502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:45:44,851][00209] Avg episode reward: [(0, '25.737')] [2024-01-05 13:45:49,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 11071488. Throughput: 0: 892.2. Samples: 263826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:45:49,850][00209] Avg episode reward: [(0, '25.365')] [2024-01-05 13:45:51,827][24421] Updated weights for policy 0, policy_version 2705 (0.0024) [2024-01-05 13:45:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 11087872. Throughput: 0: 891.1. Samples: 266994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:45:54,854][00209] Avg episode reward: [(0, '25.643')] [2024-01-05 13:45:59,849][00209] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3471.2). Total num frames: 11104256. Throughput: 0: 856.5. Samples: 271694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:45:59,852][00209] Avg episode reward: [(0, '26.505')] [2024-01-05 13:46:04,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 11116544. Throughput: 0: 855.9. Samples: 275764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:46:04,856][00209] Avg episode reward: [(0, '27.086')] [2024-01-05 13:46:05,281][24421] Updated weights for policy 0, policy_version 2715 (0.0023) [2024-01-05 13:46:09,848][00209] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 11137024. Throughput: 0: 883.3. Samples: 279006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:09,850][00209] Avg episode reward: [(0, '28.161')] [2024-01-05 13:46:09,915][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002720_11141120.pth... [2024-01-05 13:46:10,057][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002516_10305536.pth [2024-01-05 13:46:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 11157504. Throughput: 0: 888.1. Samples: 285352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:46:14,852][00209] Avg episode reward: [(0, '27.336')] [2024-01-05 13:46:15,213][24421] Updated weights for policy 0, policy_version 2725 (0.0017) [2024-01-05 13:46:19,851][00209] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3471.1). Total num frames: 11173888. Throughput: 0: 853.5. Samples: 289858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:19,859][00209] Avg episode reward: [(0, '27.873')] [2024-01-05 13:46:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 11186176. Throughput: 0: 854.5. Samples: 291930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:46:24,856][00209] Avg episode reward: [(0, '28.501')] [2024-01-05 13:46:28,030][24421] Updated weights for policy 0, policy_version 2735 (0.0024) [2024-01-05 13:46:29,848][00209] Fps is (10 sec: 3278.0, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 11206656. Throughput: 0: 888.8. Samples: 297496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:46:29,855][00209] Avg episode reward: [(0, '28.193')] [2024-01-05 13:46:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 11227136. Throughput: 0: 892.0. Samples: 303964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:34,855][00209] Avg episode reward: [(0, '28.537')] [2024-01-05 13:46:39,348][24421] Updated weights for policy 0, policy_version 2745 (0.0030) [2024-01-05 13:46:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 11243520. Throughput: 0: 870.0. Samples: 306142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:46:39,850][00209] Avg episode reward: [(0, '30.428')] [2024-01-05 13:46:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 11255808. Throughput: 0: 855.2. Samples: 310178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:44,853][00209] Avg episode reward: [(0, '29.307')] [2024-01-05 13:46:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11280384. Throughput: 0: 894.4. Samples: 316010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:49,850][00209] Avg episode reward: [(0, '30.416')] [2024-01-05 13:46:50,767][24421] Updated weights for policy 0, policy_version 2755 (0.0020) [2024-01-05 13:46:54,855][00209] Fps is (10 sec: 4502.5, 60 sec: 3549.5, 300 sec: 3526.7). Total num frames: 11300864. Throughput: 0: 894.6. Samples: 319268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:54,857][00209] Avg episode reward: [(0, '28.974')] [2024-01-05 13:46:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 11313152. Throughput: 0: 868.3. Samples: 324426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:46:59,853][00209] Avg episode reward: [(0, '29.626')] [2024-01-05 13:47:03,581][24421] Updated weights for policy 0, policy_version 2765 (0.0030) [2024-01-05 13:47:04,848][00209] Fps is (10 sec: 2459.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 11325440. Throughput: 0: 858.8. Samples: 328500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:47:04,850][00209] Avg episode reward: [(0, '30.483')] [2024-01-05 13:47:09,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.1). Total num frames: 11350016. Throughput: 0: 873.6. Samples: 331242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:47:09,853][00209] Avg episode reward: [(0, '29.438')] [2024-01-05 13:47:13,608][24421] Updated weights for policy 0, policy_version 2775 (0.0020) [2024-01-05 13:47:14,850][00209] Fps is (10 sec: 4504.7, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 11370496. Throughput: 0: 895.0. Samples: 337772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:47:14,852][00209] Avg episode reward: [(0, '28.583')] [2024-01-05 13:47:19,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 11382784. Throughput: 0: 863.6. Samples: 342828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:47:19,851][00209] Avg episode reward: [(0, '28.200')] [2024-01-05 13:47:24,848][00209] Fps is (10 sec: 2867.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 11399168. Throughput: 0: 861.2. Samples: 344894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:47:24,856][00209] Avg episode reward: [(0, '28.006')] [2024-01-05 13:47:26,858][24421] Updated weights for policy 0, policy_version 2785 (0.0015) [2024-01-05 13:47:29,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 11419648. Throughput: 0: 881.5. Samples: 349846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:47:29,850][00209] Avg episode reward: [(0, '29.358')] [2024-01-05 13:47:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 11440128. Throughput: 0: 897.9. Samples: 356416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:47:34,851][00209] Avg episode reward: [(0, '30.518')] [2024-01-05 13:47:36,564][24421] Updated weights for policy 0, policy_version 2795 (0.0013) [2024-01-05 13:47:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 11456512. Throughput: 0: 886.5. Samples: 359154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:47:39,854][00209] Avg episode reward: [(0, '29.272')] [2024-01-05 13:47:44,850][00209] Fps is (10 sec: 2866.4, 60 sec: 3549.7, 300 sec: 3471.2). Total num frames: 11468800. Throughput: 0: 861.6. Samples: 363200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:47:44,861][00209] Avg episode reward: [(0, '29.316')] [2024-01-05 13:47:49,638][24421] Updated weights for policy 0, policy_version 2805 (0.0034) [2024-01-05 13:47:49,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11489280. Throughput: 0: 887.4. Samples: 368434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:47:49,850][00209] Avg episode reward: [(0, '30.107')] [2024-01-05 13:47:54,848][00209] Fps is (10 sec: 4097.1, 60 sec: 3482.0, 300 sec: 3526.7). Total num frames: 11509760. Throughput: 0: 898.8. Samples: 371690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:47:54,850][00209] Avg episode reward: [(0, '30.392')] [2024-01-05 13:47:59,850][00209] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 11526144. Throughput: 0: 881.7. Samples: 377448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:47:59,852][00209] Avg episode reward: [(0, '29.565')] [2024-01-05 13:48:00,783][24421] Updated weights for policy 0, policy_version 2815 (0.0014) [2024-01-05 13:48:04,854][00209] Fps is (10 sec: 2865.3, 60 sec: 3549.5, 300 sec: 3471.1). Total num frames: 11538432. Throughput: 0: 858.0. Samples: 381444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:48:04,858][00209] Avg episode reward: [(0, '28.973')] [2024-01-05 13:48:09,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11558912. Throughput: 0: 860.1. Samples: 383598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:48:09,856][00209] Avg episode reward: [(0, '28.056')] [2024-01-05 13:48:09,869][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002822_11558912.pth... [2024-01-05 13:48:10,006][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002618_10723328.pth [2024-01-05 13:48:12,516][24421] Updated weights for policy 0, policy_version 2825 (0.0016) [2024-01-05 13:48:14,848][00209] Fps is (10 sec: 4098.8, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 11579392. Throughput: 0: 893.8. Samples: 390068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:48:14,853][00209] Avg episode reward: [(0, '28.301')] [2024-01-05 13:48:19,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 11595776. Throughput: 0: 871.6. Samples: 395640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:48:19,854][00209] Avg episode reward: [(0, '28.085')] [2024-01-05 13:48:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 11608064. Throughput: 0: 855.7. Samples: 397662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:48:24,853][00209] Avg episode reward: [(0, '27.933')] [2024-01-05 13:48:25,607][24421] Updated weights for policy 0, policy_version 2835 (0.0014) [2024-01-05 13:48:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11628544. Throughput: 0: 863.9. Samples: 402072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:48:29,850][00209] Avg episode reward: [(0, '27.800')] [2024-01-05 13:48:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 11649024. Throughput: 0: 890.0. Samples: 408484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:48:34,853][00209] Avg episode reward: [(0, '27.861')] [2024-01-05 13:48:35,487][24421] Updated weights for policy 0, policy_version 2845 (0.0021) [2024-01-05 13:48:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11665408. Throughput: 0: 890.7. Samples: 411770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:48:39,855][00209] Avg episode reward: [(0, '27.405')] [2024-01-05 13:48:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3471.2). Total num frames: 11677696. Throughput: 0: 852.8. Samples: 415824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:48:44,850][00209] Avg episode reward: [(0, '26.757')] [2024-01-05 13:48:48,995][24421] Updated weights for policy 0, policy_version 2855 (0.0030) [2024-01-05 13:48:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 11694080. Throughput: 0: 868.2. Samples: 420508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 13:48:49,850][00209] Avg episode reward: [(0, '26.179')] [2024-01-05 13:48:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 11718656. Throughput: 0: 892.1. Samples: 423744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:48:54,850][00209] Avg episode reward: [(0, '27.112')] [2024-01-05 13:48:58,763][24421] Updated weights for policy 0, policy_version 2865 (0.0034) [2024-01-05 13:48:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3499.0). Total num frames: 11735040. Throughput: 0: 888.4. Samples: 430046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:48:59,852][00209] Avg episode reward: [(0, '26.801')] [2024-01-05 13:49:04,849][00209] Fps is (10 sec: 2867.0, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 11747328. Throughput: 0: 855.0. Samples: 434116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:49:04,853][00209] Avg episode reward: [(0, '27.455')] [2024-01-05 13:49:09,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 11767808. Throughput: 0: 855.2. Samples: 436148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-01-05 13:49:09,851][00209] Avg episode reward: [(0, '28.507')] [2024-01-05 13:49:11,701][24421] Updated weights for policy 0, policy_version 2875 (0.0020) [2024-01-05 13:49:14,848][00209] Fps is (10 sec: 4096.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 11788288. Throughput: 0: 892.3. Samples: 442224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:49:14,855][00209] Avg episode reward: [(0, '29.360')] [2024-01-05 13:49:19,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11804672. Throughput: 0: 886.4. Samples: 448372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:49:19,850][00209] Avg episode reward: [(0, '28.686')] [2024-01-05 13:49:23,172][24421] Updated weights for policy 0, policy_version 2885 (0.0019) [2024-01-05 13:49:24,850][00209] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 11821056. Throughput: 0: 857.7. Samples: 450368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:49:24,854][00209] Avg episode reward: [(0, '29.241')] [2024-01-05 13:49:29,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 11833344. Throughput: 0: 857.5. Samples: 454410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:49:29,853][00209] Avg episode reward: [(0, '28.893')] [2024-01-05 13:49:34,737][24421] Updated weights for policy 0, policy_version 2895 (0.0024) [2024-01-05 13:49:34,848][00209] Fps is (10 sec: 3687.4, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 11857920. Throughput: 0: 891.1. Samples: 460606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:49:34,850][00209] Avg episode reward: [(0, '28.918')] [2024-01-05 13:49:39,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 11874304. Throughput: 0: 890.8. Samples: 463832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:49:39,854][00209] Avg episode reward: [(0, '27.669')] [2024-01-05 13:49:44,850][00209] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 11890688. Throughput: 0: 855.8. Samples: 468560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:49:44,855][00209] Avg episode reward: [(0, '27.162')] [2024-01-05 13:49:47,519][24421] Updated weights for policy 0, policy_version 2905 (0.0032) [2024-01-05 13:49:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 11902976. Throughput: 0: 856.6. Samples: 472662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:49:49,854][00209] Avg episode reward: [(0, '26.037')] [2024-01-05 13:49:54,848][00209] Fps is (10 sec: 3687.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 11927552. Throughput: 0: 883.7. Samples: 475914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:49:54,854][00209] Avg episode reward: [(0, '27.814')] [2024-01-05 13:49:57,295][24421] Updated weights for policy 0, policy_version 2915 (0.0031) [2024-01-05 13:49:59,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 11948032. Throughput: 0: 893.6. Samples: 482434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:49:59,854][00209] Avg episode reward: [(0, '26.399')] [2024-01-05 13:50:04,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 11960320. Throughput: 0: 855.0. Samples: 486848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:50:04,850][00209] Avg episode reward: [(0, '27.544')] [2024-01-05 13:50:09,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 11972608. Throughput: 0: 854.9. Samples: 488838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:50:09,851][00209] Avg episode reward: [(0, '27.239')] [2024-01-05 13:50:09,863][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002923_11972608.pth... [2024-01-05 13:50:09,987][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002720_11141120.pth [2024-01-05 13:50:10,975][24421] Updated weights for policy 0, policy_version 2925 (0.0033) [2024-01-05 13:50:14,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 11993088. Throughput: 0: 885.2. Samples: 494244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:50:14,855][00209] Avg episode reward: [(0, '27.184')] [2024-01-05 13:50:19,848][00209] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 12017664. Throughput: 0: 882.8. Samples: 500334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:50:19,850][00209] Avg episode reward: [(0, '28.270')] [2024-01-05 13:50:21,431][24421] Updated weights for policy 0, policy_version 2935 (0.0032) [2024-01-05 13:50:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 12029952. Throughput: 0: 860.0. Samples: 502534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 13:50:24,850][00209] Avg episode reward: [(0, '28.681')] [2024-01-05 13:50:29,848][00209] Fps is (10 sec: 2457.5, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12042240. Throughput: 0: 846.0. Samples: 506630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:50:29,850][00209] Avg episode reward: [(0, '27.461')] [2024-01-05 13:50:34,145][24421] Updated weights for policy 0, policy_version 2945 (0.0026) [2024-01-05 13:50:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 12062720. Throughput: 0: 884.3. Samples: 512456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:50:34,855][00209] Avg episode reward: [(0, '28.438')] [2024-01-05 13:50:39,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 12083200. Throughput: 0: 883.3. Samples: 515662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:50:39,851][00209] Avg episode reward: [(0, '29.011')] [2024-01-05 13:50:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 12099584. Throughput: 0: 853.4. Samples: 520836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:50:44,853][00209] Avg episode reward: [(0, '28.893')] [2024-01-05 13:50:45,742][24421] Updated weights for policy 0, policy_version 2955 (0.0020) [2024-01-05 13:50:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12111872. Throughput: 0: 844.8. Samples: 524864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:50:49,852][00209] Avg episode reward: [(0, '29.241')] [2024-01-05 13:50:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 12132352. Throughput: 0: 862.2. Samples: 527638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:50:54,850][00209] Avg episode reward: [(0, '30.502')] [2024-01-05 13:50:57,026][24421] Updated weights for policy 0, policy_version 2965 (0.0019) [2024-01-05 13:50:59,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 12156928. Throughput: 0: 887.0. Samples: 534160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:50:59,850][00209] Avg episode reward: [(0, '31.377')] [2024-01-05 13:51:04,855][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 12169216. Throughput: 0: 861.2. Samples: 539088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:04,858][00209] Avg episode reward: [(0, '31.990')] [2024-01-05 13:51:04,868][24408] Saving new best policy, reward=31.990! [2024-01-05 13:51:09,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12181504. Throughput: 0: 854.9. Samples: 541006. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:09,854][00209] Avg episode reward: [(0, '31.608')] [2024-01-05 13:51:10,098][24421] Updated weights for policy 0, policy_version 2975 (0.0027) [2024-01-05 13:51:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12201984. Throughput: 0: 871.9. Samples: 545866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:51:14,856][00209] Avg episode reward: [(0, '31.278')] [2024-01-05 13:51:19,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 12222464. Throughput: 0: 887.2. Samples: 552378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:51:19,854][00209] Avg episode reward: [(0, '31.942')] [2024-01-05 13:51:20,046][24421] Updated weights for policy 0, policy_version 2985 (0.0018) [2024-01-05 13:51:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 12238848. Throughput: 0: 876.1. Samples: 555088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:51:24,852][00209] Avg episode reward: [(0, '32.858')] [2024-01-05 13:51:24,858][24408] Saving new best policy, reward=32.858! [2024-01-05 13:51:29,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12251136. Throughput: 0: 851.6. Samples: 559158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:29,855][00209] Avg episode reward: [(0, '31.526')] [2024-01-05 13:51:33,328][24421] Updated weights for policy 0, policy_version 2995 (0.0031) [2024-01-05 13:51:34,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12271616. Throughput: 0: 877.6. Samples: 564354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:34,850][00209] Avg episode reward: [(0, '30.504')] [2024-01-05 13:51:39,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 12292096. Throughput: 0: 887.4. Samples: 567570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-01-05 13:51:39,850][00209] Avg episode reward: [(0, '30.667')] [2024-01-05 13:51:44,001][24421] Updated weights for policy 0, policy_version 3005 (0.0016) [2024-01-05 13:51:44,854][00209] Fps is (10 sec: 3684.0, 60 sec: 3481.2, 300 sec: 3485.0). Total num frames: 12308480. Throughput: 0: 870.0. Samples: 573314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:51:44,857][00209] Avg episode reward: [(0, '30.427')] [2024-01-05 13:51:49,852][00209] Fps is (10 sec: 2865.9, 60 sec: 3481.3, 300 sec: 3457.3). Total num frames: 12320768. Throughput: 0: 852.1. Samples: 577436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:51:49,855][00209] Avg episode reward: [(0, '29.878')] [2024-01-05 13:51:54,848][00209] Fps is (10 sec: 3279.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12341248. Throughput: 0: 858.6. Samples: 579642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:54,850][00209] Avg episode reward: [(0, '29.209')] [2024-01-05 13:51:56,116][24421] Updated weights for policy 0, policy_version 3015 (0.0018) [2024-01-05 13:51:59,854][00209] Fps is (10 sec: 4095.5, 60 sec: 3413.0, 300 sec: 3512.8). Total num frames: 12361728. Throughput: 0: 894.9. Samples: 586140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:51:59,860][00209] Avg episode reward: [(0, '27.564')] [2024-01-05 13:52:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12378112. Throughput: 0: 872.4. Samples: 591634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:52:04,850][00209] Avg episode reward: [(0, '27.047')] [2024-01-05 13:52:08,172][24421] Updated weights for policy 0, policy_version 3025 (0.0024) [2024-01-05 13:52:09,848][00209] Fps is (10 sec: 3278.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 12394496. Throughput: 0: 856.9. Samples: 593648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:52:09,855][00209] Avg episode reward: [(0, '26.826')] [2024-01-05 13:52:09,865][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003026_12394496.pth... [2024-01-05 13:52:10,008][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002822_11558912.pth [2024-01-05 13:52:14,849][00209] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 12410880. Throughput: 0: 861.5. Samples: 597926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:52:14,857][00209] Avg episode reward: [(0, '26.380')] [2024-01-05 13:52:19,221][24421] Updated weights for policy 0, policy_version 3035 (0.0019) [2024-01-05 13:52:19,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 12431360. Throughput: 0: 890.4. Samples: 604420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:52:19,850][00209] Avg episode reward: [(0, '24.564')] [2024-01-05 13:52:24,848][00209] Fps is (10 sec: 3686.9, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12447744. Throughput: 0: 890.5. Samples: 607644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:52:24,852][00209] Avg episode reward: [(0, '25.506')] [2024-01-05 13:52:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 12464128. Throughput: 0: 856.2. Samples: 611836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:52:29,857][00209] Avg episode reward: [(0, '24.259')] [2024-01-05 13:52:32,729][24421] Updated weights for policy 0, policy_version 3045 (0.0015) [2024-01-05 13:52:34,854][00209] Fps is (10 sec: 3274.9, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 12480512. Throughput: 0: 865.7. Samples: 616392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:52:34,856][00209] Avg episode reward: [(0, '25.417')] [2024-01-05 13:52:39,849][00209] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3499.0). Total num frames: 12500992. Throughput: 0: 888.8. Samples: 619640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:52:39,851][00209] Avg episode reward: [(0, '26.137')] [2024-01-05 13:52:42,028][24421] Updated weights for policy 0, policy_version 3055 (0.0015) [2024-01-05 13:52:44,848][00209] Fps is (10 sec: 4098.4, 60 sec: 3550.3, 300 sec: 3499.0). Total num frames: 12521472. Throughput: 0: 888.4. Samples: 626114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:52:44,850][00209] Avg episode reward: [(0, '26.152')] [2024-01-05 13:52:49,848][00209] Fps is (10 sec: 3277.1, 60 sec: 3550.1, 300 sec: 3471.2). Total num frames: 12533760. Throughput: 0: 857.0. Samples: 630200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:52:49,855][00209] Avg episode reward: [(0, '26.929')] [2024-01-05 13:52:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12550144. Throughput: 0: 856.9. Samples: 632210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:52:54,856][00209] Avg episode reward: [(0, '28.164')] [2024-01-05 13:52:55,457][24421] Updated weights for policy 0, policy_version 3065 (0.0013) [2024-01-05 13:52:59,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.9, 300 sec: 3499.0). Total num frames: 12570624. Throughput: 0: 896.9. Samples: 638286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:52:59,856][00209] Avg episode reward: [(0, '28.058')] [2024-01-05 13:53:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 12591104. Throughput: 0: 888.4. Samples: 644396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:53:04,852][00209] Avg episode reward: [(0, '29.293')] [2024-01-05 13:53:05,810][24421] Updated weights for policy 0, policy_version 3075 (0.0017) [2024-01-05 13:53:09,852][00209] Fps is (10 sec: 3275.6, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 12603392. Throughput: 0: 860.9. Samples: 646390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:53:09,854][00209] Avg episode reward: [(0, '30.294')] [2024-01-05 13:53:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 12619776. Throughput: 0: 858.5. Samples: 650468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:53:14,850][00209] Avg episode reward: [(0, '30.632')] [2024-01-05 13:53:18,283][24421] Updated weights for policy 0, policy_version 3085 (0.0020) [2024-01-05 13:53:19,848][00209] Fps is (10 sec: 3687.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 12640256. Throughput: 0: 897.1. Samples: 656758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:53:19,849][00209] Avg episode reward: [(0, '29.922')] [2024-01-05 13:53:24,853][00209] Fps is (10 sec: 4093.7, 60 sec: 3549.5, 300 sec: 3498.9). Total num frames: 12660736. Throughput: 0: 896.8. Samples: 660002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:53:24,856][00209] Avg episode reward: [(0, '29.639')] [2024-01-05 13:53:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12673024. Throughput: 0: 859.1. Samples: 664772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:53:29,850][00209] Avg episode reward: [(0, '29.796')] [2024-01-05 13:53:30,118][24421] Updated weights for policy 0, policy_version 3095 (0.0016) [2024-01-05 13:53:34,848][00209] Fps is (10 sec: 2868.8, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 12689408. Throughput: 0: 858.0. Samples: 668810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:53:34,855][00209] Avg episode reward: [(0, '29.734')] [2024-01-05 13:53:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3499.0). Total num frames: 12709888. Throughput: 0: 884.9. Samples: 672032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:53:39,850][00209] Avg episode reward: [(0, '30.178')] [2024-01-05 13:53:41,126][24421] Updated weights for policy 0, policy_version 3105 (0.0019) [2024-01-05 13:53:44,853][00209] Fps is (10 sec: 4093.7, 60 sec: 3481.3, 300 sec: 3512.8). Total num frames: 12730368. Throughput: 0: 893.8. Samples: 678510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:53:44,856][00209] Avg episode reward: [(0, '29.776')] [2024-01-05 13:53:49,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12742656. Throughput: 0: 856.6. Samples: 682942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:53:49,853][00209] Avg episode reward: [(0, '29.683')] [2024-01-05 13:53:54,400][24421] Updated weights for policy 0, policy_version 3115 (0.0014) [2024-01-05 13:53:54,848][00209] Fps is (10 sec: 2868.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12759040. Throughput: 0: 858.0. Samples: 684998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:53:54,852][00209] Avg episode reward: [(0, '28.401')] [2024-01-05 13:53:59,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 12779520. Throughput: 0: 889.2. Samples: 690482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:53:59,851][00209] Avg episode reward: [(0, '29.744')] [2024-01-05 13:54:03,941][24421] Updated weights for policy 0, policy_version 3125 (0.0027) [2024-01-05 13:54:04,856][00209] Fps is (10 sec: 4092.8, 60 sec: 3481.1, 300 sec: 3498.9). Total num frames: 12800000. Throughput: 0: 891.4. Samples: 696880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:54:04,858][00209] Avg episode reward: [(0, '30.809')] [2024-01-05 13:54:09,853][00209] Fps is (10 sec: 3684.6, 60 sec: 3549.8, 300 sec: 3485.0). Total num frames: 12816384. Throughput: 0: 867.0. Samples: 699014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:54:09,855][00209] Avg episode reward: [(0, '31.232')] [2024-01-05 13:54:09,875][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003129_12816384.pth... [2024-01-05 13:54:10,069][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002923_11972608.pth [2024-01-05 13:54:14,848][00209] Fps is (10 sec: 2869.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12828672. Throughput: 0: 850.5. Samples: 703046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:54:14,853][00209] Avg episode reward: [(0, '30.926')] [2024-01-05 13:54:17,485][24421] Updated weights for policy 0, policy_version 3135 (0.0029) [2024-01-05 13:54:19,848][00209] Fps is (10 sec: 3278.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12849152. Throughput: 0: 889.9. Samples: 708854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:54:19,854][00209] Avg episode reward: [(0, '30.427')] [2024-01-05 13:54:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.9, 300 sec: 3512.8). Total num frames: 12869632. Throughput: 0: 891.0. Samples: 712128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:54:24,855][00209] Avg episode reward: [(0, '30.727')] [2024-01-05 13:54:27,996][24421] Updated weights for policy 0, policy_version 3145 (0.0034) [2024-01-05 13:54:29,853][00209] Fps is (10 sec: 3684.3, 60 sec: 3549.5, 300 sec: 3485.0). Total num frames: 12886016. Throughput: 0: 864.3. Samples: 717404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:29,856][00209] Avg episode reward: [(0, '31.905')] [2024-01-05 13:54:34,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 12898304. Throughput: 0: 854.9. Samples: 721412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:34,853][00209] Avg episode reward: [(0, '31.011')] [2024-01-05 13:54:39,848][00209] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 12918784. Throughput: 0: 869.4. Samples: 724122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:54:39,857][00209] Avg episode reward: [(0, '29.946')] [2024-01-05 13:54:40,140][24421] Updated weights for policy 0, policy_version 3155 (0.0028) [2024-01-05 13:54:44,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.9, 300 sec: 3512.8). Total num frames: 12939264. Throughput: 0: 892.4. Samples: 730640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:44,856][00209] Avg episode reward: [(0, '27.639')] [2024-01-05 13:54:49,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 12955648. Throughput: 0: 858.6. Samples: 735510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:49,850][00209] Avg episode reward: [(0, '28.606')] [2024-01-05 13:54:52,461][24421] Updated weights for policy 0, policy_version 3165 (0.0025) [2024-01-05 13:54:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 12967936. Throughput: 0: 856.3. Samples: 737544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:54,850][00209] Avg episode reward: [(0, '27.739')] [2024-01-05 13:54:59,849][00209] Fps is (10 sec: 3276.5, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 12988416. Throughput: 0: 878.3. Samples: 742572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:54:59,852][00209] Avg episode reward: [(0, '27.930')] [2024-01-05 13:55:03,114][24421] Updated weights for policy 0, policy_version 3175 (0.0034) [2024-01-05 13:55:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3482.1, 300 sec: 3512.8). Total num frames: 13008896. Throughput: 0: 892.8. Samples: 749030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:55:04,856][00209] Avg episode reward: [(0, '28.252')] [2024-01-05 13:55:09,850][00209] Fps is (10 sec: 3686.1, 60 sec: 3481.8, 300 sec: 3498.9). Total num frames: 13025280. Throughput: 0: 877.7. Samples: 751624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:55:09,852][00209] Avg episode reward: [(0, '29.970')] [2024-01-05 13:55:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 13037568. Throughput: 0: 852.8. Samples: 755776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:55:14,855][00209] Avg episode reward: [(0, '29.560')] [2024-01-05 13:55:16,497][24421] Updated weights for policy 0, policy_version 3185 (0.0023) [2024-01-05 13:55:19,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13058048. Throughput: 0: 881.8. Samples: 761094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:55:19,855][00209] Avg episode reward: [(0, '29.357')] [2024-01-05 13:55:24,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 13082624. Throughput: 0: 893.3. Samples: 764318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:55:24,850][00209] Avg episode reward: [(0, '29.719')] [2024-01-05 13:55:25,780][24421] Updated weights for policy 0, policy_version 3195 (0.0021) [2024-01-05 13:55:29,850][00209] Fps is (10 sec: 3685.4, 60 sec: 3481.8, 300 sec: 3498.9). Total num frames: 13094912. Throughput: 0: 876.6. Samples: 770090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:55:29,857][00209] Avg episode reward: [(0, '29.832')] [2024-01-05 13:55:34,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 13111296. Throughput: 0: 860.3. Samples: 774222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:55:34,855][00209] Avg episode reward: [(0, '28.217')] [2024-01-05 13:55:39,098][24421] Updated weights for policy 0, policy_version 3205 (0.0020) [2024-01-05 13:55:39,848][00209] Fps is (10 sec: 3277.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13127680. Throughput: 0: 862.8. Samples: 776368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:55:39,850][00209] Avg episode reward: [(0, '26.632')] [2024-01-05 13:55:44,853][00209] Fps is (10 sec: 4093.9, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 13152256. Throughput: 0: 895.5. Samples: 782872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:55:44,855][00209] Avg episode reward: [(0, '26.987')] [2024-01-05 13:55:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 13164544. Throughput: 0: 875.1. Samples: 788410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:55:49,850][00209] Avg episode reward: [(0, '27.523')] [2024-01-05 13:55:49,914][24421] Updated weights for policy 0, policy_version 3215 (0.0026) [2024-01-05 13:55:54,848][00209] Fps is (10 sec: 2868.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 13180928. Throughput: 0: 863.7. Samples: 790490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:55:54,854][00209] Avg episode reward: [(0, '28.302')] [2024-01-05 13:55:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 13197312. Throughput: 0: 871.2. Samples: 794978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:55:59,853][00209] Avg episode reward: [(0, '27.604')] [2024-01-05 13:56:02,080][24421] Updated weights for policy 0, policy_version 3225 (0.0016) [2024-01-05 13:56:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 13217792. Throughput: 0: 896.0. Samples: 801416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:56:04,851][00209] Avg episode reward: [(0, '28.288')] [2024-01-05 13:56:09,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 13238272. Throughput: 0: 892.8. Samples: 804494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:56:09,855][00209] Avg episode reward: [(0, '29.215')] [2024-01-05 13:56:09,871][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003232_13238272.pth... [2024-01-05 13:56:10,040][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003026_12394496.pth [2024-01-05 13:56:14,724][24421] Updated weights for policy 0, policy_version 3235 (0.0024) [2024-01-05 13:56:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 13250560. Throughput: 0: 853.2. Samples: 808480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:56:14,854][00209] Avg episode reward: [(0, '28.895')] [2024-01-05 13:56:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13266944. Throughput: 0: 865.2. Samples: 813158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:56:19,856][00209] Avg episode reward: [(0, '28.785')] [2024-01-05 13:56:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 13287424. Throughput: 0: 890.2. Samples: 816426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:56:24,855][00209] Avg episode reward: [(0, '27.964')] [2024-01-05 13:56:25,206][24421] Updated weights for policy 0, policy_version 3245 (0.0018) [2024-01-05 13:56:29,848][00209] Fps is (10 sec: 4095.8, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 13307904. Throughput: 0: 886.9. Samples: 822778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:56:29,854][00209] Avg episode reward: [(0, '28.021')] [2024-01-05 13:56:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13320192. Throughput: 0: 857.2. Samples: 826982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 13:56:34,856][00209] Avg episode reward: [(0, '28.466')] [2024-01-05 13:56:38,412][24421] Updated weights for policy 0, policy_version 3255 (0.0040) [2024-01-05 13:56:39,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3485.2). Total num frames: 13336576. Throughput: 0: 855.2. Samples: 828974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:56:39,855][00209] Avg episode reward: [(0, '28.581')] [2024-01-05 13:56:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3512.9). Total num frames: 13357056. Throughput: 0: 891.4. Samples: 835090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:56:44,855][00209] Avg episode reward: [(0, '28.874')] [2024-01-05 13:56:47,704][24421] Updated weights for policy 0, policy_version 3265 (0.0013) [2024-01-05 13:56:49,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13377536. Throughput: 0: 884.2. Samples: 841206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:56:49,850][00209] Avg episode reward: [(0, '29.521')] [2024-01-05 13:56:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13389824. Throughput: 0: 860.6. Samples: 843220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:56:54,854][00209] Avg episode reward: [(0, '29.925')] [2024-01-05 13:56:59,849][00209] Fps is (10 sec: 2866.9, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 13406208. Throughput: 0: 865.2. Samples: 847414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:56:59,853][00209] Avg episode reward: [(0, '30.162')] [2024-01-05 13:57:00,912][24421] Updated weights for policy 0, policy_version 3275 (0.0022) [2024-01-05 13:57:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13430784. Throughput: 0: 899.2. Samples: 853622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:57:04,853][00209] Avg episode reward: [(0, '30.808')] [2024-01-05 13:57:09,848][00209] Fps is (10 sec: 4096.5, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 13447168. Throughput: 0: 896.4. Samples: 856764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:57:09,857][00209] Avg episode reward: [(0, '31.335')] [2024-01-05 13:57:11,719][24421] Updated weights for policy 0, policy_version 3285 (0.0024) [2024-01-05 13:57:14,849][00209] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 13463552. Throughput: 0: 859.8. Samples: 861472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:57:14,852][00209] Avg episode reward: [(0, '31.144')] [2024-01-05 13:57:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13475840. Throughput: 0: 860.0. Samples: 865684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:57:19,856][00209] Avg episode reward: [(0, '30.288')] [2024-01-05 13:57:23,750][24421] Updated weights for policy 0, policy_version 3295 (0.0018) [2024-01-05 13:57:24,848][00209] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13500416. Throughput: 0: 887.4. Samples: 868908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:57:24,851][00209] Avg episode reward: [(0, '31.429')] [2024-01-05 13:57:29,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 13520896. Throughput: 0: 896.9. Samples: 875450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:57:29,850][00209] Avg episode reward: [(0, '32.359')] [2024-01-05 13:57:34,848][00209] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 13533184. Throughput: 0: 861.2. Samples: 879960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:57:34,852][00209] Avg episode reward: [(0, '32.488')] [2024-01-05 13:57:35,541][24421] Updated weights for policy 0, policy_version 3305 (0.0024) [2024-01-05 13:57:39,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 13545472. Throughput: 0: 861.4. Samples: 881984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:57:39,853][00209] Avg episode reward: [(0, '31.445')] [2024-01-05 13:57:44,848][00209] Fps is (10 sec: 3686.7, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13570048. Throughput: 0: 890.4. Samples: 887480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 13:57:44,850][00209] Avg episode reward: [(0, '31.359')] [2024-01-05 13:57:46,734][24421] Updated weights for policy 0, policy_version 3315 (0.0021) [2024-01-05 13:57:49,848][00209] Fps is (10 sec: 4505.5, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 13590528. Throughput: 0: 897.9. Samples: 894030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:57:49,852][00209] Avg episode reward: [(0, '31.225')] [2024-01-05 13:57:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 13602816. Throughput: 0: 876.7. Samples: 896214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:57:54,851][00209] Avg episode reward: [(0, '30.568')] [2024-01-05 13:57:59,846][24421] Updated weights for policy 0, policy_version 3325 (0.0014) [2024-01-05 13:57:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 13619200. Throughput: 0: 864.3. Samples: 900364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 13:57:59,850][00209] Avg episode reward: [(0, '30.801')] [2024-01-05 13:58:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 13639680. Throughput: 0: 897.7. Samples: 906080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:04,851][00209] Avg episode reward: [(0, '29.175')] [2024-01-05 13:58:09,512][24421] Updated weights for policy 0, policy_version 3335 (0.0019) [2024-01-05 13:58:09,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 13660160. Throughput: 0: 894.9. Samples: 909180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:58:09,852][00209] Avg episode reward: [(0, '28.887')] [2024-01-05 13:58:09,869][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003335_13660160.pth... [2024-01-05 13:58:10,029][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003129_12816384.pth [2024-01-05 13:58:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3499.0). Total num frames: 13672448. Throughput: 0: 864.5. Samples: 914354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:14,850][00209] Avg episode reward: [(0, '28.089')] [2024-01-05 13:58:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 13688832. Throughput: 0: 856.3. Samples: 918494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:58:19,856][00209] Avg episode reward: [(0, '27.937')] [2024-01-05 13:58:22,734][24421] Updated weights for policy 0, policy_version 3345 (0.0022) [2024-01-05 13:58:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 13709312. Throughput: 0: 872.4. Samples: 921242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:58:24,850][00209] Avg episode reward: [(0, '28.926')] [2024-01-05 13:58:29,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 13729792. Throughput: 0: 896.8. Samples: 927838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:29,850][00209] Avg episode reward: [(0, '29.234')] [2024-01-05 13:58:33,275][24421] Updated weights for policy 0, policy_version 3355 (0.0014) [2024-01-05 13:58:34,848][00209] Fps is (10 sec: 3686.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13746176. Throughput: 0: 863.2. Samples: 932872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:58:34,852][00209] Avg episode reward: [(0, '28.419')] [2024-01-05 13:58:39,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 13758464. Throughput: 0: 858.9. Samples: 934866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:58:39,850][00209] Avg episode reward: [(0, '27.440')] [2024-01-05 13:58:44,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 13778944. Throughput: 0: 878.8. Samples: 939912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:44,850][00209] Avg episode reward: [(0, '28.064')] [2024-01-05 13:58:45,497][24421] Updated weights for policy 0, policy_version 3365 (0.0014) [2024-01-05 13:58:49,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 13799424. Throughput: 0: 896.7. Samples: 946432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:49,850][00209] Avg episode reward: [(0, '27.402')] [2024-01-05 13:58:54,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13815808. Throughput: 0: 886.1. Samples: 949054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:58:54,853][00209] Avg episode reward: [(0, '28.412')] [2024-01-05 13:58:57,544][24421] Updated weights for policy 0, policy_version 3375 (0.0033) [2024-01-05 13:58:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.2). Total num frames: 13828096. Throughput: 0: 863.9. Samples: 953228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:58:59,852][00209] Avg episode reward: [(0, '28.325')] [2024-01-05 13:59:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 13848576. Throughput: 0: 887.9. Samples: 958450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:04,858][00209] Avg episode reward: [(0, '28.686')] [2024-01-05 13:59:08,444][24421] Updated weights for policy 0, policy_version 3385 (0.0038) [2024-01-05 13:59:09,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 13869056. Throughput: 0: 898.6. Samples: 961680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 13:59:09,857][00209] Avg episode reward: [(0, '30.005')] [2024-01-05 13:59:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 13885440. Throughput: 0: 878.0. Samples: 967350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:14,857][00209] Avg episode reward: [(0, '29.100')] [2024-01-05 13:59:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 13897728. Throughput: 0: 859.4. Samples: 971546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 13:59:19,856][00209] Avg episode reward: [(0, '29.932')] [2024-01-05 13:59:21,495][24421] Updated weights for policy 0, policy_version 3395 (0.0039) [2024-01-05 13:59:24,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 13918208. Throughput: 0: 865.2. Samples: 973800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:59:24,850][00209] Avg episode reward: [(0, '30.641')] [2024-01-05 13:59:29,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 13938688. Throughput: 0: 899.6. Samples: 980392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 13:59:29,850][00209] Avg episode reward: [(0, '31.821')] [2024-01-05 13:59:31,028][24421] Updated weights for policy 0, policy_version 3405 (0.0016) [2024-01-05 13:59:34,849][00209] Fps is (10 sec: 3686.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 13955072. Throughput: 0: 874.7. Samples: 985796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:34,854][00209] Avg episode reward: [(0, '31.381')] [2024-01-05 13:59:39,855][00209] Fps is (10 sec: 3274.3, 60 sec: 3549.4, 300 sec: 3498.9). Total num frames: 13971456. Throughput: 0: 860.5. Samples: 987784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:39,862][00209] Avg episode reward: [(0, '30.988')] [2024-01-05 13:59:44,389][24421] Updated weights for policy 0, policy_version 3415 (0.0018) [2024-01-05 13:59:44,848][00209] Fps is (10 sec: 3277.1, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 13987840. Throughput: 0: 869.6. Samples: 992362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:44,853][00209] Avg episode reward: [(0, '32.129')] [2024-01-05 13:59:49,852][00209] Fps is (10 sec: 3687.5, 60 sec: 3481.3, 300 sec: 3526.7). Total num frames: 14008320. Throughput: 0: 898.0. Samples: 998862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:49,855][00209] Avg episode reward: [(0, '32.437')] [2024-01-05 13:59:54,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 14024704. Throughput: 0: 896.6. Samples: 1002026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 13:59:54,850][00209] Avg episode reward: [(0, '32.538')] [2024-01-05 13:59:55,121][24421] Updated weights for policy 0, policy_version 3425 (0.0016) [2024-01-05 13:59:59,855][00209] Fps is (10 sec: 3276.1, 60 sec: 3549.5, 300 sec: 3498.9). Total num frames: 14041088. Throughput: 0: 861.9. Samples: 1006140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 13:59:59,857][00209] Avg episode reward: [(0, '31.818')] [2024-01-05 14:00:04,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14057472. Throughput: 0: 874.3. Samples: 1010890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:00:04,854][00209] Avg episode reward: [(0, '30.698')] [2024-01-05 14:00:07,182][24421] Updated weights for policy 0, policy_version 3435 (0.0019) [2024-01-05 14:00:09,848][00209] Fps is (10 sec: 3688.9, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14077952. Throughput: 0: 897.2. Samples: 1014174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:00:09,850][00209] Avg episode reward: [(0, '30.263')] [2024-01-05 14:00:09,861][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003437_14077952.pth... [2024-01-05 14:00:10,000][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003232_13238272.pth [2024-01-05 14:00:14,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14098432. Throughput: 0: 882.6. Samples: 1020108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:00:14,850][00209] Avg episode reward: [(0, '29.364')] [2024-01-05 14:00:19,356][24421] Updated weights for policy 0, policy_version 3445 (0.0013) [2024-01-05 14:00:19,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 14110720. Throughput: 0: 854.4. Samples: 1024242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:00:19,851][00209] Avg episode reward: [(0, '29.181')] [2024-01-05 14:00:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14127104. Throughput: 0: 854.8. Samples: 1026242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:00:24,850][00209] Avg episode reward: [(0, '29.991')] [2024-01-05 14:00:29,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 14147584. Throughput: 0: 893.2. Samples: 1032558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:00:29,850][00209] Avg episode reward: [(0, '31.582')] [2024-01-05 14:00:30,201][24421] Updated weights for policy 0, policy_version 3455 (0.0023) [2024-01-05 14:00:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14168064. Throughput: 0: 880.8. Samples: 1038494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:00:34,853][00209] Avg episode reward: [(0, '32.513')] [2024-01-05 14:00:39,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3482.1, 300 sec: 3485.1). Total num frames: 14180352. Throughput: 0: 856.4. Samples: 1040566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:00:39,853][00209] Avg episode reward: [(0, '31.682')] [2024-01-05 14:00:43,474][24421] Updated weights for policy 0, policy_version 3465 (0.0014) [2024-01-05 14:00:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14196736. Throughput: 0: 855.7. Samples: 1044642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:00:44,850][00209] Avg episode reward: [(0, '31.048')] [2024-01-05 14:00:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.9, 300 sec: 3512.8). Total num frames: 14217216. Throughput: 0: 891.3. Samples: 1050998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:00:49,850][00209] Avg episode reward: [(0, '31.916')] [2024-01-05 14:00:52,870][24421] Updated weights for policy 0, policy_version 3475 (0.0023) [2024-01-05 14:00:54,849][00209] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 14237696. Throughput: 0: 889.7. Samples: 1054210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:00:54,853][00209] Avg episode reward: [(0, '29.602')] [2024-01-05 14:00:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3482.0, 300 sec: 3499.0). Total num frames: 14249984. Throughput: 0: 856.8. Samples: 1058666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:00:59,850][00209] Avg episode reward: [(0, '28.414')] [2024-01-05 14:01:04,848][00209] Fps is (10 sec: 2867.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14266368. Throughput: 0: 856.0. Samples: 1062762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:01:04,855][00209] Avg episode reward: [(0, '27.443')] [2024-01-05 14:01:06,529][24421] Updated weights for policy 0, policy_version 3485 (0.0018) [2024-01-05 14:01:09,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 14286848. Throughput: 0: 883.7. Samples: 1066010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:01:09,853][00209] Avg episode reward: [(0, '26.850')] [2024-01-05 14:01:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14307328. Throughput: 0: 884.2. Samples: 1072348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:01:14,850][00209] Avg episode reward: [(0, '26.424')] [2024-01-05 14:01:17,409][24421] Updated weights for policy 0, policy_version 3495 (0.0015) [2024-01-05 14:01:19,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14319616. Throughput: 0: 849.1. Samples: 1076702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:01:19,850][00209] Avg episode reward: [(0, '27.237')] [2024-01-05 14:01:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14336000. Throughput: 0: 848.9. Samples: 1078766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:01:24,850][00209] Avg episode reward: [(0, '27.666')] [2024-01-05 14:01:29,306][24421] Updated weights for policy 0, policy_version 3505 (0.0022) [2024-01-05 14:01:29,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 14356480. Throughput: 0: 888.5. Samples: 1084624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:01:29,852][00209] Avg episode reward: [(0, '29.083')] [2024-01-05 14:01:34,855][00209] Fps is (10 sec: 4092.8, 60 sec: 3481.1, 300 sec: 3526.6). Total num frames: 14376960. Throughput: 0: 890.0. Samples: 1091054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:01:34,858][00209] Avg episode reward: [(0, '30.183')] [2024-01-05 14:01:39,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14389248. Throughput: 0: 864.9. Samples: 1093128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:01:39,853][00209] Avg episode reward: [(0, '29.745')] [2024-01-05 14:01:41,761][24421] Updated weights for policy 0, policy_version 3515 (0.0024) [2024-01-05 14:01:44,848][00209] Fps is (10 sec: 2869.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14405632. Throughput: 0: 856.6. Samples: 1097214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:01:44,853][00209] Avg episode reward: [(0, '29.001')] [2024-01-05 14:01:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 14426112. Throughput: 0: 901.8. Samples: 1103342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:01:49,854][00209] Avg episode reward: [(0, '28.511')] [2024-01-05 14:01:52,096][24421] Updated weights for policy 0, policy_version 3525 (0.0018) [2024-01-05 14:01:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 14446592. Throughput: 0: 900.0. Samples: 1106512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:01:54,852][00209] Avg episode reward: [(0, '30.435')] [2024-01-05 14:01:59,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 14462976. Throughput: 0: 868.6. Samples: 1111434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:01:59,855][00209] Avg episode reward: [(0, '29.743')] [2024-01-05 14:02:04,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14475264. Throughput: 0: 862.8. Samples: 1115530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:02:04,854][00209] Avg episode reward: [(0, '29.130')] [2024-01-05 14:02:05,251][24421] Updated weights for policy 0, policy_version 3535 (0.0019) [2024-01-05 14:02:09,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14495744. Throughput: 0: 885.5. Samples: 1118612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:02:09,850][00209] Avg episode reward: [(0, '28.720')] [2024-01-05 14:02:09,866][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003539_14495744.pth... [2024-01-05 14:02:10,005][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003335_13660160.pth [2024-01-05 14:02:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14516224. Throughput: 0: 896.0. Samples: 1124946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:02:14,850][00209] Avg episode reward: [(0, '28.773')] [2024-01-05 14:02:15,159][24421] Updated weights for policy 0, policy_version 3545 (0.0013) [2024-01-05 14:02:19,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 14532608. Throughput: 0: 857.6. Samples: 1129640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:02:19,850][00209] Avg episode reward: [(0, '30.218')] [2024-01-05 14:02:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 14544896. Throughput: 0: 857.2. Samples: 1131704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:02:24,856][00209] Avg episode reward: [(0, '30.626')] [2024-01-05 14:02:28,082][24421] Updated weights for policy 0, policy_version 3555 (0.0021) [2024-01-05 14:02:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14565376. Throughput: 0: 886.6. Samples: 1137110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:02:29,853][00209] Avg episode reward: [(0, '28.753')] [2024-01-05 14:02:34,848][00209] Fps is (10 sec: 4505.5, 60 sec: 3550.3, 300 sec: 3540.6). Total num frames: 14589952. Throughput: 0: 896.1. Samples: 1143668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:02:34,853][00209] Avg episode reward: [(0, '28.053')] [2024-01-05 14:02:39,197][24421] Updated weights for policy 0, policy_version 3565 (0.0014) [2024-01-05 14:02:39,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 14602240. Throughput: 0: 879.3. Samples: 1146082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 14:02:39,851][00209] Avg episode reward: [(0, '27.697')] [2024-01-05 14:02:44,848][00209] Fps is (10 sec: 2457.7, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 14614528. Throughput: 0: 861.7. Samples: 1150210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:02:44,850][00209] Avg episode reward: [(0, '28.244')] [2024-01-05 14:02:49,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14635008. Throughput: 0: 893.1. Samples: 1155718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:02:49,853][00209] Avg episode reward: [(0, '27.904')] [2024-01-05 14:02:50,765][24421] Updated weights for policy 0, policy_version 3575 (0.0016) [2024-01-05 14:02:54,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14659584. Throughput: 0: 896.0. Samples: 1158930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:02:54,850][00209] Avg episode reward: [(0, '26.922')] [2024-01-05 14:02:59,849][00209] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 14671872. Throughput: 0: 877.5. Samples: 1164436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:02:59,854][00209] Avg episode reward: [(0, '27.867')] [2024-01-05 14:03:03,172][24421] Updated weights for policy 0, policy_version 3585 (0.0023) [2024-01-05 14:03:04,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 14688256. Throughput: 0: 862.8. Samples: 1168466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:03:04,850][00209] Avg episode reward: [(0, '28.067')] [2024-01-05 14:03:09,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14704640. Throughput: 0: 874.1. Samples: 1171040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:03:09,850][00209] Avg episode reward: [(0, '28.860')] [2024-01-05 14:03:13,589][24421] Updated weights for policy 0, policy_version 3595 (0.0013) [2024-01-05 14:03:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 14729216. Throughput: 0: 897.6. Samples: 1177502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:03:14,850][00209] Avg episode reward: [(0, '29.754')] [2024-01-05 14:03:19,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 14745600. Throughput: 0: 868.3. Samples: 1182740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:03:19,861][00209] Avg episode reward: [(0, '30.169')] [2024-01-05 14:03:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 14757888. Throughput: 0: 859.7. Samples: 1184766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:03:24,854][00209] Avg episode reward: [(0, '29.724')] [2024-01-05 14:03:27,030][24421] Updated weights for policy 0, policy_version 3605 (0.0013) [2024-01-05 14:03:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 14778368. Throughput: 0: 875.6. Samples: 1189610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:03:29,856][00209] Avg episode reward: [(0, '31.203')] [2024-01-05 14:03:34,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14798848. Throughput: 0: 899.1. Samples: 1196176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:03:34,856][00209] Avg episode reward: [(0, '32.589')] [2024-01-05 14:03:36,346][24421] Updated weights for policy 0, policy_version 3615 (0.0025) [2024-01-05 14:03:39,849][00209] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 14815232. Throughput: 0: 891.8. Samples: 1199064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:03:39,852][00209] Avg episode reward: [(0, '32.955')] [2024-01-05 14:03:39,871][24408] Saving new best policy, reward=32.955! [2024-01-05 14:03:44,855][00209] Fps is (10 sec: 2865.0, 60 sec: 3549.4, 300 sec: 3485.0). Total num frames: 14827520. Throughput: 0: 857.8. Samples: 1203044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:03:44,859][00209] Avg episode reward: [(0, '31.996')] [2024-01-05 14:03:49,494][24421] Updated weights for policy 0, policy_version 3625 (0.0023) [2024-01-05 14:03:49,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 14848000. Throughput: 0: 883.2. Samples: 1208212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:03:49,857][00209] Avg episode reward: [(0, '31.171')] [2024-01-05 14:03:54,848][00209] Fps is (10 sec: 4099.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14868480. Throughput: 0: 897.2. Samples: 1211416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:03:54,856][00209] Avg episode reward: [(0, '30.657')] [2024-01-05 14:03:59,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 14884864. Throughput: 0: 884.2. Samples: 1217292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:03:59,850][00209] Avg episode reward: [(0, '29.344')] [2024-01-05 14:04:00,556][24421] Updated weights for policy 0, policy_version 3635 (0.0013) [2024-01-05 14:04:04,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14897152. Throughput: 0: 859.3. Samples: 1221410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:04:04,852][00209] Avg episode reward: [(0, '28.554')] [2024-01-05 14:04:09,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 14917632. Throughput: 0: 861.3. Samples: 1223524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:04:09,856][00209] Avg episode reward: [(0, '28.221')] [2024-01-05 14:04:09,869][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003642_14917632.pth... [2024-01-05 14:04:10,004][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003437_14077952.pth [2024-01-05 14:04:12,583][24421] Updated weights for policy 0, policy_version 3645 (0.0021) [2024-01-05 14:04:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 14938112. Throughput: 0: 894.6. Samples: 1229866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:04:14,856][00209] Avg episode reward: [(0, '28.525')] [2024-01-05 14:04:19,850][00209] Fps is (10 sec: 3685.5, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 14954496. Throughput: 0: 872.9. Samples: 1235460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:19,854][00209] Avg episode reward: [(0, '29.418')] [2024-01-05 14:04:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 14966784. Throughput: 0: 854.0. Samples: 1237494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:24,867][00209] Avg episode reward: [(0, '30.856')] [2024-01-05 14:04:25,038][24421] Updated weights for policy 0, policy_version 3655 (0.0023) [2024-01-05 14:04:29,848][00209] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 14987264. Throughput: 0: 864.5. Samples: 1241942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:29,856][00209] Avg episode reward: [(0, '29.929')] [2024-01-05 14:04:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 15007744. Throughput: 0: 893.9. Samples: 1248438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:34,856][00209] Avg episode reward: [(0, '30.539')] [2024-01-05 14:04:35,238][24421] Updated weights for policy 0, policy_version 3665 (0.0014) [2024-01-05 14:04:39,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 15024128. Throughput: 0: 896.8. Samples: 1251772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:39,853][00209] Avg episode reward: [(0, '30.446')] [2024-01-05 14:04:44,851][00209] Fps is (10 sec: 3275.6, 60 sec: 3550.1, 300 sec: 3499.0). Total num frames: 15040512. Throughput: 0: 859.0. Samples: 1255948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:04:44,854][00209] Avg episode reward: [(0, '30.059')] [2024-01-05 14:04:48,517][24421] Updated weights for policy 0, policy_version 3675 (0.0021) [2024-01-05 14:04:49,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 15056896. Throughput: 0: 870.4. Samples: 1260576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:04:49,857][00209] Avg episode reward: [(0, '28.847')] [2024-01-05 14:04:54,848][00209] Fps is (10 sec: 3687.8, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 15077376. Throughput: 0: 894.7. Samples: 1263786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:04:54,857][00209] Avg episode reward: [(0, '29.646')] [2024-01-05 14:04:58,012][24421] Updated weights for policy 0, policy_version 3685 (0.0020) [2024-01-05 14:04:59,848][00209] Fps is (10 sec: 4095.8, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 15097856. Throughput: 0: 896.2. Samples: 1270194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:04:59,853][00209] Avg episode reward: [(0, '28.968')] [2024-01-05 14:05:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 15110144. Throughput: 0: 859.5. Samples: 1274136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:05:04,850][00209] Avg episode reward: [(0, '29.561')] [2024-01-05 14:05:09,848][00209] Fps is (10 sec: 2867.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15126528. Throughput: 0: 860.2. Samples: 1276204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:05:09,850][00209] Avg episode reward: [(0, '30.353')] [2024-01-05 14:05:11,445][24421] Updated weights for policy 0, policy_version 3695 (0.0017) [2024-01-05 14:05:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15147008. Throughput: 0: 894.9. Samples: 1282214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:05:14,849][00209] Avg episode reward: [(0, '30.314')] [2024-01-05 14:05:19,849][00209] Fps is (10 sec: 4095.7, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 15167488. Throughput: 0: 888.3. Samples: 1288410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:05:19,856][00209] Avg episode reward: [(0, '32.371')] [2024-01-05 14:05:22,529][24421] Updated weights for policy 0, policy_version 3705 (0.0018) [2024-01-05 14:05:24,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 15179776. Throughput: 0: 859.8. Samples: 1290464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:05:24,850][00209] Avg episode reward: [(0, '32.503')] [2024-01-05 14:05:29,848][00209] Fps is (10 sec: 2867.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15196160. Throughput: 0: 861.8. Samples: 1294728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:05:29,856][00209] Avg episode reward: [(0, '33.724')] [2024-01-05 14:05:29,862][24408] Saving new best policy, reward=33.724! [2024-01-05 14:05:33,946][24421] Updated weights for policy 0, policy_version 3715 (0.0025) [2024-01-05 14:05:34,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15216640. Throughput: 0: 897.1. Samples: 1300946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:05:34,850][00209] Avg episode reward: [(0, '32.404')] [2024-01-05 14:05:39,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15237120. Throughput: 0: 899.0. Samples: 1304242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:05:39,851][00209] Avg episode reward: [(0, '33.235')] [2024-01-05 14:05:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 15253504. Throughput: 0: 863.9. Samples: 1309068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:05:44,854][00209] Avg episode reward: [(0, '33.800')] [2024-01-05 14:05:44,860][24408] Saving new best policy, reward=33.800! [2024-01-05 14:05:46,531][24421] Updated weights for policy 0, policy_version 3725 (0.0027) [2024-01-05 14:05:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15265792. Throughput: 0: 865.2. Samples: 1313070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:05:49,853][00209] Avg episode reward: [(0, '32.826')] [2024-01-05 14:05:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15286272. Throughput: 0: 890.4. Samples: 1316270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:05:54,855][00209] Avg episode reward: [(0, '31.965')] [2024-01-05 14:05:56,796][24421] Updated weights for policy 0, policy_version 3735 (0.0031) [2024-01-05 14:05:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15306752. Throughput: 0: 900.3. Samples: 1322726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:05:59,852][00209] Avg episode reward: [(0, '31.541')] [2024-01-05 14:06:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 15323136. Throughput: 0: 860.1. Samples: 1327116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:04,850][00209] Avg episode reward: [(0, '31.811')] [2024-01-05 14:06:09,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15335424. Throughput: 0: 859.5. Samples: 1329142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:09,851][00209] Avg episode reward: [(0, '31.764')] [2024-01-05 14:06:09,862][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003744_15335424.pth... [2024-01-05 14:06:09,999][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003539_14495744.pth [2024-01-05 14:06:10,374][24421] Updated weights for policy 0, policy_version 3745 (0.0018) [2024-01-05 14:06:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15355904. Throughput: 0: 883.3. Samples: 1334478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:06:14,855][00209] Avg episode reward: [(0, '30.525')] [2024-01-05 14:06:19,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15376384. Throughput: 0: 886.8. Samples: 1340852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:19,854][00209] Avg episode reward: [(0, '29.525')] [2024-01-05 14:06:20,333][24421] Updated weights for policy 0, policy_version 3755 (0.0017) [2024-01-05 14:06:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 15392768. Throughput: 0: 862.8. Samples: 1343066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:24,854][00209] Avg episode reward: [(0, '29.366')] [2024-01-05 14:06:29,848][00209] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3485.2). Total num frames: 15405056. Throughput: 0: 846.8. Samples: 1347174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:29,851][00209] Avg episode reward: [(0, '28.192')] [2024-01-05 14:06:33,360][24421] Updated weights for policy 0, policy_version 3765 (0.0027) [2024-01-05 14:06:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15425536. Throughput: 0: 886.1. Samples: 1352946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:34,850][00209] Avg episode reward: [(0, '26.794')] [2024-01-05 14:06:39,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15446016. Throughput: 0: 888.1. Samples: 1356236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:39,851][00209] Avg episode reward: [(0, '27.303')] [2024-01-05 14:06:44,472][24421] Updated weights for policy 0, policy_version 3775 (0.0042) [2024-01-05 14:06:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15462400. Throughput: 0: 862.5. Samples: 1361538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:44,850][00209] Avg episode reward: [(0, '26.957')] [2024-01-05 14:06:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15474688. Throughput: 0: 854.7. Samples: 1365576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:06:49,852][00209] Avg episode reward: [(0, '29.258')] [2024-01-05 14:06:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 15495168. Throughput: 0: 868.7. Samples: 1368232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:06:54,853][00209] Avg episode reward: [(0, '28.403')] [2024-01-05 14:06:56,259][24421] Updated weights for policy 0, policy_version 3785 (0.0013) [2024-01-05 14:06:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15515648. Throughput: 0: 894.2. Samples: 1374716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:06:59,856][00209] Avg episode reward: [(0, '29.691')] [2024-01-05 14:07:04,848][00209] Fps is (10 sec: 3686.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15532032. Throughput: 0: 863.6. Samples: 1379714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:07:04,851][00209] Avg episode reward: [(0, '31.044')] [2024-01-05 14:07:08,608][24421] Updated weights for policy 0, policy_version 3795 (0.0012) [2024-01-05 14:07:09,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15544320. Throughput: 0: 859.2. Samples: 1381732. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:07:09,853][00209] Avg episode reward: [(0, '29.632')] [2024-01-05 14:07:14,848][00209] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 15564800. Throughput: 0: 876.9. Samples: 1386636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:07:14,855][00209] Avg episode reward: [(0, '29.701')] [2024-01-05 14:07:19,195][24421] Updated weights for policy 0, policy_version 3805 (0.0014) [2024-01-05 14:07:19,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15585280. Throughput: 0: 891.3. Samples: 1393056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:07:19,857][00209] Avg episode reward: [(0, '28.871')] [2024-01-05 14:07:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15601664. Throughput: 0: 879.0. Samples: 1395790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:24,850][00209] Avg episode reward: [(0, '29.315')] [2024-01-05 14:07:29,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 15613952. Throughput: 0: 851.7. Samples: 1399864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:29,857][00209] Avg episode reward: [(0, '29.737')] [2024-01-05 14:07:32,459][24421] Updated weights for policy 0, policy_version 3815 (0.0036) [2024-01-05 14:07:34,853][00209] Fps is (10 sec: 3275.2, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 15634432. Throughput: 0: 879.7. Samples: 1405166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:34,855][00209] Avg episode reward: [(0, '29.208')] [2024-01-05 14:07:39,850][00209] Fps is (10 sec: 4504.7, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 15659008. Throughput: 0: 894.5. Samples: 1408486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:39,854][00209] Avg episode reward: [(0, '28.314')] [2024-01-05 14:07:42,149][24421] Updated weights for policy 0, policy_version 3825 (0.0021) [2024-01-05 14:07:44,848][00209] Fps is (10 sec: 3687.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15671296. Throughput: 0: 877.0. Samples: 1414180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:44,851][00209] Avg episode reward: [(0, '29.803')] [2024-01-05 14:07:49,848][00209] Fps is (10 sec: 2458.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 15683584. Throughput: 0: 856.0. Samples: 1418234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:07:49,854][00209] Avg episode reward: [(0, '29.721')] [2024-01-05 14:07:54,848][00209] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 15704064. Throughput: 0: 861.9. Samples: 1420518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:07:54,851][00209] Avg episode reward: [(0, '30.821')] [2024-01-05 14:07:55,138][24421] Updated weights for policy 0, policy_version 3835 (0.0018) [2024-01-05 14:07:59,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 15728640. Throughput: 0: 897.0. Samples: 1427002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:07:59,851][00209] Avg episode reward: [(0, '29.183')] [2024-01-05 14:08:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15740928. Throughput: 0: 873.1. Samples: 1432344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:08:04,854][00209] Avg episode reward: [(0, '28.177')] [2024-01-05 14:08:06,619][24421] Updated weights for policy 0, policy_version 3845 (0.0033) [2024-01-05 14:08:09,849][00209] Fps is (10 sec: 2866.9, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 15757312. Throughput: 0: 858.2. Samples: 1434410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:09,851][00209] Avg episode reward: [(0, '28.253')] [2024-01-05 14:08:09,869][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003847_15757312.pth... [2024-01-05 14:08:10,034][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003642_14917632.pth [2024-01-05 14:08:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15773696. Throughput: 0: 869.5. Samples: 1438992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:14,851][00209] Avg episode reward: [(0, '29.270')] [2024-01-05 14:08:17,993][24421] Updated weights for policy 0, policy_version 3855 (0.0027) [2024-01-05 14:08:19,848][00209] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 15794176. Throughput: 0: 895.6. Samples: 1445464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:08:19,854][00209] Avg episode reward: [(0, '27.913')] [2024-01-05 14:08:24,850][00209] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3512.8). Total num frames: 15814656. Throughput: 0: 890.7. Samples: 1448570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:24,853][00209] Avg episode reward: [(0, '28.904')] [2024-01-05 14:08:29,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 15826944. Throughput: 0: 857.5. Samples: 1452766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:29,858][00209] Avg episode reward: [(0, '28.240')] [2024-01-05 14:08:30,598][24421] Updated weights for policy 0, policy_version 3865 (0.0027) [2024-01-05 14:08:34,848][00209] Fps is (10 sec: 2868.0, 60 sec: 3481.9, 300 sec: 3485.1). Total num frames: 15843328. Throughput: 0: 877.6. Samples: 1457724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:08:34,850][00209] Avg episode reward: [(0, '29.191')] [2024-01-05 14:08:39,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3526.8). Total num frames: 15867904. Throughput: 0: 897.4. Samples: 1460900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:39,850][00209] Avg episode reward: [(0, '27.923')] [2024-01-05 14:08:40,728][24421] Updated weights for policy 0, policy_version 3875 (0.0018) [2024-01-05 14:08:44,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 15884288. Throughput: 0: 889.4. Samples: 1467026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:44,855][00209] Avg episode reward: [(0, '27.848')] [2024-01-05 14:08:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 15896576. Throughput: 0: 860.4. Samples: 1471060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:08:49,856][00209] Avg episode reward: [(0, '27.553')] [2024-01-05 14:08:54,114][24421] Updated weights for policy 0, policy_version 3885 (0.0020) [2024-01-05 14:08:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15912960. Throughput: 0: 859.8. Samples: 1473100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:08:54,850][00209] Avg episode reward: [(0, '28.402')] [2024-01-05 14:08:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 15937536. Throughput: 0: 898.1. Samples: 1479406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:08:59,850][00209] Avg episode reward: [(0, '27.409')] [2024-01-05 14:09:04,352][24421] Updated weights for policy 0, policy_version 3895 (0.0023) [2024-01-05 14:09:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 15953920. Throughput: 0: 882.8. Samples: 1485190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:09:04,850][00209] Avg episode reward: [(0, '28.665')] [2024-01-05 14:09:09,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 15966208. Throughput: 0: 859.5. Samples: 1487246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:09:09,856][00209] Avg episode reward: [(0, '29.663')] [2024-01-05 14:09:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 15982592. Throughput: 0: 859.3. Samples: 1491434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:09:14,850][00209] Avg episode reward: [(0, '30.928')] [2024-01-05 14:09:16,997][24421] Updated weights for policy 0, policy_version 3905 (0.0016) [2024-01-05 14:09:19,850][00209] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 16007168. Throughput: 0: 892.1. Samples: 1497870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:09:19,858][00209] Avg episode reward: [(0, '31.464')] [2024-01-05 14:09:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3512.8). Total num frames: 16023552. Throughput: 0: 894.8. Samples: 1501168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 14:09:24,855][00209] Avg episode reward: [(0, '33.470')] [2024-01-05 14:09:28,401][24421] Updated weights for policy 0, policy_version 3915 (0.0020) [2024-01-05 14:09:29,848][00209] Fps is (10 sec: 3277.6, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 16039936. Throughput: 0: 857.4. Samples: 1505610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:09:29,857][00209] Avg episode reward: [(0, '32.978')] [2024-01-05 14:09:34,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 16052224. Throughput: 0: 867.7. Samples: 1510106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:09:34,856][00209] Avg episode reward: [(0, '31.315')] [2024-01-05 14:09:39,545][24421] Updated weights for policy 0, policy_version 3925 (0.0022) [2024-01-05 14:09:39,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 16076800. Throughput: 0: 895.0. Samples: 1513376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:09:39,851][00209] Avg episode reward: [(0, '31.494')] [2024-01-05 14:09:44,852][00209] Fps is (10 sec: 4503.5, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 16097280. Throughput: 0: 899.4. Samples: 1519884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:09:44,855][00209] Avg episode reward: [(0, '30.377')] [2024-01-05 14:09:49,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 16109568. Throughput: 0: 864.4. Samples: 1524088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:09:49,856][00209] Avg episode reward: [(0, '29.428')] [2024-01-05 14:09:52,061][24421] Updated weights for policy 0, policy_version 3935 (0.0033) [2024-01-05 14:09:54,848][00209] Fps is (10 sec: 2868.6, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 16125952. Throughput: 0: 864.5. Samples: 1526148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:09:54,850][00209] Avg episode reward: [(0, '28.321')] [2024-01-05 14:09:59,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 16146432. Throughput: 0: 899.5. Samples: 1531914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:09:59,856][00209] Avg episode reward: [(0, '26.654')] [2024-01-05 14:10:02,202][24421] Updated weights for policy 0, policy_version 3945 (0.0014) [2024-01-05 14:10:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 16166912. Throughput: 0: 896.1. Samples: 1538190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:04,851][00209] Avg episode reward: [(0, '26.932')] [2024-01-05 14:10:09,851][00209] Fps is (10 sec: 3275.8, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 16179200. Throughput: 0: 867.7. Samples: 1540216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:10:09,853][00209] Avg episode reward: [(0, '25.488')] [2024-01-05 14:10:09,866][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003950_16179200.pth... [2024-01-05 14:10:10,049][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003744_15335424.pth [2024-01-05 14:10:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 16195584. Throughput: 0: 858.4. Samples: 1544240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:10:14,850][00209] Avg episode reward: [(0, '26.501')] [2024-01-05 14:10:15,919][24421] Updated weights for policy 0, policy_version 3955 (0.0023) [2024-01-05 14:10:19,849][00209] Fps is (10 sec: 3687.2, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 16216064. Throughput: 0: 890.6. Samples: 1550184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:19,853][00209] Avg episode reward: [(0, '26.078')] [2024-01-05 14:10:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 16236544. Throughput: 0: 889.2. Samples: 1553390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:10:24,855][00209] Avg episode reward: [(0, '27.167')] [2024-01-05 14:10:26,092][24421] Updated weights for policy 0, policy_version 3965 (0.0023) [2024-01-05 14:10:29,848][00209] Fps is (10 sec: 3277.1, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16248832. Throughput: 0: 854.4. Samples: 1558326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:29,855][00209] Avg episode reward: [(0, '27.755')] [2024-01-05 14:10:34,849][00209] Fps is (10 sec: 2866.8, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 16265216. Throughput: 0: 855.4. Samples: 1562582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:34,856][00209] Avg episode reward: [(0, '28.021')] [2024-01-05 14:10:38,555][24421] Updated weights for policy 0, policy_version 3975 (0.0027) [2024-01-05 14:10:39,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16285696. Throughput: 0: 878.8. Samples: 1565694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:39,850][00209] Avg episode reward: [(0, '29.779')] [2024-01-05 14:10:44,848][00209] Fps is (10 sec: 4096.6, 60 sec: 3481.9, 300 sec: 3526.7). Total num frames: 16306176. Throughput: 0: 896.3. Samples: 1572248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:44,852][00209] Avg episode reward: [(0, '30.286')] [2024-01-05 14:10:49,730][24421] Updated weights for policy 0, policy_version 3985 (0.0025) [2024-01-05 14:10:49,850][00209] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 16322560. Throughput: 0: 860.5. Samples: 1576916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:10:49,852][00209] Avg episode reward: [(0, '29.861')] [2024-01-05 14:10:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 16334848. Throughput: 0: 861.1. Samples: 1578962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:10:54,857][00209] Avg episode reward: [(0, '30.133')] [2024-01-05 14:10:59,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16355328. Throughput: 0: 890.0. Samples: 1584292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:10:59,856][00209] Avg episode reward: [(0, '31.237')] [2024-01-05 14:11:01,346][24421] Updated weights for policy 0, policy_version 3995 (0.0016) [2024-01-05 14:11:04,853][00209] Fps is (10 sec: 4093.9, 60 sec: 3481.3, 300 sec: 3526.7). Total num frames: 16375808. Throughput: 0: 901.7. Samples: 1590762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:11:04,856][00209] Avg episode reward: [(0, '29.580')] [2024-01-05 14:11:09,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 16392192. Throughput: 0: 881.9. Samples: 1593074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:11:09,855][00209] Avg episode reward: [(0, '30.060')] [2024-01-05 14:11:14,144][24421] Updated weights for policy 0, policy_version 4005 (0.0021) [2024-01-05 14:11:14,848][00209] Fps is (10 sec: 2868.7, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 16404480. Throughput: 0: 863.7. Samples: 1597190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:11:14,850][00209] Avg episode reward: [(0, '29.205')] [2024-01-05 14:11:19,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3499.0). Total num frames: 16424960. Throughput: 0: 892.2. Samples: 1602732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:11:19,854][00209] Avg episode reward: [(0, '29.840')] [2024-01-05 14:11:24,035][24421] Updated weights for policy 0, policy_version 4015 (0.0013) [2024-01-05 14:11:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 16445440. Throughput: 0: 894.8. Samples: 1605962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:11:24,853][00209] Avg episode reward: [(0, '30.202')] [2024-01-05 14:11:29,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 16461824. Throughput: 0: 869.8. Samples: 1611388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:11:29,850][00209] Avg episode reward: [(0, '29.870')] [2024-01-05 14:11:34,848][00209] Fps is (10 sec: 2867.1, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 16474112. Throughput: 0: 857.8. Samples: 1615514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:11:34,853][00209] Avg episode reward: [(0, '29.237')] [2024-01-05 14:11:37,183][24421] Updated weights for policy 0, policy_version 4025 (0.0019) [2024-01-05 14:11:39,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16494592. Throughput: 0: 871.0. Samples: 1618158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:11:39,857][00209] Avg episode reward: [(0, '31.352')] [2024-01-05 14:11:44,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 16515072. Throughput: 0: 895.8. Samples: 1624602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:11:44,853][00209] Avg episode reward: [(0, '30.614')] [2024-01-05 14:11:47,768][24421] Updated weights for policy 0, policy_version 4035 (0.0024) [2024-01-05 14:11:49,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 16531456. Throughput: 0: 866.8. Samples: 1629764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:11:49,852][00209] Avg episode reward: [(0, '30.697')] [2024-01-05 14:11:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 16547840. Throughput: 0: 860.2. Samples: 1631782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:11:54,860][00209] Avg episode reward: [(0, '29.445')] [2024-01-05 14:11:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16564224. Throughput: 0: 877.8. Samples: 1636692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:11:59,850][00209] Avg episode reward: [(0, '29.064')] [2024-01-05 14:12:00,239][24421] Updated weights for policy 0, policy_version 4045 (0.0029) [2024-01-05 14:12:04,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3550.2, 300 sec: 3540.6). Total num frames: 16588800. Throughput: 0: 897.5. Samples: 1643118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:12:04,851][00209] Avg episode reward: [(0, '29.851')] [2024-01-05 14:12:09,850][00209] Fps is (10 sec: 3685.7, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 16601088. Throughput: 0: 889.5. Samples: 1645992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:12:09,852][00209] Avg episode reward: [(0, '28.089')] [2024-01-05 14:12:09,863][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004053_16601088.pth... [2024-01-05 14:12:10,020][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003847_15757312.pth [2024-01-05 14:12:11,871][24421] Updated weights for policy 0, policy_version 4055 (0.0018) [2024-01-05 14:12:14,853][00209] Fps is (10 sec: 2865.6, 60 sec: 3549.5, 300 sec: 3498.9). Total num frames: 16617472. Throughput: 0: 859.8. Samples: 1650082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:12:14,862][00209] Avg episode reward: [(0, '27.572')] [2024-01-05 14:12:19,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16633856. Throughput: 0: 882.5. Samples: 1655228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:12:19,850][00209] Avg episode reward: [(0, '29.244')] [2024-01-05 14:12:23,079][24421] Updated weights for policy 0, policy_version 4065 (0.0031) [2024-01-05 14:12:24,848][00209] Fps is (10 sec: 4098.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 16658432. Throughput: 0: 894.3. Samples: 1658402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-01-05 14:12:24,850][00209] Avg episode reward: [(0, '30.028')] [2024-01-05 14:12:29,852][00209] Fps is (10 sec: 4094.1, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 16674816. Throughput: 0: 882.5. Samples: 1664318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:12:29,862][00209] Avg episode reward: [(0, '30.810')] [2024-01-05 14:12:34,852][00209] Fps is (10 sec: 2865.8, 60 sec: 3549.6, 300 sec: 3485.0). Total num frames: 16687104. Throughput: 0: 856.7. Samples: 1668318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:12:34,855][00209] Avg episode reward: [(0, '31.579')] [2024-01-05 14:12:36,100][24421] Updated weights for policy 0, policy_version 4075 (0.0017) [2024-01-05 14:12:39,848][00209] Fps is (10 sec: 2868.6, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16703488. Throughput: 0: 860.3. Samples: 1670496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:12:39,850][00209] Avg episode reward: [(0, '30.834')] [2024-01-05 14:12:44,848][00209] Fps is (10 sec: 4098.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 16728064. Throughput: 0: 893.1. Samples: 1676882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:12:44,856][00209] Avg episode reward: [(0, '31.363')] [2024-01-05 14:12:45,697][24421] Updated weights for policy 0, policy_version 4085 (0.0021) [2024-01-05 14:12:49,852][00209] Fps is (10 sec: 4094.4, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 16744448. Throughput: 0: 874.6. Samples: 1682478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:12:49,854][00209] Avg episode reward: [(0, '30.840')] [2024-01-05 14:12:54,850][00209] Fps is (10 sec: 2866.7, 60 sec: 3481.5, 300 sec: 3485.0). Total num frames: 16756736. Throughput: 0: 856.4. Samples: 1684532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:12:54,857][00209] Avg episode reward: [(0, '30.745')] [2024-01-05 14:12:59,067][24421] Updated weights for policy 0, policy_version 4095 (0.0032) [2024-01-05 14:12:59,848][00209] Fps is (10 sec: 2868.3, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 16773120. Throughput: 0: 863.7. Samples: 1688944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:12:59,850][00209] Avg episode reward: [(0, '30.595')] [2024-01-05 14:13:04,848][00209] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 16797696. Throughput: 0: 890.6. Samples: 1695304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:04,851][00209] Avg episode reward: [(0, '32.370')] [2024-01-05 14:13:09,320][24421] Updated weights for policy 0, policy_version 4105 (0.0014) [2024-01-05 14:13:09,849][00209] Fps is (10 sec: 4095.5, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 16814080. Throughput: 0: 892.1. Samples: 1698546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:09,853][00209] Avg episode reward: [(0, '30.341')] [2024-01-05 14:13:14,850][00209] Fps is (10 sec: 2866.4, 60 sec: 3481.8, 300 sec: 3498.9). Total num frames: 16826368. Throughput: 0: 855.0. Samples: 1702792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:14,853][00209] Avg episode reward: [(0, '30.457')] [2024-01-05 14:13:19,848][00209] Fps is (10 sec: 2867.5, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 16842752. Throughput: 0: 871.2. Samples: 1707518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:13:19,851][00209] Avg episode reward: [(0, '30.976')] [2024-01-05 14:13:21,919][24421] Updated weights for policy 0, policy_version 4115 (0.0013) [2024-01-05 14:13:24,848][00209] Fps is (10 sec: 4097.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 16867328. Throughput: 0: 892.6. Samples: 1710662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:24,853][00209] Avg episode reward: [(0, '32.704')] [2024-01-05 14:13:29,850][00209] Fps is (10 sec: 4095.2, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 16883712. Throughput: 0: 892.8. Samples: 1717060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:29,853][00209] Avg episode reward: [(0, '33.424')] [2024-01-05 14:13:33,521][24421] Updated weights for policy 0, policy_version 4125 (0.0014) [2024-01-05 14:13:34,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.9, 300 sec: 3485.1). Total num frames: 16896000. Throughput: 0: 860.7. Samples: 1721206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:13:34,850][00209] Avg episode reward: [(0, '32.449')] [2024-01-05 14:13:39,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 16916480. Throughput: 0: 861.6. Samples: 1723304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:13:39,853][00209] Avg episode reward: [(0, '32.281')] [2024-01-05 14:13:44,600][24421] Updated weights for policy 0, policy_version 4135 (0.0026) [2024-01-05 14:13:44,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 16936960. Throughput: 0: 898.3. Samples: 1729368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:13:44,850][00209] Avg episode reward: [(0, '30.778')] [2024-01-05 14:13:49,854][00209] Fps is (10 sec: 3684.2, 60 sec: 3481.5, 300 sec: 3526.7). Total num frames: 16953344. Throughput: 0: 891.7. Samples: 1735436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:13:49,856][00209] Avg episode reward: [(0, '30.067')] [2024-01-05 14:13:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3499.0). Total num frames: 16969728. Throughput: 0: 864.9. Samples: 1737464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:54,850][00209] Avg episode reward: [(0, '29.507')] [2024-01-05 14:13:57,731][24421] Updated weights for policy 0, policy_version 4145 (0.0013) [2024-01-05 14:13:59,848][00209] Fps is (10 sec: 3278.7, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 16986112. Throughput: 0: 861.4. Samples: 1741554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:13:59,850][00209] Avg episode reward: [(0, '29.492')] [2024-01-05 14:14:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 17006592. Throughput: 0: 895.2. Samples: 1747802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:14:04,850][00209] Avg episode reward: [(0, '29.732')] [2024-01-05 14:14:07,433][24421] Updated weights for policy 0, policy_version 4155 (0.0020) [2024-01-05 14:14:09,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 17022976. Throughput: 0: 898.8. Samples: 1751106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:14:09,854][00209] Avg episode reward: [(0, '30.618')] [2024-01-05 14:14:09,867][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004156_17022976.pth... [2024-01-05 14:14:10,014][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003950_16179200.pth [2024-01-05 14:14:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3499.0). Total num frames: 17039360. Throughput: 0: 857.6. Samples: 1755652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:14:14,854][00209] Avg episode reward: [(0, '30.174')] [2024-01-05 14:14:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17051648. Throughput: 0: 859.3. Samples: 1759874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:14:19,850][00209] Avg episode reward: [(0, '30.297')] [2024-01-05 14:14:20,926][24421] Updated weights for policy 0, policy_version 4165 (0.0020) [2024-01-05 14:14:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 17076224. Throughput: 0: 883.5. Samples: 1763062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:14:24,850][00209] Avg episode reward: [(0, '31.686')] [2024-01-05 14:14:29,848][00209] Fps is (10 sec: 4505.5, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 17096704. Throughput: 0: 895.5. Samples: 1769666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:14:29,855][00209] Avg episode reward: [(0, '31.479')] [2024-01-05 14:14:31,079][24421] Updated weights for policy 0, policy_version 4175 (0.0024) [2024-01-05 14:14:34,850][00209] Fps is (10 sec: 3276.1, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 17108992. Throughput: 0: 856.7. Samples: 1773986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:14:34,852][00209] Avg episode reward: [(0, '31.363')] [2024-01-05 14:14:39,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17125376. Throughput: 0: 858.0. Samples: 1776072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:14:39,850][00209] Avg episode reward: [(0, '31.819')] [2024-01-05 14:14:43,547][24421] Updated weights for policy 0, policy_version 4185 (0.0020) [2024-01-05 14:14:44,848][00209] Fps is (10 sec: 3687.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 17145856. Throughput: 0: 892.3. Samples: 1781706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:14:44,850][00209] Avg episode reward: [(0, '30.462')] [2024-01-05 14:14:49,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3550.2, 300 sec: 3526.7). Total num frames: 17166336. Throughput: 0: 898.4. Samples: 1788230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:14:49,853][00209] Avg episode reward: [(0, '30.049')] [2024-01-05 14:14:54,853][00209] Fps is (10 sec: 3274.9, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 17178624. Throughput: 0: 868.7. Samples: 1790204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:14:54,856][00209] Avg episode reward: [(0, '30.144')] [2024-01-05 14:14:55,413][24421] Updated weights for policy 0, policy_version 4195 (0.0016) [2024-01-05 14:14:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17195008. Throughput: 0: 857.9. Samples: 1794256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:14:59,850][00209] Avg episode reward: [(0, '30.261')] [2024-01-05 14:15:04,848][00209] Fps is (10 sec: 3688.5, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 17215488. Throughput: 0: 893.3. Samples: 1800072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:04,850][00209] Avg episode reward: [(0, '31.024')] [2024-01-05 14:15:06,544][24421] Updated weights for policy 0, policy_version 4205 (0.0018) [2024-01-05 14:15:09,850][00209] Fps is (10 sec: 4094.9, 60 sec: 3549.7, 300 sec: 3526.7). Total num frames: 17235968. Throughput: 0: 893.8. Samples: 1803286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:09,853][00209] Avg episode reward: [(0, '31.441')] [2024-01-05 14:15:14,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 17248256. Throughput: 0: 857.8. Samples: 1808268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:15:14,856][00209] Avg episode reward: [(0, '32.055')] [2024-01-05 14:15:19,848][00209] Fps is (10 sec: 2458.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17260544. Throughput: 0: 852.7. Samples: 1812354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:15:19,855][00209] Avg episode reward: [(0, '32.018')] [2024-01-05 14:15:19,896][24421] Updated weights for policy 0, policy_version 4215 (0.0013) [2024-01-05 14:15:24,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 17281024. Throughput: 0: 866.8. Samples: 1815078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:15:24,853][00209] Avg episode reward: [(0, '31.990')] [2024-01-05 14:15:29,563][24421] Updated weights for policy 0, policy_version 4225 (0.0024) [2024-01-05 14:15:29,848][00209] Fps is (10 sec: 4505.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 17305600. Throughput: 0: 884.8. Samples: 1821524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:29,850][00209] Avg episode reward: [(0, '31.628')] [2024-01-05 14:15:34,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3499.0). Total num frames: 17317888. Throughput: 0: 847.7. Samples: 1826378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:34,855][00209] Avg episode reward: [(0, '31.641')] [2024-01-05 14:15:39,851][00209] Fps is (10 sec: 2866.5, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 17334272. Throughput: 0: 851.1. Samples: 1828500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:39,852][00209] Avg episode reward: [(0, '30.955')] [2024-01-05 14:15:42,911][24421] Updated weights for policy 0, policy_version 4235 (0.0021) [2024-01-05 14:15:44,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 17354752. Throughput: 0: 874.0. Samples: 1833584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:44,856][00209] Avg episode reward: [(0, '30.507')] [2024-01-05 14:15:49,848][00209] Fps is (10 sec: 4097.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 17375232. Throughput: 0: 889.2. Samples: 1840086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:49,850][00209] Avg episode reward: [(0, '30.346')] [2024-01-05 14:15:53,595][24421] Updated weights for policy 0, policy_version 4245 (0.0026) [2024-01-05 14:15:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3499.0). Total num frames: 17387520. Throughput: 0: 875.4. Samples: 1842676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:54,855][00209] Avg episode reward: [(0, '31.294')] [2024-01-05 14:15:59,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17403904. Throughput: 0: 855.0. Samples: 1846744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:15:59,853][00209] Avg episode reward: [(0, '29.990')] [2024-01-05 14:16:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 17420288. Throughput: 0: 880.1. Samples: 1851960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:16:04,850][00209] Avg episode reward: [(0, '29.404')] [2024-01-05 14:16:05,937][24421] Updated weights for policy 0, policy_version 4255 (0.0023) [2024-01-05 14:16:09,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 17444864. Throughput: 0: 889.9. Samples: 1855122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:16:09,856][00209] Avg episode reward: [(0, '29.404')] [2024-01-05 14:16:09,867][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004259_17444864.pth... [2024-01-05 14:16:09,994][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004053_16601088.pth [2024-01-05 14:16:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 17457152. Throughput: 0: 870.5. Samples: 1860696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:16:14,850][00209] Avg episode reward: [(0, '28.243')] [2024-01-05 14:16:18,275][24421] Updated weights for policy 0, policy_version 4265 (0.0013) [2024-01-05 14:16:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 17473536. Throughput: 0: 854.4. Samples: 1864828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:16:19,854][00209] Avg episode reward: [(0, '29.324')] [2024-01-05 14:16:24,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17489920. Throughput: 0: 858.5. Samples: 1867128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:16:24,852][00209] Avg episode reward: [(0, '29.768')] [2024-01-05 14:16:28,750][24421] Updated weights for policy 0, policy_version 4275 (0.0016) [2024-01-05 14:16:29,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 17514496. Throughput: 0: 889.6. Samples: 1873616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:16:29,850][00209] Avg episode reward: [(0, '28.617')] [2024-01-05 14:16:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 17530880. Throughput: 0: 864.3. Samples: 1878980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:16:34,850][00209] Avg episode reward: [(0, '29.484')] [2024-01-05 14:16:39,853][00209] Fps is (10 sec: 2865.8, 60 sec: 3481.5, 300 sec: 3485.0). Total num frames: 17543168. Throughput: 0: 853.2. Samples: 1881076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:16:39,862][00209] Avg episode reward: [(0, '29.451')] [2024-01-05 14:16:42,137][24421] Updated weights for policy 0, policy_version 4285 (0.0019) [2024-01-05 14:16:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 17559552. Throughput: 0: 865.6. Samples: 1885698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:16:44,856][00209] Avg episode reward: [(0, '29.753')] [2024-01-05 14:16:49,853][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.3, 300 sec: 3512.8). Total num frames: 17584128. Throughput: 0: 893.1. Samples: 1892152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:16:49,854][00209] Avg episode reward: [(0, '31.493')] [2024-01-05 14:16:51,581][24421] Updated weights for policy 0, policy_version 4295 (0.0019) [2024-01-05 14:16:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 17600512. Throughput: 0: 890.9. Samples: 1895212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:16:54,852][00209] Avg episode reward: [(0, '29.927')] [2024-01-05 14:16:59,851][00209] Fps is (10 sec: 2867.7, 60 sec: 3481.4, 300 sec: 3471.2). Total num frames: 17612800. Throughput: 0: 858.2. Samples: 1899318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:16:59,854][00209] Avg episode reward: [(0, '29.668')] [2024-01-05 14:17:04,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 17629184. Throughput: 0: 870.4. Samples: 1903996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:17:04,850][00209] Avg episode reward: [(0, '31.458')] [2024-01-05 14:17:05,132][24421] Updated weights for policy 0, policy_version 4305 (0.0018) [2024-01-05 14:17:09,848][00209] Fps is (10 sec: 4097.3, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 17653760. Throughput: 0: 891.4. Samples: 1907242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:17:09,850][00209] Avg episode reward: [(0, '31.630')] [2024-01-05 14:17:14,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 17670144. Throughput: 0: 883.5. Samples: 1913374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:17:14,851][00209] Avg episode reward: [(0, '31.121')] [2024-01-05 14:17:16,352][24421] Updated weights for policy 0, policy_version 4315 (0.0017) [2024-01-05 14:17:19,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17682432. Throughput: 0: 851.6. Samples: 1917302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:17:19,852][00209] Avg episode reward: [(0, '31.286')] [2024-01-05 14:17:24,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17698816. Throughput: 0: 850.9. Samples: 1919364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:17:24,850][00209] Avg episode reward: [(0, '31.748')] [2024-01-05 14:17:28,221][24421] Updated weights for policy 0, policy_version 4325 (0.0013) [2024-01-05 14:17:29,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 17719296. Throughput: 0: 882.3. Samples: 1925402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:17:29,851][00209] Avg episode reward: [(0, '32.002')] [2024-01-05 14:17:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 17739776. Throughput: 0: 876.4. Samples: 1931586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:17:34,856][00209] Avg episode reward: [(0, '31.673')] [2024-01-05 14:17:39,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 17752064. Throughput: 0: 853.5. Samples: 1933620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:17:39,850][00209] Avg episode reward: [(0, '30.601')] [2024-01-05 14:17:40,273][24421] Updated weights for policy 0, policy_version 4335 (0.0015) [2024-01-05 14:17:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17768448. Throughput: 0: 854.6. Samples: 1937772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:17:44,850][00209] Avg episode reward: [(0, '29.866')] [2024-01-05 14:17:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3499.0). Total num frames: 17788928. Throughput: 0: 888.3. Samples: 1943968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:17:49,850][00209] Avg episode reward: [(0, '30.439')] [2024-01-05 14:17:51,093][24421] Updated weights for policy 0, policy_version 4345 (0.0016) [2024-01-05 14:17:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 17809408. Throughput: 0: 887.8. Samples: 1947194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:17:54,851][00209] Avg episode reward: [(0, '30.324')] [2024-01-05 14:17:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3471.2). Total num frames: 17821696. Throughput: 0: 856.7. Samples: 1951926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:17:59,857][00209] Avg episode reward: [(0, '30.306')] [2024-01-05 14:18:04,569][24421] Updated weights for policy 0, policy_version 4355 (0.0027) [2024-01-05 14:18:04,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17838080. Throughput: 0: 856.6. Samples: 1955850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:18:04,853][00209] Avg episode reward: [(0, '29.958')] [2024-01-05 14:18:09,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 17858560. Throughput: 0: 881.9. Samples: 1959048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:09,855][00209] Avg episode reward: [(0, '28.336')] [2024-01-05 14:18:09,865][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004360_17858560.pth... [2024-01-05 14:18:09,990][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004156_17022976.pth [2024-01-05 14:18:14,025][24421] Updated weights for policy 0, policy_version 4365 (0.0025) [2024-01-05 14:18:14,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 17879040. Throughput: 0: 892.8. Samples: 1965580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:14,856][00209] Avg episode reward: [(0, '28.509')] [2024-01-05 14:18:19,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17891328. Throughput: 0: 857.0. Samples: 1970150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:19,856][00209] Avg episode reward: [(0, '27.679')] [2024-01-05 14:18:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 17907712. Throughput: 0: 856.9. Samples: 1972180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:18:24,850][00209] Avg episode reward: [(0, '28.393')] [2024-01-05 14:18:27,189][24421] Updated weights for policy 0, policy_version 4375 (0.0025) [2024-01-05 14:18:29,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 17928192. Throughput: 0: 887.2. Samples: 1977698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:29,850][00209] Avg episode reward: [(0, '27.767')] [2024-01-05 14:18:34,856][00209] Fps is (10 sec: 4092.4, 60 sec: 3481.1, 300 sec: 3498.9). Total num frames: 17948672. Throughput: 0: 894.9. Samples: 1984246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:18:34,859][00209] Avg episode reward: [(0, '28.909')] [2024-01-05 14:18:37,907][24421] Updated weights for policy 0, policy_version 4385 (0.0029) [2024-01-05 14:18:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 17965056. Throughput: 0: 874.4. Samples: 1986540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:18:39,850][00209] Avg episode reward: [(0, '28.934')] [2024-01-05 14:18:44,848][00209] Fps is (10 sec: 2869.7, 60 sec: 3481.6, 300 sec: 3471.3). Total num frames: 17977344. Throughput: 0: 862.6. Samples: 1990742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:18:44,853][00209] Avg episode reward: [(0, '29.773')] [2024-01-05 14:18:49,787][24421] Updated weights for policy 0, policy_version 4395 (0.0027) [2024-01-05 14:18:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 18001920. Throughput: 0: 902.0. Samples: 1996440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:18:49,850][00209] Avg episode reward: [(0, '30.434')] [2024-01-05 14:18:54,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 18022400. Throughput: 0: 902.3. Samples: 1999652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:54,850][00209] Avg episode reward: [(0, '32.743')] [2024-01-05 14:18:59,852][00209] Fps is (10 sec: 3275.2, 60 sec: 3549.6, 300 sec: 3485.0). Total num frames: 18034688. Throughput: 0: 874.1. Samples: 2004918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:18:59,855][00209] Avg episode reward: [(0, '33.513')] [2024-01-05 14:19:01,919][24421] Updated weights for policy 0, policy_version 4405 (0.0013) [2024-01-05 14:19:04,850][00209] Fps is (10 sec: 2456.9, 60 sec: 3481.4, 300 sec: 3471.2). Total num frames: 18046976. Throughput: 0: 859.6. Samples: 2008834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:19:04,854][00209] Avg episode reward: [(0, '33.448')] [2024-01-05 14:19:09,848][00209] Fps is (10 sec: 3278.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 18067456. Throughput: 0: 875.4. Samples: 2011572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-01-05 14:19:09,857][00209] Avg episode reward: [(0, '32.596')] [2024-01-05 14:19:12,874][24421] Updated weights for policy 0, policy_version 4415 (0.0019) [2024-01-05 14:19:14,848][00209] Fps is (10 sec: 4506.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 18092032. Throughput: 0: 895.0. Samples: 2017974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:19:14,851][00209] Avg episode reward: [(0, '33.788')] [2024-01-05 14:19:19,852][00209] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3485.0). Total num frames: 18104320. Throughput: 0: 858.8. Samples: 2022888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:19:19,857][00209] Avg episode reward: [(0, '32.438')] [2024-01-05 14:19:24,848][00209] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 18116608. Throughput: 0: 853.4. Samples: 2024944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:19:24,854][00209] Avg episode reward: [(0, '31.417')] [2024-01-05 14:19:26,265][24421] Updated weights for policy 0, policy_version 4425 (0.0022) [2024-01-05 14:19:29,848][00209] Fps is (10 sec: 3278.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 18137088. Throughput: 0: 872.1. Samples: 2029986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:19:29,851][00209] Avg episode reward: [(0, '29.722')] [2024-01-05 14:19:34,848][00209] Fps is (10 sec: 4505.6, 60 sec: 3550.4, 300 sec: 3512.8). Total num frames: 18161664. Throughput: 0: 890.7. Samples: 2036522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:19:34,850][00209] Avg episode reward: [(0, '29.154')] [2024-01-05 14:19:35,751][24421] Updated weights for policy 0, policy_version 4435 (0.0017) [2024-01-05 14:19:39,849][00209] Fps is (10 sec: 3686.2, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 18173952. Throughput: 0: 881.7. Samples: 2039330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:19:39,851][00209] Avg episode reward: [(0, '29.695')] [2024-01-05 14:19:44,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 18190336. Throughput: 0: 857.2. Samples: 2043490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:19:44,852][00209] Avg episode reward: [(0, '30.360')] [2024-01-05 14:19:48,703][24421] Updated weights for policy 0, policy_version 4445 (0.0014) [2024-01-05 14:19:49,848][00209] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 18210816. Throughput: 0: 887.2. Samples: 2048756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:19:49,850][00209] Avg episode reward: [(0, '28.890')] [2024-01-05 14:19:54,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18231296. Throughput: 0: 899.1. Samples: 2052032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:19:54,850][00209] Avg episode reward: [(0, '29.276')] [2024-01-05 14:19:59,572][24421] Updated weights for policy 0, policy_version 4455 (0.0026) [2024-01-05 14:19:59,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3550.1, 300 sec: 3499.0). Total num frames: 18247680. Throughput: 0: 884.0. Samples: 2057754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:19:59,851][00209] Avg episode reward: [(0, '30.974')] [2024-01-05 14:20:04,850][00209] Fps is (10 sec: 2866.5, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 18259968. Throughput: 0: 861.8. Samples: 2061668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:20:04,854][00209] Avg episode reward: [(0, '30.844')] [2024-01-05 14:20:09,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 18280448. Throughput: 0: 867.2. Samples: 2063968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:20:09,856][00209] Avg episode reward: [(0, '31.984')] [2024-01-05 14:20:09,871][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004463_18280448.pth... [2024-01-05 14:20:10,003][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004259_17444864.pth [2024-01-05 14:20:11,669][24421] Updated weights for policy 0, policy_version 4465 (0.0014) [2024-01-05 14:20:14,848][00209] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 18300928. Throughput: 0: 898.9. Samples: 2070436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:20:14,854][00209] Avg episode reward: [(0, '32.153')] [2024-01-05 14:20:19,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 18317312. Throughput: 0: 873.2. Samples: 2075816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:20:19,856][00209] Avg episode reward: [(0, '31.041')] [2024-01-05 14:20:24,151][24421] Updated weights for policy 0, policy_version 4475 (0.0015) [2024-01-05 14:20:24,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 18329600. Throughput: 0: 856.0. Samples: 2077848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:20:24,851][00209] Avg episode reward: [(0, '31.757')] [2024-01-05 14:20:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 18350080. Throughput: 0: 866.3. Samples: 2082474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:20:29,857][00209] Avg episode reward: [(0, '31.685')] [2024-01-05 14:20:34,598][24421] Updated weights for policy 0, policy_version 4485 (0.0019) [2024-01-05 14:20:34,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 18370560. Throughput: 0: 893.7. Samples: 2088972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:20:34,856][00209] Avg episode reward: [(0, '30.026')] [2024-01-05 14:20:39,848][00209] Fps is (10 sec: 3686.1, 60 sec: 3549.9, 300 sec: 3498.9). Total num frames: 18386944. Throughput: 0: 888.4. Samples: 2092012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:20:39,851][00209] Avg episode reward: [(0, '30.338')] [2024-01-05 14:20:44,851][00209] Fps is (10 sec: 2866.1, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 18399232. Throughput: 0: 852.7. Samples: 2096128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:20:44,854][00209] Avg episode reward: [(0, '29.938')] [2024-01-05 14:20:47,675][24421] Updated weights for policy 0, policy_version 4495 (0.0024) [2024-01-05 14:20:49,848][00209] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 18419712. Throughput: 0: 875.8. Samples: 2101076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:20:49,857][00209] Avg episode reward: [(0, '30.387')] [2024-01-05 14:20:54,848][00209] Fps is (10 sec: 4097.6, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18440192. Throughput: 0: 895.5. Samples: 2104266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:20:54,857][00209] Avg episode reward: [(0, '32.235')] [2024-01-05 14:20:57,703][24421] Updated weights for policy 0, policy_version 4505 (0.0020) [2024-01-05 14:20:59,848][00209] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18456576. Throughput: 0: 882.5. Samples: 2110148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:20:59,854][00209] Avg episode reward: [(0, '34.080')] [2024-01-05 14:20:59,863][24408] Saving new best policy, reward=34.080! [2024-01-05 14:21:04,853][00209] Fps is (10 sec: 2865.8, 60 sec: 3481.5, 300 sec: 3471.1). Total num frames: 18468864. Throughput: 0: 850.4. Samples: 2114086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:04,855][00209] Avg episode reward: [(0, '33.275')] [2024-01-05 14:21:09,848][00209] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 18485248. Throughput: 0: 850.8. Samples: 2116136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:09,850][00209] Avg episode reward: [(0, '32.876')] [2024-01-05 14:21:11,038][24421] Updated weights for policy 0, policy_version 4515 (0.0014) [2024-01-05 14:21:14,850][00209] Fps is (10 sec: 4096.9, 60 sec: 3481.4, 300 sec: 3512.8). Total num frames: 18509824. Throughput: 0: 885.9. Samples: 2122342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:14,856][00209] Avg episode reward: [(0, '33.134')] [2024-01-05 14:21:19,852][00209] Fps is (10 sec: 4094.4, 60 sec: 3481.4, 300 sec: 3512.8). Total num frames: 18526208. Throughput: 0: 871.7. Samples: 2128200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:19,856][00209] Avg episode reward: [(0, '31.932')] [2024-01-05 14:21:22,406][24421] Updated weights for policy 0, policy_version 4525 (0.0036) [2024-01-05 14:21:24,848][00209] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 18538496. Throughput: 0: 849.4. Samples: 2130234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:24,850][00209] Avg episode reward: [(0, '32.543')] [2024-01-05 14:21:29,848][00209] Fps is (10 sec: 2868.3, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 18554880. Throughput: 0: 848.1. Samples: 2134290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:21:29,857][00209] Avg episode reward: [(0, '31.894')] [2024-01-05 14:21:34,084][24421] Updated weights for policy 0, policy_version 4535 (0.0030) [2024-01-05 14:21:34,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 18575360. Throughput: 0: 880.8. Samples: 2140710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:34,850][00209] Avg episode reward: [(0, '30.919')] [2024-01-05 14:21:39,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18595840. Throughput: 0: 881.4. Samples: 2143928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:21:39,859][00209] Avg episode reward: [(0, '31.733')] [2024-01-05 14:21:44,848][00209] Fps is (10 sec: 3276.6, 60 sec: 3481.8, 300 sec: 3471.2). Total num frames: 18608128. Throughput: 0: 854.1. Samples: 2148582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:21:44,860][00209] Avg episode reward: [(0, '32.287')] [2024-01-05 14:21:46,504][24421] Updated weights for policy 0, policy_version 4545 (0.0028) [2024-01-05 14:21:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 18624512. Throughput: 0: 862.3. Samples: 2152884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:21:49,856][00209] Avg episode reward: [(0, '32.775')] [2024-01-05 14:21:54,848][00209] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 18649088. Throughput: 0: 888.2. Samples: 2156106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:21:54,850][00209] Avg episode reward: [(0, '32.533')] [2024-01-05 14:21:56,705][24421] Updated weights for policy 0, policy_version 4555 (0.0019) [2024-01-05 14:21:59,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18665472. Throughput: 0: 892.8. Samples: 2162516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:21:59,862][00209] Avg episode reward: [(0, '33.544')] [2024-01-05 14:22:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3485.1). Total num frames: 18681856. Throughput: 0: 857.9. Samples: 2166800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:22:04,850][00209] Avg episode reward: [(0, '32.239')] [2024-01-05 14:22:09,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 18694144. Throughput: 0: 857.6. Samples: 2168826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:22:09,858][00209] Avg episode reward: [(0, '31.637')] [2024-01-05 14:22:09,870][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004564_18694144.pth... [2024-01-05 14:22:09,997][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004360_17858560.pth [2024-01-05 14:22:10,265][24421] Updated weights for policy 0, policy_version 4565 (0.0017) [2024-01-05 14:22:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3499.0). Total num frames: 18714624. Throughput: 0: 892.6. Samples: 2174456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:22:14,856][00209] Avg episode reward: [(0, '29.091')] [2024-01-05 14:22:19,850][00209] Fps is (10 sec: 4094.9, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 18735104. Throughput: 0: 894.2. Samples: 2180952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:22:19,853][00209] Avg episode reward: [(0, '28.948')] [2024-01-05 14:22:19,921][24421] Updated weights for policy 0, policy_version 4575 (0.0029) [2024-01-05 14:22:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 18751488. Throughput: 0: 869.6. Samples: 2183058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:22:24,854][00209] Avg episode reward: [(0, '29.082')] [2024-01-05 14:22:29,848][00209] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 18763776. Throughput: 0: 857.3. Samples: 2187160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:22:29,851][00209] Avg episode reward: [(0, '29.586')] [2024-01-05 14:22:32,912][24421] Updated weights for policy 0, policy_version 4585 (0.0022) [2024-01-05 14:22:34,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 18788352. Throughput: 0: 892.6. Samples: 2193052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:22:34,856][00209] Avg episode reward: [(0, '29.019')] [2024-01-05 14:22:39,848][00209] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 18808832. Throughput: 0: 894.8. Samples: 2196370. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:22:39,850][00209] Avg episode reward: [(0, '30.421')] [2024-01-05 14:22:44,018][24421] Updated weights for policy 0, policy_version 4595 (0.0022) [2024-01-05 14:22:44,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 18821120. Throughput: 0: 867.1. Samples: 2201534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-01-05 14:22:44,850][00209] Avg episode reward: [(0, '31.356')] [2024-01-05 14:22:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 18837504. Throughput: 0: 863.1. Samples: 2205638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:22:49,852][00209] Avg episode reward: [(0, '31.540')] [2024-01-05 14:22:54,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 18857984. Throughput: 0: 882.0. Samples: 2208518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:22:54,854][00209] Avg episode reward: [(0, '30.899')] [2024-01-05 14:22:55,619][24421] Updated weights for policy 0, policy_version 4605 (0.0037) [2024-01-05 14:22:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 18878464. Throughput: 0: 901.1. Samples: 2215004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:22:59,850][00209] Avg episode reward: [(0, '29.469')] [2024-01-05 14:23:04,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 18890752. Throughput: 0: 862.7. Samples: 2219772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:23:04,856][00209] Avg episode reward: [(0, '28.286')] [2024-01-05 14:23:08,159][24421] Updated weights for policy 0, policy_version 4615 (0.0035) [2024-01-05 14:23:09,850][00209] Fps is (10 sec: 2866.6, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 18907136. Throughput: 0: 862.5. Samples: 2221872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:23:09,852][00209] Avg episode reward: [(0, '28.504')] [2024-01-05 14:23:14,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 18927616. Throughput: 0: 884.8. Samples: 2226978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:23:14,856][00209] Avg episode reward: [(0, '27.743')] [2024-01-05 14:23:18,640][24421] Updated weights for policy 0, policy_version 4625 (0.0017) [2024-01-05 14:23:19,848][00209] Fps is (10 sec: 4096.9, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 18948096. Throughput: 0: 898.1. Samples: 2233468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:23:19,850][00209] Avg episode reward: [(0, '27.385')] [2024-01-05 14:23:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 18964480. Throughput: 0: 881.5. Samples: 2236038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:23:24,852][00209] Avg episode reward: [(0, '28.021')] [2024-01-05 14:23:29,851][00209] Fps is (10 sec: 2866.1, 60 sec: 3549.7, 300 sec: 3485.1). Total num frames: 18976768. Throughput: 0: 858.0. Samples: 2240148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:23:29,854][00209] Avg episode reward: [(0, '29.413')] [2024-01-05 14:23:31,854][24421] Updated weights for policy 0, policy_version 4635 (0.0037) [2024-01-05 14:23:34,848][00209] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 18997248. Throughput: 0: 886.0. Samples: 2245506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:23:34,851][00209] Avg episode reward: [(0, '29.385')] [2024-01-05 14:23:39,848][00209] Fps is (10 sec: 4097.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 19017728. Throughput: 0: 895.1. Samples: 2248796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:23:39,858][00209] Avg episode reward: [(0, '30.121')] [2024-01-05 14:23:41,465][24421] Updated weights for policy 0, policy_version 4645 (0.0027) [2024-01-05 14:23:44,848][00209] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 19034112. Throughput: 0: 876.7. Samples: 2254456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:23:44,857][00209] Avg episode reward: [(0, '30.332')] [2024-01-05 14:23:49,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 19046400. Throughput: 0: 861.6. Samples: 2258546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:23:49,853][00209] Avg episode reward: [(0, '30.327')] [2024-01-05 14:23:54,482][24421] Updated weights for policy 0, policy_version 4655 (0.0013) [2024-01-05 14:23:54,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19066880. Throughput: 0: 868.4. Samples: 2260950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:23:54,850][00209] Avg episode reward: [(0, '30.422')] [2024-01-05 14:23:59,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.8). Total num frames: 19087360. Throughput: 0: 901.3. Samples: 2267538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:23:59,857][00209] Avg episode reward: [(0, '30.141')] [2024-01-05 14:24:04,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 19103744. Throughput: 0: 872.8. Samples: 2272746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:24:04,852][00209] Avg episode reward: [(0, '30.853')] [2024-01-05 14:24:05,588][24421] Updated weights for policy 0, policy_version 4665 (0.0022) [2024-01-05 14:24:09,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 19116032. Throughput: 0: 860.6. Samples: 2274764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:24:09,854][00209] Avg episode reward: [(0, '30.695')] [2024-01-05 14:24:09,872][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004667_19116032.pth... [2024-01-05 14:24:10,027][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004463_18280448.pth [2024-01-05 14:24:14,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19136512. Throughput: 0: 870.3. Samples: 2279310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:24:14,850][00209] Avg episode reward: [(0, '30.997')] [2024-01-05 14:24:17,485][24421] Updated weights for policy 0, policy_version 4675 (0.0028) [2024-01-05 14:24:19,850][00209] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3526.7). Total num frames: 19156992. Throughput: 0: 894.0. Samples: 2285740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:24:19,855][00209] Avg episode reward: [(0, '30.448')] [2024-01-05 14:24:24,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 19173376. Throughput: 0: 889.0. Samples: 2288802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:24:24,851][00209] Avg episode reward: [(0, '31.638')] [2024-01-05 14:24:29,848][00209] Fps is (10 sec: 2867.9, 60 sec: 3481.8, 300 sec: 3471.2). Total num frames: 19185664. Throughput: 0: 854.6. Samples: 2292914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-01-05 14:24:29,850][00209] Avg episode reward: [(0, '32.306')] [2024-01-05 14:24:29,923][24421] Updated weights for policy 0, policy_version 4685 (0.0026) [2024-01-05 14:24:34,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19206144. Throughput: 0: 871.2. Samples: 2297752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:24:34,859][00209] Avg episode reward: [(0, '32.584')] [2024-01-05 14:24:39,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 19226624. Throughput: 0: 889.7. Samples: 2300988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:24:39,850][00209] Avg episode reward: [(0, '32.065')] [2024-01-05 14:24:40,279][24421] Updated weights for policy 0, policy_version 4695 (0.0019) [2024-01-05 14:24:44,850][00209] Fps is (10 sec: 3685.5, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 19243008. Throughput: 0: 880.4. Samples: 2307160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:24:44,858][00209] Avg episode reward: [(0, '32.032')] [2024-01-05 14:24:49,849][00209] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 19259392. Throughput: 0: 855.9. Samples: 2311264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:24:49,852][00209] Avg episode reward: [(0, '32.505')] [2024-01-05 14:24:53,434][24421] Updated weights for policy 0, policy_version 4705 (0.0015) [2024-01-05 14:24:54,848][00209] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19275776. Throughput: 0: 857.2. Samples: 2313340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:24:54,851][00209] Avg episode reward: [(0, '32.505')] [2024-01-05 14:24:59,848][00209] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 19296256. Throughput: 0: 897.5. Samples: 2319698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:24:59,857][00209] Avg episode reward: [(0, '32.887')] [2024-01-05 14:25:03,469][24421] Updated weights for policy 0, policy_version 4715 (0.0013) [2024-01-05 14:25:04,849][00209] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 19312640. Throughput: 0: 879.2. Samples: 2325304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:25:04,851][00209] Avg episode reward: [(0, '32.452')] [2024-01-05 14:25:09,850][00209] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 19329024. Throughput: 0: 856.8. Samples: 2327360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:25:09,854][00209] Avg episode reward: [(0, '32.010')] [2024-01-05 14:25:14,848][00209] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19345408. Throughput: 0: 859.6. Samples: 2331598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:25:14,850][00209] Avg episode reward: [(0, '30.410')] [2024-01-05 14:25:16,413][24421] Updated weights for policy 0, policy_version 4725 (0.0018) [2024-01-05 14:25:19,848][00209] Fps is (10 sec: 3687.4, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 19365888. Throughput: 0: 895.1. Samples: 2338030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-01-05 14:25:19,850][00209] Avg episode reward: [(0, '28.898')] [2024-01-05 14:25:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 19386368. Throughput: 0: 896.1. Samples: 2341312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:25:24,850][00209] Avg episode reward: [(0, '28.753')] [2024-01-05 14:25:27,875][24421] Updated weights for policy 0, policy_version 4735 (0.0023) [2024-01-05 14:25:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 19398656. Throughput: 0: 855.1. Samples: 2345636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:25:29,852][00209] Avg episode reward: [(0, '26.993')] [2024-01-05 14:25:34,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19415040. Throughput: 0: 864.2. Samples: 2350150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:25:34,850][00209] Avg episode reward: [(0, '28.248')] [2024-01-05 14:25:39,152][24421] Updated weights for policy 0, policy_version 4745 (0.0017) [2024-01-05 14:25:39,848][00209] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 19435520. Throughput: 0: 891.0. Samples: 2353434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:25:39,857][00209] Avg episode reward: [(0, '26.716')] [2024-01-05 14:25:44,848][00209] Fps is (10 sec: 4095.9, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 19456000. Throughput: 0: 894.0. Samples: 2359928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:25:44,856][00209] Avg episode reward: [(0, '27.833')] [2024-01-05 14:25:49,848][00209] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 19468288. Throughput: 0: 860.4. Samples: 2364022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:25:49,850][00209] Avg episode reward: [(0, '28.878')] [2024-01-05 14:25:51,724][24421] Updated weights for policy 0, policy_version 4755 (0.0029) [2024-01-05 14:25:54,848][00209] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19484672. Throughput: 0: 859.3. Samples: 2366026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:25:54,853][00209] Avg episode reward: [(0, '29.698')] [2024-01-05 14:25:59,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 19505152. Throughput: 0: 897.1. Samples: 2371966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:25:59,853][00209] Avg episode reward: [(0, '29.981')] [2024-01-05 14:26:02,044][24421] Updated weights for policy 0, policy_version 4765 (0.0017) [2024-01-05 14:26:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 19525632. Throughput: 0: 890.3. Samples: 2378094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:26:04,858][00209] Avg episode reward: [(0, '31.200')] [2024-01-05 14:26:09,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 19537920. Throughput: 0: 862.3. Samples: 2380116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:26:09,856][00209] Avg episode reward: [(0, '29.242')] [2024-01-05 14:26:09,870][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004770_19537920.pth... [2024-01-05 14:26:10,038][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004564_18694144.pth [2024-01-05 14:26:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19554304. Throughput: 0: 855.7. Samples: 2384144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-01-05 14:26:14,850][00209] Avg episode reward: [(0, '30.223')] [2024-01-05 14:26:15,618][24421] Updated weights for policy 0, policy_version 4775 (0.0014) [2024-01-05 14:26:19,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 19574784. Throughput: 0: 890.1. Samples: 2390206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:26:19,853][00209] Avg episode reward: [(0, '29.594')] [2024-01-05 14:26:24,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 19595264. Throughput: 0: 888.4. Samples: 2393414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:26:24,850][00209] Avg episode reward: [(0, '29.489')] [2024-01-05 14:26:25,563][24421] Updated weights for policy 0, policy_version 4785 (0.0018) [2024-01-05 14:26:29,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19607552. Throughput: 0: 852.9. Samples: 2398308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:26:29,850][00209] Avg episode reward: [(0, '30.682')] [2024-01-05 14:26:34,848][00209] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19623936. Throughput: 0: 852.1. Samples: 2402368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:26:34,851][00209] Avg episode reward: [(0, '29.917')] [2024-01-05 14:26:38,329][24421] Updated weights for policy 0, policy_version 4795 (0.0035) [2024-01-05 14:26:39,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 19644416. Throughput: 0: 876.3. Samples: 2405458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:26:39,856][00209] Avg episode reward: [(0, '32.681')] [2024-01-05 14:26:44,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 19664896. Throughput: 0: 889.3. Samples: 2411986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:26:44,852][00209] Avg episode reward: [(0, '32.687')] [2024-01-05 14:26:49,824][24421] Updated weights for policy 0, policy_version 4805 (0.0014) [2024-01-05 14:26:49,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 19681280. Throughput: 0: 856.9. Samples: 2416656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:26:49,851][00209] Avg episode reward: [(0, '32.500')] [2024-01-05 14:26:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19693568. Throughput: 0: 857.7. Samples: 2418714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:26:54,850][00209] Avg episode reward: [(0, '31.710')] [2024-01-05 14:26:59,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19714048. Throughput: 0: 887.6. Samples: 2424084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:26:59,850][00209] Avg episode reward: [(0, '32.867')] [2024-01-05 14:27:01,208][24421] Updated weights for policy 0, policy_version 4815 (0.0025) [2024-01-05 14:27:04,848][00209] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 19734528. Throughput: 0: 894.0. Samples: 2430434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:27:04,855][00209] Avg episode reward: [(0, '31.847')] [2024-01-05 14:27:09,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 19750912. Throughput: 0: 875.1. Samples: 2432794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:09,852][00209] Avg episode reward: [(0, '30.232')] [2024-01-05 14:27:14,137][24421] Updated weights for policy 0, policy_version 4825 (0.0028) [2024-01-05 14:27:14,848][00209] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19763200. Throughput: 0: 857.7. Samples: 2436906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:14,852][00209] Avg episode reward: [(0, '29.600')] [2024-01-05 14:27:19,848][00209] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19783680. Throughput: 0: 890.1. Samples: 2442422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:19,850][00209] Avg episode reward: [(0, '30.718')] [2024-01-05 14:27:24,147][24421] Updated weights for policy 0, policy_version 4835 (0.0038) [2024-01-05 14:27:24,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 19804160. Throughput: 0: 893.5. Samples: 2445664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:24,850][00209] Avg episode reward: [(0, '30.003')] [2024-01-05 14:27:29,848][00209] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 19820544. Throughput: 0: 867.9. Samples: 2451042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:27:29,863][00209] Avg episode reward: [(0, '29.074')] [2024-01-05 14:27:34,851][00209] Fps is (10 sec: 2866.3, 60 sec: 3481.4, 300 sec: 3471.2). Total num frames: 19832832. Throughput: 0: 853.9. Samples: 2455084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:27:34,856][00209] Avg episode reward: [(0, '29.831')] [2024-01-05 14:27:37,408][24421] Updated weights for policy 0, policy_version 4845 (0.0012) [2024-01-05 14:27:39,850][00209] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 19853312. Throughput: 0: 867.3. Samples: 2457746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:39,857][00209] Avg episode reward: [(0, '29.725')] [2024-01-05 14:27:44,848][00209] Fps is (10 sec: 4097.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 19873792. Throughput: 0: 892.7. Samples: 2464254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-01-05 14:27:44,851][00209] Avg episode reward: [(0, '30.988')] [2024-01-05 14:27:47,444][24421] Updated weights for policy 0, policy_version 4855 (0.0015) [2024-01-05 14:27:49,850][00209] Fps is (10 sec: 3686.4, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 19890176. Throughput: 0: 866.4. Samples: 2469424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:49,852][00209] Avg episode reward: [(0, '30.648')] [2024-01-05 14:27:54,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 19902464. Throughput: 0: 859.2. Samples: 2471456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:27:54,862][00209] Avg episode reward: [(0, '30.146')] [2024-01-05 14:27:59,848][00209] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 19922944. Throughput: 0: 876.4. Samples: 2476342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:27:59,853][00209] Avg episode reward: [(0, '31.065')] [2024-01-05 14:28:00,139][24421] Updated weights for policy 0, policy_version 4865 (0.0020) [2024-01-05 14:28:04,848][00209] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 19943424. Throughput: 0: 896.1. Samples: 2482746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:28:04,850][00209] Avg episode reward: [(0, '31.621')] [2024-01-05 14:28:09,849][00209] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 19959808. Throughput: 0: 887.2. Samples: 2485588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-01-05 14:28:09,853][00209] Avg episode reward: [(0, '32.286')] [2024-01-05 14:28:09,868][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004873_19959808.pth... [2024-01-05 14:28:10,062][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004667_19116032.pth [2024-01-05 14:28:11,864][24421] Updated weights for policy 0, policy_version 4875 (0.0025) [2024-01-05 14:28:14,848][00209] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 19972096. Throughput: 0: 858.1. Samples: 2489658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-01-05 14:28:14,850][00209] Avg episode reward: [(0, '33.080')] [2024-01-05 14:28:19,848][00209] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 19992576. Throughput: 0: 880.0. Samples: 2494680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-01-05 14:28:19,853][00209] Avg episode reward: [(0, '33.267')] [2024-01-05 14:28:22,157][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-01-05 14:28:22,163][00209] Component Batcher_0 stopped! [2024-01-05 14:28:22,158][24408] Stopping Batcher_0... [2024-01-05 14:28:22,178][24408] Loop batcher_evt_loop terminating... [2024-01-05 14:28:22,243][24421] Weights refcount: 2 0 [2024-01-05 14:28:22,247][00209] Component InferenceWorker_p0-w0 stopped! [2024-01-05 14:28:22,250][00209] Component RolloutWorker_w4 stopped! [2024-01-05 14:28:22,250][24427] Stopping RolloutWorker_w4... [2024-01-05 14:28:22,253][24421] Stopping InferenceWorker_p0-w0... [2024-01-05 14:28:22,254][24421] Loop inference_proc0-0_evt_loop terminating... [2024-01-05 14:28:22,263][00209] Component RolloutWorker_w2 stopped! [2024-01-05 14:28:22,253][24427] Loop rollout_proc4_evt_loop terminating... [2024-01-05 14:28:22,265][24425] Stopping RolloutWorker_w2... [2024-01-05 14:28:22,270][00209] Component RolloutWorker_w6 stopped! [2024-01-05 14:28:22,272][24428] Stopping RolloutWorker_w6... [2024-01-05 14:28:22,267][24425] Loop rollout_proc2_evt_loop terminating... [2024-01-05 14:28:22,273][24428] Loop rollout_proc6_evt_loop terminating... [2024-01-05 14:28:22,284][24429] Stopping RolloutWorker_w7... [2024-01-05 14:28:22,285][24429] Loop rollout_proc7_evt_loop terminating... [2024-01-05 14:28:22,284][00209] Component RolloutWorker_w7 stopped! [2024-01-05 14:28:22,290][00209] Component RolloutWorker_w0 stopped! [2024-01-05 14:28:22,292][24422] Stopping RolloutWorker_w0... [2024-01-05 14:28:22,293][24422] Loop rollout_proc0_evt_loop terminating... [2024-01-05 14:28:22,300][24426] Stopping RolloutWorker_w5... [2024-01-05 14:28:22,301][24426] Loop rollout_proc5_evt_loop terminating... [2024-01-05 14:28:22,300][00209] Component RolloutWorker_w5 stopped! [2024-01-05 14:28:22,320][24424] Stopping RolloutWorker_w3... [2024-01-05 14:28:22,320][00209] Component RolloutWorker_w3 stopped! [2024-01-05 14:28:22,332][24424] Loop rollout_proc3_evt_loop terminating... [2024-01-05 14:28:22,341][24423] Stopping RolloutWorker_w1... [2024-01-05 14:28:22,341][00209] Component RolloutWorker_w1 stopped! [2024-01-05 14:28:22,341][24423] Loop rollout_proc1_evt_loop terminating... [2024-01-05 14:28:22,412][24408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004770_19537920.pth [2024-01-05 14:28:22,429][24408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-01-05 14:28:22,585][00209] Component LearnerWorker_p0 stopped! [2024-01-05 14:28:22,588][00209] Waiting for process learner_proc0 to stop... [2024-01-05 14:28:22,590][24408] Stopping LearnerWorker_p0... [2024-01-05 14:28:22,591][24408] Loop learner_proc0_evt_loop terminating... [2024-01-05 14:28:24,170][00209] Waiting for process inference_proc0-0 to join... [2024-01-05 14:28:24,173][00209] Waiting for process rollout_proc0 to join... [2024-01-05 14:28:25,975][00209] Waiting for process rollout_proc1 to join... [2024-01-05 14:28:26,078][00209] Waiting for process rollout_proc2 to join... [2024-01-05 14:28:26,080][00209] Waiting for process rollout_proc3 to join... [2024-01-05 14:28:26,082][00209] Waiting for process rollout_proc4 to join... [2024-01-05 14:28:26,089][00209] Waiting for process rollout_proc5 to join... [2024-01-05 14:28:26,091][00209] Waiting for process rollout_proc6 to join... [2024-01-05 14:28:26,093][00209] Waiting for process rollout_proc7 to join... [2024-01-05 14:28:26,095][00209] Batcher 0 profile tree view: batching: 67.7613, releasing_batches: 0.0692 [2024-01-05 14:28:26,097][00209] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 1321.2918 update_model: 20.7745 weight_update: 0.0023 one_step: 0.0052 handle_policy_step: 1419.1148 deserialize: 39.7168, stack: 7.7226, obs_to_device_normalize: 290.5124, forward: 745.2117, send_messages: 68.5268 prepare_outputs: 192.2016 to_cpu: 110.0362 [2024-01-05 14:28:26,098][00209] Learner 0 profile tree view: misc: 0.0121, prepare_batch: 29.0729 train: 177.6899 epoch_init: 0.0212, minibatch_init: 0.0160, losses_postprocess: 1.6526, kl_divergence: 1.6569, after_optimizer: 6.6476 calculate_losses: 61.6585 losses_init: 0.0271, forward_head: 2.7140, bptt_initial: 40.7501, tail: 2.5765, advantages_returns: 0.6003, losses: 9.2797 bptt: 4.8529 bptt_forward_core: 4.5305 update: 104.4450 clip: 2.1280 [2024-01-05 14:28:26,101][00209] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.7682, enqueue_policy_requests: 380.4173, env_step: 2169.1289, overhead: 56.3799, complete_rollouts: 18.7018 save_policy_outputs: 51.2718 split_output_tensors: 24.4754 [2024-01-05 14:28:26,102][00209] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.8776, enqueue_policy_requests: 381.9052, env_step: 2170.8044, overhead: 54.8182, complete_rollouts: 18.6596 save_policy_outputs: 49.6995 split_output_tensors: 23.3962 [2024-01-05 14:28:26,104][00209] Loop Runner_EvtLoop terminating... [2024-01-05 14:28:26,105][00209] Runner profile tree view: main_loop: 2893.9114 [2024-01-05 14:28:26,107][00209] Collected {0: 20004864}, FPS: 3452.1 [2024-01-05 14:28:26,159][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 14:28:26,160][00209] Overriding arg 'num_workers' with value 1 passed from command line [2024-01-05 14:28:26,162][00209] Adding new argument 'no_render'=True that is not in the saved config file! [2024-01-05 14:28:26,163][00209] Adding new argument 'save_video'=True that is not in the saved config file! [2024-01-05 14:28:26,163][00209] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 14:28:26,164][00209] Adding new argument 'video_name'=None that is not in the saved config file! [2024-01-05 14:28:26,165][00209] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 14:28:26,166][00209] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-01-05 14:28:26,167][00209] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-01-05 14:28:26,170][00209] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-01-05 14:28:26,171][00209] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-01-05 14:28:26,172][00209] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-01-05 14:28:26,173][00209] Adding new argument 'train_script'=None that is not in the saved config file! [2024-01-05 14:28:26,174][00209] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-01-05 14:28:26,175][00209] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-01-05 14:28:26,220][00209] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 14:28:26,222][00209] RunningMeanStd input shape: (1,) [2024-01-05 14:28:26,243][00209] ConvEncoder: input_channels=3 [2024-01-05 14:28:26,286][00209] Conv encoder output size: 512 [2024-01-05 14:28:26,288][00209] Policy head output size: 512 [2024-01-05 14:28:26,307][00209] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-01-05 14:28:26,883][00209] Num frames 100... [2024-01-05 14:28:27,073][00209] Num frames 200... [2024-01-05 14:28:27,262][00209] Num frames 300... [2024-01-05 14:28:27,458][00209] Num frames 400... [2024-01-05 14:28:27,638][00209] Num frames 500... [2024-01-05 14:28:27,819][00209] Num frames 600... [2024-01-05 14:28:28,005][00209] Num frames 700... [2024-01-05 14:28:28,186][00209] Num frames 800... [2024-01-05 14:28:28,372][00209] Num frames 900... [2024-01-05 14:28:28,551][00209] Num frames 1000... [2024-01-05 14:28:28,740][00209] Num frames 1100... [2024-01-05 14:28:28,921][00209] Num frames 1200... [2024-01-05 14:28:29,107][00209] Num frames 1300... [2024-01-05 14:28:29,292][00209] Num frames 1400... [2024-01-05 14:28:29,481][00209] Num frames 1500... [2024-01-05 14:28:29,657][00209] Num frames 1600... [2024-01-05 14:28:29,790][00209] Num frames 1700... [2024-01-05 14:28:29,922][00209] Num frames 1800... [2024-01-05 14:28:30,056][00209] Num frames 1900... [2024-01-05 14:28:30,184][00209] Num frames 2000... [2024-01-05 14:28:30,319][00209] Num frames 2100... [2024-01-05 14:28:30,371][00209] Avg episode rewards: #0: 55.999, true rewards: #0: 21.000 [2024-01-05 14:28:30,373][00209] Avg episode reward: 55.999, avg true_objective: 21.000 [2024-01-05 14:28:30,513][00209] Num frames 2200... [2024-01-05 14:28:30,653][00209] Num frames 2300... [2024-01-05 14:28:30,791][00209] Num frames 2400... [2024-01-05 14:28:30,929][00209] Num frames 2500... [2024-01-05 14:28:31,006][00209] Avg episode rewards: #0: 32.579, true rewards: #0: 12.580 [2024-01-05 14:28:31,009][00209] Avg episode reward: 32.579, avg true_objective: 12.580 [2024-01-05 14:28:31,120][00209] Num frames 2600... [2024-01-05 14:28:31,252][00209] Num frames 2700... [2024-01-05 14:28:31,377][00209] Num frames 2800... [2024-01-05 14:28:31,514][00209] Num frames 2900... [2024-01-05 14:28:31,645][00209] Num frames 3000... [2024-01-05 14:28:31,775][00209] Num frames 3100... [2024-01-05 14:28:31,902][00209] Num frames 3200... [2024-01-05 14:28:32,034][00209] Num frames 3300... [2024-01-05 14:28:32,168][00209] Num frames 3400... [2024-01-05 14:28:32,298][00209] Num frames 3500... [2024-01-05 14:28:32,428][00209] Num frames 3600... [2024-01-05 14:28:32,568][00209] Num frames 3700... [2024-01-05 14:28:32,702][00209] Num frames 3800... [2024-01-05 14:28:32,829][00209] Num frames 3900... [2024-01-05 14:28:32,962][00209] Num frames 4000... [2024-01-05 14:28:33,098][00209] Num frames 4100... [2024-01-05 14:28:33,232][00209] Num frames 4200... [2024-01-05 14:28:33,365][00209] Num frames 4300... [2024-01-05 14:28:33,503][00209] Num frames 4400... [2024-01-05 14:28:33,633][00209] Num frames 4500... [2024-01-05 14:28:33,765][00209] Num frames 4600... [2024-01-05 14:28:33,842][00209] Avg episode rewards: #0: 40.386, true rewards: #0: 15.387 [2024-01-05 14:28:33,843][00209] Avg episode reward: 40.386, avg true_objective: 15.387 [2024-01-05 14:28:33,955][00209] Num frames 4700... [2024-01-05 14:28:34,089][00209] Num frames 4800... [2024-01-05 14:28:34,220][00209] Num frames 4900... [2024-01-05 14:28:34,348][00209] Num frames 5000... [2024-01-05 14:28:34,479][00209] Num frames 5100... [2024-01-05 14:28:34,615][00209] Num frames 5200... [2024-01-05 14:28:34,707][00209] Avg episode rewards: #0: 33.799, true rewards: #0: 13.050 [2024-01-05 14:28:34,708][00209] Avg episode reward: 33.799, avg true_objective: 13.050 [2024-01-05 14:28:34,813][00209] Num frames 5300... [2024-01-05 14:28:34,945][00209] Num frames 5400... [2024-01-05 14:28:35,055][00209] Avg episode rewards: #0: 28.088, true rewards: #0: 10.888 [2024-01-05 14:28:35,057][00209] Avg episode reward: 28.088, avg true_objective: 10.888 [2024-01-05 14:28:35,131][00209] Num frames 5500... [2024-01-05 14:28:35,262][00209] Num frames 5600... [2024-01-05 14:28:35,389][00209] Num frames 5700... [2024-01-05 14:28:35,520][00209] Num frames 5800... [2024-01-05 14:28:35,659][00209] Num frames 5900... [2024-01-05 14:28:35,788][00209] Num frames 6000... [2024-01-05 14:28:35,917][00209] Num frames 6100... [2024-01-05 14:28:36,050][00209] Num frames 6200... [2024-01-05 14:28:36,181][00209] Num frames 6300... [2024-01-05 14:28:36,309][00209] Num frames 6400... [2024-01-05 14:28:36,441][00209] Num frames 6500... [2024-01-05 14:28:36,574][00209] Num frames 6600... [2024-01-05 14:28:36,705][00209] Num frames 6700... [2024-01-05 14:28:36,841][00209] Num frames 6800... [2024-01-05 14:28:36,992][00209] Avg episode rewards: #0: 30.288, true rewards: #0: 11.455 [2024-01-05 14:28:36,993][00209] Avg episode reward: 30.288, avg true_objective: 11.455 [2024-01-05 14:28:37,032][00209] Num frames 6900... [2024-01-05 14:28:37,161][00209] Num frames 7000... [2024-01-05 14:28:37,290][00209] Num frames 7100... [2024-01-05 14:28:37,420][00209] Num frames 7200... [2024-01-05 14:28:37,554][00209] Num frames 7300... [2024-01-05 14:28:37,694][00209] Num frames 7400... [2024-01-05 14:28:37,826][00209] Num frames 7500... [2024-01-05 14:28:37,955][00209] Num frames 7600... [2024-01-05 14:28:38,084][00209] Num frames 7700... [2024-01-05 14:28:38,212][00209] Num frames 7800... [2024-01-05 14:28:38,341][00209] Num frames 7900... [2024-01-05 14:28:38,471][00209] Avg episode rewards: #0: 30.084, true rewards: #0: 11.370 [2024-01-05 14:28:38,472][00209] Avg episode reward: 30.084, avg true_objective: 11.370 [2024-01-05 14:28:38,528][00209] Num frames 8000... [2024-01-05 14:28:38,661][00209] Num frames 8100... [2024-01-05 14:28:38,789][00209] Num frames 8200... [2024-01-05 14:28:38,913][00209] Num frames 8300... [2024-01-05 14:28:39,044][00209] Num frames 8400... [2024-01-05 14:28:39,171][00209] Num frames 8500... [2024-01-05 14:28:39,294][00209] Num frames 8600... [2024-01-05 14:28:39,421][00209] Num frames 8700... [2024-01-05 14:28:39,546][00209] Num frames 8800... [2024-01-05 14:28:39,732][00209] Num frames 8900... [2024-01-05 14:28:39,915][00209] Num frames 9000... [2024-01-05 14:28:40,102][00209] Num frames 9100... [2024-01-05 14:28:40,177][00209] Avg episode rewards: #0: 29.759, true rewards: #0: 11.384 [2024-01-05 14:28:40,180][00209] Avg episode reward: 29.759, avg true_objective: 11.384 [2024-01-05 14:28:40,345][00209] Num frames 9200... [2024-01-05 14:28:40,524][00209] Num frames 9300... [2024-01-05 14:28:40,722][00209] Num frames 9400... [2024-01-05 14:28:40,907][00209] Num frames 9500... [2024-01-05 14:28:41,096][00209] Num frames 9600... [2024-01-05 14:28:41,269][00209] Num frames 9700... [2024-01-05 14:28:41,444][00209] Num frames 9800... [2024-01-05 14:28:41,627][00209] Num frames 9900... [2024-01-05 14:28:41,820][00209] Num frames 10000... [2024-01-05 14:28:42,011][00209] Num frames 10100... [2024-01-05 14:28:42,192][00209] Num frames 10200... [2024-01-05 14:28:42,291][00209] Avg episode rewards: #0: 29.024, true rewards: #0: 11.358 [2024-01-05 14:28:42,293][00209] Avg episode reward: 29.024, avg true_objective: 11.358 [2024-01-05 14:28:42,435][00209] Num frames 10300... [2024-01-05 14:28:42,605][00209] Num frames 10400... [2024-01-05 14:28:42,736][00209] Num frames 10500... [2024-01-05 14:28:42,876][00209] Num frames 10600... [2024-01-05 14:28:43,013][00209] Num frames 10700... [2024-01-05 14:28:43,144][00209] Num frames 10800... [2024-01-05 14:28:43,280][00209] Num frames 10900... [2024-01-05 14:28:43,414][00209] Num frames 11000... [2024-01-05 14:28:43,541][00209] Num frames 11100... [2024-01-05 14:28:43,676][00209] Num frames 11200... [2024-01-05 14:28:43,813][00209] Num frames 11300... [2024-01-05 14:28:43,958][00209] Num frames 11400... [2024-01-05 14:28:44,088][00209] Num frames 11500... [2024-01-05 14:28:44,217][00209] Num frames 11600... [2024-01-05 14:28:44,344][00209] Num frames 11700... [2024-01-05 14:28:44,478][00209] Num frames 11800... [2024-01-05 14:28:44,610][00209] Num frames 11900... [2024-01-05 14:28:44,734][00209] Avg episode rewards: #0: 30.850, true rewards: #0: 11.950 [2024-01-05 14:28:44,735][00209] Avg episode reward: 30.850, avg true_objective: 11.950 [2024-01-05 14:29:57,409][00209] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-01-05 14:29:57,449][00209] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-01-05 14:29:57,451][00209] Overriding arg 'num_workers' with value 1 passed from command line [2024-01-05 14:29:57,453][00209] Adding new argument 'no_render'=True that is not in the saved config file! [2024-01-05 14:29:57,458][00209] Adding new argument 'save_video'=True that is not in the saved config file! [2024-01-05 14:29:57,459][00209] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-01-05 14:29:57,462][00209] Adding new argument 'video_name'=None that is not in the saved config file! [2024-01-05 14:29:57,463][00209] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-01-05 14:29:57,464][00209] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-01-05 14:29:57,465][00209] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-01-05 14:29:57,466][00209] Adding new argument 'hf_repository'='gchindemi/appo-vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-01-05 14:29:57,468][00209] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-01-05 14:29:57,469][00209] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-01-05 14:29:57,470][00209] Adding new argument 'train_script'=None that is not in the saved config file! [2024-01-05 14:29:57,471][00209] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-01-05 14:29:57,472][00209] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-01-05 14:29:57,506][00209] RunningMeanStd input shape: (3, 72, 128) [2024-01-05 14:29:57,507][00209] RunningMeanStd input shape: (1,) [2024-01-05 14:29:57,521][00209] ConvEncoder: input_channels=3 [2024-01-05 14:29:57,559][00209] Conv encoder output size: 512 [2024-01-05 14:29:57,561][00209] Policy head output size: 512 [2024-01-05 14:29:57,580][00209] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... [2024-01-05 14:29:58,019][00209] Num frames 100... [2024-01-05 14:29:58,151][00209] Num frames 200... [2024-01-05 14:29:58,280][00209] Num frames 300... [2024-01-05 14:29:58,421][00209] Num frames 400... [2024-01-05 14:29:58,575][00209] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 [2024-01-05 14:29:58,577][00209] Avg episode reward: 6.800, avg true_objective: 4.800 [2024-01-05 14:29:58,604][00209] Num frames 500... [2024-01-05 14:29:58,733][00209] Num frames 600... [2024-01-05 14:29:58,862][00209] Num frames 700... [2024-01-05 14:29:58,991][00209] Num frames 800... [2024-01-05 14:29:59,116][00209] Num frames 900... [2024-01-05 14:29:59,243][00209] Num frames 1000... [2024-01-05 14:29:59,368][00209] Num frames 1100... [2024-01-05 14:29:59,503][00209] Num frames 1200... [2024-01-05 14:29:59,629][00209] Num frames 1300... [2024-01-05 14:29:59,787][00209] Avg episode rewards: #0: 14.390, true rewards: #0: 6.890 [2024-01-05 14:29:59,788][00209] Avg episode reward: 14.390, avg true_objective: 6.890 [2024-01-05 14:29:59,820][00209] Num frames 1400... [2024-01-05 14:29:59,947][00209] Num frames 1500... [2024-01-05 14:30:00,077][00209] Num frames 1600... [2024-01-05 14:30:00,214][00209] Num frames 1700... [2024-01-05 14:30:00,342][00209] Num frames 1800... [2024-01-05 14:30:00,445][00209] Avg episode rewards: #0: 13.113, true rewards: #0: 6.113 [2024-01-05 14:30:00,448][00209] Avg episode reward: 13.113, avg true_objective: 6.113 [2024-01-05 14:30:00,532][00209] Num frames 1900... [2024-01-05 14:30:00,663][00209] Num frames 2000... [2024-01-05 14:30:00,816][00209] Num frames 2100... [2024-01-05 14:30:01,022][00209] Num frames 2200... [2024-01-05 14:30:01,203][00209] Num frames 2300... [2024-01-05 14:30:01,390][00209] Num frames 2400... [2024-01-05 14:30:01,594][00209] Num frames 2500... [2024-01-05 14:30:01,664][00209] Avg episode rewards: #0: 13.015, true rewards: #0: 6.265 [2024-01-05 14:30:01,666][00209] Avg episode reward: 13.015, avg true_objective: 6.265 [2024-01-05 14:30:01,838][00209] Num frames 2600... [2024-01-05 14:30:02,023][00209] Num frames 2700... [2024-01-05 14:30:02,208][00209] Num frames 2800... [2024-01-05 14:30:02,394][00209] Num frames 2900... [2024-01-05 14:30:02,586][00209] Num frames 3000... [2024-01-05 14:30:02,772][00209] Num frames 3100... [2024-01-05 14:30:02,957][00209] Avg episode rewards: #0: 13.338, true rewards: #0: 6.338 [2024-01-05 14:30:02,959][00209] Avg episode reward: 13.338, avg true_objective: 6.338 [2024-01-05 14:30:03,023][00209] Num frames 3200... [2024-01-05 14:30:03,205][00209] Num frames 3300... [2024-01-05 14:30:03,383][00209] Num frames 3400... [2024-01-05 14:30:03,571][00209] Num frames 3500... [2024-01-05 14:30:03,766][00209] Num frames 3600... [2024-01-05 14:30:03,947][00209] Num frames 3700... [2024-01-05 14:30:04,081][00209] Num frames 3800... [2024-01-05 14:30:04,213][00209] Num frames 3900... [2024-01-05 14:30:04,336][00209] Num frames 4000... [2024-01-05 14:30:04,469][00209] Num frames 4100... [2024-01-05 14:30:04,600][00209] Num frames 4200... [2024-01-05 14:30:04,739][00209] Num frames 4300... [2024-01-05 14:30:04,868][00209] Num frames 4400... [2024-01-05 14:30:04,994][00209] Num frames 4500... [2024-01-05 14:30:05,122][00209] Num frames 4600... [2024-01-05 14:30:05,245][00209] Num frames 4700... [2024-01-05 14:30:05,371][00209] Num frames 4800... [2024-01-05 14:30:05,496][00209] Num frames 4900... [2024-01-05 14:30:05,624][00209] Num frames 5000... [2024-01-05 14:30:05,806][00209] Avg episode rewards: #0: 20.648, true rewards: #0: 8.482 [2024-01-05 14:30:05,808][00209] Avg episode reward: 20.648, avg true_objective: 8.482 [2024-01-05 14:30:05,825][00209] Num frames 5100... [2024-01-05 14:30:05,956][00209] Num frames 5200... [2024-01-05 14:30:06,084][00209] Num frames 5300... [2024-01-05 14:30:06,209][00209] Num frames 5400... [2024-01-05 14:30:06,334][00209] Num frames 5500... [2024-01-05 14:30:06,468][00209] Num frames 5600... [2024-01-05 14:30:06,595][00209] Num frames 5700... [2024-01-05 14:30:06,772][00209] Avg episode rewards: #0: 19.554, true rewards: #0: 8.269 [2024-01-05 14:30:06,774][00209] Avg episode reward: 19.554, avg true_objective: 8.269 [2024-01-05 14:30:06,795][00209] Num frames 5800... [2024-01-05 14:30:06,931][00209] Num frames 5900... [2024-01-05 14:30:07,058][00209] Num frames 6000... [2024-01-05 14:30:07,186][00209] Num frames 6100... [2024-01-05 14:30:07,314][00209] Num frames 6200... [2024-01-05 14:30:07,443][00209] Num frames 6300... [2024-01-05 14:30:07,572][00209] Num frames 6400... [2024-01-05 14:30:07,699][00209] Num frames 6500... [2024-01-05 14:30:07,835][00209] Num frames 6600... [2024-01-05 14:30:07,962][00209] Num frames 6700... [2024-01-05 14:30:08,094][00209] Num frames 6800... [2024-01-05 14:30:08,221][00209] Num frames 6900... [2024-01-05 14:30:08,347][00209] Num frames 7000... [2024-01-05 14:30:08,486][00209] Num frames 7100... [2024-01-05 14:30:08,620][00209] Num frames 7200... [2024-01-05 14:30:08,755][00209] Num frames 7300... [2024-01-05 14:30:08,890][00209] Num frames 7400... [2024-01-05 14:30:09,021][00209] Num frames 7500... [2024-01-05 14:30:09,150][00209] Num frames 7600... [2024-01-05 14:30:09,275][00209] Num frames 7700... [2024-01-05 14:30:09,403][00209] Num frames 7800... [2024-01-05 14:30:09,571][00209] Avg episode rewards: #0: 24.235, true rewards: #0: 9.860 [2024-01-05 14:30:09,572][00209] Avg episode reward: 24.235, avg true_objective: 9.860 [2024-01-05 14:30:09,591][00209] Num frames 7900... [2024-01-05 14:30:09,722][00209] Num frames 8000... [2024-01-05 14:30:09,863][00209] Num frames 8100... [2024-01-05 14:30:09,993][00209] Num frames 8200... [2024-01-05 14:30:10,124][00209] Num frames 8300... [2024-01-05 14:30:10,255][00209] Num frames 8400... [2024-01-05 14:30:10,384][00209] Num frames 8500... [2024-01-05 14:30:10,518][00209] Num frames 8600... [2024-01-05 14:30:10,646][00209] Num frames 8700... [2024-01-05 14:30:10,779][00209] Num frames 8800... [2024-01-05 14:30:10,915][00209] Num frames 8900... [2024-01-05 14:30:11,049][00209] Num frames 9000... [2024-01-05 14:30:11,181][00209] Num frames 9100... [2024-01-05 14:30:11,310][00209] Num frames 9200... [2024-01-05 14:30:11,442][00209] Num frames 9300... [2024-01-05 14:30:11,571][00209] Num frames 9400... [2024-01-05 14:30:11,703][00209] Num frames 9500... [2024-01-05 14:30:11,837][00209] Num frames 9600... [2024-01-05 14:30:11,977][00209] Num frames 9700... [2024-01-05 14:30:12,107][00209] Num frames 9800... [2024-01-05 14:30:12,240][00209] Num frames 9900... [2024-01-05 14:30:12,412][00209] Avg episode rewards: #0: 28.097, true rewards: #0: 11.098 [2024-01-05 14:30:12,414][00209] Avg episode reward: 28.097, avg true_objective: 11.098 [2024-01-05 14:30:12,435][00209] Num frames 10000... [2024-01-05 14:30:12,563][00209] Num frames 10100... [2024-01-05 14:30:12,694][00209] Num frames 10200... [2024-01-05 14:30:12,823][00209] Num frames 10300... [2024-01-05 14:30:12,970][00209] Num frames 10400... [2024-01-05 14:30:13,105][00209] Num frames 10500... [2024-01-05 14:30:13,240][00209] Num frames 10600... [2024-01-05 14:30:13,371][00209] Num frames 10700... [2024-01-05 14:30:13,499][00209] Num frames 10800... [2024-01-05 14:30:13,625][00209] Num frames 10900... [2024-01-05 14:30:13,756][00209] Num frames 11000... [2024-01-05 14:30:13,886][00209] Num frames 11100... [2024-01-05 14:30:14,078][00209] Num frames 11200... [2024-01-05 14:30:14,263][00209] Num frames 11300... [2024-01-05 14:30:14,450][00209] Num frames 11400... [2024-01-05 14:30:14,627][00209] Num frames 11500... [2024-01-05 14:30:14,814][00209] Num frames 11600... [2024-01-05 14:30:15,039][00209] Avg episode rewards: #0: 29.784, true rewards: #0: 11.684 [2024-01-05 14:30:15,041][00209] Avg episode reward: 29.784, avg true_objective: 11.684 [2024-01-05 14:31:26,654][00209] Replay video saved to /content/train_dir/default_experiment/replay.mp4!