[2023-02-26 18:45:22,683][08036] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 18:45:22,686][08036] Rollout worker 0 uses device cpu [2023-02-26 18:45:22,689][08036] Rollout worker 1 uses device cpu [2023-02-26 18:45:22,693][08036] Rollout worker 2 uses device cpu [2023-02-26 18:45:22,695][08036] Rollout worker 3 uses device cpu [2023-02-26 18:45:22,696][08036] Rollout worker 4 uses device cpu [2023-02-26 18:45:22,698][08036] Rollout worker 5 uses device cpu [2023-02-26 18:45:22,699][08036] Rollout worker 6 uses device cpu [2023-02-26 18:45:22,700][08036] Rollout worker 7 uses device cpu [2023-02-26 18:45:22,895][08036] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 18:45:22,897][08036] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 18:45:22,929][08036] Starting all processes... [2023-02-26 18:45:22,931][08036] Starting process learner_proc0 [2023-02-26 18:45:22,992][08036] Starting all processes... [2023-02-26 18:45:23,004][08036] Starting process inference_proc0-0 [2023-02-26 18:45:23,005][08036] Starting process rollout_proc0 [2023-02-26 18:45:23,007][08036] Starting process rollout_proc1 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc2 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc3 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc4 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc5 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc6 [2023-02-26 18:45:23,008][08036] Starting process rollout_proc7 [2023-02-26 18:45:33,281][13517] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 18:45:33,288][13517] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 18:45:34,031][13531] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 18:45:34,042][13531] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 18:45:34,901][13536] Worker 3 uses CPU cores [1] [2023-02-26 18:45:35,028][13531] Num visible devices: 1 [2023-02-26 18:45:35,036][13517] Num visible devices: 1 [2023-02-26 18:45:35,070][13517] Starting seed is not provided [2023-02-26 18:45:35,070][13517] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 18:45:35,071][13517] Initializing actor-critic model on device cuda:0 [2023-02-26 18:45:35,077][13517] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 18:45:35,082][13517] RunningMeanStd input shape: (1,) [2023-02-26 18:45:35,277][13517] ConvEncoder: input_channels=3 [2023-02-26 18:45:35,345][13538] Worker 6 uses CPU cores [0] [2023-02-26 18:45:35,359][13537] Worker 5 uses CPU cores [1] [2023-02-26 18:45:35,387][13535] Worker 4 uses CPU cores [0] [2023-02-26 18:45:35,461][13532] Worker 0 uses CPU cores [0] [2023-02-26 18:45:35,486][13539] Worker 7 uses CPU cores [1] [2023-02-26 18:45:35,496][13533] Worker 2 uses CPU cores [0] [2023-02-26 18:45:35,680][13534] Worker 1 uses CPU cores [1] [2023-02-26 18:45:35,908][13517] Conv encoder output size: 512 [2023-02-26 18:45:35,909][13517] Policy head output size: 512 [2023-02-26 18:45:35,994][13517] Created Actor Critic model with architecture: [2023-02-26 18:45:35,994][13517] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 18:45:42,888][08036] Heartbeat connected on Batcher_0 [2023-02-26 18:45:42,896][08036] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 18:45:42,906][08036] Heartbeat connected on RolloutWorker_w0 [2023-02-26 18:45:42,910][08036] Heartbeat connected on RolloutWorker_w1 [2023-02-26 18:45:42,913][08036] Heartbeat connected on RolloutWorker_w2 [2023-02-26 18:45:42,918][08036] Heartbeat connected on RolloutWorker_w3 [2023-02-26 18:45:42,919][08036] Heartbeat connected on RolloutWorker_w4 [2023-02-26 18:45:42,923][08036] Heartbeat connected on RolloutWorker_w5 [2023-02-26 18:45:42,927][08036] Heartbeat connected on RolloutWorker_w6 [2023-02-26 18:45:42,931][08036] Heartbeat connected on RolloutWorker_w7 [2023-02-26 18:45:44,357][13517] Using optimizer [2023-02-26 18:45:44,358][13517] No checkpoints found [2023-02-26 18:45:44,359][13517] Did not load from checkpoint, starting from scratch! [2023-02-26 18:45:44,359][13517] Initialized policy 0 weights for model version 0 [2023-02-26 18:45:44,363][13517] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 18:45:44,371][13517] LearnerWorker_p0 finished initialization! [2023-02-26 18:45:44,372][08036] Heartbeat connected on LearnerWorker_p0 [2023-02-26 18:45:44,470][13531] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 18:45:44,472][13531] RunningMeanStd input shape: (1,) [2023-02-26 18:45:44,490][13531] ConvEncoder: input_channels=3 [2023-02-26 18:45:44,594][13531] Conv encoder output size: 512 [2023-02-26 18:45:44,594][13531] Policy head output size: 512 [2023-02-26 18:45:46,865][08036] Inference worker 0-0 is ready! [2023-02-26 18:45:46,868][08036] All inference workers are ready! Signal rollout workers to start! [2023-02-26 18:45:47,028][13539] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,056][13536] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,072][13537] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,090][13534] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,130][13533] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,124][13538] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,127][13535] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,141][13532] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 18:45:47,690][08036] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 18:45:48,808][13532] Decorrelating experience for 0 frames... [2023-02-26 18:45:48,813][13533] Decorrelating experience for 0 frames... [2023-02-26 18:45:48,930][13534] Decorrelating experience for 0 frames... [2023-02-26 18:45:48,934][13537] Decorrelating experience for 0 frames... [2023-02-26 18:45:48,935][13536] Decorrelating experience for 0 frames... [2023-02-26 18:45:48,938][13539] Decorrelating experience for 0 frames... [2023-02-26 18:45:50,068][13539] Decorrelating experience for 32 frames... [2023-02-26 18:45:50,070][13536] Decorrelating experience for 32 frames... [2023-02-26 18:45:50,091][13534] Decorrelating experience for 32 frames... [2023-02-26 18:45:50,671][13533] Decorrelating experience for 32 frames... [2023-02-26 18:45:50,773][13538] Decorrelating experience for 0 frames... [2023-02-26 18:45:50,772][13535] Decorrelating experience for 0 frames... [2023-02-26 18:45:51,515][13539] Decorrelating experience for 64 frames... [2023-02-26 18:45:51,750][13535] Decorrelating experience for 32 frames... [2023-02-26 18:45:51,922][13533] Decorrelating experience for 64 frames... [2023-02-26 18:45:51,986][13536] Decorrelating experience for 64 frames... [2023-02-26 18:45:52,576][13538] Decorrelating experience for 32 frames... [2023-02-26 18:45:52,686][08036] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 18:45:52,866][13532] Decorrelating experience for 32 frames... [2023-02-26 18:45:53,282][13539] Decorrelating experience for 96 frames... [2023-02-26 18:45:53,567][13534] Decorrelating experience for 64 frames... [2023-02-26 18:45:53,862][13536] Decorrelating experience for 96 frames... [2023-02-26 18:45:53,996][13535] Decorrelating experience for 64 frames... [2023-02-26 18:45:54,236][13538] Decorrelating experience for 64 frames... [2023-02-26 18:45:54,554][13532] Decorrelating experience for 64 frames... [2023-02-26 18:45:55,175][13537] Decorrelating experience for 32 frames... [2023-02-26 18:45:55,292][13534] Decorrelating experience for 96 frames... [2023-02-26 18:45:55,595][13533] Decorrelating experience for 96 frames... [2023-02-26 18:45:55,686][13535] Decorrelating experience for 96 frames... [2023-02-26 18:45:56,019][13537] Decorrelating experience for 64 frames... [2023-02-26 18:45:56,198][13538] Decorrelating experience for 96 frames... [2023-02-26 18:45:56,283][13532] Decorrelating experience for 96 frames... [2023-02-26 18:45:56,459][13537] Decorrelating experience for 96 frames... [2023-02-26 18:45:57,686][08036] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.6. Samples: 36. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 18:46:00,268][13517] Signal inference workers to stop experience collection... [2023-02-26 18:46:00,275][13531] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 18:46:02,686][08036] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 147.6. Samples: 2214. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 18:46:02,688][08036] Avg episode reward: [(0, '1.837')] [2023-02-26 18:46:02,782][13517] Signal inference workers to resume experience collection... [2023-02-26 18:46:02,783][13531] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 18:46:07,691][08036] Fps is (10 sec: 1637.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 164.2. Samples: 3284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-26 18:46:07,694][08036] Avg episode reward: [(0, '3.349')] [2023-02-26 18:46:12,687][08036] Fps is (10 sec: 3276.3, 60 sec: 1310.9, 300 sec: 1310.9). Total num frames: 32768. Throughput: 0: 296.4. Samples: 7408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 18:46:12,690][08036] Avg episode reward: [(0, '3.845')] [2023-02-26 18:46:15,973][13531] Updated weights for policy 0, policy_version 10 (0.0369) [2023-02-26 18:46:17,692][08036] Fps is (10 sec: 2457.4, 60 sec: 1365.2, 300 sec: 1365.2). Total num frames: 40960. Throughput: 0: 373.6. Samples: 11208. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 18:46:17,695][08036] Avg episode reward: [(0, '4.189')] [2023-02-26 18:46:22,688][08036] Fps is (10 sec: 2457.5, 60 sec: 1638.5, 300 sec: 1638.5). Total num frames: 57344. Throughput: 0: 380.0. Samples: 13300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:46:22,695][08036] Avg episode reward: [(0, '4.365')] [2023-02-26 18:46:27,687][08036] Fps is (10 sec: 3278.6, 60 sec: 1843.4, 300 sec: 1843.4). Total num frames: 73728. Throughput: 0: 449.4. Samples: 17976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:46:27,696][08036] Avg episode reward: [(0, '4.325')] [2023-02-26 18:46:30,538][13531] Updated weights for policy 0, policy_version 20 (0.0019) [2023-02-26 18:46:32,686][08036] Fps is (10 sec: 2867.7, 60 sec: 1911.6, 300 sec: 1911.6). Total num frames: 86016. Throughput: 0: 489.2. Samples: 22010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:46:32,694][08036] Avg episode reward: [(0, '4.433')] [2023-02-26 18:46:37,686][08036] Fps is (10 sec: 2867.4, 60 sec: 2048.2, 300 sec: 2048.2). Total num frames: 102400. Throughput: 0: 536.5. Samples: 24142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:46:37,689][08036] Avg episode reward: [(0, '4.377')] [2023-02-26 18:46:37,696][13517] Saving new best policy, reward=4.377! [2023-02-26 18:46:42,051][13531] Updated weights for policy 0, policy_version 30 (0.0013) [2023-02-26 18:46:42,686][08036] Fps is (10 sec: 3686.4, 60 sec: 2234.3, 300 sec: 2234.3). Total num frames: 122880. Throughput: 0: 668.6. Samples: 30122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:46:42,689][08036] Avg episode reward: [(0, '4.444')] [2023-02-26 18:46:42,696][13517] Saving new best policy, reward=4.444! [2023-02-26 18:46:47,687][08036] Fps is (10 sec: 3686.0, 60 sec: 2321.2, 300 sec: 2321.2). Total num frames: 139264. Throughput: 0: 736.4. Samples: 35354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:46:47,692][08036] Avg episode reward: [(0, '4.458')] [2023-02-26 18:46:47,712][13517] Saving new best policy, reward=4.458! [2023-02-26 18:46:52,686][08036] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2331.7). Total num frames: 151552. Throughput: 0: 756.7. Samples: 37330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:46:52,694][08036] Avg episode reward: [(0, '4.450')] [2023-02-26 18:46:56,018][13531] Updated weights for policy 0, policy_version 40 (0.0029) [2023-02-26 18:46:57,686][08036] Fps is (10 sec: 2867.5, 60 sec: 2798.9, 300 sec: 2399.2). Total num frames: 167936. Throughput: 0: 755.4. Samples: 41400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:46:57,691][08036] Avg episode reward: [(0, '4.381')] [2023-02-26 18:47:02,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2512.4). Total num frames: 188416. Throughput: 0: 808.2. Samples: 47572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:47:02,694][08036] Avg episode reward: [(0, '4.256')] [2023-02-26 18:47:05,791][13531] Updated weights for policy 0, policy_version 50 (0.0013) [2023-02-26 18:47:07,689][08036] Fps is (10 sec: 4094.7, 60 sec: 3208.6, 300 sec: 2611.2). Total num frames: 208896. Throughput: 0: 832.6. Samples: 50768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:47:07,692][08036] Avg episode reward: [(0, '4.490')] [2023-02-26 18:47:07,712][13517] Saving new best policy, reward=4.490! [2023-02-26 18:47:12,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2602.3). Total num frames: 221184. Throughput: 0: 832.5. Samples: 55436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:47:12,693][08036] Avg episode reward: [(0, '4.608')] [2023-02-26 18:47:12,700][13517] Saving new best policy, reward=4.608! [2023-02-26 18:47:17,686][08036] Fps is (10 sec: 2868.1, 60 sec: 3277.1, 300 sec: 2639.8). Total num frames: 237568. Throughput: 0: 835.5. Samples: 59608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:47:17,688][08036] Avg episode reward: [(0, '4.606')] [2023-02-26 18:47:17,701][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth... [2023-02-26 18:47:19,018][13531] Updated weights for policy 0, policy_version 60 (0.0020) [2023-02-26 18:47:22,686][08036] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 2716.4). Total num frames: 258048. Throughput: 0: 862.6. Samples: 62960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:47:22,688][08036] Avg episode reward: [(0, '4.553')] [2023-02-26 18:47:27,688][08036] Fps is (10 sec: 4095.3, 60 sec: 3413.3, 300 sec: 2785.3). Total num frames: 278528. Throughput: 0: 873.0. Samples: 69408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:47:27,690][08036] Avg episode reward: [(0, '4.497')] [2023-02-26 18:47:29,540][13531] Updated weights for policy 0, policy_version 70 (0.0014) [2023-02-26 18:47:32,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2808.8). Total num frames: 294912. Throughput: 0: 853.8. Samples: 73772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:47:32,696][08036] Avg episode reward: [(0, '4.364')] [2023-02-26 18:47:37,686][08036] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 2792.8). Total num frames: 307200. Throughput: 0: 855.8. Samples: 75840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:47:37,689][08036] Avg episode reward: [(0, '4.315')] [2023-02-26 18:47:41,991][13531] Updated weights for policy 0, policy_version 80 (0.0025) [2023-02-26 18:47:42,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2849.5). Total num frames: 327680. Throughput: 0: 886.4. Samples: 81288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:47:42,693][08036] Avg episode reward: [(0, '4.330')] [2023-02-26 18:47:47,689][08036] Fps is (10 sec: 4094.8, 60 sec: 3481.5, 300 sec: 2901.4). Total num frames: 348160. Throughput: 0: 892.7. Samples: 87746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:47:47,695][08036] Avg episode reward: [(0, '4.294')] [2023-02-26 18:47:52,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2916.4). Total num frames: 364544. Throughput: 0: 870.7. Samples: 89946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:47:52,692][08036] Avg episode reward: [(0, '4.324')] [2023-02-26 18:47:53,996][13531] Updated weights for policy 0, policy_version 90 (0.0033) [2023-02-26 18:47:57,688][08036] Fps is (10 sec: 2867.7, 60 sec: 3481.5, 300 sec: 2898.8). Total num frames: 376832. Throughput: 0: 857.3. Samples: 94018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:47:57,690][08036] Avg episode reward: [(0, '4.572')] [2023-02-26 18:48:02,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2943.1). Total num frames: 397312. Throughput: 0: 892.2. Samples: 99756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:48:02,692][08036] Avg episode reward: [(0, '4.430')] [2023-02-26 18:48:04,809][13531] Updated weights for policy 0, policy_version 100 (0.0029) [2023-02-26 18:48:07,686][08036] Fps is (10 sec: 4506.3, 60 sec: 3550.1, 300 sec: 3013.6). Total num frames: 421888. Throughput: 0: 888.5. Samples: 102944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:48:07,691][08036] Avg episode reward: [(0, '4.499')] [2023-02-26 18:48:12,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2994.4). Total num frames: 434176. Throughput: 0: 860.5. Samples: 108128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:48:12,689][08036] Avg episode reward: [(0, '4.589')] [2023-02-26 18:48:17,686][08036] Fps is (10 sec: 2457.5, 60 sec: 3481.6, 300 sec: 2976.5). Total num frames: 446464. Throughput: 0: 854.0. Samples: 112204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:48:17,689][08036] Avg episode reward: [(0, '4.513')] [2023-02-26 18:48:18,204][13531] Updated weights for policy 0, policy_version 110 (0.0012) [2023-02-26 18:48:22,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3012.6). Total num frames: 466944. Throughput: 0: 868.4. Samples: 114918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:48:22,688][08036] Avg episode reward: [(0, '4.436')] [2023-02-26 18:48:27,686][08036] Fps is (10 sec: 4096.1, 60 sec: 3481.7, 300 sec: 3046.5). Total num frames: 487424. Throughput: 0: 892.0. Samples: 121428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:48:27,695][08036] Avg episode reward: [(0, '4.369')] [2023-02-26 18:48:27,713][13531] Updated weights for policy 0, policy_version 120 (0.0013) [2023-02-26 18:48:32,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3053.5). Total num frames: 503808. Throughput: 0: 858.9. Samples: 126392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:48:32,689][08036] Avg episode reward: [(0, '4.237')] [2023-02-26 18:48:37,686][08036] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3035.9). Total num frames: 516096. Throughput: 0: 854.7. Samples: 128410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:48:37,689][08036] Avg episode reward: [(0, '4.571')] [2023-02-26 18:48:42,009][13531] Updated weights for policy 0, policy_version 130 (0.0032) [2023-02-26 18:48:42,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3042.8). Total num frames: 532480. Throughput: 0: 855.3. Samples: 132506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:48:42,690][08036] Avg episode reward: [(0, '4.692')] [2023-02-26 18:48:42,694][13517] Saving new best policy, reward=4.692! [2023-02-26 18:48:47,686][08036] Fps is (10 sec: 3686.5, 60 sec: 3413.5, 300 sec: 3072.1). Total num frames: 552960. Throughput: 0: 862.5. Samples: 138570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:48:47,689][08036] Avg episode reward: [(0, '4.464')] [2023-02-26 18:48:52,689][08036] Fps is (10 sec: 3685.3, 60 sec: 3413.2, 300 sec: 3077.6). Total num frames: 569344. Throughput: 0: 859.0. Samples: 141604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 18:48:52,695][08036] Avg episode reward: [(0, '4.469')] [2023-02-26 18:48:53,616][13531] Updated weights for policy 0, policy_version 140 (0.0021) [2023-02-26 18:48:57,687][08036] Fps is (10 sec: 2867.0, 60 sec: 3413.4, 300 sec: 3061.3). Total num frames: 581632. Throughput: 0: 826.5. Samples: 145322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:48:57,693][08036] Avg episode reward: [(0, '4.433')] [2023-02-26 18:49:02,686][08036] Fps is (10 sec: 2868.1, 60 sec: 3345.1, 300 sec: 3066.8). Total num frames: 598016. Throughput: 0: 826.9. Samples: 149416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:02,689][08036] Avg episode reward: [(0, '4.322')] [2023-02-26 18:49:06,555][13531] Updated weights for policy 0, policy_version 150 (0.0035) [2023-02-26 18:49:07,686][08036] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3092.5). Total num frames: 618496. Throughput: 0: 833.9. Samples: 152444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:07,693][08036] Avg episode reward: [(0, '4.262')] [2023-02-26 18:49:12,689][08036] Fps is (10 sec: 3685.2, 60 sec: 3344.9, 300 sec: 3097.0). Total num frames: 634880. Throughput: 0: 821.5. Samples: 158400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:49:12,696][08036] Avg episode reward: [(0, '4.416')] [2023-02-26 18:49:17,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3081.8). Total num frames: 647168. Throughput: 0: 798.0. Samples: 162302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:17,695][08036] Avg episode reward: [(0, '4.610')] [2023-02-26 18:49:17,711][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth... [2023-02-26 18:49:20,277][13531] Updated weights for policy 0, policy_version 160 (0.0026) [2023-02-26 18:49:22,686][08036] Fps is (10 sec: 2458.4, 60 sec: 3208.5, 300 sec: 3067.3). Total num frames: 659456. Throughput: 0: 793.0. Samples: 164094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:22,691][08036] Avg episode reward: [(0, '4.513')] [2023-02-26 18:49:27,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3109.3). Total num frames: 684032. Throughput: 0: 831.8. Samples: 169936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:27,694][08036] Avg episode reward: [(0, '4.534')] [2023-02-26 18:49:30,260][13531] Updated weights for policy 0, policy_version 170 (0.0012) [2023-02-26 18:49:32,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3131.2). Total num frames: 704512. Throughput: 0: 840.8. Samples: 176404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:49:32,693][08036] Avg episode reward: [(0, '4.635')] [2023-02-26 18:49:37,688][08036] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3116.6). Total num frames: 716800. Throughput: 0: 819.2. Samples: 178466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:49:37,694][08036] Avg episode reward: [(0, '4.678')] [2023-02-26 18:49:42,686][08036] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3102.6). Total num frames: 729088. Throughput: 0: 826.0. Samples: 182490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:49:42,693][08036] Avg episode reward: [(0, '4.757')] [2023-02-26 18:49:42,697][13517] Saving new best policy, reward=4.757! [2023-02-26 18:49:43,959][13531] Updated weights for policy 0, policy_version 180 (0.0025) [2023-02-26 18:49:47,686][08036] Fps is (10 sec: 3277.3, 60 sec: 3276.8, 300 sec: 3123.3). Total num frames: 749568. Throughput: 0: 855.7. Samples: 187922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:47,689][08036] Avg episode reward: [(0, '4.726')] [2023-02-26 18:49:52,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3413.5, 300 sec: 3159.8). Total num frames: 774144. Throughput: 0: 859.2. Samples: 191110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:49:52,692][08036] Avg episode reward: [(0, '4.660')] [2023-02-26 18:49:54,135][13531] Updated weights for policy 0, policy_version 190 (0.0016) [2023-02-26 18:49:57,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3145.8). Total num frames: 786432. Throughput: 0: 841.1. Samples: 196248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:49:57,691][08036] Avg episode reward: [(0, '4.756')] [2023-02-26 18:50:02,686][08036] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3132.3). Total num frames: 798720. Throughput: 0: 842.3. Samples: 200204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:50:02,691][08036] Avg episode reward: [(0, '4.674')] [2023-02-26 18:50:07,178][13531] Updated weights for policy 0, policy_version 200 (0.0024) [2023-02-26 18:50:07,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3150.8). Total num frames: 819200. Throughput: 0: 862.0. Samples: 202886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:50:07,689][08036] Avg episode reward: [(0, '4.465')] [2023-02-26 18:50:12,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3168.7). Total num frames: 839680. Throughput: 0: 874.4. Samples: 209284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:50:12,693][08036] Avg episode reward: [(0, '4.486')] [2023-02-26 18:50:17,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3170.7). Total num frames: 856064. Throughput: 0: 840.7. Samples: 214236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:50:17,689][08036] Avg episode reward: [(0, '4.451')] [2023-02-26 18:50:18,761][13531] Updated weights for policy 0, policy_version 210 (0.0026) [2023-02-26 18:50:22,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3157.7). Total num frames: 868352. Throughput: 0: 841.0. Samples: 216308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:50:22,694][08036] Avg episode reward: [(0, '4.321')] [2023-02-26 18:50:27,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3174.4). Total num frames: 888832. Throughput: 0: 863.8. Samples: 221362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:50:27,692][08036] Avg episode reward: [(0, '4.359')] [2023-02-26 18:50:30,211][13531] Updated weights for policy 0, policy_version 220 (0.0023) [2023-02-26 18:50:32,686][08036] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3190.6). Total num frames: 909312. Throughput: 0: 884.7. Samples: 227732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:50:32,692][08036] Avg episode reward: [(0, '4.650')] [2023-02-26 18:50:37,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3192.1). Total num frames: 925696. Throughput: 0: 871.5. Samples: 230326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:50:37,691][08036] Avg episode reward: [(0, '4.867')] [2023-02-26 18:50:37,708][13517] Saving new best policy, reward=4.867! [2023-02-26 18:50:42,686][08036] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3179.6). Total num frames: 937984. Throughput: 0: 845.4. Samples: 234292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:50:42,690][08036] Avg episode reward: [(0, '4.730')] [2023-02-26 18:50:43,597][13531] Updated weights for policy 0, policy_version 230 (0.0017) [2023-02-26 18:50:47,686][08036] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3249.0). Total num frames: 958464. Throughput: 0: 870.4. Samples: 239374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:50:47,694][08036] Avg episode reward: [(0, '4.712')] [2023-02-26 18:50:52,686][08036] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 978944. Throughput: 0: 882.2. Samples: 242586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:50:52,689][08036] Avg episode reward: [(0, '4.733')] [2023-02-26 18:50:53,419][13531] Updated weights for policy 0, policy_version 240 (0.0021) [2023-02-26 18:50:57,686][08036] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 995328. Throughput: 0: 870.1. Samples: 248438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:50:57,692][08036] Avg episode reward: [(0, '4.724')] [2023-02-26 18:51:02,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3360.2). Total num frames: 1007616. Throughput: 0: 850.9. Samples: 252528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:51:02,694][08036] Avg episode reward: [(0, '4.632')] [2023-02-26 18:51:06,373][13531] Updated weights for policy 0, policy_version 250 (0.0020) [2023-02-26 18:51:07,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1028096. Throughput: 0: 856.9. Samples: 254870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:07,688][08036] Avg episode reward: [(0, '4.802')] [2023-02-26 18:51:12,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 1048576. Throughput: 0: 893.6. Samples: 261576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:12,692][08036] Avg episode reward: [(0, '4.757')] [2023-02-26 18:51:16,643][13531] Updated weights for policy 0, policy_version 260 (0.0016) [2023-02-26 18:51:17,688][08036] Fps is (10 sec: 3685.5, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 1064960. Throughput: 0: 872.4. Samples: 266992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:51:17,693][08036] Avg episode reward: [(0, '4.544')] [2023-02-26 18:51:17,709][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000260_1064960.pth... [2023-02-26 18:51:17,851][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth [2023-02-26 18:51:22,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3415.7). Total num frames: 1081344. Throughput: 0: 860.4. Samples: 269044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:51:22,689][08036] Avg episode reward: [(0, '4.659')] [2023-02-26 18:51:27,686][08036] Fps is (10 sec: 3277.6, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1097728. Throughput: 0: 879.7. Samples: 273880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:51:27,689][08036] Avg episode reward: [(0, '4.800')] [2023-02-26 18:51:28,883][13531] Updated weights for policy 0, policy_version 270 (0.0017) [2023-02-26 18:51:32,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1118208. Throughput: 0: 896.8. Samples: 279728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:32,689][08036] Avg episode reward: [(0, '4.870')] [2023-02-26 18:51:32,691][13517] Saving new best policy, reward=4.870! [2023-02-26 18:51:37,687][08036] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3401.7). Total num frames: 1126400. Throughput: 0: 869.1. Samples: 281698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:51:37,690][08036] Avg episode reward: [(0, '5.060')] [2023-02-26 18:51:37,755][13517] Saving new best policy, reward=5.060! [2023-02-26 18:51:42,689][08036] Fps is (10 sec: 2047.3, 60 sec: 3344.9, 300 sec: 3387.9). Total num frames: 1138688. Throughput: 0: 810.8. Samples: 284926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:42,695][08036] Avg episode reward: [(0, '5.117')] [2023-02-26 18:51:42,700][13517] Saving new best policy, reward=5.117! [2023-02-26 18:51:45,006][13531] Updated weights for policy 0, policy_version 280 (0.0037) [2023-02-26 18:51:47,686][08036] Fps is (10 sec: 2457.9, 60 sec: 3208.5, 300 sec: 3387.9). Total num frames: 1150976. Throughput: 0: 801.8. Samples: 288610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:51:47,697][08036] Avg episode reward: [(0, '5.462')] [2023-02-26 18:51:47,711][13517] Saving new best policy, reward=5.462! [2023-02-26 18:51:52,686][08036] Fps is (10 sec: 3687.6, 60 sec: 3276.8, 300 sec: 3415.6). Total num frames: 1175552. Throughput: 0: 812.6. Samples: 291436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:52,688][08036] Avg episode reward: [(0, '5.454')] [2023-02-26 18:51:55,490][13531] Updated weights for policy 0, policy_version 290 (0.0018) [2023-02-26 18:51:57,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 1196032. Throughput: 0: 809.3. Samples: 297994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:51:57,689][08036] Avg episode reward: [(0, '5.092')] [2023-02-26 18:52:02,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 1208320. Throughput: 0: 799.7. Samples: 302978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:52:02,691][08036] Avg episode reward: [(0, '5.000')] [2023-02-26 18:52:07,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 1224704. Throughput: 0: 801.8. Samples: 305126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:52:07,695][08036] Avg episode reward: [(0, '5.385')] [2023-02-26 18:52:08,433][13531] Updated weights for policy 0, policy_version 300 (0.0017) [2023-02-26 18:52:12,686][08036] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3415.6). Total num frames: 1245184. Throughput: 0: 816.7. Samples: 310630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 18:52:12,694][08036] Avg episode reward: [(0, '5.373')] [2023-02-26 18:52:17,686][08036] Fps is (10 sec: 4095.9, 60 sec: 3345.2, 300 sec: 3415.6). Total num frames: 1265664. Throughput: 0: 831.2. Samples: 317130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:52:17,691][08036] Avg episode reward: [(0, '5.296')] [2023-02-26 18:52:17,833][13531] Updated weights for policy 0, policy_version 310 (0.0017) [2023-02-26 18:52:22,689][08036] Fps is (10 sec: 3685.3, 60 sec: 3344.9, 300 sec: 3401.7). Total num frames: 1282048. Throughput: 0: 848.1. Samples: 319864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 18:52:22,693][08036] Avg episode reward: [(0, '5.418')] [2023-02-26 18:52:27,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 1298432. Throughput: 0: 872.4. Samples: 324180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:52:27,691][08036] Avg episode reward: [(0, '5.901')] [2023-02-26 18:52:27,705][13517] Saving new best policy, reward=5.901! [2023-02-26 18:52:30,625][13531] Updated weights for policy 0, policy_version 320 (0.0020) [2023-02-26 18:52:32,686][08036] Fps is (10 sec: 3687.5, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 1318912. Throughput: 0: 911.2. Samples: 329614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:52:32,688][08036] Avg episode reward: [(0, '6.263')] [2023-02-26 18:52:32,692][13517] Saving new best policy, reward=6.263! [2023-02-26 18:52:37,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1339392. Throughput: 0: 921.4. Samples: 332900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:52:37,692][08036] Avg episode reward: [(0, '6.544')] [2023-02-26 18:52:37,702][13517] Saving new best policy, reward=6.544! [2023-02-26 18:52:40,899][13531] Updated weights for policy 0, policy_version 330 (0.0014) [2023-02-26 18:52:42,686][08036] Fps is (10 sec: 3686.3, 60 sec: 3618.3, 300 sec: 3415.7). Total num frames: 1355776. Throughput: 0: 900.3. Samples: 338506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:52:42,689][08036] Avg episode reward: [(0, '6.285')] [2023-02-26 18:52:47,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 1368064. Throughput: 0: 880.8. Samples: 342612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:52:47,694][08036] Avg episode reward: [(0, '6.313')] [2023-02-26 18:52:52,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1388544. Throughput: 0: 889.6. Samples: 345158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:52:52,688][08036] Avg episode reward: [(0, '6.257')] [2023-02-26 18:52:53,070][13531] Updated weights for policy 0, policy_version 340 (0.0018) [2023-02-26 18:52:57,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1409024. Throughput: 0: 915.0. Samples: 351804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:52:57,690][08036] Avg episode reward: [(0, '6.290')] [2023-02-26 18:53:02,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 1425408. Throughput: 0: 890.0. Samples: 357182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:53:02,694][08036] Avg episode reward: [(0, '6.397')] [2023-02-26 18:53:04,458][13531] Updated weights for policy 0, policy_version 350 (0.0021) [2023-02-26 18:53:07,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3415.6). Total num frames: 1441792. Throughput: 0: 874.5. Samples: 359214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:53:07,690][08036] Avg episode reward: [(0, '6.835')] [2023-02-26 18:53:07,703][13517] Saving new best policy, reward=6.835! [2023-02-26 18:53:12,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1458176. Throughput: 0: 887.1. Samples: 364100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:53:12,689][08036] Avg episode reward: [(0, '6.889')] [2023-02-26 18:53:12,692][13517] Saving new best policy, reward=6.889! [2023-02-26 18:53:15,659][13531] Updated weights for policy 0, policy_version 360 (0.0013) [2023-02-26 18:53:17,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3443.4). Total num frames: 1482752. Throughput: 0: 913.9. Samples: 370740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:53:17,689][08036] Avg episode reward: [(0, '7.363')] [2023-02-26 18:53:17,698][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000362_1482752.pth... [2023-02-26 18:53:17,815][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth [2023-02-26 18:53:17,828][13517] Saving new best policy, reward=7.363! [2023-02-26 18:53:22,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3429.5). Total num frames: 1499136. Throughput: 0: 908.1. Samples: 373766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:53:22,693][08036] Avg episode reward: [(0, '7.276')] [2023-02-26 18:53:27,688][08036] Fps is (10 sec: 2866.7, 60 sec: 3549.8, 300 sec: 3415.6). Total num frames: 1511424. Throughput: 0: 879.7. Samples: 378092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:53:27,692][08036] Avg episode reward: [(0, '7.694')] [2023-02-26 18:53:27,713][13517] Saving new best policy, reward=7.694! [2023-02-26 18:53:28,052][13531] Updated weights for policy 0, policy_version 370 (0.0012) [2023-02-26 18:53:32,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1531904. Throughput: 0: 902.8. Samples: 383240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:53:32,689][08036] Avg episode reward: [(0, '7.774')] [2023-02-26 18:53:32,692][13517] Saving new best policy, reward=7.774! [2023-02-26 18:53:37,521][13531] Updated weights for policy 0, policy_version 380 (0.0030) [2023-02-26 18:53:37,686][08036] Fps is (10 sec: 4506.4, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 1556480. Throughput: 0: 919.6. Samples: 386538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:53:37,689][08036] Avg episode reward: [(0, '8.058')] [2023-02-26 18:53:37,699][13517] Saving new best policy, reward=8.058! [2023-02-26 18:53:42,687][08036] Fps is (10 sec: 4095.7, 60 sec: 3618.1, 300 sec: 3457.3). Total num frames: 1572864. Throughput: 0: 911.1. Samples: 392806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:53:42,694][08036] Avg episode reward: [(0, '7.751')] [2023-02-26 18:53:47,688][08036] Fps is (10 sec: 2866.6, 60 sec: 3618.0, 300 sec: 3443.4). Total num frames: 1585152. Throughput: 0: 881.7. Samples: 396860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:53:47,695][08036] Avg episode reward: [(0, '7.778')] [2023-02-26 18:53:50,748][13531] Updated weights for policy 0, policy_version 390 (0.0014) [2023-02-26 18:53:52,686][08036] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 1605632. Throughput: 0: 884.1. Samples: 398998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:53:52,693][08036] Avg episode reward: [(0, '7.865')] [2023-02-26 18:53:57,686][08036] Fps is (10 sec: 4096.9, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 1626112. Throughput: 0: 916.1. Samples: 405326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:53:57,696][08036] Avg episode reward: [(0, '9.068')] [2023-02-26 18:53:57,706][13517] Saving new best policy, reward=9.068! [2023-02-26 18:54:00,587][13531] Updated weights for policy 0, policy_version 400 (0.0015) [2023-02-26 18:54:02,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 1642496. Throughput: 0: 892.9. Samples: 410922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:54:02,692][08036] Avg episode reward: [(0, '9.259')] [2023-02-26 18:54:02,698][13517] Saving new best policy, reward=9.259! [2023-02-26 18:54:07,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1654784. Throughput: 0: 870.7. Samples: 412946. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 18:54:07,695][08036] Avg episode reward: [(0, '10.213')] [2023-02-26 18:54:07,712][13517] Saving new best policy, reward=10.213! [2023-02-26 18:54:12,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 1671168. Throughput: 0: 870.9. Samples: 417282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:54:12,691][08036] Avg episode reward: [(0, '9.714')] [2023-02-26 18:54:13,923][13531] Updated weights for policy 0, policy_version 410 (0.0047) [2023-02-26 18:54:17,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1695744. Throughput: 0: 899.5. Samples: 423718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:54:17,691][08036] Avg episode reward: [(0, '9.703')] [2023-02-26 18:54:22,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1712128. Throughput: 0: 898.6. Samples: 426976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:54:22,691][08036] Avg episode reward: [(0, '9.651')] [2023-02-26 18:54:24,804][13531] Updated weights for policy 0, policy_version 420 (0.0019) [2023-02-26 18:54:27,693][08036] Fps is (10 sec: 2865.1, 60 sec: 3549.5, 300 sec: 3457.2). Total num frames: 1724416. Throughput: 0: 856.8. Samples: 431368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:54:27,703][08036] Avg episode reward: [(0, '10.104')] [2023-02-26 18:54:32,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1740800. Throughput: 0: 865.8. Samples: 435818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:54:32,689][08036] Avg episode reward: [(0, '10.441')] [2023-02-26 18:54:32,694][13517] Saving new best policy, reward=10.441! [2023-02-26 18:54:36,498][13531] Updated weights for policy 0, policy_version 430 (0.0027) [2023-02-26 18:54:37,686][08036] Fps is (10 sec: 4098.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1765376. Throughput: 0: 890.2. Samples: 439058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:54:37,693][08036] Avg episode reward: [(0, '10.727')] [2023-02-26 18:54:37,703][13517] Saving new best policy, reward=10.727! [2023-02-26 18:54:42,687][08036] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3498.9). Total num frames: 1781760. Throughput: 0: 894.5. Samples: 445580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:54:42,689][08036] Avg episode reward: [(0, '10.252')] [2023-02-26 18:54:47,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3471.2). Total num frames: 1798144. Throughput: 0: 858.8. Samples: 449566. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 18:54:47,690][08036] Avg episode reward: [(0, '10.698')] [2023-02-26 18:54:48,940][13531] Updated weights for policy 0, policy_version 440 (0.0028) [2023-02-26 18:54:52,686][08036] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1810432. Throughput: 0: 858.4. Samples: 451574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:54:52,692][08036] Avg episode reward: [(0, '10.978')] [2023-02-26 18:54:52,783][13517] Saving new best policy, reward=10.978! [2023-02-26 18:54:57,690][08036] Fps is (10 sec: 3685.1, 60 sec: 3481.4, 300 sec: 3512.8). Total num frames: 1835008. Throughput: 0: 888.1. Samples: 457248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:54:57,698][08036] Avg episode reward: [(0, '10.460')] [2023-02-26 18:54:59,679][13531] Updated weights for policy 0, policy_version 450 (0.0024) [2023-02-26 18:55:02,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1851392. Throughput: 0: 885.6. Samples: 463572. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:55:02,690][08036] Avg episode reward: [(0, '10.105')] [2023-02-26 18:55:07,686][08036] Fps is (10 sec: 3277.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1867776. Throughput: 0: 858.8. Samples: 465624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:55:07,693][08036] Avg episode reward: [(0, '11.044')] [2023-02-26 18:55:07,711][13517] Saving new best policy, reward=11.044! [2023-02-26 18:55:12,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1880064. Throughput: 0: 853.8. Samples: 469782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:55:12,689][08036] Avg episode reward: [(0, '11.769')] [2023-02-26 18:55:12,697][13517] Saving new best policy, reward=11.769! [2023-02-26 18:55:13,141][13531] Updated weights for policy 0, policy_version 460 (0.0031) [2023-02-26 18:55:17,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1900544. Throughput: 0: 885.7. Samples: 475674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:55:17,689][08036] Avg episode reward: [(0, '12.650')] [2023-02-26 18:55:17,785][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000465_1904640.pth... [2023-02-26 18:55:17,930][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000260_1064960.pth [2023-02-26 18:55:17,938][13517] Saving new best policy, reward=12.650! [2023-02-26 18:55:22,553][13531] Updated weights for policy 0, policy_version 470 (0.0012) [2023-02-26 18:55:22,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1925120. Throughput: 0: 883.7. Samples: 478824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:55:22,688][08036] Avg episode reward: [(0, '12.793')] [2023-02-26 18:55:22,691][13517] Saving new best policy, reward=12.793! [2023-02-26 18:55:27,689][08036] Fps is (10 sec: 3685.3, 60 sec: 3550.1, 300 sec: 3485.0). Total num frames: 1937408. Throughput: 0: 854.7. Samples: 484042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:55:27,692][08036] Avg episode reward: [(0, '12.379')] [2023-02-26 18:55:32,686][08036] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1949696. Throughput: 0: 859.7. Samples: 488254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:55:32,689][08036] Avg episode reward: [(0, '12.029')] [2023-02-26 18:55:35,533][13531] Updated weights for policy 0, policy_version 480 (0.0029) [2023-02-26 18:55:37,686][08036] Fps is (10 sec: 3687.6, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1974272. Throughput: 0: 884.5. Samples: 491378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 18:55:37,689][08036] Avg episode reward: [(0, '12.135')] [2023-02-26 18:55:42,686][08036] Fps is (10 sec: 4505.5, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1994752. Throughput: 0: 905.3. Samples: 497982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:55:42,688][08036] Avg episode reward: [(0, '11.757')] [2023-02-26 18:55:46,148][13531] Updated weights for policy 0, policy_version 490 (0.0012) [2023-02-26 18:55:47,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2011136. Throughput: 0: 871.3. Samples: 502782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:55:47,689][08036] Avg episode reward: [(0, '13.027')] [2023-02-26 18:55:47,697][13517] Saving new best policy, reward=13.027! [2023-02-26 18:55:52,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2023424. Throughput: 0: 869.3. Samples: 504744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:55:52,692][08036] Avg episode reward: [(0, '12.636')] [2023-02-26 18:55:57,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3512.8). Total num frames: 2043904. Throughput: 0: 890.7. Samples: 509862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:55:57,688][08036] Avg episode reward: [(0, '11.558')] [2023-02-26 18:55:58,270][13531] Updated weights for policy 0, policy_version 500 (0.0015) [2023-02-26 18:56:02,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2064384. Throughput: 0: 907.6. Samples: 516518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:56:02,689][08036] Avg episode reward: [(0, '12.337')] [2023-02-26 18:56:07,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2080768. Throughput: 0: 898.1. Samples: 519238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:56:07,692][08036] Avg episode reward: [(0, '12.015')] [2023-02-26 18:56:09,844][13531] Updated weights for policy 0, policy_version 510 (0.0019) [2023-02-26 18:56:12,687][08036] Fps is (10 sec: 2866.9, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 2093056. Throughput: 0: 876.7. Samples: 523492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:56:12,703][08036] Avg episode reward: [(0, '12.591')] [2023-02-26 18:56:17,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2117632. Throughput: 0: 907.2. Samples: 529080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:56:17,689][08036] Avg episode reward: [(0, '13.438')] [2023-02-26 18:56:17,699][13517] Saving new best policy, reward=13.438! [2023-02-26 18:56:20,550][13531] Updated weights for policy 0, policy_version 520 (0.0012) [2023-02-26 18:56:22,686][08036] Fps is (10 sec: 4506.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2138112. Throughput: 0: 909.2. Samples: 532292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:56:22,688][08036] Avg episode reward: [(0, '14.455')] [2023-02-26 18:56:22,695][13517] Saving new best policy, reward=14.455! [2023-02-26 18:56:27,687][08036] Fps is (10 sec: 3685.9, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 2154496. Throughput: 0: 891.5. Samples: 538100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:56:27,689][08036] Avg episode reward: [(0, '15.015')] [2023-02-26 18:56:27,711][13517] Saving new best policy, reward=15.015! [2023-02-26 18:56:32,687][08036] Fps is (10 sec: 2867.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2166784. Throughput: 0: 875.5. Samples: 542178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:56:32,690][08036] Avg episode reward: [(0, '15.375')] [2023-02-26 18:56:32,692][13517] Saving new best policy, reward=15.375! [2023-02-26 18:56:33,397][13531] Updated weights for policy 0, policy_version 530 (0.0022) [2023-02-26 18:56:37,686][08036] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2187264. Throughput: 0: 884.8. Samples: 544560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:56:37,689][08036] Avg episode reward: [(0, '14.929')] [2023-02-26 18:56:42,686][08036] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 2207744. Throughput: 0: 917.5. Samples: 551148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:56:42,693][08036] Avg episode reward: [(0, '14.941')] [2023-02-26 18:56:43,085][13531] Updated weights for policy 0, policy_version 540 (0.0014) [2023-02-26 18:56:47,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2224128. Throughput: 0: 889.1. Samples: 556526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:56:47,694][08036] Avg episode reward: [(0, '16.391')] [2023-02-26 18:56:47,709][13517] Saving new best policy, reward=16.391! [2023-02-26 18:56:52,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2236416. Throughput: 0: 868.9. Samples: 558340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:56:52,691][08036] Avg episode reward: [(0, '16.054')] [2023-02-26 18:56:57,686][08036] Fps is (10 sec: 2048.0, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2244608. Throughput: 0: 847.7. Samples: 561638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:56:57,691][08036] Avg episode reward: [(0, '16.406')] [2023-02-26 18:56:57,711][13517] Saving new best policy, reward=16.406! [2023-02-26 18:56:59,455][13531] Updated weights for policy 0, policy_version 550 (0.0021) [2023-02-26 18:57:02,686][08036] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2260992. Throughput: 0: 810.5. Samples: 565554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:57:02,689][08036] Avg episode reward: [(0, '16.615')] [2023-02-26 18:57:02,697][13517] Saving new best policy, reward=16.615! [2023-02-26 18:57:07,690][08036] Fps is (10 sec: 3684.8, 60 sec: 3344.8, 300 sec: 3512.8). Total num frames: 2281472. Throughput: 0: 812.2. Samples: 568846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:57:07,693][08036] Avg episode reward: [(0, '16.317')] [2023-02-26 18:57:10,874][13531] Updated weights for policy 0, policy_version 560 (0.0041) [2023-02-26 18:57:12,690][08036] Fps is (10 sec: 3684.8, 60 sec: 3413.1, 300 sec: 3498.9). Total num frames: 2297856. Throughput: 0: 796.7. Samples: 573956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:57:12,698][08036] Avg episode reward: [(0, '15.125')] [2023-02-26 18:57:17,686][08036] Fps is (10 sec: 2868.4, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 2310144. Throughput: 0: 802.6. Samples: 578296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:57:17,693][08036] Avg episode reward: [(0, '15.128')] [2023-02-26 18:57:17,702][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000564_2310144.pth... [2023-02-26 18:57:17,846][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000362_1482752.pth [2023-02-26 18:57:22,428][13531] Updated weights for policy 0, policy_version 570 (0.0018) [2023-02-26 18:57:22,686][08036] Fps is (10 sec: 3688.0, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2334720. Throughput: 0: 818.0. Samples: 581370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:57:22,694][08036] Avg episode reward: [(0, '15.931')] [2023-02-26 18:57:27,689][08036] Fps is (10 sec: 4504.2, 60 sec: 3345.0, 300 sec: 3512.8). Total num frames: 2355200. Throughput: 0: 821.1. Samples: 588100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:57:27,692][08036] Avg episode reward: [(0, '15.757')] [2023-02-26 18:57:32,687][08036] Fps is (10 sec: 3686.2, 60 sec: 3413.3, 300 sec: 3498.9). Total num frames: 2371584. Throughput: 0: 808.6. Samples: 592914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:57:32,690][08036] Avg episode reward: [(0, '16.122')] [2023-02-26 18:57:34,251][13531] Updated weights for policy 0, policy_version 580 (0.0020) [2023-02-26 18:57:37,686][08036] Fps is (10 sec: 2868.1, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 2383872. Throughput: 0: 813.7. Samples: 594958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:57:37,689][08036] Avg episode reward: [(0, '17.157')] [2023-02-26 18:57:37,699][13517] Saving new best policy, reward=17.157! [2023-02-26 18:57:42,686][08036] Fps is (10 sec: 3277.0, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2404352. Throughput: 0: 864.5. Samples: 600542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:57:42,695][08036] Avg episode reward: [(0, '17.770')] [2023-02-26 18:57:42,698][13517] Saving new best policy, reward=17.770! [2023-02-26 18:57:44,668][13531] Updated weights for policy 0, policy_version 590 (0.0024) [2023-02-26 18:57:47,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2428928. Throughput: 0: 922.8. Samples: 607078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:57:47,690][08036] Avg episode reward: [(0, '17.560')] [2023-02-26 18:57:52,688][08036] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3498.9). Total num frames: 2441216. Throughput: 0: 902.8. Samples: 609470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:57:52,693][08036] Avg episode reward: [(0, '18.135')] [2023-02-26 18:57:52,701][13517] Saving new best policy, reward=18.135! [2023-02-26 18:57:57,686][08036] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2453504. Throughput: 0: 878.0. Samples: 613464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:57:57,696][08036] Avg episode reward: [(0, '18.201')] [2023-02-26 18:57:57,706][13517] Saving new best policy, reward=18.201! [2023-02-26 18:57:58,132][13531] Updated weights for policy 0, policy_version 600 (0.0020) [2023-02-26 18:58:02,686][08036] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2473984. Throughput: 0: 903.9. Samples: 618972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:58:02,695][08036] Avg episode reward: [(0, '17.500')] [2023-02-26 18:58:07,432][13531] Updated weights for policy 0, policy_version 610 (0.0018) [2023-02-26 18:58:07,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3618.4, 300 sec: 3526.7). Total num frames: 2498560. Throughput: 0: 911.0. Samples: 622366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:58:07,689][08036] Avg episode reward: [(0, '16.451')] [2023-02-26 18:58:12,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3485.1). Total num frames: 2510848. Throughput: 0: 886.0. Samples: 627968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:58:12,692][08036] Avg episode reward: [(0, '17.283')] [2023-02-26 18:58:17,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2527232. Throughput: 0: 874.1. Samples: 632250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 18:58:17,690][08036] Avg episode reward: [(0, '17.861')] [2023-02-26 18:58:20,311][13531] Updated weights for policy 0, policy_version 620 (0.0022) [2023-02-26 18:58:22,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 2547712. Throughput: 0: 893.1. Samples: 635146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 18:58:22,693][08036] Avg episode reward: [(0, '18.430')] [2023-02-26 18:58:22,697][13517] Saving new best policy, reward=18.430! [2023-02-26 18:58:27,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3526.7). Total num frames: 2572288. Throughput: 0: 914.8. Samples: 641706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:58:27,688][08036] Avg episode reward: [(0, '18.742')] [2023-02-26 18:58:27,698][13517] Saving new best policy, reward=18.742! [2023-02-26 18:58:30,403][13531] Updated weights for policy 0, policy_version 630 (0.0017) [2023-02-26 18:58:32,686][08036] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2584576. Throughput: 0: 880.8. Samples: 646712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:58:32,689][08036] Avg episode reward: [(0, '19.340')] [2023-02-26 18:58:32,700][13517] Saving new best policy, reward=19.340! [2023-02-26 18:58:37,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2600960. Throughput: 0: 873.6. Samples: 648782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:58:37,689][08036] Avg episode reward: [(0, '19.940')] [2023-02-26 18:58:37,703][13517] Saving new best policy, reward=19.940! [2023-02-26 18:58:42,406][13531] Updated weights for policy 0, policy_version 640 (0.0030) [2023-02-26 18:58:42,686][08036] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3512.9). Total num frames: 2621440. Throughput: 0: 906.0. Samples: 654234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:58:42,693][08036] Avg episode reward: [(0, '18.603')] [2023-02-26 18:58:47,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2641920. Throughput: 0: 926.3. Samples: 660654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:58:47,691][08036] Avg episode reward: [(0, '18.682')] [2023-02-26 18:58:52,688][08036] Fps is (10 sec: 3276.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2654208. Throughput: 0: 898.9. Samples: 662818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 18:58:52,698][08036] Avg episode reward: [(0, '18.027')] [2023-02-26 18:58:54,754][13531] Updated weights for policy 0, policy_version 650 (0.0037) [2023-02-26 18:58:57,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2670592. Throughput: 0: 864.5. Samples: 666870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:58:57,693][08036] Avg episode reward: [(0, '17.766')] [2023-02-26 18:59:02,686][08036] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2691072. Throughput: 0: 905.4. Samples: 672994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:59:02,689][08036] Avg episode reward: [(0, '16.425')] [2023-02-26 18:59:04,890][13531] Updated weights for policy 0, policy_version 660 (0.0014) [2023-02-26 18:59:07,686][08036] Fps is (10 sec: 4095.9, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 2711552. Throughput: 0: 915.8. Samples: 676358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 18:59:07,694][08036] Avg episode reward: [(0, '16.934')] [2023-02-26 18:59:12,686][08036] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2727936. Throughput: 0: 882.6. Samples: 681422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 18:59:12,692][08036] Avg episode reward: [(0, '16.952')] [2023-02-26 18:59:17,616][13531] Updated weights for policy 0, policy_version 670 (0.0024) [2023-02-26 18:59:17,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2744320. Throughput: 0: 868.6. Samples: 685798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:59:17,689][08036] Avg episode reward: [(0, '17.546')] [2023-02-26 18:59:17,703][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth... [2023-02-26 18:59:17,825][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000465_1904640.pth [2023-02-26 18:59:22,686][08036] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3526.8). Total num frames: 2764800. Throughput: 0: 892.5. Samples: 688946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:59:22,694][08036] Avg episode reward: [(0, '19.011')] [2023-02-26 18:59:27,325][13531] Updated weights for policy 0, policy_version 680 (0.0023) [2023-02-26 18:59:27,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2785280. Throughput: 0: 920.4. Samples: 695654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 18:59:27,690][08036] Avg episode reward: [(0, '20.386')] [2023-02-26 18:59:27,704][13517] Saving new best policy, reward=20.386! [2023-02-26 18:59:32,687][08036] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 2797568. Throughput: 0: 871.8. Samples: 699886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:59:32,690][08036] Avg episode reward: [(0, '21.081')] [2023-02-26 18:59:32,696][13517] Saving new best policy, reward=21.081! [2023-02-26 18:59:37,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2813952. Throughput: 0: 869.2. Samples: 701932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 18:59:37,689][08036] Avg episode reward: [(0, '20.791')] [2023-02-26 18:59:40,185][13531] Updated weights for policy 0, policy_version 690 (0.0021) [2023-02-26 18:59:42,686][08036] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2834432. Throughput: 0: 915.2. Samples: 708056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 18:59:42,689][08036] Avg episode reward: [(0, '19.780')] [2023-02-26 18:59:47,688][08036] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 2854912. Throughput: 0: 911.5. Samples: 714012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:59:47,691][08036] Avg episode reward: [(0, '19.335')] [2023-02-26 18:59:52,148][13531] Updated weights for policy 0, policy_version 700 (0.0012) [2023-02-26 18:59:52,686][08036] Fps is (10 sec: 3276.7, 60 sec: 3550.0, 300 sec: 3499.0). Total num frames: 2867200. Throughput: 0: 879.7. Samples: 715946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:59:52,694][08036] Avg episode reward: [(0, '19.404')] [2023-02-26 18:59:57,686][08036] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2883584. Throughput: 0: 867.6. Samples: 720466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 18:59:57,690][08036] Avg episode reward: [(0, '19.614')] [2023-02-26 19:00:02,140][13531] Updated weights for policy 0, policy_version 710 (0.0025) [2023-02-26 19:00:02,686][08036] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2908160. Throughput: 0: 920.4. Samples: 727214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 19:00:02,689][08036] Avg episode reward: [(0, '20.655')] [2023-02-26 19:00:07,686][08036] Fps is (10 sec: 4095.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2924544. Throughput: 0: 924.5. Samples: 730548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:00:07,694][08036] Avg episode reward: [(0, '20.472')] [2023-02-26 19:00:12,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2940928. Throughput: 0: 868.0. Samples: 734714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:00:12,695][08036] Avg episode reward: [(0, '20.899')] [2023-02-26 19:00:14,772][13531] Updated weights for policy 0, policy_version 720 (0.0012) [2023-02-26 19:00:17,686][08036] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2961408. Throughput: 0: 897.5. Samples: 740272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:00:17,689][08036] Avg episode reward: [(0, '21.527')] [2023-02-26 19:00:17,704][13517] Saving new best policy, reward=21.527! [2023-02-26 19:00:22,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2981888. Throughput: 0: 924.9. Samples: 743552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 19:00:22,693][08036] Avg episode reward: [(0, '22.398')] [2023-02-26 19:00:22,698][13517] Saving new best policy, reward=22.398! [2023-02-26 19:00:24,440][13531] Updated weights for policy 0, policy_version 730 (0.0017) [2023-02-26 19:00:27,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2998272. Throughput: 0: 913.0. Samples: 749142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:00:27,696][08036] Avg episode reward: [(0, '22.336')] [2023-02-26 19:00:32,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3010560. Throughput: 0: 871.3. Samples: 753218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:00:32,689][08036] Avg episode reward: [(0, '23.082')] [2023-02-26 19:00:32,692][13517] Saving new best policy, reward=23.082! [2023-02-26 19:00:37,017][13531] Updated weights for policy 0, policy_version 740 (0.0019) [2023-02-26 19:00:37,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3031040. Throughput: 0: 887.8. Samples: 755896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:00:37,689][08036] Avg episode reward: [(0, '23.114')] [2023-02-26 19:00:37,705][13517] Saving new best policy, reward=23.114! [2023-02-26 19:00:42,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3051520. Throughput: 0: 932.4. Samples: 762424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:00:42,691][08036] Avg episode reward: [(0, '23.426')] [2023-02-26 19:00:42,745][13517] Saving new best policy, reward=23.426! [2023-02-26 19:00:47,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 3067904. Throughput: 0: 895.5. Samples: 767512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:00:47,692][08036] Avg episode reward: [(0, '22.639')] [2023-02-26 19:00:48,689][13531] Updated weights for policy 0, policy_version 750 (0.0017) [2023-02-26 19:00:52,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3080192. Throughput: 0: 866.6. Samples: 769546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:00:52,689][08036] Avg episode reward: [(0, '21.242')] [2023-02-26 19:00:57,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 3104768. Throughput: 0: 896.2. Samples: 775042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:00:57,689][08036] Avg episode reward: [(0, '20.447')] [2023-02-26 19:00:59,204][13531] Updated weights for policy 0, policy_version 760 (0.0016) [2023-02-26 19:01:02,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3125248. Throughput: 0: 921.4. Samples: 781734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:01:02,694][08036] Avg episode reward: [(0, '18.618')] [2023-02-26 19:01:07,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3137536. Throughput: 0: 899.2. Samples: 784018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:07,690][08036] Avg episode reward: [(0, '17.932')] [2023-02-26 19:01:12,021][13531] Updated weights for policy 0, policy_version 770 (0.0012) [2023-02-26 19:01:12,686][08036] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3153920. Throughput: 0: 867.9. Samples: 788198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:01:12,692][08036] Avg episode reward: [(0, '18.091')] [2023-02-26 19:01:17,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3178496. Throughput: 0: 920.6. Samples: 794644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:17,689][08036] Avg episode reward: [(0, '18.295')] [2023-02-26 19:01:17,698][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000776_3178496.pth... [2023-02-26 19:01:17,843][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000564_2310144.pth [2023-02-26 19:01:21,153][13531] Updated weights for policy 0, policy_version 780 (0.0015) [2023-02-26 19:01:22,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3198976. Throughput: 0: 934.9. Samples: 797968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:22,689][08036] Avg episode reward: [(0, '18.237')] [2023-02-26 19:01:27,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3211264. Throughput: 0: 894.8. Samples: 802690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:27,689][08036] Avg episode reward: [(0, '19.386')] [2023-02-26 19:01:32,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3227648. Throughput: 0: 889.1. Samples: 807520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:32,695][08036] Avg episode reward: [(0, '20.331')] [2023-02-26 19:01:33,740][13531] Updated weights for policy 0, policy_version 790 (0.0021) [2023-02-26 19:01:37,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3252224. Throughput: 0: 917.8. Samples: 810846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:37,696][08036] Avg episode reward: [(0, '20.242')] [2023-02-26 19:01:42,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3268608. Throughput: 0: 937.7. Samples: 817238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:01:42,689][08036] Avg episode reward: [(0, '19.466')] [2023-02-26 19:01:44,493][13531] Updated weights for policy 0, policy_version 800 (0.0031) [2023-02-26 19:01:47,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3284992. Throughput: 0: 882.2. Samples: 821432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 19:01:47,692][08036] Avg episode reward: [(0, '18.505')] [2023-02-26 19:01:52,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3301376. Throughput: 0: 879.5. Samples: 823596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:01:52,693][08036] Avg episode reward: [(0, '17.771')] [2023-02-26 19:01:55,891][13531] Updated weights for policy 0, policy_version 810 (0.0022) [2023-02-26 19:01:57,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3321856. Throughput: 0: 930.9. Samples: 830090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:01:57,688][08036] Avg episode reward: [(0, '16.628')] [2023-02-26 19:02:02,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3342336. Throughput: 0: 914.7. Samples: 835804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:02,690][08036] Avg episode reward: [(0, '18.353')] [2023-02-26 19:02:07,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3354624. Throughput: 0: 883.8. Samples: 837738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:02:07,689][08036] Avg episode reward: [(0, '18.986')] [2023-02-26 19:02:09,293][13531] Updated weights for policy 0, policy_version 820 (0.0012) [2023-02-26 19:02:12,686][08036] Fps is (10 sec: 2048.0, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 3362816. Throughput: 0: 853.6. Samples: 841102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:02:12,694][08036] Avg episode reward: [(0, '19.325')] [2023-02-26 19:02:17,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 3383296. Throughput: 0: 848.4. Samples: 845696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:17,688][08036] Avg episode reward: [(0, '20.565')] [2023-02-26 19:02:21,135][13531] Updated weights for policy 0, policy_version 830 (0.0020) [2023-02-26 19:02:22,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 3403776. Throughput: 0: 852.0. Samples: 849186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:22,688][08036] Avg episode reward: [(0, '22.033')] [2023-02-26 19:02:27,686][08036] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3416064. Throughput: 0: 814.6. Samples: 853896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:02:27,689][08036] Avg episode reward: [(0, '22.107')] [2023-02-26 19:02:32,694][08036] Fps is (10 sec: 2864.8, 60 sec: 3412.9, 300 sec: 3554.4). Total num frames: 3432448. Throughput: 0: 823.5. Samples: 858496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 19:02:32,697][08036] Avg episode reward: [(0, '23.235')] [2023-02-26 19:02:33,893][13531] Updated weights for policy 0, policy_version 840 (0.0020) [2023-02-26 19:02:37,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 3457024. Throughput: 0: 851.4. Samples: 861910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:37,688][08036] Avg episode reward: [(0, '23.046')] [2023-02-26 19:02:42,686][08036] Fps is (10 sec: 4099.4, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3473408. Throughput: 0: 853.5. Samples: 868498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:42,695][08036] Avg episode reward: [(0, '23.404')] [2023-02-26 19:02:44,472][13531] Updated weights for policy 0, policy_version 850 (0.0011) [2023-02-26 19:02:47,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 3489792. Throughput: 0: 819.9. Samples: 872698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:02:47,689][08036] Avg episode reward: [(0, '23.337')] [2023-02-26 19:02:52,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 3506176. Throughput: 0: 825.1. Samples: 874868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:02:52,689][08036] Avg episode reward: [(0, '22.727')] [2023-02-26 19:02:55,776][13531] Updated weights for policy 0, policy_version 860 (0.0012) [2023-02-26 19:02:57,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 3530752. Throughput: 0: 892.6. Samples: 881270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:02:57,689][08036] Avg episode reward: [(0, '23.407')] [2023-02-26 19:03:02,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 3547136. Throughput: 0: 922.8. Samples: 887220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:02,689][08036] Avg episode reward: [(0, '23.217')] [2023-02-26 19:03:07,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 3559424. Throughput: 0: 892.1. Samples: 889330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:03:07,688][08036] Avg episode reward: [(0, '23.545')] [2023-02-26 19:03:07,706][13517] Saving new best policy, reward=23.545! [2023-02-26 19:03:07,990][13531] Updated weights for policy 0, policy_version 870 (0.0028) [2023-02-26 19:03:12,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3579904. Throughput: 0: 899.4. Samples: 894370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:12,695][08036] Avg episode reward: [(0, '22.682')] [2023-02-26 19:03:17,432][13531] Updated weights for policy 0, policy_version 880 (0.0016) [2023-02-26 19:03:17,686][08036] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3604480. Throughput: 0: 949.6. Samples: 901220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 19:03:17,691][08036] Avg episode reward: [(0, '22.768')] [2023-02-26 19:03:17,703][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000880_3604480.pth... [2023-02-26 19:03:17,813][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth [2023-02-26 19:03:22,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3620864. Throughput: 0: 935.3. Samples: 904000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:03:22,689][08036] Avg episode reward: [(0, '21.290')] [2023-02-26 19:03:27,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3633152. Throughput: 0: 883.6. Samples: 908262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:27,691][08036] Avg episode reward: [(0, '21.808')] [2023-02-26 19:03:29,876][13531] Updated weights for policy 0, policy_version 890 (0.0030) [2023-02-26 19:03:32,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3686.9, 300 sec: 3568.4). Total num frames: 3653632. Throughput: 0: 916.8. Samples: 913952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:32,695][08036] Avg episode reward: [(0, '22.291')] [2023-02-26 19:03:37,686][08036] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3678208. Throughput: 0: 942.9. Samples: 917298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:37,694][08036] Avg episode reward: [(0, '23.402')] [2023-02-26 19:03:39,662][13531] Updated weights for policy 0, policy_version 900 (0.0026) [2023-02-26 19:03:42,691][08036] Fps is (10 sec: 4093.9, 60 sec: 3686.1, 300 sec: 3568.3). Total num frames: 3694592. Throughput: 0: 924.9. Samples: 922894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:03:42,701][08036] Avg episode reward: [(0, '24.139')] [2023-02-26 19:03:42,704][13517] Saving new best policy, reward=24.139! [2023-02-26 19:03:47,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3706880. Throughput: 0: 886.6. Samples: 927116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 19:03:47,695][08036] Avg episode reward: [(0, '23.235')] [2023-02-26 19:03:51,792][13531] Updated weights for policy 0, policy_version 910 (0.0016) [2023-02-26 19:03:52,686][08036] Fps is (10 sec: 3278.5, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3727360. Throughput: 0: 909.6. Samples: 930264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:52,695][08036] Avg episode reward: [(0, '24.067')] [2023-02-26 19:03:57,689][08036] Fps is (10 sec: 4504.2, 60 sec: 3686.2, 300 sec: 3596.1). Total num frames: 3751936. Throughput: 0: 942.1. Samples: 936768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:03:57,694][08036] Avg episode reward: [(0, '22.995')] [2023-02-26 19:04:02,687][08036] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3764224. Throughput: 0: 899.6. Samples: 941704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 19:04:02,694][08036] Avg episode reward: [(0, '23.268')] [2023-02-26 19:04:03,148][13531] Updated weights for policy 0, policy_version 920 (0.0022) [2023-02-26 19:04:07,686][08036] Fps is (10 sec: 2868.2, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3780608. Throughput: 0: 885.6. Samples: 943852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:04:07,693][08036] Avg episode reward: [(0, '22.121')] [2023-02-26 19:04:12,686][08036] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3596.2). Total num frames: 3805184. Throughput: 0: 922.2. Samples: 949760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:04:12,693][08036] Avg episode reward: [(0, '21.086')] [2023-02-26 19:04:13,620][13531] Updated weights for policy 0, policy_version 930 (0.0020) [2023-02-26 19:04:17,686][08036] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3825664. Throughput: 0: 947.6. Samples: 956596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:04:17,689][08036] Avg episode reward: [(0, '21.890')] [2023-02-26 19:04:22,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3837952. Throughput: 0: 919.5. Samples: 958674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:04:22,693][08036] Avg episode reward: [(0, '22.446')] [2023-02-26 19:04:26,255][13531] Updated weights for policy 0, policy_version 940 (0.0014) [2023-02-26 19:04:27,686][08036] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3854336. Throughput: 0: 888.1. Samples: 962854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:04:27,688][08036] Avg episode reward: [(0, '21.747')] [2023-02-26 19:04:32,686][08036] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3874816. Throughput: 0: 939.9. Samples: 969412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:04:32,694][08036] Avg episode reward: [(0, '21.246')] [2023-02-26 19:04:35,590][13531] Updated weights for policy 0, policy_version 950 (0.0029) [2023-02-26 19:04:37,686][08036] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3895296. Throughput: 0: 942.7. Samples: 972686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:04:37,691][08036] Avg episode reward: [(0, '21.569')] [2023-02-26 19:04:42,689][08036] Fps is (10 sec: 3685.2, 60 sec: 3618.3, 300 sec: 3582.2). Total num frames: 3911680. Throughput: 0: 902.0. Samples: 977356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:04:42,692][08036] Avg episode reward: [(0, '21.506')] [2023-02-26 19:04:47,686][08036] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3928064. Throughput: 0: 899.0. Samples: 982160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 19:04:47,689][08036] Avg episode reward: [(0, '21.189')] [2023-02-26 19:04:48,248][13531] Updated weights for policy 0, policy_version 960 (0.0018) [2023-02-26 19:04:52,686][08036] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3948544. Throughput: 0: 925.7. Samples: 985510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 19:04:52,696][08036] Avg episode reward: [(0, '21.353')] [2023-02-26 19:04:57,687][08036] Fps is (10 sec: 4095.4, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 3969024. Throughput: 0: 933.2. Samples: 991754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:04:57,692][08036] Avg episode reward: [(0, '22.501')] [2023-02-26 19:04:59,044][13531] Updated weights for policy 0, policy_version 970 (0.0023) [2023-02-26 19:05:02,686][08036] Fps is (10 sec: 3276.9, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 3981312. Throughput: 0: 870.4. Samples: 995766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:05:02,695][08036] Avg episode reward: [(0, '22.518')] [2023-02-26 19:05:07,686][08036] Fps is (10 sec: 2867.6, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3997696. Throughput: 0: 872.4. Samples: 997930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 19:05:07,693][08036] Avg episode reward: [(0, '23.215')] [2023-02-26 19:05:08,555][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 19:05:08,556][08036] Component Batcher_0 stopped! [2023-02-26 19:05:08,556][13517] Stopping Batcher_0... [2023-02-26 19:05:08,575][13517] Loop batcher_evt_loop terminating... [2023-02-26 19:05:08,617][08036] Component RolloutWorker_w5 stopped! [2023-02-26 19:05:08,620][13537] Stopping RolloutWorker_w5... [2023-02-26 19:05:08,626][08036] Component RolloutWorker_w7 stopped! [2023-02-26 19:05:08,629][13539] Stopping RolloutWorker_w7... [2023-02-26 19:05:08,623][13537] Loop rollout_proc5_evt_loop terminating... [2023-02-26 19:05:08,638][08036] Component RolloutWorker_w1 stopped! [2023-02-26 19:05:08,640][13534] Stopping RolloutWorker_w1... [2023-02-26 19:05:08,632][13539] Loop rollout_proc7_evt_loop terminating... [2023-02-26 19:05:08,644][13531] Weights refcount: 2 0 [2023-02-26 19:05:08,643][13534] Loop rollout_proc1_evt_loop terminating... [2023-02-26 19:05:08,661][08036] Component InferenceWorker_p0-w0 stopped! [2023-02-26 19:05:08,665][13531] Stopping InferenceWorker_p0-w0... [2023-02-26 19:05:08,665][13531] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 19:05:08,676][08036] Component RolloutWorker_w6 stopped! [2023-02-26 19:05:08,678][08036] Component RolloutWorker_w0 stopped! [2023-02-26 19:05:08,688][08036] Component RolloutWorker_w3 stopped! [2023-02-26 19:05:08,690][13536] Stopping RolloutWorker_w3... [2023-02-26 19:05:08,676][13538] Stopping RolloutWorker_w6... [2023-02-26 19:05:08,696][13538] Loop rollout_proc6_evt_loop terminating... [2023-02-26 19:05:08,676][13532] Stopping RolloutWorker_w0... [2023-02-26 19:05:08,697][13532] Loop rollout_proc0_evt_loop terminating... [2023-02-26 19:05:08,692][13536] Loop rollout_proc3_evt_loop terminating... [2023-02-26 19:05:08,707][08036] Component RolloutWorker_w4 stopped! [2023-02-26 19:05:08,713][13535] Stopping RolloutWorker_w4... [2023-02-26 19:05:08,713][13535] Loop rollout_proc4_evt_loop terminating... [2023-02-26 19:05:08,722][08036] Component RolloutWorker_w2 stopped! [2023-02-26 19:05:08,728][13533] Stopping RolloutWorker_w2... [2023-02-26 19:05:08,733][13533] Loop rollout_proc2_evt_loop terminating... [2023-02-26 19:05:08,789][13517] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000776_3178496.pth [2023-02-26 19:05:08,810][13517] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 19:05:09,006][08036] Component LearnerWorker_p0 stopped! [2023-02-26 19:05:09,015][08036] Waiting for process learner_proc0 to stop... [2023-02-26 19:05:09,020][13517] Stopping LearnerWorker_p0... [2023-02-26 19:05:09,021][13517] Loop learner_proc0_evt_loop terminating... [2023-02-26 19:05:10,775][08036] Waiting for process inference_proc0-0 to join... [2023-02-26 19:05:11,179][08036] Waiting for process rollout_proc0 to join... [2023-02-26 19:05:11,581][08036] Waiting for process rollout_proc1 to join... [2023-02-26 19:05:11,583][08036] Waiting for process rollout_proc2 to join... [2023-02-26 19:05:11,595][08036] Waiting for process rollout_proc3 to join... [2023-02-26 19:05:11,596][08036] Waiting for process rollout_proc4 to join... [2023-02-26 19:05:11,597][08036] Waiting for process rollout_proc5 to join... [2023-02-26 19:05:11,598][08036] Waiting for process rollout_proc6 to join... [2023-02-26 19:05:11,600][08036] Waiting for process rollout_proc7 to join... [2023-02-26 19:05:11,601][08036] Batcher 0 profile tree view: batching: 25.6492, releasing_batches: 0.0269 [2023-02-26 19:05:11,602][08036] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 566.6409 update_model: 8.2139 weight_update: 0.0016 one_step: 0.0093 handle_policy_step: 542.2757 deserialize: 15.6264, stack: 2.9339, obs_to_device_normalize: 117.7517, forward: 264.9191, send_messages: 26.9115 prepare_outputs: 87.0667 to_cpu: 53.6686 [2023-02-26 19:05:11,604][08036] Learner 0 profile tree view: misc: 0.0062, prepare_batch: 16.1614 train: 76.6645 epoch_init: 0.0138, minibatch_init: 0.0161, losses_postprocess: 0.5848, kl_divergence: 0.6097, after_optimizer: 33.5494 calculate_losses: 26.9609 losses_init: 0.0069, forward_head: 1.8711, bptt_initial: 17.7361, tail: 1.1704, advantages_returns: 0.3351, losses: 3.3693 bptt: 2.0937 bptt_forward_core: 2.0124 update: 14.2326 clip: 1.4152 [2023-02-26 19:05:11,606][08036] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2896, enqueue_policy_requests: 156.8880, env_step: 867.9924, overhead: 22.4518, complete_rollouts: 7.5711 save_policy_outputs: 22.0078 split_output_tensors: 10.1874 [2023-02-26 19:05:11,608][08036] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3937, enqueue_policy_requests: 158.0133, env_step: 865.0519, overhead: 24.2229, complete_rollouts: 7.5344 save_policy_outputs: 22.3104 split_output_tensors: 10.8550 [2023-02-26 19:05:11,611][08036] Loop Runner_EvtLoop terminating... [2023-02-26 19:05:11,613][08036] Runner profile tree view: main_loop: 1188.6849 [2023-02-26 19:05:11,615][08036] Collected {0: 4005888}, FPS: 3370.0 [2023-02-26 19:24:46,797][08036] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 19:24:46,809][08036] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 19:24:46,811][08036] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 19:24:46,820][08036] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 19:24:46,835][08036] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 19:24:46,842][08036] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 19:24:46,849][08036] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 19:24:46,851][08036] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 19:24:46,857][08036] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 19:24:46,870][08036] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 19:24:46,874][08036] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 19:24:46,879][08036] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 19:24:46,882][08036] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 19:24:46,891][08036] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 19:24:46,898][08036] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 19:24:46,955][08036] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 19:24:46,964][08036] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 19:24:46,976][08036] RunningMeanStd input shape: (1,) [2023-02-26 19:24:47,032][08036] ConvEncoder: input_channels=3 [2023-02-26 19:24:47,888][08036] Conv encoder output size: 512 [2023-02-26 19:24:47,893][08036] Policy head output size: 512 [2023-02-26 19:24:50,661][08036] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 19:24:52,326][08036] Num frames 100... [2023-02-26 19:24:52,438][08036] Num frames 200... [2023-02-26 19:24:52,549][08036] Num frames 300... [2023-02-26 19:24:52,667][08036] Num frames 400... [2023-02-26 19:24:52,784][08036] Num frames 500... [2023-02-26 19:24:52,906][08036] Num frames 600... [2023-02-26 19:24:53,019][08036] Num frames 700... [2023-02-26 19:24:53,138][08036] Num frames 800... [2023-02-26 19:24:53,249][08036] Num frames 900... [2023-02-26 19:24:53,360][08036] Num frames 1000... [2023-02-26 19:24:53,480][08036] Num frames 1100... [2023-02-26 19:24:53,609][08036] Num frames 1200... [2023-02-26 19:24:53,728][08036] Num frames 1300... [2023-02-26 19:24:53,843][08036] Num frames 1400... [2023-02-26 19:24:53,961][08036] Num frames 1500... [2023-02-26 19:24:54,077][08036] Num frames 1600... [2023-02-26 19:24:54,169][08036] Avg episode rewards: #0: 40.310, true rewards: #0: 16.310 [2023-02-26 19:24:54,170][08036] Avg episode reward: 40.310, avg true_objective: 16.310 [2023-02-26 19:24:54,255][08036] Num frames 1700... [2023-02-26 19:24:54,381][08036] Num frames 1800... [2023-02-26 19:24:54,504][08036] Num frames 1900... [2023-02-26 19:24:54,616][08036] Num frames 2000... [2023-02-26 19:24:54,730][08036] Num frames 2100... [2023-02-26 19:24:54,842][08036] Num frames 2200... [2023-02-26 19:24:54,960][08036] Num frames 2300... [2023-02-26 19:24:55,071][08036] Num frames 2400... [2023-02-26 19:24:55,182][08036] Num frames 2500... [2023-02-26 19:24:55,290][08036] Num frames 2600... [2023-02-26 19:24:55,409][08036] Num frames 2700... [2023-02-26 19:24:55,518][08036] Num frames 2800... [2023-02-26 19:24:55,635][08036] Num frames 2900... [2023-02-26 19:24:55,751][08036] Num frames 3000... [2023-02-26 19:24:55,863][08036] Num frames 3100... [2023-02-26 19:24:55,981][08036] Num frames 3200... [2023-02-26 19:24:56,072][08036] Avg episode rewards: #0: 38.155, true rewards: #0: 16.155 [2023-02-26 19:24:56,073][08036] Avg episode reward: 38.155, avg true_objective: 16.155 [2023-02-26 19:24:56,157][08036] Num frames 3300... [2023-02-26 19:24:56,273][08036] Num frames 3400... [2023-02-26 19:24:56,388][08036] Num frames 3500... [2023-02-26 19:24:56,504][08036] Num frames 3600... [2023-02-26 19:24:56,624][08036] Num frames 3700... [2023-02-26 19:24:56,739][08036] Num frames 3800... [2023-02-26 19:24:56,850][08036] Num frames 3900... [2023-02-26 19:24:56,975][08036] Num frames 4000... [2023-02-26 19:24:57,087][08036] Num frames 4100... [2023-02-26 19:24:57,200][08036] Num frames 4200... [2023-02-26 19:24:57,364][08036] Avg episode rewards: #0: 33.290, true rewards: #0: 14.290 [2023-02-26 19:24:57,367][08036] Avg episode reward: 33.290, avg true_objective: 14.290 [2023-02-26 19:24:57,387][08036] Num frames 4300... [2023-02-26 19:24:57,503][08036] Num frames 4400... [2023-02-26 19:24:57,628][08036] Num frames 4500... [2023-02-26 19:24:57,752][08036] Num frames 4600... [2023-02-26 19:24:57,865][08036] Num frames 4700... [2023-02-26 19:24:57,990][08036] Num frames 4800... [2023-02-26 19:24:58,066][08036] Avg episode rewards: #0: 26.792, true rewards: #0: 12.042 [2023-02-26 19:24:58,068][08036] Avg episode reward: 26.792, avg true_objective: 12.042 [2023-02-26 19:24:58,170][08036] Num frames 4900... [2023-02-26 19:24:58,293][08036] Num frames 5000... [2023-02-26 19:24:58,410][08036] Num frames 5100... [2023-02-26 19:24:58,533][08036] Num frames 5200... [2023-02-26 19:24:58,663][08036] Num frames 5300... [2023-02-26 19:24:58,788][08036] Num frames 5400... [2023-02-26 19:24:58,897][08036] Num frames 5500... [2023-02-26 19:24:59,003][08036] Avg episode rewards: #0: 24.686, true rewards: #0: 11.086 [2023-02-26 19:24:59,005][08036] Avg episode reward: 24.686, avg true_objective: 11.086 [2023-02-26 19:24:59,080][08036] Num frames 5600... [2023-02-26 19:24:59,192][08036] Num frames 5700... [2023-02-26 19:24:59,303][08036] Num frames 5800... [2023-02-26 19:24:59,428][08036] Num frames 5900... [2023-02-26 19:24:59,544][08036] Num frames 6000... [2023-02-26 19:24:59,659][08036] Num frames 6100... [2023-02-26 19:24:59,772][08036] Num frames 6200... [2023-02-26 19:24:59,880][08036] Num frames 6300... [2023-02-26 19:24:59,982][08036] Avg episode rewards: #0: 23.572, true rewards: #0: 10.572 [2023-02-26 19:24:59,983][08036] Avg episode reward: 23.572, avg true_objective: 10.572 [2023-02-26 19:25:00,052][08036] Num frames 6400... [2023-02-26 19:25:00,168][08036] Num frames 6500... [2023-02-26 19:25:00,283][08036] Num frames 6600... [2023-02-26 19:25:00,406][08036] Num frames 6700... [2023-02-26 19:25:00,520][08036] Num frames 6800... [2023-02-26 19:25:00,636][08036] Num frames 6900... [2023-02-26 19:25:00,749][08036] Num frames 7000... [2023-02-26 19:25:00,860][08036] Num frames 7100... [2023-02-26 19:25:00,971][08036] Num frames 7200... [2023-02-26 19:25:01,089][08036] Num frames 7300... [2023-02-26 19:25:01,184][08036] Avg episode rewards: #0: 23.336, true rewards: #0: 10.479 [2023-02-26 19:25:01,186][08036] Avg episode reward: 23.336, avg true_objective: 10.479 [2023-02-26 19:25:01,263][08036] Num frames 7400... [2023-02-26 19:25:01,389][08036] Num frames 7500... [2023-02-26 19:25:01,504][08036] Num frames 7600... [2023-02-26 19:25:01,619][08036] Num frames 7700... [2023-02-26 19:25:01,740][08036] Num frames 7800... [2023-02-26 19:25:01,884][08036] Avg episode rewards: #0: 21.599, true rewards: #0: 9.849 [2023-02-26 19:25:01,886][08036] Avg episode reward: 21.599, avg true_objective: 9.849 [2023-02-26 19:25:01,914][08036] Num frames 7900... [2023-02-26 19:25:02,035][08036] Num frames 8000... [2023-02-26 19:25:02,154][08036] Num frames 8100... [2023-02-26 19:25:02,291][08036] Num frames 8200... [2023-02-26 19:25:02,442][08036] Num frames 8300... [2023-02-26 19:25:02,596][08036] Num frames 8400... [2023-02-26 19:25:02,755][08036] Num frames 8500... [2023-02-26 19:25:02,906][08036] Num frames 8600... [2023-02-26 19:25:03,076][08036] Num frames 8700... [2023-02-26 19:25:03,230][08036] Num frames 8800... [2023-02-26 19:25:03,386][08036] Num frames 8900... [2023-02-26 19:25:03,451][08036] Avg episode rewards: #0: 21.559, true rewards: #0: 9.892 [2023-02-26 19:25:03,453][08036] Avg episode reward: 21.559, avg true_objective: 9.892 [2023-02-26 19:25:03,614][08036] Num frames 9000... [2023-02-26 19:25:03,767][08036] Num frames 9100... [2023-02-26 19:25:03,920][08036] Num frames 9200... [2023-02-26 19:25:04,083][08036] Num frames 9300... [2023-02-26 19:25:04,239][08036] Num frames 9400... [2023-02-26 19:25:04,393][08036] Num frames 9500... [2023-02-26 19:25:04,548][08036] Num frames 9600... [2023-02-26 19:25:04,709][08036] Num frames 9700... [2023-02-26 19:25:04,826][08036] Avg episode rewards: #0: 21.235, true rewards: #0: 9.735 [2023-02-26 19:25:04,828][08036] Avg episode reward: 21.235, avg true_objective: 9.735 [2023-02-26 19:26:05,077][08036] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 19:28:30,483][08036] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 19:28:30,489][08036] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 19:28:30,494][08036] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 19:28:30,496][08036] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 19:28:30,502][08036] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 19:28:30,504][08036] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 19:28:30,506][08036] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 19:28:30,507][08036] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 19:28:30,509][08036] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 19:28:30,510][08036] Adding new argument 'hf_repository'='EdenYav/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 19:28:30,511][08036] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 19:28:30,512][08036] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 19:28:30,514][08036] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 19:28:30,515][08036] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 19:28:30,516][08036] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 19:28:30,544][08036] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 19:28:30,547][08036] RunningMeanStd input shape: (1,) [2023-02-26 19:28:30,568][08036] ConvEncoder: input_channels=3 [2023-02-26 19:28:30,635][08036] Conv encoder output size: 512 [2023-02-26 19:28:30,640][08036] Policy head output size: 512 [2023-02-26 19:28:30,694][08036] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 19:28:31,398][08036] Num frames 100... [2023-02-26 19:28:31,569][08036] Num frames 200... [2023-02-26 19:28:31,800][08036] Num frames 300... [2023-02-26 19:28:31,979][08036] Num frames 400... [2023-02-26 19:28:32,174][08036] Num frames 500... [2023-02-26 19:28:32,368][08036] Num frames 600... [2023-02-26 19:28:32,684][08036] Num frames 700... [2023-02-26 19:28:32,869][08036] Num frames 800... [2023-02-26 19:28:33,036][08036] Num frames 900... [2023-02-26 19:28:33,220][08036] Num frames 1000... [2023-02-26 19:28:33,384][08036] Num frames 1100... [2023-02-26 19:28:33,556][08036] Avg episode rewards: #0: 25.720, true rewards: #0: 11.720 [2023-02-26 19:28:33,563][08036] Avg episode reward: 25.720, avg true_objective: 11.720 [2023-02-26 19:28:33,625][08036] Num frames 1200... [2023-02-26 19:28:33,841][08036] Num frames 1300... [2023-02-26 19:28:34,060][08036] Num frames 1400... [2023-02-26 19:28:34,234][08036] Num frames 1500... [2023-02-26 19:28:34,407][08036] Num frames 1600... [2023-02-26 19:28:34,658][08036] Num frames 1700... [2023-02-26 19:28:34,876][08036] Num frames 1800... [2023-02-26 19:28:35,001][08036] Avg episode rewards: #0: 19.705, true rewards: #0: 9.205 [2023-02-26 19:28:35,003][08036] Avg episode reward: 19.705, avg true_objective: 9.205 [2023-02-26 19:28:35,105][08036] Num frames 1900... [2023-02-26 19:28:35,413][08036] Num frames 2000... [2023-02-26 19:28:35,671][08036] Num frames 2100... [2023-02-26 19:28:36,044][08036] Num frames 2200... [2023-02-26 19:28:36,250][08036] Num frames 2300... [2023-02-26 19:28:36,567][08036] Num frames 2400... [2023-02-26 19:28:36,853][08036] Num frames 2500... [2023-02-26 19:28:37,039][08036] Num frames 2600... [2023-02-26 19:28:37,211][08036] Num frames 2700... [2023-02-26 19:28:37,335][08036] Num frames 2800... [2023-02-26 19:28:37,447][08036] Num frames 2900... [2023-02-26 19:28:37,538][08036] Avg episode rewards: #0: 21.097, true rewards: #0: 9.763 [2023-02-26 19:28:37,540][08036] Avg episode reward: 21.097, avg true_objective: 9.763 [2023-02-26 19:28:37,620][08036] Num frames 3000... [2023-02-26 19:28:37,738][08036] Num frames 3100... [2023-02-26 19:28:37,858][08036] Num frames 3200... [2023-02-26 19:28:37,975][08036] Num frames 3300... [2023-02-26 19:28:38,094][08036] Num frames 3400... [2023-02-26 19:28:38,215][08036] Num frames 3500... [2023-02-26 19:28:38,331][08036] Num frames 3600... [2023-02-26 19:28:38,450][08036] Num frames 3700... [2023-02-26 19:28:38,565][08036] Num frames 3800... [2023-02-26 19:28:38,627][08036] Avg episode rewards: #0: 21.005, true rewards: #0: 9.505 [2023-02-26 19:28:38,629][08036] Avg episode reward: 21.005, avg true_objective: 9.505 [2023-02-26 19:28:38,752][08036] Num frames 3900... [2023-02-26 19:28:38,875][08036] Num frames 4000... [2023-02-26 19:28:39,033][08036] Num frames 4100... [2023-02-26 19:28:39,197][08036] Num frames 4200... [2023-02-26 19:28:39,352][08036] Num frames 4300... [2023-02-26 19:28:39,509][08036] Num frames 4400... [2023-02-26 19:28:39,668][08036] Num frames 4500... [2023-02-26 19:28:39,827][08036] Num frames 4600... [2023-02-26 19:28:39,925][08036] Avg episode rewards: #0: 20.446, true rewards: #0: 9.246 [2023-02-26 19:28:39,929][08036] Avg episode reward: 20.446, avg true_objective: 9.246 [2023-02-26 19:28:40,066][08036] Num frames 4700... [2023-02-26 19:28:40,220][08036] Num frames 4800... [2023-02-26 19:28:40,376][08036] Num frames 4900... [2023-02-26 19:28:40,534][08036] Num frames 5000... [2023-02-26 19:28:40,685][08036] Num frames 5100... [2023-02-26 19:28:40,848][08036] Num frames 5200... [2023-02-26 19:28:41,055][08036] Avg episode rewards: #0: 18.825, true rewards: #0: 8.825 [2023-02-26 19:28:41,058][08036] Avg episode reward: 18.825, avg true_objective: 8.825 [2023-02-26 19:28:41,071][08036] Num frames 5300... [2023-02-26 19:28:41,230][08036] Num frames 5400... [2023-02-26 19:28:41,393][08036] Num frames 5500... [2023-02-26 19:28:41,553][08036] Num frames 5600... [2023-02-26 19:28:41,714][08036] Num frames 5700... [2023-02-26 19:28:41,875][08036] Num frames 5800... [2023-02-26 19:28:42,042][08036] Num frames 5900... [2023-02-26 19:28:42,104][08036] Avg episode rewards: #0: 17.719, true rewards: #0: 8.433 [2023-02-26 19:28:42,107][08036] Avg episode reward: 17.719, avg true_objective: 8.433 [2023-02-26 19:28:42,259][08036] Num frames 6000... [2023-02-26 19:28:42,419][08036] Num frames 6100... [2023-02-26 19:28:42,570][08036] Num frames 6200... [2023-02-26 19:28:42,682][08036] Num frames 6300... [2023-02-26 19:28:42,797][08036] Num frames 6400... [2023-02-26 19:28:42,908][08036] Num frames 6500... [2023-02-26 19:28:43,029][08036] Num frames 6600... [2023-02-26 19:28:43,148][08036] Num frames 6700... [2023-02-26 19:28:43,279][08036] Avg episode rewards: #0: 17.959, true rewards: #0: 8.459 [2023-02-26 19:28:43,281][08036] Avg episode reward: 17.959, avg true_objective: 8.459 [2023-02-26 19:28:43,323][08036] Num frames 6800... [2023-02-26 19:28:43,440][08036] Num frames 6900... [2023-02-26 19:28:43,565][08036] Num frames 7000... [2023-02-26 19:28:43,676][08036] Num frames 7100... [2023-02-26 19:28:43,788][08036] Num frames 7200... [2023-02-26 19:28:43,902][08036] Num frames 7300... [2023-02-26 19:28:44,022][08036] Num frames 7400... [2023-02-26 19:28:44,135][08036] Num frames 7500... [2023-02-26 19:28:44,250][08036] Num frames 7600... [2023-02-26 19:28:44,408][08036] Avg episode rewards: #0: 18.106, true rewards: #0: 8.550 [2023-02-26 19:28:44,410][08036] Avg episode reward: 18.106, avg true_objective: 8.550 [2023-02-26 19:28:44,421][08036] Num frames 7700... [2023-02-26 19:28:44,541][08036] Num frames 7800... [2023-02-26 19:28:44,651][08036] Num frames 7900... [2023-02-26 19:28:44,764][08036] Num frames 8000... [2023-02-26 19:28:44,874][08036] Num frames 8100... [2023-02-26 19:28:44,991][08036] Num frames 8200... [2023-02-26 19:28:45,105][08036] Num frames 8300... [2023-02-26 19:28:45,228][08036] Num frames 8400... [2023-02-26 19:28:45,338][08036] Num frames 8500... [2023-02-26 19:28:45,452][08036] Num frames 8600... [2023-02-26 19:28:45,563][08036] Num frames 8700... [2023-02-26 19:28:45,704][08036] Avg episode rewards: #0: 18.371, true rewards: #0: 8.771 [2023-02-26 19:28:45,706][08036] Avg episode reward: 18.371, avg true_objective: 8.771 [2023-02-26 19:29:39,207][08036] Replay video saved to /content/train_dir/default_experiment/replay.mp4!