[2023-02-26 20:45:25,420][00236] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 20:45:25,424][00236] Rollout worker 0 uses device cpu [2023-02-26 20:45:25,425][00236] Rollout worker 1 uses device cpu [2023-02-26 20:45:25,429][00236] Rollout worker 2 uses device cpu [2023-02-26 20:45:25,430][00236] Rollout worker 3 uses device cpu [2023-02-26 20:45:25,431][00236] Rollout worker 4 uses device cpu [2023-02-26 20:45:25,433][00236] Rollout worker 5 uses device cpu [2023-02-26 20:45:25,434][00236] Rollout worker 6 uses device cpu [2023-02-26 20:45:25,439][00236] Rollout worker 7 uses device cpu [2023-02-26 20:45:25,660][00236] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 20:45:25,662][00236] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 20:45:25,703][00236] Starting all processes... [2023-02-26 20:45:25,705][00236] Starting process learner_proc0 [2023-02-26 20:45:25,781][00236] Starting all processes... [2023-02-26 20:45:25,794][00236] Starting process inference_proc0-0 [2023-02-26 20:45:25,807][00236] Starting process rollout_proc0 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc1 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc2 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc3 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc4 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc5 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc6 [2023-02-26 20:45:25,812][00236] Starting process rollout_proc7 [2023-02-26 20:45:37,224][10274] Worker 1 uses CPU cores [1] [2023-02-26 20:45:37,374][10259] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 20:45:37,376][10259] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 20:45:37,739][10283] Worker 5 uses CPU cores [1] [2023-02-26 20:45:37,751][10279] Worker 0 uses CPU cores [0] [2023-02-26 20:45:37,763][10285] Worker 6 uses CPU cores [0] [2023-02-26 20:45:37,813][10272] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 20:45:37,813][10272] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 20:45:37,860][10280] Worker 2 uses CPU cores [0] [2023-02-26 20:45:37,925][10281] Worker 3 uses CPU cores [1] [2023-02-26 20:45:38,033][10282] Worker 4 uses CPU cores [0] [2023-02-26 20:45:38,129][10284] Worker 7 uses CPU cores [1] [2023-02-26 20:45:38,360][10272] Num visible devices: 1 [2023-02-26 20:45:38,365][10259] Num visible devices: 1 [2023-02-26 20:45:38,375][10259] Starting seed is not provided [2023-02-26 20:45:38,376][10259] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 20:45:38,377][10259] Initializing actor-critic model on device cuda:0 [2023-02-26 20:45:38,378][10259] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 20:45:38,380][10259] RunningMeanStd input shape: (1,) [2023-02-26 20:45:38,399][10259] ConvEncoder: input_channels=3 [2023-02-26 20:45:38,906][10259] Conv encoder output size: 512 [2023-02-26 20:45:38,906][10259] Policy head output size: 512 [2023-02-26 20:45:39,045][10259] Created Actor Critic model with architecture: [2023-02-26 20:45:39,045][10259] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 20:45:45,650][00236] Heartbeat connected on Batcher_0 [2023-02-26 20:45:45,660][00236] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 20:45:45,677][00236] Heartbeat connected on RolloutWorker_w0 [2023-02-26 20:45:45,679][00236] Heartbeat connected on RolloutWorker_w1 [2023-02-26 20:45:45,683][00236] Heartbeat connected on RolloutWorker_w2 [2023-02-26 20:45:45,687][00236] Heartbeat connected on RolloutWorker_w3 [2023-02-26 20:45:45,692][00236] Heartbeat connected on RolloutWorker_w4 [2023-02-26 20:45:45,695][00236] Heartbeat connected on RolloutWorker_w5 [2023-02-26 20:45:45,699][00236] Heartbeat connected on RolloutWorker_w6 [2023-02-26 20:45:45,703][00236] Heartbeat connected on RolloutWorker_w7 [2023-02-26 20:45:48,827][10259] Using optimizer [2023-02-26 20:45:48,828][10259] No checkpoints found [2023-02-26 20:45:48,828][10259] Did not load from checkpoint, starting from scratch! [2023-02-26 20:45:48,829][10259] Initialized policy 0 weights for model version 0 [2023-02-26 20:45:48,836][10259] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 20:45:48,842][10259] LearnerWorker_p0 finished initialization! [2023-02-26 20:45:48,843][00236] Heartbeat connected on LearnerWorker_p0 [2023-02-26 20:45:48,944][10272] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 20:45:48,946][10272] RunningMeanStd input shape: (1,) [2023-02-26 20:45:48,960][10272] ConvEncoder: input_channels=3 [2023-02-26 20:45:49,060][10272] Conv encoder output size: 512 [2023-02-26 20:45:49,061][10272] Policy head output size: 512 [2023-02-26 20:45:50,568][00236] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 20:45:51,329][00236] Inference worker 0-0 is ready! [2023-02-26 20:45:51,332][00236] All inference workers are ready! Signal rollout workers to start! [2023-02-26 20:45:51,442][10284] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,463][10283] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,474][10281] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,478][10274] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,488][10282] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,494][10285] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,494][10279] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:51,513][10280] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 20:45:52,637][10279] Decorrelating experience for 0 frames... [2023-02-26 20:45:52,638][10285] Decorrelating experience for 0 frames... [2023-02-26 20:45:52,639][10282] Decorrelating experience for 0 frames... [2023-02-26 20:45:52,636][10281] Decorrelating experience for 0 frames... [2023-02-26 20:45:52,638][10284] Decorrelating experience for 0 frames... [2023-02-26 20:45:52,639][10283] Decorrelating experience for 0 frames... [2023-02-26 20:45:53,659][10280] Decorrelating experience for 0 frames... [2023-02-26 20:45:53,662][10279] Decorrelating experience for 32 frames... [2023-02-26 20:45:53,667][10285] Decorrelating experience for 32 frames... [2023-02-26 20:45:53,682][10283] Decorrelating experience for 32 frames... [2023-02-26 20:45:53,690][10281] Decorrelating experience for 32 frames... [2023-02-26 20:45:53,694][10284] Decorrelating experience for 32 frames... [2023-02-26 20:45:54,873][10274] Decorrelating experience for 0 frames... [2023-02-26 20:45:55,105][10282] Decorrelating experience for 32 frames... [2023-02-26 20:45:55,115][10281] Decorrelating experience for 64 frames... [2023-02-26 20:45:55,127][10280] Decorrelating experience for 32 frames... [2023-02-26 20:45:55,369][10285] Decorrelating experience for 64 frames... [2023-02-26 20:45:55,568][00236] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 20:45:56,061][10282] Decorrelating experience for 64 frames... [2023-02-26 20:45:56,244][10274] Decorrelating experience for 32 frames... [2023-02-26 20:45:56,544][10284] Decorrelating experience for 64 frames... [2023-02-26 20:45:56,685][10283] Decorrelating experience for 64 frames... [2023-02-26 20:45:57,087][10280] Decorrelating experience for 64 frames... [2023-02-26 20:45:57,481][10282] Decorrelating experience for 96 frames... [2023-02-26 20:45:57,860][10274] Decorrelating experience for 64 frames... [2023-02-26 20:45:58,168][10283] Decorrelating experience for 96 frames... [2023-02-26 20:45:58,469][10284] Decorrelating experience for 96 frames... [2023-02-26 20:45:58,774][10280] Decorrelating experience for 96 frames... [2023-02-26 20:45:58,814][10274] Decorrelating experience for 96 frames... [2023-02-26 20:45:59,201][10285] Decorrelating experience for 96 frames... [2023-02-26 20:45:59,802][10279] Decorrelating experience for 64 frames... [2023-02-26 20:46:00,254][10281] Decorrelating experience for 96 frames... [2023-02-26 20:46:00,568][00236] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 20:46:00,738][10279] Decorrelating experience for 96 frames... [2023-02-26 20:46:04,754][10259] Signal inference workers to stop experience collection... [2023-02-26 20:46:04,770][10272] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 20:46:05,568][00236] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 76.4. Samples: 1146. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 20:46:05,570][00236] Avg episode reward: [(0, '1.951')] [2023-02-26 20:46:07,417][10259] Signal inference workers to resume experience collection... [2023-02-26 20:46:07,421][10272] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 20:46:10,568][00236] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 169.7. Samples: 3394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-26 20:46:10,571][00236] Avg episode reward: [(0, '3.283')] [2023-02-26 20:46:15,568][00236] Fps is (10 sec: 3276.8, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 347.0. Samples: 8676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:46:15,576][00236] Avg episode reward: [(0, '3.698')] [2023-02-26 20:46:18,394][10272] Updated weights for policy 0, policy_version 10 (0.0369) [2023-02-26 20:46:20,568][00236] Fps is (10 sec: 2867.2, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 357.7. Samples: 10732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:46:20,571][00236] Avg episode reward: [(0, '4.312')] [2023-02-26 20:46:25,568][00236] Fps is (10 sec: 3276.7, 60 sec: 1872.4, 300 sec: 1872.4). Total num frames: 65536. Throughput: 0: 443.3. Samples: 15514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:46:25,576][00236] Avg episode reward: [(0, '4.452')] [2023-02-26 20:46:29,396][10272] Updated weights for policy 0, policy_version 20 (0.0013) [2023-02-26 20:46:30,568][00236] Fps is (10 sec: 4096.0, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 549.7. Samples: 21988. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 20:46:30,571][00236] Avg episode reward: [(0, '4.402')] [2023-02-26 20:46:35,568][00236] Fps is (10 sec: 3686.5, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 102400. Throughput: 0: 557.0. Samples: 25064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:46:35,575][00236] Avg episode reward: [(0, '4.245')] [2023-02-26 20:46:35,585][10259] Saving new best policy, reward=4.245! [2023-02-26 20:46:40,568][00236] Fps is (10 sec: 2867.2, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 114688. Throughput: 0: 648.4. Samples: 29178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:46:40,574][00236] Avg episode reward: [(0, '4.360')] [2023-02-26 20:46:40,581][10259] Saving new best policy, reward=4.360! [2023-02-26 20:46:42,490][10272] Updated weights for policy 0, policy_version 30 (0.0026) [2023-02-26 20:46:45,568][00236] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 759.1. Samples: 34158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:46:45,574][00236] Avg episode reward: [(0, '4.464')] [2023-02-26 20:46:45,577][10259] Saving new best policy, reward=4.464! [2023-02-26 20:46:50,568][00236] Fps is (10 sec: 4095.9, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 805.3. Samples: 37384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-26 20:46:50,570][00236] Avg episode reward: [(0, '4.528')] [2023-02-26 20:46:50,580][10259] Saving new best policy, reward=4.528! [2023-02-26 20:46:52,049][10272] Updated weights for policy 0, policy_version 40 (0.0018) [2023-02-26 20:46:55,568][00236] Fps is (10 sec: 4096.0, 60 sec: 2935.5, 300 sec: 2709.7). Total num frames: 176128. Throughput: 0: 889.5. Samples: 43422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-26 20:46:55,575][00236] Avg episode reward: [(0, '4.411')] [2023-02-26 20:47:00,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 188416. Throughput: 0: 867.8. Samples: 47726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-26 20:47:00,574][00236] Avg episode reward: [(0, '4.475')] [2023-02-26 20:47:05,042][10272] Updated weights for policy 0, policy_version 50 (0.0016) [2023-02-26 20:47:05,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 204800. Throughput: 0: 868.9. Samples: 49834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:47:05,571][00236] Avg episode reward: [(0, '4.409')] [2023-02-26 20:47:10,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 911.7. Samples: 56538. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 20:47:10,576][00236] Avg episode reward: [(0, '4.396')] [2023-02-26 20:47:14,535][10272] Updated weights for policy 0, policy_version 60 (0.0023) [2023-02-26 20:47:15,574][00236] Fps is (10 sec: 4093.3, 60 sec: 3549.5, 300 sec: 2891.1). Total num frames: 245760. Throughput: 0: 896.5. Samples: 62338. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 20:47:15,578][00236] Avg episode reward: [(0, '4.549')] [2023-02-26 20:47:15,580][10259] Saving new best policy, reward=4.549! [2023-02-26 20:47:20,571][00236] Fps is (10 sec: 3275.8, 60 sec: 3618.0, 300 sec: 2912.6). Total num frames: 262144. Throughput: 0: 875.1. Samples: 64448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-26 20:47:20,574][00236] Avg episode reward: [(0, '4.617')] [2023-02-26 20:47:20,588][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth... [2023-02-26 20:47:20,747][10259] Saving new best policy, reward=4.617! [2023-02-26 20:47:25,568][00236] Fps is (10 sec: 2869.1, 60 sec: 3481.6, 300 sec: 2888.8). Total num frames: 274432. Throughput: 0: 876.8. Samples: 68632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 20:47:25,576][00236] Avg episode reward: [(0, '4.465')] [2023-02-26 20:47:27,536][10272] Updated weights for policy 0, policy_version 70 (0.0018) [2023-02-26 20:47:30,568][00236] Fps is (10 sec: 3277.8, 60 sec: 3481.6, 300 sec: 2949.1). Total num frames: 294912. Throughput: 0: 902.5. Samples: 74770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:47:30,577][00236] Avg episode reward: [(0, '4.495')] [2023-02-26 20:47:35,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3003.7). Total num frames: 315392. Throughput: 0: 900.2. Samples: 77892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:47:35,571][00236] Avg episode reward: [(0, '4.521')] [2023-02-26 20:47:39,471][10272] Updated weights for policy 0, policy_version 80 (0.0015) [2023-02-26 20:47:40,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 2978.9). Total num frames: 327680. Throughput: 0: 865.9. Samples: 82388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:47:40,574][00236] Avg episode reward: [(0, '4.596')] [2023-02-26 20:47:45,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2991.9). Total num frames: 344064. Throughput: 0: 868.6. Samples: 86812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:47:45,570][00236] Avg episode reward: [(0, '4.485')] [2023-02-26 20:47:50,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3037.9). Total num frames: 364544. Throughput: 0: 896.0. Samples: 90156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:47:50,575][00236] Avg episode reward: [(0, '4.349')] [2023-02-26 20:47:50,827][10272] Updated weights for policy 0, policy_version 90 (0.0037) [2023-02-26 20:47:55,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3080.2). Total num frames: 385024. Throughput: 0: 875.7. Samples: 95944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:47:55,574][00236] Avg episode reward: [(0, '4.555')] [2023-02-26 20:48:00,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3056.2). Total num frames: 397312. Throughput: 0: 843.4. Samples: 100284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:48:00,572][00236] Avg episode reward: [(0, '4.656')] [2023-02-26 20:48:00,590][10259] Saving new best policy, reward=4.656! [2023-02-26 20:48:04,347][10272] Updated weights for policy 0, policy_version 100 (0.0013) [2023-02-26 20:48:05,568][00236] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3034.1). Total num frames: 409600. Throughput: 0: 840.0. Samples: 102246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:48:05,570][00236] Avg episode reward: [(0, '4.661')] [2023-02-26 20:48:05,590][10259] Saving new best policy, reward=4.661! [2023-02-26 20:48:10,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3101.3). Total num frames: 434176. Throughput: 0: 876.9. Samples: 108092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:48:10,575][00236] Avg episode reward: [(0, '4.730')] [2023-02-26 20:48:10,590][10259] Saving new best policy, reward=4.730! [2023-02-26 20:48:14,295][10272] Updated weights for policy 0, policy_version 110 (0.0019) [2023-02-26 20:48:15,575][00236] Fps is (10 sec: 4502.2, 60 sec: 3481.5, 300 sec: 3135.4). Total num frames: 454656. Throughput: 0: 876.4. Samples: 114216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:48:15,578][00236] Avg episode reward: [(0, '4.699')] [2023-02-26 20:48:20,570][00236] Fps is (10 sec: 3276.0, 60 sec: 3413.4, 300 sec: 3112.9). Total num frames: 466944. Throughput: 0: 850.8. Samples: 116178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:48:20,577][00236] Avg episode reward: [(0, '4.756')] [2023-02-26 20:48:20,589][10259] Saving new best policy, reward=4.756! [2023-02-26 20:48:25,568][00236] Fps is (10 sec: 2459.3, 60 sec: 3413.3, 300 sec: 3091.8). Total num frames: 479232. Throughput: 0: 833.5. Samples: 119894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:48:25,573][00236] Avg episode reward: [(0, '4.807')] [2023-02-26 20:48:25,581][10259] Saving new best policy, reward=4.807! [2023-02-26 20:48:28,508][10272] Updated weights for policy 0, policy_version 120 (0.0022) [2023-02-26 20:48:30,568][00236] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3123.2). Total num frames: 499712. Throughput: 0: 856.8. Samples: 125370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:48:30,571][00236] Avg episode reward: [(0, '4.677')] [2023-02-26 20:48:35,568][00236] Fps is (10 sec: 4096.2, 60 sec: 3413.3, 300 sec: 3152.7). Total num frames: 520192. Throughput: 0: 850.1. Samples: 128412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:48:35,570][00236] Avg episode reward: [(0, '4.738')] [2023-02-26 20:48:40,561][10272] Updated weights for policy 0, policy_version 130 (0.0015) [2023-02-26 20:48:40,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3132.2). Total num frames: 532480. Throughput: 0: 823.9. Samples: 133018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:48:40,574][00236] Avg episode reward: [(0, '4.908')] [2023-02-26 20:48:40,585][10259] Saving new best policy, reward=4.908! [2023-02-26 20:48:45,568][00236] Fps is (10 sec: 2457.4, 60 sec: 3345.0, 300 sec: 3112.9). Total num frames: 544768. Throughput: 0: 809.3. Samples: 136702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:48:45,576][00236] Avg episode reward: [(0, '5.019')] [2023-02-26 20:48:45,581][10259] Saving new best policy, reward=5.019! [2023-02-26 20:48:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3117.5). Total num frames: 561152. Throughput: 0: 823.2. Samples: 139288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:48:50,576][00236] Avg episode reward: [(0, '4.926')] [2023-02-26 20:48:53,167][10272] Updated weights for policy 0, policy_version 140 (0.0028) [2023-02-26 20:48:55,568][00236] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3144.0). Total num frames: 581632. Throughput: 0: 824.0. Samples: 145170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:48:55,577][00236] Avg episode reward: [(0, '4.990')] [2023-02-26 20:49:00,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3147.5). Total num frames: 598016. Throughput: 0: 789.4. Samples: 149732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:49:00,575][00236] Avg episode reward: [(0, '5.016')] [2023-02-26 20:49:05,568][00236] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3108.8). Total num frames: 606208. Throughput: 0: 787.0. Samples: 151592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-26 20:49:05,576][00236] Avg episode reward: [(0, '4.855')] [2023-02-26 20:49:07,262][10272] Updated weights for policy 0, policy_version 150 (0.0030) [2023-02-26 20:49:10,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3133.4). Total num frames: 626688. Throughput: 0: 808.3. Samples: 156268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:49:10,570][00236] Avg episode reward: [(0, '5.003')] [2023-02-26 20:49:15,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3208.9, 300 sec: 3156.9). Total num frames: 647168. Throughput: 0: 822.4. Samples: 162380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:49:15,576][00236] Avg episode reward: [(0, '5.146')] [2023-02-26 20:49:15,582][10259] Saving new best policy, reward=5.146! [2023-02-26 20:49:17,522][10272] Updated weights for policy 0, policy_version 160 (0.0016) [2023-02-26 20:49:20,568][00236] Fps is (10 sec: 3276.7, 60 sec: 3208.6, 300 sec: 3140.3). Total num frames: 659456. Throughput: 0: 808.7. Samples: 164802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:49:20,571][00236] Avg episode reward: [(0, '5.182')] [2023-02-26 20:49:20,589][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth... [2023-02-26 20:49:20,755][10259] Saving new best policy, reward=5.182! [2023-02-26 20:49:25,568][00236] Fps is (10 sec: 2457.6, 60 sec: 3208.6, 300 sec: 3124.4). Total num frames: 671744. Throughput: 0: 786.8. Samples: 168426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:49:25,575][00236] Avg episode reward: [(0, '5.391')] [2023-02-26 20:49:25,578][10259] Saving new best policy, reward=5.391! [2023-02-26 20:49:30,568][00236] Fps is (10 sec: 2867.4, 60 sec: 3140.3, 300 sec: 3127.9). Total num frames: 688128. Throughput: 0: 815.0. Samples: 173378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:49:30,571][00236] Avg episode reward: [(0, '5.464')] [2023-02-26 20:49:30,580][10259] Saving new best policy, reward=5.464! [2023-02-26 20:49:31,680][10272] Updated weights for policy 0, policy_version 170 (0.0022) [2023-02-26 20:49:35,568][00236] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3149.4). Total num frames: 708608. Throughput: 0: 823.1. Samples: 176326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:49:35,571][00236] Avg episode reward: [(0, '5.225')] [2023-02-26 20:49:40,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3152.1). Total num frames: 724992. Throughput: 0: 810.8. Samples: 181654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:49:40,571][00236] Avg episode reward: [(0, '5.262')] [2023-02-26 20:49:44,103][10272] Updated weights for policy 0, policy_version 180 (0.0017) [2023-02-26 20:49:45,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3137.4). Total num frames: 737280. Throughput: 0: 793.6. Samples: 185444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:49:45,571][00236] Avg episode reward: [(0, '5.510')] [2023-02-26 20:49:45,574][10259] Saving new best policy, reward=5.510! [2023-02-26 20:49:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3140.3). Total num frames: 753664. Throughput: 0: 793.7. Samples: 187308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:49:50,577][00236] Avg episode reward: [(0, '5.983')] [2023-02-26 20:49:50,590][10259] Saving new best policy, reward=5.983! [2023-02-26 20:49:55,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3159.8). Total num frames: 774144. Throughput: 0: 821.5. Samples: 193234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:49:55,571][00236] Avg episode reward: [(0, '6.247')] [2023-02-26 20:49:55,575][10259] Saving new best policy, reward=6.247! [2023-02-26 20:49:56,174][10272] Updated weights for policy 0, policy_version 190 (0.0016) [2023-02-26 20:50:00,568][00236] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3162.1). Total num frames: 790528. Throughput: 0: 810.8. Samples: 198866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:50:00,575][00236] Avg episode reward: [(0, '6.410')] [2023-02-26 20:50:00,588][10259] Saving new best policy, reward=6.410! [2023-02-26 20:50:05,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3164.4). Total num frames: 806912. Throughput: 0: 802.5. Samples: 200916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:50:05,576][00236] Avg episode reward: [(0, '6.207')] [2023-02-26 20:50:09,370][10272] Updated weights for policy 0, policy_version 200 (0.0013) [2023-02-26 20:50:10,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3166.5). Total num frames: 823296. Throughput: 0: 816.2. Samples: 205154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:50:10,570][00236] Avg episode reward: [(0, '6.381')] [2023-02-26 20:50:15,568][00236] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3168.6). Total num frames: 839680. Throughput: 0: 828.2. Samples: 210648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:50:15,571][00236] Avg episode reward: [(0, '6.367')] [2023-02-26 20:50:20,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3155.4). Total num frames: 851968. Throughput: 0: 809.0. Samples: 212730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:50:20,571][00236] Avg episode reward: [(0, '6.339')] [2023-02-26 20:50:23,479][10272] Updated weights for policy 0, policy_version 210 (0.0033) [2023-02-26 20:50:25,572][00236] Fps is (10 sec: 2456.6, 60 sec: 3208.3, 300 sec: 3142.7). Total num frames: 864256. Throughput: 0: 764.7. Samples: 216070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:50:25,578][00236] Avg episode reward: [(0, '6.587')] [2023-02-26 20:50:25,580][10259] Saving new best policy, reward=6.587! [2023-02-26 20:50:30,568][00236] Fps is (10 sec: 2457.5, 60 sec: 3140.2, 300 sec: 3130.5). Total num frames: 876544. Throughput: 0: 767.5. Samples: 219982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:50:30,574][00236] Avg episode reward: [(0, '6.527')] [2023-02-26 20:50:35,568][00236] Fps is (10 sec: 3278.2, 60 sec: 3140.3, 300 sec: 3147.5). Total num frames: 897024. Throughput: 0: 785.9. Samples: 222674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:50:35,571][00236] Avg episode reward: [(0, '6.645')] [2023-02-26 20:50:35,576][10259] Saving new best policy, reward=6.645! [2023-02-26 20:50:36,298][10272] Updated weights for policy 0, policy_version 220 (0.0034) [2023-02-26 20:50:40,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3208.5, 300 sec: 3163.8). Total num frames: 917504. Throughput: 0: 798.3. Samples: 229158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:50:40,574][00236] Avg episode reward: [(0, '6.848')] [2023-02-26 20:50:40,589][10259] Saving new best policy, reward=6.848! [2023-02-26 20:50:45,568][00236] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3165.7). Total num frames: 933888. Throughput: 0: 789.4. Samples: 234390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:50:45,576][00236] Avg episode reward: [(0, '6.992')] [2023-02-26 20:50:45,584][10259] Saving new best policy, reward=6.992! [2023-02-26 20:50:47,938][10272] Updated weights for policy 0, policy_version 230 (0.0035) [2023-02-26 20:50:50,568][00236] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 946176. Throughput: 0: 787.6. Samples: 236360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:50:50,570][00236] Avg episode reward: [(0, '6.836')] [2023-02-26 20:50:55,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 966656. Throughput: 0: 799.6. Samples: 241134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:50:55,575][00236] Avg episode reward: [(0, '7.031')] [2023-02-26 20:50:55,581][10259] Saving new best policy, reward=7.031! [2023-02-26 20:50:59,069][10272] Updated weights for policy 0, policy_version 240 (0.0038) [2023-02-26 20:51:00,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 987136. Throughput: 0: 820.3. Samples: 247562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:51:00,571][00236] Avg episode reward: [(0, '7.357')] [2023-02-26 20:51:00,585][10259] Saving new best policy, reward=7.357! [2023-02-26 20:51:05,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1003520. Throughput: 0: 844.9. Samples: 250752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:51:05,571][00236] Avg episode reward: [(0, '8.053')] [2023-02-26 20:51:05,580][10259] Saving new best policy, reward=8.053! [2023-02-26 20:51:10,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1019904. Throughput: 0: 863.0. Samples: 254902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:51:10,574][00236] Avg episode reward: [(0, '8.127')] [2023-02-26 20:51:10,594][10259] Saving new best policy, reward=8.127! [2023-02-26 20:51:11,757][10272] Updated weights for policy 0, policy_version 250 (0.0012) [2023-02-26 20:51:15,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 1036288. Throughput: 0: 881.1. Samples: 259630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:51:15,576][00236] Avg episode reward: [(0, '7.888')] [2023-02-26 20:51:20,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1056768. Throughput: 0: 893.2. Samples: 262870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:51:20,571][00236] Avg episode reward: [(0, '7.560')] [2023-02-26 20:51:20,581][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000258_1056768.pth... [2023-02-26 20:51:20,699][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth [2023-02-26 20:51:22,143][10272] Updated weights for policy 0, policy_version 260 (0.0029) [2023-02-26 20:51:25,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3550.1, 300 sec: 3360.1). Total num frames: 1077248. Throughput: 0: 890.8. Samples: 269244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:51:25,574][00236] Avg episode reward: [(0, '8.356')] [2023-02-26 20:51:25,580][10259] Saving new best policy, reward=8.356! [2023-02-26 20:51:30,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3346.2). Total num frames: 1089536. Throughput: 0: 867.5. Samples: 273428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:51:30,574][00236] Avg episode reward: [(0, '8.373')] [2023-02-26 20:51:30,590][10259] Saving new best policy, reward=8.373! [2023-02-26 20:51:35,275][10272] Updated weights for policy 0, policy_version 270 (0.0022) [2023-02-26 20:51:35,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 1105920. Throughput: 0: 870.3. Samples: 275522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:51:35,571][00236] Avg episode reward: [(0, '8.004')] [2023-02-26 20:51:40,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1130496. Throughput: 0: 901.6. Samples: 281706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:51:40,570][00236] Avg episode reward: [(0, '8.056')] [2023-02-26 20:51:44,246][10272] Updated weights for policy 0, policy_version 280 (0.0020) [2023-02-26 20:51:45,574][00236] Fps is (10 sec: 4093.3, 60 sec: 3549.5, 300 sec: 3360.0). Total num frames: 1146880. Throughput: 0: 902.2. Samples: 288168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:51:45,577][00236] Avg episode reward: [(0, '8.363')] [2023-02-26 20:51:50,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3346.2). Total num frames: 1163264. Throughput: 0: 878.0. Samples: 290262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:51:50,574][00236] Avg episode reward: [(0, '8.466')] [2023-02-26 20:51:50,585][10259] Saving new best policy, reward=8.466! [2023-02-26 20:51:55,568][00236] Fps is (10 sec: 3279.0, 60 sec: 3549.9, 300 sec: 3360.1). Total num frames: 1179648. Throughput: 0: 877.6. Samples: 294394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:51:55,570][00236] Avg episode reward: [(0, '7.971')] [2023-02-26 20:51:57,433][10272] Updated weights for policy 0, policy_version 290 (0.0017) [2023-02-26 20:52:00,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1200128. Throughput: 0: 911.4. Samples: 300642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:52:00,570][00236] Avg episode reward: [(0, '7.794')] [2023-02-26 20:52:05,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3360.1). Total num frames: 1220608. Throughput: 0: 913.7. Samples: 303986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:52:05,571][00236] Avg episode reward: [(0, '8.112')] [2023-02-26 20:52:07,458][10272] Updated weights for policy 0, policy_version 300 (0.0015) [2023-02-26 20:52:10,570][00236] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3360.2). Total num frames: 1236992. Throughput: 0: 886.4. Samples: 309132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:52:10,574][00236] Avg episode reward: [(0, '8.770')] [2023-02-26 20:52:10,590][10259] Saving new best policy, reward=8.770! [2023-02-26 20:52:15,568][00236] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3346.3). Total num frames: 1249280. Throughput: 0: 883.7. Samples: 313194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:52:15,573][00236] Avg episode reward: [(0, '8.602')] [2023-02-26 20:52:20,088][10272] Updated weights for policy 0, policy_version 310 (0.0022) [2023-02-26 20:52:20,568][00236] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1269760. Throughput: 0: 902.0. Samples: 316114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:52:20,576][00236] Avg episode reward: [(0, '9.389')] [2023-02-26 20:52:20,588][10259] Saving new best policy, reward=9.389! [2023-02-26 20:52:25,568][00236] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1290240. Throughput: 0: 909.6. Samples: 322638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:52:25,571][00236] Avg episode reward: [(0, '9.863')] [2023-02-26 20:52:25,614][10259] Saving new best policy, reward=9.863! [2023-02-26 20:52:30,571][00236] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3360.1). Total num frames: 1306624. Throughput: 0: 874.8. Samples: 327532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:52:30,576][00236] Avg episode reward: [(0, '9.666')] [2023-02-26 20:52:31,287][10272] Updated weights for policy 0, policy_version 320 (0.0017) [2023-02-26 20:52:35,568][00236] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3360.1). Total num frames: 1318912. Throughput: 0: 875.1. Samples: 329640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:52:35,569][00236] Avg episode reward: [(0, '9.877')] [2023-02-26 20:52:35,581][10259] Saving new best policy, reward=9.877! [2023-02-26 20:52:40,568][00236] Fps is (10 sec: 3277.7, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1339392. Throughput: 0: 899.4. Samples: 334868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:52:40,576][00236] Avg episode reward: [(0, '10.233')] [2023-02-26 20:52:40,587][10259] Saving new best policy, reward=10.233! [2023-02-26 20:52:42,737][10272] Updated weights for policy 0, policy_version 330 (0.0027) [2023-02-26 20:52:45,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3618.5, 300 sec: 3387.9). Total num frames: 1363968. Throughput: 0: 906.7. Samples: 341444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:52:45,575][00236] Avg episode reward: [(0, '10.815')] [2023-02-26 20:52:45,578][10259] Saving new best policy, reward=10.815! [2023-02-26 20:52:50,568][00236] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3374.0). Total num frames: 1380352. Throughput: 0: 890.2. Samples: 344044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:52:50,575][00236] Avg episode reward: [(0, '10.939')] [2023-02-26 20:52:50,586][10259] Saving new best policy, reward=10.939! [2023-02-26 20:52:55,176][10272] Updated weights for policy 0, policy_version 340 (0.0050) [2023-02-26 20:52:55,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1392640. Throughput: 0: 866.5. Samples: 348122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:52:55,575][00236] Avg episode reward: [(0, '11.469')] [2023-02-26 20:52:55,578][10259] Saving new best policy, reward=11.469! [2023-02-26 20:53:00,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1413120. Throughput: 0: 898.4. Samples: 353622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:53:00,571][00236] Avg episode reward: [(0, '12.253')] [2023-02-26 20:53:00,582][10259] Saving new best policy, reward=12.253! [2023-02-26 20:53:05,298][10272] Updated weights for policy 0, policy_version 350 (0.0016) [2023-02-26 20:53:05,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 1433600. Throughput: 0: 903.6. Samples: 356776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:53:05,570][00236] Avg episode reward: [(0, '13.134')] [2023-02-26 20:53:05,577][10259] Saving new best policy, reward=13.134! [2023-02-26 20:53:10,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3374.1). Total num frames: 1449984. Throughput: 0: 888.9. Samples: 362638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:53:10,575][00236] Avg episode reward: [(0, '14.473')] [2023-02-26 20:53:10,588][10259] Saving new best policy, reward=14.473! [2023-02-26 20:53:15,570][00236] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3387.9). Total num frames: 1466368. Throughput: 0: 873.8. Samples: 366854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:53:15,575][00236] Avg episode reward: [(0, '14.705')] [2023-02-26 20:53:15,578][10259] Saving new best policy, reward=14.705! [2023-02-26 20:53:18,444][10272] Updated weights for policy 0, policy_version 360 (0.0043) [2023-02-26 20:53:20,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1482752. Throughput: 0: 874.2. Samples: 368980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:53:20,573][00236] Avg episode reward: [(0, '14.123')] [2023-02-26 20:53:20,588][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000362_1482752.pth... [2023-02-26 20:53:20,750][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth [2023-02-26 20:53:25,568][00236] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1503232. Throughput: 0: 900.0. Samples: 375366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:53:25,574][00236] Avg episode reward: [(0, '14.295')] [2023-02-26 20:53:27,982][10272] Updated weights for policy 0, policy_version 370 (0.0023) [2023-02-26 20:53:30,570][00236] Fps is (10 sec: 4095.3, 60 sec: 3618.2, 300 sec: 3401.7). Total num frames: 1523712. Throughput: 0: 885.3. Samples: 381284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:53:30,577][00236] Avg episode reward: [(0, '14.213')] [2023-02-26 20:53:35,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 1536000. Throughput: 0: 874.8. Samples: 383412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:53:35,571][00236] Avg episode reward: [(0, '14.110')] [2023-02-26 20:53:40,568][00236] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3415.7). Total num frames: 1552384. Throughput: 0: 879.9. Samples: 387716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:53:40,574][00236] Avg episode reward: [(0, '15.201')] [2023-02-26 20:53:40,586][10259] Saving new best policy, reward=15.201! [2023-02-26 20:53:41,287][10272] Updated weights for policy 0, policy_version 380 (0.0025) [2023-02-26 20:53:45,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1572864. Throughput: 0: 901.2. Samples: 394174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:53:45,577][00236] Avg episode reward: [(0, '14.564')] [2023-02-26 20:53:50,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1593344. Throughput: 0: 902.8. Samples: 397400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:53:50,572][00236] Avg episode reward: [(0, '14.843')] [2023-02-26 20:53:51,697][10272] Updated weights for policy 0, policy_version 390 (0.0019) [2023-02-26 20:53:55,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 1605632. Throughput: 0: 872.0. Samples: 401876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:53:55,572][00236] Avg episode reward: [(0, '14.842')] [2023-02-26 20:54:00,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1622016. Throughput: 0: 872.1. Samples: 406098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:54:00,571][00236] Avg episode reward: [(0, '14.793')] [2023-02-26 20:54:04,102][10272] Updated weights for policy 0, policy_version 400 (0.0014) [2023-02-26 20:54:05,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1642496. Throughput: 0: 896.4. Samples: 409320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:54:05,571][00236] Avg episode reward: [(0, '14.875')] [2023-02-26 20:54:10,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1662976. Throughput: 0: 899.6. Samples: 415848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:54:10,571][00236] Avg episode reward: [(0, '16.254')] [2023-02-26 20:54:10,584][10259] Saving new best policy, reward=16.254! [2023-02-26 20:54:15,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3443.4). Total num frames: 1675264. Throughput: 0: 865.7. Samples: 420238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:54:15,573][00236] Avg episode reward: [(0, '16.083')] [2023-02-26 20:54:15,599][10272] Updated weights for policy 0, policy_version 410 (0.0016) [2023-02-26 20:54:20,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1691648. Throughput: 0: 866.3. Samples: 422396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:54:20,575][00236] Avg episode reward: [(0, '16.784')] [2023-02-26 20:54:20,588][10259] Saving new best policy, reward=16.784! [2023-02-26 20:54:25,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1712128. Throughput: 0: 887.3. Samples: 427646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:54:25,571][00236] Avg episode reward: [(0, '17.262')] [2023-02-26 20:54:25,576][10259] Saving new best policy, reward=17.262! [2023-02-26 20:54:27,123][10272] Updated weights for policy 0, policy_version 420 (0.0024) [2023-02-26 20:54:30,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 1732608. Throughput: 0: 888.0. Samples: 434132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:54:30,574][00236] Avg episode reward: [(0, '18.238')] [2023-02-26 20:54:30,585][10259] Saving new best policy, reward=18.238! [2023-02-26 20:54:35,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 1748992. Throughput: 0: 867.2. Samples: 436426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:54:35,573][00236] Avg episode reward: [(0, '17.820')] [2023-02-26 20:54:39,579][10272] Updated weights for policy 0, policy_version 430 (0.0021) [2023-02-26 20:54:40,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1761280. Throughput: 0: 863.2. Samples: 440720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:54:40,571][00236] Avg episode reward: [(0, '18.479')] [2023-02-26 20:54:40,579][10259] Saving new best policy, reward=18.479! [2023-02-26 20:54:45,568][00236] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 1773568. Throughput: 0: 858.4. Samples: 444728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:54:45,571][00236] Avg episode reward: [(0, '18.144')] [2023-02-26 20:54:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 1789952. Throughput: 0: 832.6. Samples: 446786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:54:50,571][00236] Avg episode reward: [(0, '18.691')] [2023-02-26 20:54:50,588][10259] Saving new best policy, reward=18.691! [2023-02-26 20:54:54,708][10272] Updated weights for policy 0, policy_version 440 (0.0015) [2023-02-26 20:54:55,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3429.5). Total num frames: 1802240. Throughput: 0: 777.8. Samples: 450848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:54:55,575][00236] Avg episode reward: [(0, '18.166')] [2023-02-26 20:55:00,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3429.5). Total num frames: 1818624. Throughput: 0: 774.6. Samples: 455096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:55:00,575][00236] Avg episode reward: [(0, '17.571')] [2023-02-26 20:55:05,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3429.5). Total num frames: 1835008. Throughput: 0: 780.4. Samples: 457514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:55:05,571][00236] Avg episode reward: [(0, '16.932')] [2023-02-26 20:55:06,788][10272] Updated weights for policy 0, policy_version 450 (0.0022) [2023-02-26 20:55:10,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3443.4). Total num frames: 1855488. Throughput: 0: 809.8. Samples: 464086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:55:10,571][00236] Avg episode reward: [(0, '16.848')] [2023-02-26 20:55:15,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3457.3). Total num frames: 1871872. Throughput: 0: 787.2. Samples: 469556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:55:15,570][00236] Avg episode reward: [(0, '16.292')] [2023-02-26 20:55:18,362][10272] Updated weights for policy 0, policy_version 460 (0.0022) [2023-02-26 20:55:20,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3471.2). Total num frames: 1888256. Throughput: 0: 783.1. Samples: 471666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:55:20,571][00236] Avg episode reward: [(0, '16.221')] [2023-02-26 20:55:20,586][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000461_1888256.pth... [2023-02-26 20:55:20,791][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000258_1056768.pth [2023-02-26 20:55:25,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 1904640. Throughput: 0: 781.6. Samples: 475894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:55:25,570][00236] Avg episode reward: [(0, '16.321')] [2023-02-26 20:55:29,729][10272] Updated weights for policy 0, policy_version 470 (0.0027) [2023-02-26 20:55:30,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 1925120. Throughput: 0: 838.3. Samples: 482452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:55:30,571][00236] Avg episode reward: [(0, '17.754')] [2023-02-26 20:55:35,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 1945600. Throughput: 0: 866.0. Samples: 485756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:55:35,578][00236] Avg episode reward: [(0, '17.579')] [2023-02-26 20:55:40,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3471.2). Total num frames: 1957888. Throughput: 0: 872.1. Samples: 490094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:55:40,576][00236] Avg episode reward: [(0, '17.438')] [2023-02-26 20:55:42,097][10272] Updated weights for policy 0, policy_version 480 (0.0025) [2023-02-26 20:55:45,568][00236] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 1974272. Throughput: 0: 882.2. Samples: 494794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:55:45,571][00236] Avg episode reward: [(0, '17.951')] [2023-02-26 20:55:50,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1998848. Throughput: 0: 904.1. Samples: 498200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:55:50,574][00236] Avg episode reward: [(0, '18.586')] [2023-02-26 20:55:52,359][10272] Updated weights for policy 0, policy_version 490 (0.0024) [2023-02-26 20:55:55,568][00236] Fps is (10 sec: 4505.4, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2019328. Throughput: 0: 903.1. Samples: 504724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:55:55,571][00236] Avg episode reward: [(0, '19.415')] [2023-02-26 20:55:55,574][10259] Saving new best policy, reward=19.415! [2023-02-26 20:56:00,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2031616. Throughput: 0: 872.4. Samples: 508814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:56:00,576][00236] Avg episode reward: [(0, '19.542')] [2023-02-26 20:56:00,590][10259] Saving new best policy, reward=19.542! [2023-02-26 20:56:05,350][10272] Updated weights for policy 0, policy_version 500 (0.0012) [2023-02-26 20:56:05,568][00236] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2048000. Throughput: 0: 870.2. Samples: 510824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:56:05,571][00236] Avg episode reward: [(0, '20.442')] [2023-02-26 20:56:05,578][10259] Saving new best policy, reward=20.442! [2023-02-26 20:56:10,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2068480. Throughput: 0: 911.1. Samples: 516892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:56:10,575][00236] Avg episode reward: [(0, '21.135')] [2023-02-26 20:56:10,585][10259] Saving new best policy, reward=21.135! [2023-02-26 20:56:14,709][10272] Updated weights for policy 0, policy_version 510 (0.0014) [2023-02-26 20:56:15,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2088960. Throughput: 0: 910.0. Samples: 523400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:56:15,570][00236] Avg episode reward: [(0, '20.183')] [2023-02-26 20:56:20,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2105344. Throughput: 0: 884.3. Samples: 525550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:56:20,575][00236] Avg episode reward: [(0, '19.605')] [2023-02-26 20:56:25,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2117632. Throughput: 0: 880.4. Samples: 529710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:56:25,571][00236] Avg episode reward: [(0, '19.513')] [2023-02-26 20:56:27,900][10272] Updated weights for policy 0, policy_version 520 (0.0028) [2023-02-26 20:56:30,570][00236] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 2138112. Throughput: 0: 915.0. Samples: 535970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:56:30,572][00236] Avg episode reward: [(0, '18.419')] [2023-02-26 20:56:35,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3499.0). Total num frames: 2162688. Throughput: 0: 913.1. Samples: 539288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 20:56:35,573][00236] Avg episode reward: [(0, '17.554')] [2023-02-26 20:56:38,070][10272] Updated weights for policy 0, policy_version 530 (0.0044) [2023-02-26 20:56:40,568][00236] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2174976. Throughput: 0: 880.5. Samples: 544346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:56:40,579][00236] Avg episode reward: [(0, '17.954')] [2023-02-26 20:56:45,571][00236] Fps is (10 sec: 2866.4, 60 sec: 3618.0, 300 sec: 3485.0). Total num frames: 2191360. Throughput: 0: 883.9. Samples: 548592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 20:56:45,575][00236] Avg episode reward: [(0, '16.880')] [2023-02-26 20:56:50,229][10272] Updated weights for policy 0, policy_version 540 (0.0012) [2023-02-26 20:56:50,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2211840. Throughput: 0: 907.0. Samples: 551638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:56:50,574][00236] Avg episode reward: [(0, '16.164')] [2023-02-26 20:56:55,568][00236] Fps is (10 sec: 4097.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2232320. Throughput: 0: 918.9. Samples: 558242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:56:55,573][00236] Avg episode reward: [(0, '16.366')] [2023-02-26 20:57:00,570][00236] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3485.1). Total num frames: 2248704. Throughput: 0: 886.0. Samples: 563270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:57:00,576][00236] Avg episode reward: [(0, '17.449')] [2023-02-26 20:57:01,071][10272] Updated weights for policy 0, policy_version 550 (0.0018) [2023-02-26 20:57:05,568][00236] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2265088. Throughput: 0: 883.2. Samples: 565294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 20:57:05,578][00236] Avg episode reward: [(0, '18.093')] [2023-02-26 20:57:10,568][00236] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2281472. Throughput: 0: 911.2. Samples: 570712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:57:10,574][00236] Avg episode reward: [(0, '18.939')] [2023-02-26 20:57:12,526][10272] Updated weights for policy 0, policy_version 560 (0.0017) [2023-02-26 20:57:15,568][00236] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2306048. Throughput: 0: 917.9. Samples: 577276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:57:15,571][00236] Avg episode reward: [(0, '18.814')] [2023-02-26 20:57:20,570][00236] Fps is (10 sec: 4095.1, 60 sec: 3618.0, 300 sec: 3498.9). Total num frames: 2322432. Throughput: 0: 904.3. Samples: 579982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:57:20,572][00236] Avg episode reward: [(0, '19.080')] [2023-02-26 20:57:20,595][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000567_2322432.pth... [2023-02-26 20:57:20,742][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000362_1482752.pth [2023-02-26 20:57:24,986][10272] Updated weights for policy 0, policy_version 570 (0.0014) [2023-02-26 20:57:25,569][00236] Fps is (10 sec: 2866.8, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2334720. Throughput: 0: 881.1. Samples: 583998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:57:25,579][00236] Avg episode reward: [(0, '17.924')] [2023-02-26 20:57:30,568][00236] Fps is (10 sec: 3277.5, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 2355200. Throughput: 0: 908.3. Samples: 589464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 20:57:30,571][00236] Avg episode reward: [(0, '16.963')] [2023-02-26 20:57:34,729][10272] Updated weights for policy 0, policy_version 580 (0.0022) [2023-02-26 20:57:35,568][00236] Fps is (10 sec: 4506.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2379776. Throughput: 0: 917.5. Samples: 592924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:57:35,576][00236] Avg episode reward: [(0, '16.907')] [2023-02-26 20:57:40,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2396160. Throughput: 0: 901.2. Samples: 598798. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:57:40,570][00236] Avg episode reward: [(0, '18.078')] [2023-02-26 20:57:45,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3485.1). Total num frames: 2408448. Throughput: 0: 883.5. Samples: 603028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:57:45,570][00236] Avg episode reward: [(0, '17.919')] [2023-02-26 20:57:47,690][10272] Updated weights for policy 0, policy_version 590 (0.0025) [2023-02-26 20:57:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2424832. Throughput: 0: 886.7. Samples: 605196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:57:50,574][00236] Avg episode reward: [(0, '19.131')] [2023-02-26 20:57:55,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2449408. Throughput: 0: 910.3. Samples: 611674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:57:55,575][00236] Avg episode reward: [(0, '19.690')] [2023-02-26 20:57:57,251][10272] Updated weights for policy 0, policy_version 600 (0.0016) [2023-02-26 20:58:00,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3499.0). Total num frames: 2465792. Throughput: 0: 891.0. Samples: 617370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:58:00,576][00236] Avg episode reward: [(0, '21.125')] [2023-02-26 20:58:05,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2478080. Throughput: 0: 878.4. Samples: 619506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:58:05,572][00236] Avg episode reward: [(0, '21.184')] [2023-02-26 20:58:05,582][10259] Saving new best policy, reward=21.184! [2023-02-26 20:58:10,355][10272] Updated weights for policy 0, policy_version 610 (0.0030) [2023-02-26 20:58:10,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2498560. Throughput: 0: 890.0. Samples: 624048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:58:10,570][00236] Avg episode reward: [(0, '20.831')] [2023-02-26 20:58:15,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2519040. Throughput: 0: 915.6. Samples: 630664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:15,571][00236] Avg episode reward: [(0, '20.895')] [2023-02-26 20:58:20,463][10272] Updated weights for policy 0, policy_version 620 (0.0035) [2023-02-26 20:58:20,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 2539520. Throughput: 0: 914.2. Samples: 634062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:20,573][00236] Avg episode reward: [(0, '20.981')] [2023-02-26 20:58:25,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3485.1). Total num frames: 2551808. Throughput: 0: 874.8. Samples: 638162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:25,571][00236] Avg episode reward: [(0, '20.620')] [2023-02-26 20:58:30,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2568192. Throughput: 0: 882.8. Samples: 642756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 20:58:30,571][00236] Avg episode reward: [(0, '21.772')] [2023-02-26 20:58:30,581][10259] Saving new best policy, reward=21.772! [2023-02-26 20:58:33,094][10272] Updated weights for policy 0, policy_version 630 (0.0020) [2023-02-26 20:58:35,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2588672. Throughput: 0: 908.2. Samples: 646066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:58:35,571][00236] Avg episode reward: [(0, '21.301')] [2023-02-26 20:58:40,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2609152. Throughput: 0: 910.5. Samples: 652646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:40,572][00236] Avg episode reward: [(0, '21.636')] [2023-02-26 20:58:44,100][10272] Updated weights for policy 0, policy_version 640 (0.0022) [2023-02-26 20:58:45,568][00236] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2625536. Throughput: 0: 878.0. Samples: 656880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:58:45,575][00236] Avg episode reward: [(0, '21.544')] [2023-02-26 20:58:50,573][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2637824. Throughput: 0: 877.0. Samples: 658970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:50,576][00236] Avg episode reward: [(0, '21.487')] [2023-02-26 20:58:55,329][10272] Updated weights for policy 0, policy_version 650 (0.0013) [2023-02-26 20:58:55,568][00236] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2662400. Throughput: 0: 913.2. Samples: 665144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:58:55,570][00236] Avg episode reward: [(0, '19.797')] [2023-02-26 20:59:00,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2682880. Throughput: 0: 909.2. Samples: 671578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:00,575][00236] Avg episode reward: [(0, '20.143')] [2023-02-26 20:59:05,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2695168. Throughput: 0: 880.8. Samples: 673700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:05,577][00236] Avg episode reward: [(0, '19.841')] [2023-02-26 20:59:07,302][10272] Updated weights for policy 0, policy_version 660 (0.0026) [2023-02-26 20:59:10,568][00236] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 2711552. Throughput: 0: 881.9. Samples: 677846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:10,571][00236] Avg episode reward: [(0, '20.013')] [2023-02-26 20:59:15,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 2723840. Throughput: 0: 869.3. Samples: 681876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:15,576][00236] Avg episode reward: [(0, '20.510')] [2023-02-26 20:59:20,568][00236] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 2740224. Throughput: 0: 843.2. Samples: 684008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:20,574][00236] Avg episode reward: [(0, '20.477')] [2023-02-26 20:59:20,587][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth... [2023-02-26 20:59:20,725][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000461_1888256.pth [2023-02-26 20:59:21,760][10272] Updated weights for policy 0, policy_version 670 (0.0030) [2023-02-26 20:59:25,575][00236] Fps is (10 sec: 2865.2, 60 sec: 3344.7, 300 sec: 3457.2). Total num frames: 2752512. Throughput: 0: 798.2. Samples: 688572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 20:59:25,585][00236] Avg episode reward: [(0, '20.754')] [2023-02-26 20:59:30,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 2768896. Throughput: 0: 796.7. Samples: 692732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 20:59:30,572][00236] Avg episode reward: [(0, '21.243')] [2023-02-26 20:59:34,299][10272] Updated weights for policy 0, policy_version 680 (0.0032) [2023-02-26 20:59:35,568][00236] Fps is (10 sec: 3689.0, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 2789376. Throughput: 0: 816.3. Samples: 695704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 20:59:35,570][00236] Avg episode reward: [(0, '21.240')] [2023-02-26 20:59:40,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2809856. Throughput: 0: 826.8. Samples: 702348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:40,570][00236] Avg episode reward: [(0, '22.431')] [2023-02-26 20:59:40,585][10259] Saving new best policy, reward=22.431! [2023-02-26 20:59:45,167][10272] Updated weights for policy 0, policy_version 690 (0.0024) [2023-02-26 20:59:45,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2826240. Throughput: 0: 794.0. Samples: 707310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:45,575][00236] Avg episode reward: [(0, '23.375')] [2023-02-26 20:59:45,579][10259] Saving new best policy, reward=23.375! [2023-02-26 20:59:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2838528. Throughput: 0: 792.6. Samples: 709366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:50,576][00236] Avg episode reward: [(0, '23.964')] [2023-02-26 20:59:50,588][10259] Saving new best policy, reward=23.964! [2023-02-26 20:59:55,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3526.7). Total num frames: 2859008. Throughput: 0: 810.5. Samples: 714318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 20:59:55,570][00236] Avg episode reward: [(0, '24.255')] [2023-02-26 20:59:55,573][10259] Saving new best policy, reward=24.255! [2023-02-26 20:59:57,166][10272] Updated weights for policy 0, policy_version 700 (0.0028) [2023-02-26 21:00:00,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3540.6). Total num frames: 2879488. Throughput: 0: 864.0. Samples: 720756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:00:00,576][00236] Avg episode reward: [(0, '24.028')] [2023-02-26 21:00:05,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 2895872. Throughput: 0: 879.8. Samples: 723600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:00:05,575][00236] Avg episode reward: [(0, '22.183')] [2023-02-26 21:00:08,812][10272] Updated weights for policy 0, policy_version 710 (0.0026) [2023-02-26 21:00:10,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 2912256. Throughput: 0: 874.0. Samples: 727898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:00:10,574][00236] Avg episode reward: [(0, '21.276')] [2023-02-26 21:00:15,569][00236] Fps is (10 sec: 3276.5, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2928640. Throughput: 0: 900.7. Samples: 733264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:00:15,576][00236] Avg episode reward: [(0, '20.707')] [2023-02-26 21:00:19,400][10272] Updated weights for policy 0, policy_version 720 (0.0014) [2023-02-26 21:00:20,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2953216. Throughput: 0: 908.7. Samples: 736594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:00:20,577][00236] Avg episode reward: [(0, '20.667')] [2023-02-26 21:00:25,569][00236] Fps is (10 sec: 4095.7, 60 sec: 3618.5, 300 sec: 3540.6). Total num frames: 2969600. Throughput: 0: 893.2. Samples: 742542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 21:00:25,577][00236] Avg episode reward: [(0, '20.574')] [2023-02-26 21:00:30,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2981888. Throughput: 0: 874.0. Samples: 746642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:00:30,570][00236] Avg episode reward: [(0, '20.566')] [2023-02-26 21:00:32,536][10272] Updated weights for policy 0, policy_version 730 (0.0025) [2023-02-26 21:00:35,568][00236] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3002368. Throughput: 0: 876.0. Samples: 748788. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:00:35,570][00236] Avg episode reward: [(0, '20.303')] [2023-02-26 21:00:40,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3022848. Throughput: 0: 912.9. Samples: 755398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:00:40,573][00236] Avg episode reward: [(0, '21.720')] [2023-02-26 21:00:42,080][10272] Updated weights for policy 0, policy_version 740 (0.0022) [2023-02-26 21:00:45,568][00236] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3039232. Throughput: 0: 900.4. Samples: 761274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:00:45,573][00236] Avg episode reward: [(0, '21.365')] [2023-02-26 21:00:50,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3055616. Throughput: 0: 884.3. Samples: 763392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:00:50,574][00236] Avg episode reward: [(0, '22.502')] [2023-02-26 21:00:55,184][10272] Updated weights for policy 0, policy_version 750 (0.0034) [2023-02-26 21:00:55,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3072000. Throughput: 0: 884.3. Samples: 767692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:00:55,576][00236] Avg episode reward: [(0, '22.933')] [2023-02-26 21:01:00,568][00236] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 3092480. Throughput: 0: 912.4. Samples: 774320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:00,577][00236] Avg episode reward: [(0, '23.641')] [2023-02-26 21:01:04,854][10272] Updated weights for policy 0, policy_version 760 (0.0018) [2023-02-26 21:01:05,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3112960. Throughput: 0: 908.8. Samples: 777490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:05,571][00236] Avg episode reward: [(0, '23.970')] [2023-02-26 21:01:10,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3125248. Throughput: 0: 876.1. Samples: 781966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:01:10,570][00236] Avg episode reward: [(0, '23.616')] [2023-02-26 21:01:15,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3141632. Throughput: 0: 886.8. Samples: 786550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:01:15,574][00236] Avg episode reward: [(0, '23.508')] [2023-02-26 21:01:17,612][10272] Updated weights for policy 0, policy_version 770 (0.0030) [2023-02-26 21:01:20,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3166208. Throughput: 0: 912.9. Samples: 789868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:01:20,575][00236] Avg episode reward: [(0, '22.951')] [2023-02-26 21:01:20,588][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth... [2023-02-26 21:01:20,715][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000567_2322432.pth [2023-02-26 21:01:25,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3186688. Throughput: 0: 910.8. Samples: 796386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:25,572][00236] Avg episode reward: [(0, '23.117')] [2023-02-26 21:01:28,250][10272] Updated weights for policy 0, policy_version 780 (0.0013) [2023-02-26 21:01:30,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3198976. Throughput: 0: 879.9. Samples: 800870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:30,570][00236] Avg episode reward: [(0, '24.467')] [2023-02-26 21:01:30,595][10259] Saving new best policy, reward=24.467! [2023-02-26 21:01:35,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3215360. Throughput: 0: 875.4. Samples: 802784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 21:01:35,570][00236] Avg episode reward: [(0, '24.516')] [2023-02-26 21:01:35,572][10259] Saving new best policy, reward=24.516! [2023-02-26 21:01:40,179][10272] Updated weights for policy 0, policy_version 790 (0.0016) [2023-02-26 21:01:40,568][00236] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3235840. Throughput: 0: 908.7. Samples: 808584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 21:01:40,571][00236] Avg episode reward: [(0, '25.019')] [2023-02-26 21:01:40,579][10259] Saving new best policy, reward=25.019! [2023-02-26 21:01:45,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3256320. Throughput: 0: 905.4. Samples: 815064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 21:01:45,572][00236] Avg episode reward: [(0, '26.147')] [2023-02-26 21:01:45,577][10259] Saving new best policy, reward=26.147! [2023-02-26 21:01:50,568][00236] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3272704. Throughput: 0: 883.9. Samples: 817268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:50,578][00236] Avg episode reward: [(0, '26.225')] [2023-02-26 21:01:50,598][10259] Saving new best policy, reward=26.225! [2023-02-26 21:01:52,170][10272] Updated weights for policy 0, policy_version 800 (0.0015) [2023-02-26 21:01:55,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 3284992. Throughput: 0: 876.4. Samples: 821404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:01:55,570][00236] Avg episode reward: [(0, '25.766')] [2023-02-26 21:02:00,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3305472. Throughput: 0: 905.9. Samples: 827316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 21:02:00,571][00236] Avg episode reward: [(0, '25.408')] [2023-02-26 21:02:02,798][10272] Updated weights for policy 0, policy_version 810 (0.0023) [2023-02-26 21:02:05,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3325952. Throughput: 0: 906.5. Samples: 830660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:05,570][00236] Avg episode reward: [(0, '25.664')] [2023-02-26 21:02:10,568][00236] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3342336. Throughput: 0: 882.1. Samples: 836082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:10,571][00236] Avg episode reward: [(0, '26.012')] [2023-02-26 21:02:15,290][10272] Updated weights for policy 0, policy_version 820 (0.0036) [2023-02-26 21:02:15,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.9). Total num frames: 3358720. Throughput: 0: 877.2. Samples: 840344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 21:02:15,577][00236] Avg episode reward: [(0, '25.434')] [2023-02-26 21:02:20,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 3375104. Throughput: 0: 890.5. Samples: 842856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:20,570][00236] Avg episode reward: [(0, '26.090')] [2023-02-26 21:02:25,264][10272] Updated weights for policy 0, policy_version 830 (0.0020) [2023-02-26 21:02:25,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3399680. Throughput: 0: 910.9. Samples: 849576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:25,570][00236] Avg episode reward: [(0, '25.748')] [2023-02-26 21:02:30,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3416064. Throughput: 0: 885.3. Samples: 854904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:02:30,570][00236] Avg episode reward: [(0, '26.147')] [2023-02-26 21:02:35,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 3428352. Throughput: 0: 882.0. Samples: 856956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:02:35,570][00236] Avg episode reward: [(0, '25.682')] [2023-02-26 21:02:38,287][10272] Updated weights for policy 0, policy_version 840 (0.0031) [2023-02-26 21:02:40,568][00236] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3448832. Throughput: 0: 900.4. Samples: 861922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:02:40,575][00236] Avg episode reward: [(0, '25.420')] [2023-02-26 21:02:45,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3469312. Throughput: 0: 917.4. Samples: 868600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:45,575][00236] Avg episode reward: [(0, '24.892')] [2023-02-26 21:02:47,567][10272] Updated weights for policy 0, policy_version 850 (0.0012) [2023-02-26 21:02:50,569][00236] Fps is (10 sec: 4095.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3489792. Throughput: 0: 916.0. Samples: 871880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:02:50,578][00236] Avg episode reward: [(0, '23.832')] [2023-02-26 21:02:55,568][00236] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3502080. Throughput: 0: 889.1. Samples: 876090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:02:55,577][00236] Avg episode reward: [(0, '23.607')] [2023-02-26 21:03:00,444][10272] Updated weights for policy 0, policy_version 860 (0.0026) [2023-02-26 21:03:00,568][00236] Fps is (10 sec: 3277.3, 60 sec: 3618.2, 300 sec: 3540.6). Total num frames: 3522560. Throughput: 0: 904.6. Samples: 881050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:03:00,570][00236] Avg episode reward: [(0, '25.213')] [2023-02-26 21:03:05,568][00236] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3543040. Throughput: 0: 922.1. Samples: 884352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:03:05,571][00236] Avg episode reward: [(0, '25.922')] [2023-02-26 21:03:10,070][10272] Updated weights for policy 0, policy_version 870 (0.0017) [2023-02-26 21:03:10,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3563520. Throughput: 0: 917.2. Samples: 890852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:03:10,577][00236] Avg episode reward: [(0, '24.796')] [2023-02-26 21:03:15,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 3575808. Throughput: 0: 894.3. Samples: 895148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:03:15,579][00236] Avg episode reward: [(0, '25.556')] [2023-02-26 21:03:20,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3592192. Throughput: 0: 897.3. Samples: 897336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:03:20,571][00236] Avg episode reward: [(0, '26.236')] [2023-02-26 21:03:20,646][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000878_3596288.pth... [2023-02-26 21:03:20,813][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth [2023-02-26 21:03:20,828][10259] Saving new best policy, reward=26.236! [2023-02-26 21:03:22,602][10272] Updated weights for policy 0, policy_version 880 (0.0023) [2023-02-26 21:03:25,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3616768. Throughput: 0: 926.4. Samples: 903612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:03:25,575][00236] Avg episode reward: [(0, '26.803')] [2023-02-26 21:03:25,582][10259] Saving new best policy, reward=26.803! [2023-02-26 21:03:30,570][00236] Fps is (10 sec: 4504.6, 60 sec: 3686.3, 300 sec: 3554.5). Total num frames: 3637248. Throughput: 0: 912.7. Samples: 909674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 21:03:30,573][00236] Avg episode reward: [(0, '24.770')] [2023-02-26 21:03:33,363][10272] Updated weights for policy 0, policy_version 890 (0.0013) [2023-02-26 21:03:35,570][00236] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3526.7). Total num frames: 3649536. Throughput: 0: 886.3. Samples: 911766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:03:35,575][00236] Avg episode reward: [(0, '25.387')] [2023-02-26 21:03:40,568][00236] Fps is (10 sec: 2458.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 3661824. Throughput: 0: 875.4. Samples: 915482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:03:40,570][00236] Avg episode reward: [(0, '25.182')] [2023-02-26 21:03:45,572][00236] Fps is (10 sec: 2457.1, 60 sec: 3413.1, 300 sec: 3512.8). Total num frames: 3674112. Throughput: 0: 851.9. Samples: 919388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:03:45,580][00236] Avg episode reward: [(0, '25.716')] [2023-02-26 21:03:48,929][10272] Updated weights for policy 0, policy_version 900 (0.0059) [2023-02-26 21:03:50,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3345.2, 300 sec: 3485.1). Total num frames: 3690496. Throughput: 0: 827.3. Samples: 921580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:03:50,576][00236] Avg episode reward: [(0, '24.508')] [2023-02-26 21:03:55,568][00236] Fps is (10 sec: 3278.1, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3706880. Throughput: 0: 801.6. Samples: 926922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:03:55,575][00236] Avg episode reward: [(0, '23.821')] [2023-02-26 21:04:00,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3723264. Throughput: 0: 800.0. Samples: 931148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:04:00,572][00236] Avg episode reward: [(0, '23.844')] [2023-02-26 21:04:01,929][10272] Updated weights for policy 0, policy_version 910 (0.0022) [2023-02-26 21:04:05,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 3739648. Throughput: 0: 812.0. Samples: 933878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:04:05,570][00236] Avg episode reward: [(0, '25.236')] [2023-02-26 21:04:10,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 3764224. Throughput: 0: 821.7. Samples: 940590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 21:04:10,573][00236] Avg episode reward: [(0, '26.368')] [2023-02-26 21:04:11,064][10272] Updated weights for policy 0, policy_version 920 (0.0012) [2023-02-26 21:04:15,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3780608. Throughput: 0: 806.2. Samples: 945952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 21:04:15,572][00236] Avg episode reward: [(0, '26.487')] [2023-02-26 21:04:20,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3526.8). Total num frames: 3792896. Throughput: 0: 807.3. Samples: 948094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:04:20,572][00236] Avg episode reward: [(0, '25.400')] [2023-02-26 21:04:24,362][10272] Updated weights for policy 0, policy_version 930 (0.0014) [2023-02-26 21:04:25,568][00236] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3540.6). Total num frames: 3813376. Throughput: 0: 832.8. Samples: 952960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:04:25,571][00236] Avg episode reward: [(0, '24.931')] [2023-02-26 21:04:30,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3276.9, 300 sec: 3540.6). Total num frames: 3833856. Throughput: 0: 895.2. Samples: 959670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:04:30,570][00236] Avg episode reward: [(0, '24.437')] [2023-02-26 21:04:33,790][10272] Updated weights for policy 0, policy_version 940 (0.0021) [2023-02-26 21:04:35,573][00236] Fps is (10 sec: 4093.8, 60 sec: 3413.1, 300 sec: 3540.5). Total num frames: 3854336. Throughput: 0: 916.9. Samples: 962846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:04:35,576][00236] Avg episode reward: [(0, '24.293')] [2023-02-26 21:04:40,568][00236] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3866624. Throughput: 0: 893.3. Samples: 967120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:04:40,578][00236] Avg episode reward: [(0, '21.947')] [2023-02-26 21:04:45,568][00236] Fps is (10 sec: 2868.8, 60 sec: 3481.9, 300 sec: 3540.6). Total num frames: 3883008. Throughput: 0: 913.0. Samples: 972232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 21:04:45,571][00236] Avg episode reward: [(0, '22.482')] [2023-02-26 21:04:46,512][10272] Updated weights for policy 0, policy_version 950 (0.0039) [2023-02-26 21:04:50,568][00236] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3907584. Throughput: 0: 924.7. Samples: 975490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 21:04:50,576][00236] Avg episode reward: [(0, '23.097')] [2023-02-26 21:04:55,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 3928064. Throughput: 0: 917.8. Samples: 981890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:04:55,574][00236] Avg episode reward: [(0, '25.483')] [2023-02-26 21:04:56,743][10272] Updated weights for policy 0, policy_version 960 (0.0013) [2023-02-26 21:05:00,568][00236] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3940352. Throughput: 0: 894.5. Samples: 986206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:05:00,575][00236] Avg episode reward: [(0, '24.707')] [2023-02-26 21:05:05,568][00236] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3956736. Throughput: 0: 894.4. Samples: 988344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:05:05,570][00236] Avg episode reward: [(0, '25.414')] [2023-02-26 21:05:08,498][10272] Updated weights for policy 0, policy_version 970 (0.0030) [2023-02-26 21:05:10,568][00236] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3981312. Throughput: 0: 932.3. Samples: 994912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:05:10,571][00236] Avg episode reward: [(0, '26.548')] [2023-02-26 21:05:15,568][00236] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 4001792. Throughput: 0: 919.2. Samples: 1001032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 21:05:15,573][00236] Avg episode reward: [(0, '26.215')] [2023-02-26 21:05:16,811][10259] Stopping Batcher_0... [2023-02-26 21:05:16,818][10259] Loop batcher_evt_loop terminating... [2023-02-26 21:05:16,819][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 21:05:16,811][00236] Component Batcher_0 stopped! [2023-02-26 21:05:16,898][10272] Weights refcount: 2 0 [2023-02-26 21:05:16,905][00236] Component InferenceWorker_p0-w0 stopped! [2023-02-26 21:05:16,904][10272] Stopping InferenceWorker_p0-w0... [2023-02-26 21:05:16,921][10272] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 21:05:16,933][00236] Component RolloutWorker_w5 stopped! [2023-02-26 21:05:16,933][10283] Stopping RolloutWorker_w5... [2023-02-26 21:05:16,944][10283] Loop rollout_proc5_evt_loop terminating... [2023-02-26 21:05:16,954][10282] Stopping RolloutWorker_w4... [2023-02-26 21:05:16,954][00236] Component RolloutWorker_w4 stopped! [2023-02-26 21:05:16,959][10282] Loop rollout_proc4_evt_loop terminating... [2023-02-26 21:05:16,963][10281] Stopping RolloutWorker_w3... [2023-02-26 21:05:16,963][10281] Loop rollout_proc3_evt_loop terminating... [2023-02-26 21:05:16,966][10274] Stopping RolloutWorker_w1... [2023-02-26 21:05:16,967][10279] Stopping RolloutWorker_w0... [2023-02-26 21:05:16,967][10279] Loop rollout_proc0_evt_loop terminating... [2023-02-26 21:05:16,965][00236] Component RolloutWorker_w3 stopped! [2023-02-26 21:05:16,973][00236] Component RolloutWorker_w1 stopped! [2023-02-26 21:05:16,978][00236] Component RolloutWorker_w0 stopped! [2023-02-26 21:05:16,984][10280] Stopping RolloutWorker_w2... [2023-02-26 21:05:16,991][10280] Loop rollout_proc2_evt_loop terminating... [2023-02-26 21:05:16,967][10274] Loop rollout_proc1_evt_loop terminating... [2023-02-26 21:05:16,987][10284] Stopping RolloutWorker_w7... [2023-02-26 21:05:16,995][10284] Loop rollout_proc7_evt_loop terminating... [2023-02-26 21:05:16,983][00236] Component RolloutWorker_w2 stopped! [2023-02-26 21:05:16,998][00236] Component RolloutWorker_w7 stopped! [2023-02-26 21:05:17,023][10285] Stopping RolloutWorker_w6... [2023-02-26 21:05:17,023][00236] Component RolloutWorker_w6 stopped! [2023-02-26 21:05:17,024][10285] Loop rollout_proc6_evt_loop terminating... [2023-02-26 21:05:17,049][10259] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth [2023-02-26 21:05:17,061][10259] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 21:05:17,299][10259] Stopping LearnerWorker_p0... [2023-02-26 21:05:17,299][00236] Component LearnerWorker_p0 stopped! [2023-02-26 21:05:17,300][10259] Loop learner_proc0_evt_loop terminating... [2023-02-26 21:05:17,301][00236] Waiting for process learner_proc0 to stop... [2023-02-26 21:05:19,534][00236] Waiting for process inference_proc0-0 to join... [2023-02-26 21:05:20,318][00236] Waiting for process rollout_proc0 to join... [2023-02-26 21:05:20,643][00236] Waiting for process rollout_proc1 to join... [2023-02-26 21:05:20,899][00236] Waiting for process rollout_proc2 to join... [2023-02-26 21:05:20,900][00236] Waiting for process rollout_proc3 to join... [2023-02-26 21:05:20,902][00236] Waiting for process rollout_proc4 to join... [2023-02-26 21:05:20,904][00236] Waiting for process rollout_proc5 to join... [2023-02-26 21:05:20,905][00236] Waiting for process rollout_proc6 to join... [2023-02-26 21:05:20,906][00236] Waiting for process rollout_proc7 to join... [2023-02-26 21:05:20,907][00236] Batcher 0 profile tree view: batching: 27.7138, releasing_batches: 0.0302 [2023-02-26 21:05:20,908][00236] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 559.2642 update_model: 8.5466 weight_update: 0.0031 one_step: 0.0023 handle_policy_step: 553.1015 deserialize: 15.3453, stack: 3.1641, obs_to_device_normalize: 119.8014, forward: 270.3688, send_messages: 27.9221 prepare_outputs: 89.2529 to_cpu: 55.7611 [2023-02-26 21:05:20,911][00236] Learner 0 profile tree view: misc: 0.0061, prepare_batch: 16.0314 train: 76.3152 epoch_init: 0.0092, minibatch_init: 0.0062, losses_postprocess: 0.5837, kl_divergence: 0.5653, after_optimizer: 32.5293 calculate_losses: 27.0546 losses_init: 0.0095, forward_head: 1.7624, bptt_initial: 17.6858, tail: 1.1341, advantages_returns: 0.3593, losses: 3.4038 bptt: 2.3832 bptt_forward_core: 2.2816 update: 14.9581 clip: 1.4273 [2023-02-26 21:05:20,913][00236] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3878, enqueue_policy_requests: 154.2490, env_step: 869.9473, overhead: 23.3019, complete_rollouts: 7.5363 save_policy_outputs: 22.2372 split_output_tensors: 11.0046 [2023-02-26 21:05:20,916][00236] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3165, enqueue_policy_requests: 159.8332, env_step: 867.0020, overhead: 23.0084, complete_rollouts: 7.6313 save_policy_outputs: 21.4655 split_output_tensors: 10.5987 [2023-02-26 21:05:20,917][00236] Loop Runner_EvtLoop terminating... [2023-02-26 21:05:20,919][00236] Runner profile tree view: main_loop: 1195.2169 [2023-02-26 21:05:20,922][00236] Collected {0: 4005888}, FPS: 3351.6 [2023-02-26 21:12:45,794][00236] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 21:12:45,798][00236] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 21:12:45,799][00236] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 21:12:45,803][00236] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 21:12:45,804][00236] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 21:12:45,807][00236] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 21:12:45,810][00236] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 21:12:45,812][00236] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 21:12:45,815][00236] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 21:12:45,819][00236] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 21:12:45,820][00236] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 21:12:45,821][00236] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 21:12:45,822][00236] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 21:12:45,823][00236] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 21:12:45,825][00236] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 21:12:45,848][00236] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 21:12:45,850][00236] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 21:12:45,852][00236] RunningMeanStd input shape: (1,) [2023-02-26 21:12:45,869][00236] ConvEncoder: input_channels=3 [2023-02-26 21:12:46,536][00236] Conv encoder output size: 512 [2023-02-26 21:12:46,539][00236] Policy head output size: 512 [2023-02-26 21:12:48,857][00236] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 21:12:50,119][00236] Num frames 100... [2023-02-26 21:12:50,269][00236] Num frames 200... [2023-02-26 21:12:50,426][00236] Num frames 300... [2023-02-26 21:12:50,590][00236] Num frames 400... [2023-02-26 21:12:50,745][00236] Num frames 500... [2023-02-26 21:12:50,897][00236] Num frames 600... [2023-02-26 21:12:51,055][00236] Num frames 700... [2023-02-26 21:12:51,207][00236] Num frames 800... [2023-02-26 21:12:51,359][00236] Num frames 900... [2023-02-26 21:12:51,508][00236] Avg episode rewards: #0: 18.600, true rewards: #0: 9.600 [2023-02-26 21:12:51,513][00236] Avg episode reward: 18.600, avg true_objective: 9.600 [2023-02-26 21:12:51,588][00236] Num frames 1000... [2023-02-26 21:12:51,760][00236] Num frames 1100... [2023-02-26 21:12:51,927][00236] Num frames 1200... [2023-02-26 21:12:52,087][00236] Num frames 1300... [2023-02-26 21:12:52,247][00236] Num frames 1400... [2023-02-26 21:12:52,402][00236] Num frames 1500... [2023-02-26 21:12:52,567][00236] Num frames 1600... [2023-02-26 21:12:52,735][00236] Num frames 1700... [2023-02-26 21:12:52,905][00236] Num frames 1800... [2023-02-26 21:12:53,083][00236] Num frames 1900... [2023-02-26 21:12:53,247][00236] Num frames 2000... [2023-02-26 21:12:53,431][00236] Avg episode rewards: #0: 23.365, true rewards: #0: 10.365 [2023-02-26 21:12:53,434][00236] Avg episode reward: 23.365, avg true_objective: 10.365 [2023-02-26 21:12:53,483][00236] Num frames 2100... [2023-02-26 21:12:53,641][00236] Num frames 2200... [2023-02-26 21:12:53,789][00236] Num frames 2300... [2023-02-26 21:12:53,909][00236] Num frames 2400... [2023-02-26 21:12:54,028][00236] Num frames 2500... [2023-02-26 21:12:54,145][00236] Num frames 2600... [2023-02-26 21:12:54,260][00236] Num frames 2700... [2023-02-26 21:12:54,370][00236] Num frames 2800... [2023-02-26 21:12:54,482][00236] Num frames 2900... [2023-02-26 21:12:54,595][00236] Num frames 3000... [2023-02-26 21:12:54,720][00236] Num frames 3100... [2023-02-26 21:12:54,810][00236] Avg episode rewards: #0: 22.430, true rewards: #0: 10.430 [2023-02-26 21:12:54,814][00236] Avg episode reward: 22.430, avg true_objective: 10.430 [2023-02-26 21:12:54,904][00236] Num frames 3200... [2023-02-26 21:12:55,017][00236] Num frames 3300... [2023-02-26 21:12:55,128][00236] Num frames 3400... [2023-02-26 21:12:55,241][00236] Num frames 3500... [2023-02-26 21:12:55,358][00236] Num frames 3600... [2023-02-26 21:12:55,470][00236] Num frames 3700... [2023-02-26 21:12:55,584][00236] Num frames 3800... [2023-02-26 21:12:55,746][00236] Avg episode rewards: #0: 19.993, true rewards: #0: 9.742 [2023-02-26 21:12:55,748][00236] Avg episode reward: 19.993, avg true_objective: 9.742 [2023-02-26 21:12:55,754][00236] Num frames 3900... [2023-02-26 21:12:55,867][00236] Num frames 4000... [2023-02-26 21:12:55,978][00236] Num frames 4100... [2023-02-26 21:12:56,091][00236] Num frames 4200... [2023-02-26 21:12:56,201][00236] Num frames 4300... [2023-02-26 21:12:56,322][00236] Num frames 4400... [2023-02-26 21:12:56,435][00236] Num frames 4500... [2023-02-26 21:12:56,504][00236] Avg episode rewards: #0: 18.022, true rewards: #0: 9.022 [2023-02-26 21:12:56,508][00236] Avg episode reward: 18.022, avg true_objective: 9.022 [2023-02-26 21:12:56,614][00236] Num frames 4600... [2023-02-26 21:12:56,730][00236] Num frames 4700... [2023-02-26 21:12:56,844][00236] Num frames 4800... [2023-02-26 21:12:56,955][00236] Num frames 4900... [2023-02-26 21:12:57,081][00236] Num frames 5000... [2023-02-26 21:12:57,202][00236] Num frames 5100... [2023-02-26 21:12:57,323][00236] Num frames 5200... [2023-02-26 21:12:57,432][00236] Avg episode rewards: #0: 18.412, true rewards: #0: 8.745 [2023-02-26 21:12:57,435][00236] Avg episode reward: 18.412, avg true_objective: 8.745 [2023-02-26 21:12:57,502][00236] Num frames 5300... [2023-02-26 21:12:57,620][00236] Num frames 5400... [2023-02-26 21:12:57,731][00236] Num frames 5500... [2023-02-26 21:12:57,851][00236] Num frames 5600... [2023-02-26 21:12:57,985][00236] Avg episode rewards: #0: 16.662, true rewards: #0: 8.090 [2023-02-26 21:12:57,987][00236] Avg episode reward: 16.662, avg true_objective: 8.090 [2023-02-26 21:12:58,035][00236] Num frames 5700... [2023-02-26 21:12:58,152][00236] Num frames 5800... [2023-02-26 21:12:58,271][00236] Num frames 5900... [2023-02-26 21:12:58,361][00236] Avg episode rewards: #0: 15.290, true rewards: #0: 7.415 [2023-02-26 21:12:58,365][00236] Avg episode reward: 15.290, avg true_objective: 7.415 [2023-02-26 21:12:58,456][00236] Num frames 6000... [2023-02-26 21:12:58,575][00236] Num frames 6100... [2023-02-26 21:12:58,686][00236] Num frames 6200... [2023-02-26 21:12:58,810][00236] Num frames 6300... [2023-02-26 21:12:58,922][00236] Num frames 6400... [2023-02-26 21:12:58,998][00236] Avg episode rewards: #0: 14.569, true rewards: #0: 7.124 [2023-02-26 21:12:59,001][00236] Avg episode reward: 14.569, avg true_objective: 7.124 [2023-02-26 21:12:59,106][00236] Num frames 6500... [2023-02-26 21:12:59,228][00236] Num frames 6600... [2023-02-26 21:12:59,339][00236] Num frames 6700... [2023-02-26 21:12:59,452][00236] Num frames 6800... [2023-02-26 21:12:59,571][00236] Num frames 6900... [2023-02-26 21:12:59,686][00236] Num frames 7000... [2023-02-26 21:12:59,808][00236] Num frames 7100... [2023-02-26 21:12:59,921][00236] Num frames 7200... [2023-02-26 21:13:00,030][00236] Num frames 7300... [2023-02-26 21:13:00,132][00236] Avg episode rewards: #0: 15.340, true rewards: #0: 7.340 [2023-02-26 21:13:00,134][00236] Avg episode reward: 15.340, avg true_objective: 7.340 [2023-02-26 21:13:48,910][00236] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 21:15:08,486][00236] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 21:15:08,489][00236] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 21:15:08,491][00236] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 21:15:08,494][00236] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 21:15:08,495][00236] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 21:15:08,498][00236] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 21:15:08,500][00236] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 21:15:08,503][00236] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 21:15:08,505][00236] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 21:15:08,506][00236] Adding new argument 'hf_repository'='Convolution/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 21:15:08,507][00236] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 21:15:08,509][00236] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 21:15:08,510][00236] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 21:15:08,511][00236] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 21:15:08,513][00236] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 21:15:08,542][00236] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 21:15:08,544][00236] RunningMeanStd input shape: (1,) [2023-02-26 21:15:08,558][00236] ConvEncoder: input_channels=3 [2023-02-26 21:15:08,602][00236] Conv encoder output size: 512 [2023-02-26 21:15:08,604][00236] Policy head output size: 512 [2023-02-26 21:15:08,624][00236] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 21:15:09,059][00236] Num frames 100... [2023-02-26 21:15:09,175][00236] Num frames 200... [2023-02-26 21:15:09,303][00236] Num frames 300... [2023-02-26 21:15:09,430][00236] Num frames 400... [2023-02-26 21:15:09,546][00236] Num frames 500... [2023-02-26 21:15:09,658][00236] Num frames 600... [2023-02-26 21:15:09,781][00236] Num frames 700... [2023-02-26 21:15:09,897][00236] Num frames 800... [2023-02-26 21:15:10,011][00236] Num frames 900... [2023-02-26 21:15:10,175][00236] Avg episode rewards: #0: 22.920, true rewards: #0: 9.920 [2023-02-26 21:15:10,177][00236] Avg episode reward: 22.920, avg true_objective: 9.920 [2023-02-26 21:15:10,192][00236] Num frames 1000... [2023-02-26 21:15:10,308][00236] Num frames 1100... [2023-02-26 21:15:10,432][00236] Num frames 1200... [2023-02-26 21:15:10,542][00236] Num frames 1300... [2023-02-26 21:15:10,651][00236] Num frames 1400... [2023-02-26 21:15:10,763][00236] Num frames 1500... [2023-02-26 21:15:10,877][00236] Num frames 1600... [2023-02-26 21:15:11,085][00236] Avg episode rewards: #0: 19.980, true rewards: #0: 8.480 [2023-02-26 21:15:11,091][00236] Avg episode reward: 19.980, avg true_objective: 8.480 [2023-02-26 21:15:11,103][00236] Num frames 1700... [2023-02-26 21:15:11,303][00236] Num frames 1800... [2023-02-26 21:15:11,426][00236] Num frames 1900... [2023-02-26 21:15:11,540][00236] Num frames 2000... [2023-02-26 21:15:11,651][00236] Num frames 2100... [2023-02-26 21:15:11,769][00236] Num frames 2200... [2023-02-26 21:15:11,886][00236] Num frames 2300... [2023-02-26 21:15:11,947][00236] Avg episode rewards: #0: 16.347, true rewards: #0: 7.680 [2023-02-26 21:15:11,949][00236] Avg episode reward: 16.347, avg true_objective: 7.680 [2023-02-26 21:15:12,072][00236] Num frames 2400... [2023-02-26 21:15:12,186][00236] Num frames 2500... [2023-02-26 21:15:12,294][00236] Num frames 2600... [2023-02-26 21:15:12,411][00236] Num frames 2700... [2023-02-26 21:15:12,524][00236] Num frames 2800... [2023-02-26 21:15:12,639][00236] Num frames 2900... [2023-02-26 21:15:12,760][00236] Num frames 3000... [2023-02-26 21:15:12,879][00236] Num frames 3100... [2023-02-26 21:15:12,997][00236] Num frames 3200... [2023-02-26 21:15:13,110][00236] Num frames 3300... [2023-02-26 21:15:13,265][00236] Avg episode rewards: #0: 18.730, true rewards: #0: 8.480 [2023-02-26 21:15:13,267][00236] Avg episode reward: 18.730, avg true_objective: 8.480 [2023-02-26 21:15:13,281][00236] Num frames 3400... [2023-02-26 21:15:13,415][00236] Num frames 3500... [2023-02-26 21:15:13,533][00236] Num frames 3600... [2023-02-26 21:15:13,653][00236] Num frames 3700... [2023-02-26 21:15:13,776][00236] Num frames 3800... [2023-02-26 21:15:13,905][00236] Num frames 3900... [2023-02-26 21:15:14,017][00236] Num frames 4000... [2023-02-26 21:15:14,129][00236] Num frames 4100... [2023-02-26 21:15:14,244][00236] Num frames 4200... [2023-02-26 21:15:14,354][00236] Num frames 4300... [2023-02-26 21:15:14,478][00236] Num frames 4400... [2023-02-26 21:15:14,590][00236] Num frames 4500... [2023-02-26 21:15:14,702][00236] Num frames 4600... [2023-02-26 21:15:14,820][00236] Num frames 4700... [2023-02-26 21:15:14,881][00236] Avg episode rewards: #0: 21.608, true rewards: #0: 9.408 [2023-02-26 21:15:14,883][00236] Avg episode reward: 21.608, avg true_objective: 9.408 [2023-02-26 21:15:14,995][00236] Num frames 4800... [2023-02-26 21:15:15,122][00236] Num frames 4900... [2023-02-26 21:15:15,247][00236] Num frames 5000... [2023-02-26 21:15:15,357][00236] Num frames 5100... [2023-02-26 21:15:15,475][00236] Num frames 5200... [2023-02-26 21:15:15,616][00236] Num frames 5300... [2023-02-26 21:15:15,775][00236] Num frames 5400... [2023-02-26 21:15:15,933][00236] Num frames 5500... [2023-02-26 21:15:16,095][00236] Num frames 5600... [2023-02-26 21:15:16,263][00236] Num frames 5700... [2023-02-26 21:15:16,416][00236] Num frames 5800... [2023-02-26 21:15:16,573][00236] Num frames 5900... [2023-02-26 21:15:16,742][00236] Num frames 6000... [2023-02-26 21:15:16,912][00236] Num frames 6100... [2023-02-26 21:15:17,069][00236] Num frames 6200... [2023-02-26 21:15:17,232][00236] Num frames 6300... [2023-02-26 21:15:17,387][00236] Num frames 6400... [2023-02-26 21:15:17,560][00236] Num frames 6500... [2023-02-26 21:15:17,724][00236] Num frames 6600... [2023-02-26 21:15:17,890][00236] Num frames 6700... [2023-02-26 21:15:18,055][00236] Num frames 6800... [2023-02-26 21:15:18,119][00236] Avg episode rewards: #0: 28.007, true rewards: #0: 11.340 [2023-02-26 21:15:18,122][00236] Avg episode reward: 28.007, avg true_objective: 11.340 [2023-02-26 21:15:18,278][00236] Num frames 6900... [2023-02-26 21:15:18,444][00236] Num frames 7000... [2023-02-26 21:15:18,610][00236] Num frames 7100... [2023-02-26 21:15:18,775][00236] Num frames 7200... [2023-02-26 21:15:18,936][00236] Num frames 7300... [2023-02-26 21:15:19,100][00236] Num frames 7400... [2023-02-26 21:15:19,247][00236] Num frames 7500... [2023-02-26 21:15:19,358][00236] Num frames 7600... [2023-02-26 21:15:19,477][00236] Num frames 7700... [2023-02-26 21:15:19,596][00236] Num frames 7800... [2023-02-26 21:15:19,716][00236] Num frames 7900... [2023-02-26 21:15:19,826][00236] Num frames 8000... [2023-02-26 21:15:19,939][00236] Num frames 8100... [2023-02-26 21:15:20,052][00236] Num frames 8200... [2023-02-26 21:15:20,166][00236] Num frames 8300... [2023-02-26 21:15:20,231][00236] Avg episode rewards: #0: 29.440, true rewards: #0: 11.869 [2023-02-26 21:15:20,233][00236] Avg episode reward: 29.440, avg true_objective: 11.869 [2023-02-26 21:15:20,359][00236] Num frames 8400... [2023-02-26 21:15:20,469][00236] Num frames 8500... [2023-02-26 21:15:20,593][00236] Num frames 8600... [2023-02-26 21:15:20,709][00236] Num frames 8700... [2023-02-26 21:15:20,831][00236] Num frames 8800... [2023-02-26 21:15:20,951][00236] Num frames 8900... [2023-02-26 21:15:21,062][00236] Num frames 9000... [2023-02-26 21:15:21,201][00236] Avg episode rewards: #0: 28.470, true rewards: #0: 11.345 [2023-02-26 21:15:21,204][00236] Avg episode reward: 28.470, avg true_objective: 11.345 [2023-02-26 21:15:21,233][00236] Num frames 9100... [2023-02-26 21:15:21,358][00236] Num frames 9200... [2023-02-26 21:15:21,479][00236] Num frames 9300... [2023-02-26 21:15:21,597][00236] Num frames 9400... [2023-02-26 21:15:21,719][00236] Num frames 9500... [2023-02-26 21:15:21,831][00236] Num frames 9600... [2023-02-26 21:15:21,944][00236] Num frames 9700... [2023-02-26 21:15:22,067][00236] Num frames 9800... [2023-02-26 21:15:22,181][00236] Num frames 9900... [2023-02-26 21:15:22,293][00236] Num frames 10000... [2023-02-26 21:15:22,407][00236] Num frames 10100... [2023-02-26 21:15:22,524][00236] Num frames 10200... [2023-02-26 21:15:22,642][00236] Num frames 10300... [2023-02-26 21:15:22,741][00236] Avg episode rewards: #0: 28.375, true rewards: #0: 11.487 [2023-02-26 21:15:22,743][00236] Avg episode reward: 28.375, avg true_objective: 11.487 [2023-02-26 21:15:22,827][00236] Num frames 10400... [2023-02-26 21:15:22,950][00236] Num frames 10500... [2023-02-26 21:15:23,063][00236] Num frames 10600... [2023-02-26 21:15:23,177][00236] Num frames 10700... [2023-02-26 21:15:23,289][00236] Num frames 10800... [2023-02-26 21:15:23,405][00236] Num frames 10900... [2023-02-26 21:15:23,518][00236] Num frames 11000... [2023-02-26 21:15:23,656][00236] Avg episode rewards: #0: 27.174, true rewards: #0: 11.074 [2023-02-26 21:15:23,658][00236] Avg episode reward: 27.174, avg true_objective: 11.074 [2023-02-26 21:16:35,114][00236] Replay video saved to /content/train_dir/default_experiment/replay.mp4!