[2023-02-22 14:47:38,167][08504] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 14:47:38,171][08504] Rollout worker 0 uses device cpu [2023-02-22 14:47:38,173][08504] Rollout worker 1 uses device cpu [2023-02-22 14:47:38,179][08504] Rollout worker 2 uses device cpu [2023-02-22 14:47:38,180][08504] Rollout worker 3 uses device cpu [2023-02-22 14:47:38,182][08504] Rollout worker 4 uses device cpu [2023-02-22 14:47:38,184][08504] Rollout worker 5 uses device cpu [2023-02-22 14:47:38,185][08504] Rollout worker 6 uses device cpu [2023-02-22 14:47:38,186][08504] Rollout worker 7 uses device cpu [2023-02-22 14:47:38,356][08504] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 14:47:38,357][08504] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 14:47:38,390][08504] Starting all processes... [2023-02-22 14:47:38,391][08504] Starting process learner_proc0 [2023-02-22 14:47:38,445][08504] Starting all processes... [2023-02-22 14:47:38,459][08504] Starting process inference_proc0-0 [2023-02-22 14:47:38,460][08504] Starting process rollout_proc0 [2023-02-22 14:47:38,460][08504] Starting process rollout_proc1 [2023-02-22 14:47:38,460][08504] Starting process rollout_proc2 [2023-02-22 14:47:38,460][08504] Starting process rollout_proc3 [2023-02-22 14:47:38,460][08504] Starting process rollout_proc4 [2023-02-22 14:47:38,481][08504] Starting process rollout_proc5 [2023-02-22 14:47:38,483][08504] Starting process rollout_proc6 [2023-02-22 14:47:38,483][08504] Starting process rollout_proc7 [2023-02-22 14:47:49,695][14413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 14:47:49,703][14413] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 14:47:50,073][14427] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 14:47:50,079][14427] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 14:47:50,133][14435] Worker 7 uses CPU cores [1] [2023-02-22 14:47:50,199][14432] Worker 3 uses CPU cores [1] [2023-02-22 14:47:50,206][14428] Worker 0 uses CPU cores [0] [2023-02-22 14:47:50,264][14431] Worker 4 uses CPU cores [0] [2023-02-22 14:47:50,307][14430] Worker 2 uses CPU cores [0] [2023-02-22 14:47:50,341][14429] Worker 1 uses CPU cores [1] [2023-02-22 14:47:50,431][14434] Worker 6 uses CPU cores [0] [2023-02-22 14:47:50,480][14433] Worker 5 uses CPU cores [1] [2023-02-22 14:47:50,792][14427] Num visible devices: 1 [2023-02-22 14:47:50,792][14413] Num visible devices: 1 [2023-02-22 14:47:50,808][14413] Starting seed is not provided [2023-02-22 14:47:50,808][14413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 14:47:50,808][14413] Initializing actor-critic model on device cuda:0 [2023-02-22 14:47:50,809][14413] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 14:47:50,810][14413] RunningMeanStd input shape: (1,) [2023-02-22 14:47:50,823][14413] ConvEncoder: input_channels=3 [2023-02-22 14:47:51,089][14413] Conv encoder output size: 512 [2023-02-22 14:47:51,089][14413] Policy head output size: 512 [2023-02-22 14:47:51,138][14413] Created Actor Critic model with architecture: [2023-02-22 14:47:51,139][14413] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 14:47:57,398][14413] Using optimizer [2023-02-22 14:47:57,399][14413] No checkpoints found [2023-02-22 14:47:57,399][14413] Did not load from checkpoint, starting from scratch! [2023-02-22 14:47:57,400][14413] Initialized policy 0 weights for model version 0 [2023-02-22 14:47:57,403][14413] LearnerWorker_p0 finished initialization! [2023-02-22 14:47:57,405][14413] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 14:47:57,613][14427] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 14:47:57,614][14427] RunningMeanStd input shape: (1,) [2023-02-22 14:47:57,626][14427] ConvEncoder: input_channels=3 [2023-02-22 14:47:57,726][14427] Conv encoder output size: 512 [2023-02-22 14:47:57,726][14427] Policy head output size: 512 [2023-02-22 14:47:58,349][08504] Heartbeat connected on Batcher_0 [2023-02-22 14:47:58,357][08504] Heartbeat connected on LearnerWorker_p0 [2023-02-22 14:47:58,366][08504] Heartbeat connected on RolloutWorker_w0 [2023-02-22 14:47:58,370][08504] Heartbeat connected on RolloutWorker_w1 [2023-02-22 14:47:58,373][08504] Heartbeat connected on RolloutWorker_w2 [2023-02-22 14:47:58,376][08504] Heartbeat connected on RolloutWorker_w3 [2023-02-22 14:47:58,381][08504] Heartbeat connected on RolloutWorker_w4 [2023-02-22 14:47:58,384][08504] Heartbeat connected on RolloutWorker_w5 [2023-02-22 14:47:58,389][08504] Heartbeat connected on RolloutWorker_w6 [2023-02-22 14:47:58,393][08504] Heartbeat connected on RolloutWorker_w7 [2023-02-22 14:47:59,384][08504] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 14:48:00,190][08504] Inference worker 0-0 is ready! [2023-02-22 14:48:00,195][08504] All inference workers are ready! Signal rollout workers to start! [2023-02-22 14:48:00,212][08504] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 14:48:00,328][14432] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,363][14435] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,359][14433] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,377][14429] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,410][14431] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,407][14430] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,415][14428] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:00,419][14434] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:48:01,746][14429] Decorrelating experience for 0 frames... [2023-02-22 14:48:01,747][14432] Decorrelating experience for 0 frames... [2023-02-22 14:48:01,903][14431] Decorrelating experience for 0 frames... [2023-02-22 14:48:01,905][14434] Decorrelating experience for 0 frames... [2023-02-22 14:48:01,966][14430] Decorrelating experience for 0 frames... [2023-02-22 14:48:03,244][14431] Decorrelating experience for 32 frames... [2023-02-22 14:48:03,260][14434] Decorrelating experience for 32 frames... [2023-02-22 14:48:03,312][14432] Decorrelating experience for 32 frames... [2023-02-22 14:48:03,497][14433] Decorrelating experience for 0 frames... [2023-02-22 14:48:03,501][14429] Decorrelating experience for 32 frames... [2023-02-22 14:48:04,387][08504] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 14:48:04,626][14428] Decorrelating experience for 0 frames... [2023-02-22 14:48:04,671][14431] Decorrelating experience for 64 frames... [2023-02-22 14:48:04,848][14433] Decorrelating experience for 32 frames... [2023-02-22 14:48:04,855][14435] Decorrelating experience for 0 frames... [2023-02-22 14:48:04,910][14432] Decorrelating experience for 64 frames... [2023-02-22 14:48:05,872][14428] Decorrelating experience for 32 frames... [2023-02-22 14:48:05,999][14430] Decorrelating experience for 32 frames... [2023-02-22 14:48:06,005][14434] Decorrelating experience for 64 frames... [2023-02-22 14:48:06,668][14429] Decorrelating experience for 64 frames... [2023-02-22 14:48:06,770][14432] Decorrelating experience for 96 frames... [2023-02-22 14:48:06,842][14433] Decorrelating experience for 64 frames... [2023-02-22 14:48:06,906][14428] Decorrelating experience for 64 frames... [2023-02-22 14:48:06,920][14434] Decorrelating experience for 96 frames... [2023-02-22 14:48:07,384][14435] Decorrelating experience for 32 frames... [2023-02-22 14:48:07,890][14429] Decorrelating experience for 96 frames... [2023-02-22 14:48:07,970][14433] Decorrelating experience for 96 frames... [2023-02-22 14:48:08,250][14430] Decorrelating experience for 64 frames... [2023-02-22 14:48:08,429][14428] Decorrelating experience for 96 frames... [2023-02-22 14:48:08,767][14431] Decorrelating experience for 96 frames... [2023-02-22 14:48:08,876][14435] Decorrelating experience for 64 frames... [2023-02-22 14:48:09,082][14430] Decorrelating experience for 96 frames... [2023-02-22 14:48:09,383][08504] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 14:48:09,422][14435] Decorrelating experience for 96 frames... [2023-02-22 14:48:13,071][14413] Signal inference workers to stop experience collection... [2023-02-22 14:48:13,098][14427] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 14:48:14,383][08504] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 122.8. Samples: 1842. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 14:48:14,386][08504] Avg episode reward: [(0, '1.996')] [2023-02-22 14:48:15,555][14413] Signal inference workers to resume experience collection... [2023-02-22 14:48:15,557][14427] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 14:48:19,383][08504] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 188.1. Samples: 3762. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-22 14:48:19,387][08504] Avg episode reward: [(0, '3.169')] [2023-02-22 14:48:24,384][08504] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 325.0. Samples: 8124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:48:24,388][08504] Avg episode reward: [(0, '3.664')] [2023-02-22 14:48:26,827][14427] Updated weights for policy 0, policy_version 10 (0.0016) [2023-02-22 14:48:29,383][08504] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 366.0. Samples: 10980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-22 14:48:29,390][08504] Avg episode reward: [(0, '4.220')] [2023-02-22 14:48:34,384][08504] Fps is (10 sec: 4505.6, 60 sec: 2106.6, 300 sec: 2106.6). Total num frames: 73728. Throughput: 0: 510.7. Samples: 17874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:48:34,389][08504] Avg episode reward: [(0, '4.317')] [2023-02-22 14:48:36,320][14427] Updated weights for policy 0, policy_version 20 (0.0020) [2023-02-22 14:48:39,385][08504] Fps is (10 sec: 4095.3, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 577.3. Samples: 23092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:48:39,395][08504] Avg episode reward: [(0, '4.438')] [2023-02-22 14:48:44,384][08504] Fps is (10 sec: 2867.2, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 102400. Throughput: 0: 559.9. Samples: 25194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:48:44,387][08504] Avg episode reward: [(0, '4.405')] [2023-02-22 14:48:44,403][14413] Saving new best policy, reward=4.405! [2023-02-22 14:48:48,626][14427] Updated weights for policy 0, policy_version 30 (0.0016) [2023-02-22 14:48:49,383][08504] Fps is (10 sec: 3277.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 686.2. Samples: 30878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:48:49,386][08504] Avg episode reward: [(0, '4.485')] [2023-02-22 14:48:49,395][14413] Saving new best policy, reward=4.485! [2023-02-22 14:48:54,386][08504] Fps is (10 sec: 4913.9, 60 sec: 2755.4, 300 sec: 2755.4). Total num frames: 151552. Throughput: 0: 848.8. Samples: 38198. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:48:54,392][08504] Avg episode reward: [(0, '4.475')] [2023-02-22 14:48:57,690][14427] Updated weights for policy 0, policy_version 40 (0.0012) [2023-02-22 14:48:59,383][08504] Fps is (10 sec: 4505.6, 60 sec: 2799.0, 300 sec: 2799.0). Total num frames: 167936. Throughput: 0: 868.7. Samples: 40932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:48:59,386][08504] Avg episode reward: [(0, '4.361')] [2023-02-22 14:49:04,384][08504] Fps is (10 sec: 2867.7, 60 sec: 3003.8, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 919.3. Samples: 45132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:49:04,387][08504] Avg episode reward: [(0, '4.382')] [2023-02-22 14:49:09,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 956.9. Samples: 51184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:49:09,386][08504] Avg episode reward: [(0, '4.449')] [2023-02-22 14:49:09,722][14427] Updated weights for policy 0, policy_version 50 (0.0020) [2023-02-22 14:49:14,383][08504] Fps is (10 sec: 4506.0, 60 sec: 3754.7, 300 sec: 3003.8). Total num frames: 225280. Throughput: 0: 970.6. Samples: 54656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:49:14,390][08504] Avg episode reward: [(0, '4.412')] [2023-02-22 14:49:19,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 947.6. Samples: 60518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:49:19,390][08504] Avg episode reward: [(0, '4.574')] [2023-02-22 14:49:19,395][14413] Saving new best policy, reward=4.574! [2023-02-22 14:49:20,235][14427] Updated weights for policy 0, policy_version 60 (0.0019) [2023-02-22 14:49:24,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 933.1. Samples: 65080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:49:24,387][08504] Avg episode reward: [(0, '4.554')] [2023-02-22 14:49:29,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 958.4. Samples: 68324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:49:29,386][08504] Avg episode reward: [(0, '4.502')] [2023-02-22 14:49:30,314][14427] Updated weights for policy 0, policy_version 70 (0.0013) [2023-02-22 14:49:34,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 995.9. Samples: 75692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:49:34,391][08504] Avg episode reward: [(0, '4.490')] [2023-02-22 14:49:34,401][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth... [2023-02-22 14:49:39,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 951.3. Samples: 81006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:49:39,390][08504] Avg episode reward: [(0, '4.512')] [2023-02-22 14:49:40,964][14427] Updated weights for policy 0, policy_version 80 (0.0030) [2023-02-22 14:49:44,385][08504] Fps is (10 sec: 3276.2, 60 sec: 3891.1, 300 sec: 3198.7). Total num frames: 335872. Throughput: 0: 941.0. Samples: 83278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:49:44,388][08504] Avg episode reward: [(0, '4.519')] [2023-02-22 14:49:49,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 983.2. Samples: 89376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:49:49,386][08504] Avg episode reward: [(0, '4.301')] [2023-02-22 14:49:51,003][14427] Updated weights for policy 0, policy_version 90 (0.0019) [2023-02-22 14:49:54,384][08504] Fps is (10 sec: 4506.5, 60 sec: 3823.1, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 1010.6. Samples: 96662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:49:54,389][08504] Avg episode reward: [(0, '4.470')] [2023-02-22 14:49:59,384][08504] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3311.0). Total num frames: 397312. Throughput: 0: 992.7. Samples: 99328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:49:59,386][08504] Avg episode reward: [(0, '4.532')] [2023-02-22 14:50:01,967][14427] Updated weights for policy 0, policy_version 100 (0.0011) [2023-02-22 14:50:04,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 964.9. Samples: 103938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:50:04,394][08504] Avg episode reward: [(0, '4.571')] [2023-02-22 14:50:09,383][08504] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 1006.1. Samples: 110354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:50:09,391][08504] Avg episode reward: [(0, '4.632')] [2023-02-22 14:50:09,395][14413] Saving new best policy, reward=4.632! [2023-02-22 14:50:11,645][14427] Updated weights for policy 0, policy_version 110 (0.0018) [2023-02-22 14:50:14,384][08504] Fps is (10 sec: 4915.1, 60 sec: 3959.5, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 1013.5. Samples: 113932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:50:14,386][08504] Avg episode reward: [(0, '4.430')] [2023-02-22 14:50:19,384][08504] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 976.7. Samples: 119644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:50:19,386][08504] Avg episode reward: [(0, '4.332')] [2023-02-22 14:50:23,800][14427] Updated weights for policy 0, policy_version 120 (0.0026) [2023-02-22 14:50:24,384][08504] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 946.8. Samples: 123614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:50:24,390][08504] Avg episode reward: [(0, '4.258')] [2023-02-22 14:50:29,383][08504] Fps is (10 sec: 2457.6, 60 sec: 3754.7, 300 sec: 3358.7). Total num frames: 503808. Throughput: 0: 936.4. Samples: 125414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:50:29,390][08504] Avg episode reward: [(0, '4.431')] [2023-02-22 14:50:34,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3382.5). Total num frames: 524288. Throughput: 0: 914.8. Samples: 130544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:50:34,392][08504] Avg episode reward: [(0, '4.628')] [2023-02-22 14:50:35,827][14427] Updated weights for policy 0, policy_version 130 (0.0029) [2023-02-22 14:50:39,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3404.8). Total num frames: 544768. Throughput: 0: 892.6. Samples: 136828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:50:39,386][08504] Avg episode reward: [(0, '4.651')] [2023-02-22 14:50:39,389][14413] Saving new best policy, reward=4.651! [2023-02-22 14:50:44,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3376.1). Total num frames: 557056. Throughput: 0: 882.8. Samples: 139056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:50:44,386][08504] Avg episode reward: [(0, '4.515')] [2023-02-22 14:50:47,950][14427] Updated weights for policy 0, policy_version 140 (0.0026) [2023-02-22 14:50:49,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3397.3). Total num frames: 577536. Throughput: 0: 894.2. Samples: 144178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:50:49,389][08504] Avg episode reward: [(0, '4.376')] [2023-02-22 14:50:54,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3440.7). Total num frames: 602112. Throughput: 0: 916.7. Samples: 151606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:50:54,386][08504] Avg episode reward: [(0, '4.454')] [2023-02-22 14:50:56,427][14427] Updated weights for policy 0, policy_version 150 (0.0027) [2023-02-22 14:50:59,385][08504] Fps is (10 sec: 4504.8, 60 sec: 3754.6, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 912.5. Samples: 154996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:50:59,388][08504] Avg episode reward: [(0, '4.461')] [2023-02-22 14:51:04,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3453.9). Total num frames: 638976. Throughput: 0: 887.2. Samples: 159566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:51:04,386][08504] Avg episode reward: [(0, '4.497')] [2023-02-22 14:51:08,502][14427] Updated weights for policy 0, policy_version 160 (0.0017) [2023-02-22 14:51:09,383][08504] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 924.6. Samples: 165220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:51:09,386][08504] Avg episode reward: [(0, '4.653')] [2023-02-22 14:51:09,395][14413] Saving new best policy, reward=4.653! [2023-02-22 14:51:14,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3486.9). Total num frames: 679936. Throughput: 0: 961.9. Samples: 168700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:51:14,389][08504] Avg episode reward: [(0, '4.635')] [2023-02-22 14:51:17,283][14427] Updated weights for policy 0, policy_version 170 (0.0033) [2023-02-22 14:51:19,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3502.1). Total num frames: 700416. Throughput: 0: 995.6. Samples: 175346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:51:19,388][08504] Avg episode reward: [(0, '4.589')] [2023-02-22 14:51:24,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3496.6). Total num frames: 716800. Throughput: 0: 957.9. Samples: 179934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:51:24,390][08504] Avg episode reward: [(0, '4.696')] [2023-02-22 14:51:24,398][14413] Saving new best policy, reward=4.696! [2023-02-22 14:51:29,379][14427] Updated weights for policy 0, policy_version 180 (0.0039) [2023-02-22 14:51:29,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 737280. Throughput: 0: 964.3. Samples: 182448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:51:29,386][08504] Avg episode reward: [(0, '4.718')] [2023-02-22 14:51:29,393][14413] Saving new best policy, reward=4.718! [2023-02-22 14:51:34,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 1010.6. Samples: 189656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:51:34,386][08504] Avg episode reward: [(0, '4.793')] [2023-02-22 14:51:34,395][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2023-02-22 14:51:34,516][14413] Saving new best policy, reward=4.803! [2023-02-22 14:51:38,390][14427] Updated weights for policy 0, policy_version 190 (0.0017) [2023-02-22 14:51:39,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3537.5). Total num frames: 778240. Throughput: 0: 979.2. Samples: 195670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:51:39,392][08504] Avg episode reward: [(0, '4.868')] [2023-02-22 14:51:39,404][14413] Saving new best policy, reward=4.868! [2023-02-22 14:51:44,385][08504] Fps is (10 sec: 3276.4, 60 sec: 3959.4, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 953.8. Samples: 197916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:51:44,394][08504] Avg episode reward: [(0, '4.947')] [2023-02-22 14:51:44,406][14413] Saving new best policy, reward=4.947! [2023-02-22 14:51:49,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 969.2. Samples: 203180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:51:49,395][08504] Avg episode reward: [(0, '4.975')] [2023-02-22 14:51:49,398][14413] Saving new best policy, reward=4.975! [2023-02-22 14:51:50,110][14427] Updated weights for policy 0, policy_version 200 (0.0015) [2023-02-22 14:51:54,384][08504] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3573.1). Total num frames: 839680. Throughput: 0: 1003.4. Samples: 210372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:51:54,386][08504] Avg episode reward: [(0, '5.041')] [2023-02-22 14:51:54,403][14413] Saving new best policy, reward=5.041! [2023-02-22 14:51:59,384][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3566.9). Total num frames: 856064. Throughput: 0: 1001.3. Samples: 213760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:51:59,386][08504] Avg episode reward: [(0, '5.426')] [2023-02-22 14:51:59,388][14413] Saving new best policy, reward=5.426! [2023-02-22 14:51:59,720][14427] Updated weights for policy 0, policy_version 210 (0.0020) [2023-02-22 14:52:04,388][08504] Fps is (10 sec: 3275.5, 60 sec: 3890.9, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 955.7. Samples: 218356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:52:04,390][08504] Avg episode reward: [(0, '5.403')] [2023-02-22 14:52:09,383][08504] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 982.1. Samples: 224130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:52:09,386][08504] Avg episode reward: [(0, '5.254')] [2023-02-22 14:52:10,540][14427] Updated weights for policy 0, policy_version 220 (0.0032) [2023-02-22 14:52:14,383][08504] Fps is (10 sec: 4507.5, 60 sec: 3959.5, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 1007.1. Samples: 227766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:52:14,389][08504] Avg episode reward: [(0, '5.367')] [2023-02-22 14:52:19,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 991.7. Samples: 234282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:52:19,387][08504] Avg episode reward: [(0, '5.499')] [2023-02-22 14:52:19,391][14413] Saving new best policy, reward=5.499! [2023-02-22 14:52:20,604][14427] Updated weights for policy 0, policy_version 230 (0.0017) [2023-02-22 14:52:24,384][08504] Fps is (10 sec: 3276.5, 60 sec: 3891.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 958.4. Samples: 238800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:52:24,393][08504] Avg episode reward: [(0, '5.319')] [2023-02-22 14:52:29,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3595.4). Total num frames: 970752. Throughput: 0: 966.3. Samples: 241398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:52:29,386][08504] Avg episode reward: [(0, '5.405')] [2023-02-22 14:52:31,279][14427] Updated weights for policy 0, policy_version 240 (0.0018) [2023-02-22 14:52:34,384][08504] Fps is (10 sec: 4506.0, 60 sec: 3891.2, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 1010.3. Samples: 248644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:52:34,385][08504] Avg episode reward: [(0, '5.714')] [2023-02-22 14:52:34,402][14413] Saving new best policy, reward=5.714! [2023-02-22 14:52:39,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 984.0. Samples: 254652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:52:39,386][08504] Avg episode reward: [(0, '5.368')] [2023-02-22 14:52:41,642][14427] Updated weights for policy 0, policy_version 250 (0.0016) [2023-02-22 14:52:44,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3621.7). Total num frames: 1032192. Throughput: 0: 961.2. Samples: 257016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:52:44,389][08504] Avg episode reward: [(0, '5.161')] [2023-02-22 14:52:49,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3629.9). Total num frames: 1052672. Throughput: 0: 975.2. Samples: 262238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:52:49,388][08504] Avg episode reward: [(0, '5.405')] [2023-02-22 14:52:51,934][14427] Updated weights for policy 0, policy_version 260 (0.0024) [2023-02-22 14:52:54,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 1011.1. Samples: 269628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:52:54,386][08504] Avg episode reward: [(0, '5.704')] [2023-02-22 14:52:59,384][08504] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3707.3). Total num frames: 1093632. Throughput: 0: 1007.2. Samples: 273090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:52:59,390][08504] Avg episode reward: [(0, '5.766')] [2023-02-22 14:52:59,393][14413] Saving new best policy, reward=5.766! [2023-02-22 14:53:02,608][14427] Updated weights for policy 0, policy_version 270 (0.0022) [2023-02-22 14:53:04,386][08504] Fps is (10 sec: 3685.5, 60 sec: 3959.6, 300 sec: 3762.7). Total num frames: 1110016. Throughput: 0: 961.4. Samples: 277546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:53:04,392][08504] Avg episode reward: [(0, '5.587')] [2023-02-22 14:53:09,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 988.9. Samples: 283298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:09,386][08504] Avg episode reward: [(0, '5.948')] [2023-02-22 14:53:09,391][14413] Saving new best policy, reward=5.948! [2023-02-22 14:53:12,642][14427] Updated weights for policy 0, policy_version 280 (0.0017) [2023-02-22 14:53:14,383][08504] Fps is (10 sec: 4506.7, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1155072. Throughput: 0: 1010.1. Samples: 286854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:14,391][08504] Avg episode reward: [(0, '6.525')] [2023-02-22 14:53:14,401][14413] Saving new best policy, reward=6.525! [2023-02-22 14:53:19,384][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1171456. Throughput: 0: 994.6. Samples: 293402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:53:19,386][08504] Avg episode reward: [(0, '6.638')] [2023-02-22 14:53:19,395][14413] Saving new best policy, reward=6.638! [2023-02-22 14:53:23,581][14427] Updated weights for policy 0, policy_version 290 (0.0015) [2023-02-22 14:53:24,389][08504] Fps is (10 sec: 3274.9, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 1187840. Throughput: 0: 961.7. Samples: 297936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:53:24,392][08504] Avg episode reward: [(0, '6.775')] [2023-02-22 14:53:24,406][14413] Saving new best policy, reward=6.775! [2023-02-22 14:53:29,383][08504] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1208320. Throughput: 0: 966.6. Samples: 300512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:29,391][08504] Avg episode reward: [(0, '6.874')] [2023-02-22 14:53:29,395][14413] Saving new best policy, reward=6.874! [2023-02-22 14:53:33,287][14427] Updated weights for policy 0, policy_version 300 (0.0030) [2023-02-22 14:53:34,383][08504] Fps is (10 sec: 4508.2, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 1232896. Throughput: 0: 1010.9. Samples: 307728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:34,392][08504] Avg episode reward: [(0, '6.693')] [2023-02-22 14:53:34,402][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth... [2023-02-22 14:53:34,559][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth [2023-02-22 14:53:39,391][08504] Fps is (10 sec: 4093.1, 60 sec: 3890.7, 300 sec: 3887.6). Total num frames: 1249280. Throughput: 0: 978.7. Samples: 313676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:53:39,395][08504] Avg episode reward: [(0, '6.912')] [2023-02-22 14:53:39,401][14413] Saving new best policy, reward=6.912! [2023-02-22 14:53:44,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1265664. Throughput: 0: 951.4. Samples: 315902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:53:44,387][08504] Avg episode reward: [(0, '7.227')] [2023-02-22 14:53:44,400][14413] Saving new best policy, reward=7.227! [2023-02-22 14:53:45,088][14427] Updated weights for policy 0, policy_version 310 (0.0016) [2023-02-22 14:53:49,383][08504] Fps is (10 sec: 3689.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1286144. Throughput: 0: 966.5. Samples: 321034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:49,386][08504] Avg episode reward: [(0, '6.935')] [2023-02-22 14:53:54,196][14427] Updated weights for policy 0, policy_version 320 (0.0014) [2023-02-22 14:53:54,384][08504] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 1310720. Throughput: 0: 999.2. Samples: 328260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:53:54,386][08504] Avg episode reward: [(0, '7.857')] [2023-02-22 14:53:54,395][14413] Saving new best policy, reward=7.857! [2023-02-22 14:53:59,384][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1327104. Throughput: 0: 992.8. Samples: 331532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:53:59,388][08504] Avg episode reward: [(0, '8.028')] [2023-02-22 14:53:59,393][14413] Saving new best policy, reward=8.028! [2023-02-22 14:54:04,383][08504] Fps is (10 sec: 3276.9, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 1343488. Throughput: 0: 944.2. Samples: 335890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:54:04,390][08504] Avg episode reward: [(0, '8.620')] [2023-02-22 14:54:04,403][14413] Saving new best policy, reward=8.620! [2023-02-22 14:54:06,583][14427] Updated weights for policy 0, policy_version 330 (0.0026) [2023-02-22 14:54:09,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1363968. Throughput: 0: 968.6. Samples: 341518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:54:09,390][08504] Avg episode reward: [(0, '8.129')] [2023-02-22 14:54:14,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1388544. Throughput: 0: 990.9. Samples: 345104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:54:14,390][08504] Avg episode reward: [(0, '8.147')] [2023-02-22 14:54:15,209][14427] Updated weights for policy 0, policy_version 340 (0.0015) [2023-02-22 14:54:19,384][08504] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1404928. Throughput: 0: 973.0. Samples: 351514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:54:19,388][08504] Avg episode reward: [(0, '8.145')] [2023-02-22 14:54:24,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3873.8). Total num frames: 1421312. Throughput: 0: 940.7. Samples: 356000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:54:24,391][08504] Avg episode reward: [(0, '8.552')] [2023-02-22 14:54:27,368][14427] Updated weights for policy 0, policy_version 350 (0.0014) [2023-02-22 14:54:29,383][08504] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1441792. Throughput: 0: 952.5. Samples: 358766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:54:29,391][08504] Avg episode reward: [(0, '8.226')] [2023-02-22 14:54:34,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1466368. Throughput: 0: 1000.6. Samples: 366060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:54:34,386][08504] Avg episode reward: [(0, '8.924')] [2023-02-22 14:54:34,394][14413] Saving new best policy, reward=8.924! [2023-02-22 14:54:35,818][14427] Updated weights for policy 0, policy_version 360 (0.0029) [2023-02-22 14:54:39,385][08504] Fps is (10 sec: 4095.5, 60 sec: 3891.6, 300 sec: 3887.7). Total num frames: 1482752. Throughput: 0: 969.0. Samples: 371866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:54:39,388][08504] Avg episode reward: [(0, '9.277')] [2023-02-22 14:54:39,392][14413] Saving new best policy, reward=9.277! [2023-02-22 14:54:44,384][08504] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1499136. Throughput: 0: 945.9. Samples: 374096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:54:44,393][08504] Avg episode reward: [(0, '9.261')] [2023-02-22 14:54:48,091][14427] Updated weights for policy 0, policy_version 370 (0.0027) [2023-02-22 14:54:49,383][08504] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1519616. Throughput: 0: 969.8. Samples: 379530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:54:49,391][08504] Avg episode reward: [(0, '10.152')] [2023-02-22 14:54:49,396][14413] Saving new best policy, reward=10.152! [2023-02-22 14:54:54,383][08504] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1544192. Throughput: 0: 1006.7. Samples: 386820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:54:54,386][08504] Avg episode reward: [(0, '10.288')] [2023-02-22 14:54:54,399][14413] Saving new best policy, reward=10.288! [2023-02-22 14:54:56,942][14427] Updated weights for policy 0, policy_version 380 (0.0028) [2023-02-22 14:54:59,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1560576. Throughput: 0: 996.4. Samples: 389942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:54:59,388][08504] Avg episode reward: [(0, '11.298')] [2023-02-22 14:54:59,395][14413] Saving new best policy, reward=11.298! [2023-02-22 14:55:04,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1576960. Throughput: 0: 953.6. Samples: 394426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:55:04,386][08504] Avg episode reward: [(0, '11.150')] [2023-02-22 14:55:08,917][14427] Updated weights for policy 0, policy_version 390 (0.0030) [2023-02-22 14:55:09,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1597440. Throughput: 0: 986.2. Samples: 400378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:55:09,385][08504] Avg episode reward: [(0, '10.881')] [2023-02-22 14:55:14,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1622016. Throughput: 0: 1001.5. Samples: 403832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:55:14,389][08504] Avg episode reward: [(0, '10.923')] [2023-02-22 14:55:18,189][14427] Updated weights for policy 0, policy_version 400 (0.0014) [2023-02-22 14:55:19,385][08504] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 1638400. Throughput: 0: 979.7. Samples: 410148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:55:19,393][08504] Avg episode reward: [(0, '11.031')] [2023-02-22 14:55:24,385][08504] Fps is (10 sec: 3276.3, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 1654784. Throughput: 0: 952.1. Samples: 414712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:55:24,392][08504] Avg episode reward: [(0, '11.161')] [2023-02-22 14:55:29,384][08504] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1675264. Throughput: 0: 962.7. Samples: 417418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:55:29,386][08504] Avg episode reward: [(0, '11.343')] [2023-02-22 14:55:29,393][14413] Saving new best policy, reward=11.343! [2023-02-22 14:55:29,768][14427] Updated weights for policy 0, policy_version 410 (0.0016) [2023-02-22 14:55:34,384][08504] Fps is (10 sec: 4506.2, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1699840. Throughput: 0: 999.8. Samples: 424520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:55:34,387][08504] Avg episode reward: [(0, '12.071')] [2023-02-22 14:55:34,401][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000415_1699840.pth... [2023-02-22 14:55:34,534][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2023-02-22 14:55:34,542][14413] Saving new best policy, reward=12.071! [2023-02-22 14:55:39,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 1716224. Throughput: 0: 969.4. Samples: 430444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:55:39,389][08504] Avg episode reward: [(0, '11.810')] [2023-02-22 14:55:39,465][14427] Updated weights for policy 0, policy_version 420 (0.0016) [2023-02-22 14:55:44,385][08504] Fps is (10 sec: 3276.3, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 1732608. Throughput: 0: 948.9. Samples: 432644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:55:44,390][08504] Avg episode reward: [(0, '12.051')] [2023-02-22 14:55:49,385][08504] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 1753088. Throughput: 0: 971.9. Samples: 438164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:55:49,392][08504] Avg episode reward: [(0, '12.107')] [2023-02-22 14:55:49,395][14413] Saving new best policy, reward=12.107! [2023-02-22 14:55:50,548][14427] Updated weights for policy 0, policy_version 430 (0.0038) [2023-02-22 14:55:54,384][08504] Fps is (10 sec: 4506.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1777664. Throughput: 0: 1000.4. Samples: 445398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:55:54,390][08504] Avg episode reward: [(0, '11.446')] [2023-02-22 14:55:59,384][08504] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1798144. Throughput: 0: 995.6. Samples: 448632. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:55:59,387][08504] Avg episode reward: [(0, '11.580')] [2023-02-22 14:56:00,604][14427] Updated weights for policy 0, policy_version 440 (0.0021) [2023-02-22 14:56:04,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1810432. Throughput: 0: 956.6. Samples: 453194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:56:04,391][08504] Avg episode reward: [(0, '11.747')] [2023-02-22 14:56:09,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1835008. Throughput: 0: 990.3. Samples: 459276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:56:09,386][08504] Avg episode reward: [(0, '12.490')] [2023-02-22 14:56:09,394][14413] Saving new best policy, reward=12.490! [2023-02-22 14:56:11,176][14427] Updated weights for policy 0, policy_version 450 (0.0018) [2023-02-22 14:56:14,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1855488. Throughput: 0: 1006.9. Samples: 462730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:56:14,387][08504] Avg episode reward: [(0, '13.413')] [2023-02-22 14:56:14,396][14413] Saving new best policy, reward=13.413! [2023-02-22 14:56:19,391][08504] Fps is (10 sec: 4093.1, 60 sec: 3959.1, 300 sec: 3929.3). Total num frames: 1875968. Throughput: 0: 984.4. Samples: 468826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:56:19,397][08504] Avg episode reward: [(0, '14.280')] [2023-02-22 14:56:19,400][14413] Saving new best policy, reward=14.280! [2023-02-22 14:56:21,992][14427] Updated weights for policy 0, policy_version 460 (0.0027) [2023-02-22 14:56:24,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 1888256. Throughput: 0: 953.2. Samples: 473336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:56:24,386][08504] Avg episode reward: [(0, '15.023')] [2023-02-22 14:56:24,398][14413] Saving new best policy, reward=15.023! [2023-02-22 14:56:29,383][08504] Fps is (10 sec: 3279.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1908736. Throughput: 0: 966.8. Samples: 476150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:56:29,389][08504] Avg episode reward: [(0, '15.216')] [2023-02-22 14:56:29,425][14413] Saving new best policy, reward=15.216! [2023-02-22 14:56:32,013][14427] Updated weights for policy 0, policy_version 470 (0.0019) [2023-02-22 14:56:34,384][08504] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1933312. Throughput: 0: 1003.4. Samples: 483318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:56:34,386][08504] Avg episode reward: [(0, '14.834')] [2023-02-22 14:56:39,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1953792. Throughput: 0: 969.4. Samples: 489020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:56:39,386][08504] Avg episode reward: [(0, '14.403')] [2023-02-22 14:56:43,234][14427] Updated weights for policy 0, policy_version 480 (0.0022) [2023-02-22 14:56:44,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 1966080. Throughput: 0: 947.6. Samples: 491276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:56:44,395][08504] Avg episode reward: [(0, '14.930')] [2023-02-22 14:56:49,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1990656. Throughput: 0: 974.6. Samples: 497050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:56:49,392][08504] Avg episode reward: [(0, '13.259')] [2023-02-22 14:56:52,466][14427] Updated weights for policy 0, policy_version 490 (0.0024) [2023-02-22 14:56:54,383][08504] Fps is (10 sec: 4915.3, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2015232. Throughput: 0: 1005.4. Samples: 504520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:56:54,390][08504] Avg episode reward: [(0, '13.892')] [2023-02-22 14:56:59,385][08504] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3929.4). Total num frames: 2031616. Throughput: 0: 998.8. Samples: 507676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:56:59,390][08504] Avg episode reward: [(0, '15.969')] [2023-02-22 14:56:59,394][14413] Saving new best policy, reward=15.969! [2023-02-22 14:57:03,701][14427] Updated weights for policy 0, policy_version 500 (0.0015) [2023-02-22 14:57:04,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2048000. Throughput: 0: 964.5. Samples: 512222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:57:04,391][08504] Avg episode reward: [(0, '15.703')] [2023-02-22 14:57:09,383][08504] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2072576. Throughput: 0: 1003.3. Samples: 518486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:57:09,391][08504] Avg episode reward: [(0, '17.147')] [2023-02-22 14:57:09,394][14413] Saving new best policy, reward=17.147! [2023-02-22 14:57:12,835][14427] Updated weights for policy 0, policy_version 510 (0.0012) [2023-02-22 14:57:14,383][08504] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2093056. Throughput: 0: 1018.1. Samples: 521964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:57:14,386][08504] Avg episode reward: [(0, '18.277')] [2023-02-22 14:57:14,396][14413] Saving new best policy, reward=18.277! [2023-02-22 14:57:19,389][08504] Fps is (10 sec: 4093.9, 60 sec: 3959.6, 300 sec: 3943.2). Total num frames: 2113536. Throughput: 0: 999.0. Samples: 528276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:57:19,391][08504] Avg episode reward: [(0, '16.944')] [2023-02-22 14:57:24,384][08504] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2125824. Throughput: 0: 975.4. Samples: 532914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:57:24,390][08504] Avg episode reward: [(0, '16.934')] [2023-02-22 14:57:24,559][14427] Updated weights for policy 0, policy_version 520 (0.0016) [2023-02-22 14:57:29,384][08504] Fps is (10 sec: 3688.3, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2150400. Throughput: 0: 992.7. Samples: 535948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:57:29,386][08504] Avg episode reward: [(0, '15.394')] [2023-02-22 14:57:33,261][14427] Updated weights for policy 0, policy_version 530 (0.0019) [2023-02-22 14:57:34,383][08504] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2174976. Throughput: 0: 1028.0. Samples: 543312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:57:34,389][08504] Avg episode reward: [(0, '16.105')] [2023-02-22 14:57:34,403][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth... [2023-02-22 14:57:34,517][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth [2023-02-22 14:57:39,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2191360. Throughput: 0: 988.9. Samples: 549020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:57:39,395][08504] Avg episode reward: [(0, '17.429')] [2023-02-22 14:57:44,385][08504] Fps is (10 sec: 3276.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2207744. Throughput: 0: 969.4. Samples: 551300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:57:44,388][08504] Avg episode reward: [(0, '17.406')] [2023-02-22 14:57:44,812][14427] Updated weights for policy 0, policy_version 540 (0.0012) [2023-02-22 14:57:49,383][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2232320. Throughput: 0: 1001.1. Samples: 557270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:57:49,389][08504] Avg episode reward: [(0, '19.043')] [2023-02-22 14:57:49,392][14413] Saving new best policy, reward=19.043! [2023-02-22 14:57:53,551][14427] Updated weights for policy 0, policy_version 550 (0.0017) [2023-02-22 14:57:54,383][08504] Fps is (10 sec: 4915.8, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2256896. Throughput: 0: 1024.3. Samples: 564580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:57:54,390][08504] Avg episode reward: [(0, '18.322')] [2023-02-22 14:57:59,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3929.4). Total num frames: 2269184. Throughput: 0: 1010.9. Samples: 567454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:57:59,386][08504] Avg episode reward: [(0, '19.237')] [2023-02-22 14:57:59,388][14413] Saving new best policy, reward=19.237! [2023-02-22 14:58:04,386][08504] Fps is (10 sec: 2457.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 2281472. Throughput: 0: 952.5. Samples: 571136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:04,398][08504] Avg episode reward: [(0, '19.145')] [2023-02-22 14:58:07,902][14427] Updated weights for policy 0, policy_version 560 (0.0038) [2023-02-22 14:58:09,385][08504] Fps is (10 sec: 2457.3, 60 sec: 3686.3, 300 sec: 3859.9). Total num frames: 2293760. Throughput: 0: 933.0. Samples: 574898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:09,387][08504] Avg episode reward: [(0, '19.078')] [2023-02-22 14:58:14,383][08504] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2318336. Throughput: 0: 930.2. Samples: 577808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:14,389][08504] Avg episode reward: [(0, '19.826')] [2023-02-22 14:58:14,401][14413] Saving new best policy, reward=19.826! [2023-02-22 14:58:17,466][14427] Updated weights for policy 0, policy_version 570 (0.0025) [2023-02-22 14:58:19,384][08504] Fps is (10 sec: 4915.6, 60 sec: 3823.2, 300 sec: 3915.6). Total num frames: 2342912. Throughput: 0: 931.2. Samples: 585218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:19,387][08504] Avg episode reward: [(0, '21.467')] [2023-02-22 14:58:19,395][14413] Saving new best policy, reward=21.467! [2023-02-22 14:58:24,386][08504] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 2359296. Throughput: 0: 917.4. Samples: 590306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:58:24,390][08504] Avg episode reward: [(0, '22.349')] [2023-02-22 14:58:24,408][14413] Saving new best policy, reward=22.349! [2023-02-22 14:58:29,275][14427] Updated weights for policy 0, policy_version 580 (0.0028) [2023-02-22 14:58:29,383][08504] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2375680. Throughput: 0: 916.0. Samples: 592518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:29,386][08504] Avg episode reward: [(0, '21.285')] [2023-02-22 14:58:34,384][08504] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3901.7). Total num frames: 2400256. Throughput: 0: 931.5. Samples: 599188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:58:34,386][08504] Avg episode reward: [(0, '22.014')] [2023-02-22 14:58:37,698][14427] Updated weights for policy 0, policy_version 590 (0.0018) [2023-02-22 14:58:39,385][08504] Fps is (10 sec: 4505.0, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2420736. Throughput: 0: 931.8. Samples: 606514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:39,391][08504] Avg episode reward: [(0, '20.064')] [2023-02-22 14:58:44,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 2437120. Throughput: 0: 919.3. Samples: 608824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:44,386][08504] Avg episode reward: [(0, '19.395')] [2023-02-22 14:58:49,383][08504] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 2453504. Throughput: 0: 936.1. Samples: 613260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:58:49,387][08504] Avg episode reward: [(0, '18.820')] [2023-02-22 14:58:49,990][14427] Updated weights for policy 0, policy_version 600 (0.0029) [2023-02-22 14:58:54,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3887.7). Total num frames: 2473984. Throughput: 0: 993.6. Samples: 619608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:58:54,386][08504] Avg episode reward: [(0, '19.356')] [2023-02-22 14:58:58,948][14427] Updated weights for policy 0, policy_version 610 (0.0014) [2023-02-22 14:58:59,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2498560. Throughput: 0: 1006.0. Samples: 623076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:58:59,386][08504] Avg episode reward: [(0, '18.861')] [2023-02-22 14:59:04,390][08504] Fps is (10 sec: 3684.1, 60 sec: 3822.7, 300 sec: 3887.6). Total num frames: 2510848. Throughput: 0: 957.7. Samples: 628320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:59:04,392][08504] Avg episode reward: [(0, '19.151')] [2023-02-22 14:59:09,383][08504] Fps is (10 sec: 2867.2, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 2527232. Throughput: 0: 945.8. Samples: 632866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:59:09,385][08504] Avg episode reward: [(0, '18.391')] [2023-02-22 14:59:11,621][14427] Updated weights for policy 0, policy_version 620 (0.0023) [2023-02-22 14:59:14,383][08504] Fps is (10 sec: 4098.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2551808. Throughput: 0: 971.2. Samples: 636224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:59:14,389][08504] Avg episode reward: [(0, '19.067')] [2023-02-22 14:59:19,384][08504] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2572288. Throughput: 0: 975.4. Samples: 643082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:59:19,391][08504] Avg episode reward: [(0, '19.565')] [2023-02-22 14:59:21,362][14427] Updated weights for policy 0, policy_version 630 (0.0018) [2023-02-22 14:59:24,384][08504] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 2588672. Throughput: 0: 914.8. Samples: 647680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:59:24,392][08504] Avg episode reward: [(0, '19.458')] [2023-02-22 14:59:29,383][08504] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2600960. Throughput: 0: 909.6. Samples: 649756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:59:29,391][08504] Avg episode reward: [(0, '20.716')] [2023-02-22 14:59:33,227][14427] Updated weights for policy 0, policy_version 640 (0.0033) [2023-02-22 14:59:34,383][08504] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 2625536. Throughput: 0: 947.3. Samples: 655890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:59:34,386][08504] Avg episode reward: [(0, '21.187')] [2023-02-22 14:59:34,394][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth... [2023-02-22 14:59:34,502][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000415_1699840.pth [2023-02-22 14:59:39,383][08504] Fps is (10 sec: 4915.2, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 2650112. Throughput: 0: 969.5. Samples: 663234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:59:39,392][08504] Avg episode reward: [(0, '22.438')] [2023-02-22 14:59:39,396][14413] Saving new best policy, reward=22.438! [2023-02-22 14:59:42,602][14427] Updated weights for policy 0, policy_version 650 (0.0026) [2023-02-22 14:59:44,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2666496. Throughput: 0: 951.1. Samples: 665874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:59:44,389][08504] Avg episode reward: [(0, '20.771')] [2023-02-22 14:59:49,383][08504] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2682880. Throughput: 0: 937.1. Samples: 670484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:59:49,389][08504] Avg episode reward: [(0, '21.227')] [2023-02-22 14:59:53,551][14427] Updated weights for policy 0, policy_version 660 (0.0022) [2023-02-22 14:59:54,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2703360. Throughput: 0: 987.4. Samples: 677300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:59:54,385][08504] Avg episode reward: [(0, '22.749')] [2023-02-22 14:59:54,423][14413] Saving new best policy, reward=22.749! [2023-02-22 14:59:59,384][08504] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2727936. Throughput: 0: 991.9. Samples: 680862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:59:59,393][08504] Avg episode reward: [(0, '23.865')] [2023-02-22 14:59:59,439][14413] Saving new best policy, reward=23.865! [2023-02-22 15:00:03,246][14427] Updated weights for policy 0, policy_version 670 (0.0015) [2023-02-22 15:00:04,385][08504] Fps is (10 sec: 4095.6, 60 sec: 3891.5, 300 sec: 3887.7). Total num frames: 2744320. Throughput: 0: 968.2. Samples: 686654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:00:04,387][08504] Avg episode reward: [(0, '23.027')] [2023-02-22 15:00:09,383][08504] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2760704. Throughput: 0: 970.1. Samples: 691332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:09,388][08504] Avg episode reward: [(0, '23.064')] [2023-02-22 15:00:13,983][14427] Updated weights for policy 0, policy_version 680 (0.0024) [2023-02-22 15:00:14,383][08504] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2785280. Throughput: 0: 1000.2. Samples: 694766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 15:00:14,391][08504] Avg episode reward: [(0, '24.332')] [2023-02-22 15:00:14,401][14413] Saving new best policy, reward=24.332! [2023-02-22 15:00:19,383][08504] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2809856. Throughput: 0: 1022.9. Samples: 701922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:19,385][08504] Avg episode reward: [(0, '23.504')] [2023-02-22 15:00:24,383][08504] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2822144. Throughput: 0: 975.1. Samples: 707114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:00:24,385][08504] Avg episode reward: [(0, '23.966')] [2023-02-22 15:00:24,419][14427] Updated weights for policy 0, policy_version 690 (0.0017) [2023-02-22 15:00:29,389][08504] Fps is (10 sec: 2865.7, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 2838528. Throughput: 0: 967.6. Samples: 709422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:29,392][08504] Avg episode reward: [(0, '23.559')] [2023-02-22 15:00:34,384][08504] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2863104. Throughput: 0: 1003.7. Samples: 715650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:34,391][08504] Avg episode reward: [(0, '23.917')] [2023-02-22 15:00:34,659][14427] Updated weights for policy 0, policy_version 700 (0.0018) [2023-02-22 15:00:39,384][08504] Fps is (10 sec: 4917.7, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2887680. Throughput: 0: 1014.4. Samples: 722950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:00:39,386][08504] Avg episode reward: [(0, '23.420')] [2023-02-22 15:00:44,383][08504] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2904064. Throughput: 0: 990.6. Samples: 725438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:44,386][08504] Avg episode reward: [(0, '23.048')] [2023-02-22 15:00:45,095][14427] Updated weights for policy 0, policy_version 710 (0.0012) [2023-02-22 15:00:49,386][08504] Fps is (10 sec: 3276.0, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 2920448. Throughput: 0: 964.2. Samples: 730046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:49,388][08504] Avg episode reward: [(0, '22.131')] [2023-02-22 15:00:54,384][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2945024. Throughput: 0: 1013.3. Samples: 736930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:00:54,394][08504] Avg episode reward: [(0, '21.494')] [2023-02-22 15:00:54,984][14427] Updated weights for policy 0, policy_version 720 (0.0031) [2023-02-22 15:00:59,383][08504] Fps is (10 sec: 4916.4, 60 sec: 4027.8, 300 sec: 3929.4). Total num frames: 2969600. Throughput: 0: 1019.5. Samples: 740642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:00:59,390][08504] Avg episode reward: [(0, '21.111')] [2023-02-22 15:01:04,384][08504] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 3901.6). Total num frames: 2985984. Throughput: 0: 985.6. Samples: 746276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:04,389][08504] Avg episode reward: [(0, '20.323')] [2023-02-22 15:01:05,540][14427] Updated weights for policy 0, policy_version 730 (0.0011) [2023-02-22 15:01:09,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3002368. Throughput: 0: 975.6. Samples: 751016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:01:09,385][08504] Avg episode reward: [(0, '20.095')] [2023-02-22 15:01:14,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 3022848. Throughput: 0: 1003.6. Samples: 754580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:01:14,391][08504] Avg episode reward: [(0, '20.034')] [2023-02-22 15:01:15,230][14427] Updated weights for policy 0, policy_version 740 (0.0018) [2023-02-22 15:01:19,384][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3047424. Throughput: 0: 1029.1. Samples: 761958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:19,386][08504] Avg episode reward: [(0, '20.919')] [2023-02-22 15:01:24,383][08504] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3063808. Throughput: 0: 979.7. Samples: 767036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:24,387][08504] Avg episode reward: [(0, '22.566')] [2023-02-22 15:01:26,077][14427] Updated weights for policy 0, policy_version 750 (0.0018) [2023-02-22 15:01:29,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4028.1, 300 sec: 3887.7). Total num frames: 3080192. Throughput: 0: 975.7. Samples: 769346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 15:01:29,386][08504] Avg episode reward: [(0, '21.764')] [2023-02-22 15:01:34,384][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3104768. Throughput: 0: 1020.4. Samples: 775960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:01:34,391][08504] Avg episode reward: [(0, '23.164')] [2023-02-22 15:01:34,404][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000758_3104768.pth... [2023-02-22 15:01:34,539][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth [2023-02-22 15:01:35,546][14427] Updated weights for policy 0, policy_version 760 (0.0041) [2023-02-22 15:01:39,385][08504] Fps is (10 sec: 4914.5, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3129344. Throughput: 0: 1027.2. Samples: 783154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:01:39,391][08504] Avg episode reward: [(0, '23.922')] [2023-02-22 15:01:44,387][08504] Fps is (10 sec: 4094.7, 60 sec: 4027.5, 300 sec: 3915.5). Total num frames: 3145728. Throughput: 0: 995.8. Samples: 785454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:44,391][08504] Avg episode reward: [(0, '23.314')] [2023-02-22 15:01:46,845][14427] Updated weights for policy 0, policy_version 770 (0.0020) [2023-02-22 15:01:49,384][08504] Fps is (10 sec: 3277.2, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 3162112. Throughput: 0: 974.2. Samples: 790114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:49,386][08504] Avg episode reward: [(0, '22.931')] [2023-02-22 15:01:54,383][08504] Fps is (10 sec: 4097.3, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3186688. Throughput: 0: 1028.1. Samples: 797280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 15:01:54,387][08504] Avg episode reward: [(0, '22.314')] [2023-02-22 15:01:55,927][14427] Updated weights for policy 0, policy_version 780 (0.0020) [2023-02-22 15:01:59,393][08504] Fps is (10 sec: 4501.3, 60 sec: 3958.8, 300 sec: 3929.3). Total num frames: 3207168. Throughput: 0: 1029.5. Samples: 800916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:01:59,399][08504] Avg episode reward: [(0, '23.715')] [2023-02-22 15:02:04,388][08504] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 3901.6). Total num frames: 3223552. Throughput: 0: 986.3. Samples: 806344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:02:04,390][08504] Avg episode reward: [(0, '23.633')] [2023-02-22 15:02:07,292][14427] Updated weights for policy 0, policy_version 790 (0.0026) [2023-02-22 15:02:09,384][08504] Fps is (10 sec: 3279.8, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3239936. Throughput: 0: 983.9. Samples: 811310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:02:09,391][08504] Avg episode reward: [(0, '23.260')] [2023-02-22 15:02:14,384][08504] Fps is (10 sec: 4097.7, 60 sec: 4027.7, 300 sec: 3901.7). Total num frames: 3264512. Throughput: 0: 1014.0. Samples: 814974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:02:14,386][08504] Avg episode reward: [(0, '24.074')] [2023-02-22 15:02:16,116][14427] Updated weights for policy 0, policy_version 800 (0.0014) [2023-02-22 15:02:19,384][08504] Fps is (10 sec: 4915.5, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3289088. Throughput: 0: 1028.6. Samples: 822248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:02:19,386][08504] Avg episode reward: [(0, '24.443')] [2023-02-22 15:02:19,388][14413] Saving new best policy, reward=24.443! [2023-02-22 15:02:24,386][08504] Fps is (10 sec: 4094.8, 60 sec: 4027.5, 300 sec: 3915.5). Total num frames: 3305472. Throughput: 0: 976.5. Samples: 827098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 15:02:24,394][08504] Avg episode reward: [(0, '21.568')] [2023-02-22 15:02:27,831][14427] Updated weights for policy 0, policy_version 810 (0.0027) [2023-02-22 15:02:29,387][08504] Fps is (10 sec: 3275.7, 60 sec: 4027.5, 300 sec: 3887.7). Total num frames: 3321856. Throughput: 0: 978.7. Samples: 829496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:02:29,394][08504] Avg episode reward: [(0, '20.443')] [2023-02-22 15:02:34,384][08504] Fps is (10 sec: 4097.1, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3346432. Throughput: 0: 1026.8. Samples: 836322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:02:34,386][08504] Avg episode reward: [(0, '20.528')] [2023-02-22 15:02:36,474][14427] Updated weights for policy 0, policy_version 820 (0.0021) [2023-02-22 15:02:39,383][08504] Fps is (10 sec: 4916.9, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3371008. Throughput: 0: 1025.1. Samples: 843410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:02:39,392][08504] Avg episode reward: [(0, '18.818')] [2023-02-22 15:02:44,384][08504] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3901.6). Total num frames: 3383296. Throughput: 0: 996.9. Samples: 845766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:02:44,387][08504] Avg episode reward: [(0, '19.878')] [2023-02-22 15:02:48,230][14427] Updated weights for policy 0, policy_version 830 (0.0025) [2023-02-22 15:02:49,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3403776. Throughput: 0: 979.9. Samples: 850434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:02:49,392][08504] Avg episode reward: [(0, '20.931')] [2023-02-22 15:02:54,384][08504] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3428352. Throughput: 0: 1034.4. Samples: 857856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:02:54,388][08504] Avg episode reward: [(0, '22.088')] [2023-02-22 15:02:56,600][14427] Updated weights for policy 0, policy_version 840 (0.0021) [2023-02-22 15:02:59,383][08504] Fps is (10 sec: 4915.2, 60 sec: 4096.7, 300 sec: 3971.1). Total num frames: 3452928. Throughput: 0: 1035.9. Samples: 861588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:02:59,389][08504] Avg episode reward: [(0, '22.605')] [2023-02-22 15:03:04,384][08504] Fps is (10 sec: 3686.1, 60 sec: 4028.0, 300 sec: 3971.0). Total num frames: 3465216. Throughput: 0: 991.7. Samples: 866874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:04,390][08504] Avg episode reward: [(0, '22.665')] [2023-02-22 15:03:08,416][14427] Updated weights for policy 0, policy_version 850 (0.0014) [2023-02-22 15:03:09,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3485696. Throughput: 0: 996.6. Samples: 871940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:09,386][08504] Avg episode reward: [(0, '22.928')] [2023-02-22 15:03:14,384][08504] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3510272. Throughput: 0: 1025.1. Samples: 875622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:14,385][08504] Avg episode reward: [(0, '22.723')] [2023-02-22 15:03:16,791][14427] Updated weights for policy 0, policy_version 860 (0.0012) [2023-02-22 15:03:19,385][08504] Fps is (10 sec: 4505.1, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3530752. Throughput: 0: 1035.4. Samples: 882916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:19,393][08504] Avg episode reward: [(0, '22.756')] [2023-02-22 15:03:24,383][08504] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 3971.0). Total num frames: 3547136. Throughput: 0: 981.4. Samples: 887574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 15:03:24,386][08504] Avg episode reward: [(0, '23.081')] [2023-02-22 15:03:28,854][14427] Updated weights for policy 0, policy_version 870 (0.0021) [2023-02-22 15:03:29,384][08504] Fps is (10 sec: 3277.2, 60 sec: 4028.0, 300 sec: 3943.3). Total num frames: 3563520. Throughput: 0: 981.6. Samples: 889938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:03:29,386][08504] Avg episode reward: [(0, '22.959')] [2023-02-22 15:03:34,384][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3588096. Throughput: 0: 1031.2. Samples: 896836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 15:03:34,386][08504] Avg episode reward: [(0, '23.277')] [2023-02-22 15:03:34,401][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000876_3588096.pth... [2023-02-22 15:03:34,530][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth [2023-02-22 15:03:37,117][14427] Updated weights for policy 0, policy_version 880 (0.0014) [2023-02-22 15:03:39,383][08504] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3612672. Throughput: 0: 1021.6. Samples: 903826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:03:39,389][08504] Avg episode reward: [(0, '22.318')] [2023-02-22 15:03:44,383][08504] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3629056. Throughput: 0: 991.1. Samples: 906186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:44,394][08504] Avg episode reward: [(0, '22.973')] [2023-02-22 15:03:49,078][14427] Updated weights for policy 0, policy_version 890 (0.0012) [2023-02-22 15:03:49,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3645440. Throughput: 0: 979.1. Samples: 910934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:03:49,386][08504] Avg episode reward: [(0, '22.557')] [2023-02-22 15:03:54,384][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3670016. Throughput: 0: 1032.7. Samples: 918410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:54,386][08504] Avg episode reward: [(0, '24.176')] [2023-02-22 15:03:57,217][14427] Updated weights for policy 0, policy_version 900 (0.0021) [2023-02-22 15:03:59,384][08504] Fps is (10 sec: 4914.9, 60 sec: 4027.7, 300 sec: 4012.8). Total num frames: 3694592. Throughput: 0: 1033.1. Samples: 922112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:03:59,391][08504] Avg episode reward: [(0, '24.977')] [2023-02-22 15:03:59,393][14413] Saving new best policy, reward=24.977! [2023-02-22 15:04:04,383][08504] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3998.8). Total num frames: 3706880. Throughput: 0: 982.0. Samples: 927106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:04:04,385][08504] Avg episode reward: [(0, '25.375')] [2023-02-22 15:04:04,395][14413] Saving new best policy, reward=25.375! [2023-02-22 15:04:09,384][08504] Fps is (10 sec: 2867.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3723264. Throughput: 0: 992.0. Samples: 932212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 15:04:09,389][08504] Avg episode reward: [(0, '26.294')] [2023-02-22 15:04:09,447][14413] Saving new best policy, reward=26.294! [2023-02-22 15:04:09,452][14427] Updated weights for policy 0, policy_version 910 (0.0013) [2023-02-22 15:04:14,383][08504] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3747840. Throughput: 0: 1019.4. Samples: 935810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:04:14,387][08504] Avg episode reward: [(0, '26.242')] [2023-02-22 15:04:17,708][14427] Updated weights for policy 0, policy_version 920 (0.0022) [2023-02-22 15:04:19,383][08504] Fps is (10 sec: 4915.2, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 3772416. Throughput: 0: 1030.9. Samples: 943228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:19,390][08504] Avg episode reward: [(0, '26.012')] [2023-02-22 15:04:24,384][08504] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3788800. Throughput: 0: 977.5. Samples: 947814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:24,385][08504] Avg episode reward: [(0, '25.084')] [2023-02-22 15:04:29,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3805184. Throughput: 0: 976.8. Samples: 950142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:04:29,391][08504] Avg episode reward: [(0, '24.360')] [2023-02-22 15:04:29,723][14427] Updated weights for policy 0, policy_version 930 (0.0013) [2023-02-22 15:04:34,384][08504] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3829760. Throughput: 0: 1029.7. Samples: 957272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:34,390][08504] Avg episode reward: [(0, '23.961')] [2023-02-22 15:04:38,039][14427] Updated weights for policy 0, policy_version 940 (0.0026) [2023-02-22 15:04:39,383][08504] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3854336. Throughput: 0: 1014.7. Samples: 964072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:39,388][08504] Avg episode reward: [(0, '24.276')] [2023-02-22 15:04:44,387][08504] Fps is (10 sec: 3685.3, 60 sec: 3959.3, 300 sec: 4012.6). Total num frames: 3866624. Throughput: 0: 983.0. Samples: 966350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:04:44,393][08504] Avg episode reward: [(0, '24.475')] [2023-02-22 15:04:49,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3887104. Throughput: 0: 980.5. Samples: 971230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 15:04:49,395][08504] Avg episode reward: [(0, '23.347')] [2023-02-22 15:04:50,118][14427] Updated weights for policy 0, policy_version 950 (0.0036) [2023-02-22 15:04:54,383][08504] Fps is (10 sec: 4507.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3911680. Throughput: 0: 1029.2. Samples: 978528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:54,391][08504] Avg episode reward: [(0, '24.442')] [2023-02-22 15:04:58,510][14427] Updated weights for policy 0, policy_version 960 (0.0021) [2023-02-22 15:04:59,383][08504] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3932160. Throughput: 0: 1032.7. Samples: 982282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:04:59,388][08504] Avg episode reward: [(0, '24.009')] [2023-02-22 15:05:04,384][08504] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3948544. Throughput: 0: 978.0. Samples: 987240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:05:04,390][08504] Avg episode reward: [(0, '23.151')] [2023-02-22 15:05:09,383][08504] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3964928. Throughput: 0: 998.1. Samples: 992730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 15:05:09,390][08504] Avg episode reward: [(0, '21.877')] [2023-02-22 15:05:10,191][14427] Updated weights for policy 0, policy_version 970 (0.0018) [2023-02-22 15:05:14,383][08504] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3993600. Throughput: 0: 1027.6. Samples: 996386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 15:05:14,386][08504] Avg episode reward: [(0, '23.231')] [2023-02-22 15:05:16,750][14413] Stopping Batcher_0... [2023-02-22 15:05:16,751][14413] Loop batcher_evt_loop terminating... [2023-02-22 15:05:16,751][08504] Component Batcher_0 stopped! [2023-02-22 15:05:16,759][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 15:05:16,815][14427] Weights refcount: 2 0 [2023-02-22 15:05:16,818][14427] Stopping InferenceWorker_p0-w0... [2023-02-22 15:05:16,819][14427] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 15:05:16,814][08504] Component RolloutWorker_w3 stopped! [2023-02-22 15:05:16,820][08504] Component InferenceWorker_p0-w0 stopped! [2023-02-22 15:05:16,814][14432] Stopping RolloutWorker_w3... [2023-02-22 15:05:16,823][14429] Stopping RolloutWorker_w1... [2023-02-22 15:05:16,825][08504] Component RolloutWorker_w1 stopped! [2023-02-22 15:05:16,827][14432] Loop rollout_proc3_evt_loop terminating... [2023-02-22 15:05:16,828][14429] Loop rollout_proc1_evt_loop terminating... [2023-02-22 15:05:16,841][14433] Stopping RolloutWorker_w5... [2023-02-22 15:05:16,839][14435] Stopping RolloutWorker_w7... [2023-02-22 15:05:16,843][14433] Loop rollout_proc5_evt_loop terminating... [2023-02-22 15:05:16,839][08504] Component RolloutWorker_w7 stopped! [2023-02-22 15:05:16,843][14435] Loop rollout_proc7_evt_loop terminating... [2023-02-22 15:05:16,845][08504] Component RolloutWorker_w0 stopped! [2023-02-22 15:05:16,839][14428] Stopping RolloutWorker_w0... [2023-02-22 15:05:16,852][14428] Loop rollout_proc0_evt_loop terminating... [2023-02-22 15:05:16,850][08504] Component RolloutWorker_w5 stopped! [2023-02-22 15:05:16,863][08504] Component RolloutWorker_w4 stopped! [2023-02-22 15:05:16,867][14431] Stopping RolloutWorker_w4... [2023-02-22 15:05:16,869][14434] Stopping RolloutWorker_w6... [2023-02-22 15:05:16,868][08504] Component RolloutWorker_w6 stopped! [2023-02-22 15:05:16,872][14431] Loop rollout_proc4_evt_loop terminating... [2023-02-22 15:05:16,873][14434] Loop rollout_proc6_evt_loop terminating... [2023-02-22 15:05:16,876][08504] Component RolloutWorker_w2 stopped! [2023-02-22 15:05:16,884][14430] Stopping RolloutWorker_w2... [2023-02-22 15:05:16,884][14430] Loop rollout_proc2_evt_loop terminating... [2023-02-22 15:05:16,968][14413] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000758_3104768.pth [2023-02-22 15:05:16,980][14413] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 15:05:17,117][08504] Component LearnerWorker_p0 stopped! [2023-02-22 15:05:17,123][08504] Waiting for process learner_proc0 to stop... [2023-02-22 15:05:17,129][14413] Stopping LearnerWorker_p0... [2023-02-22 15:05:17,130][14413] Loop learner_proc0_evt_loop terminating... [2023-02-22 15:05:19,809][08504] Waiting for process inference_proc0-0 to join... [2023-02-22 15:05:20,295][08504] Waiting for process rollout_proc0 to join... [2023-02-22 15:05:21,455][08504] Waiting for process rollout_proc1 to join... [2023-02-22 15:05:21,460][08504] Waiting for process rollout_proc2 to join... [2023-02-22 15:05:21,487][08504] Waiting for process rollout_proc3 to join... [2023-02-22 15:05:21,488][08504] Waiting for process rollout_proc4 to join... [2023-02-22 15:05:21,493][08504] Waiting for process rollout_proc5 to join... [2023-02-22 15:05:21,495][08504] Waiting for process rollout_proc6 to join... [2023-02-22 15:05:21,503][08504] Waiting for process rollout_proc7 to join... [2023-02-22 15:05:21,504][08504] Batcher 0 profile tree view: batching: 26.2922, releasing_batches: 0.0213 [2023-02-22 15:05:21,506][08504] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 496.1414 update_model: 7.5273 weight_update: 0.0028 one_step: 0.0029 handle_policy_step: 492.1000 deserialize: 14.5007, stack: 2.8751, obs_to_device_normalize: 110.6931, forward: 233.1025, send_messages: 25.6556 prepare_outputs: 80.5003 to_cpu: 50.3872 [2023-02-22 15:05:21,515][08504] Learner 0 profile tree view: misc: 0.0052, prepare_batch: 16.0964 train: 74.2474 epoch_init: 0.0056, minibatch_init: 0.0059, losses_postprocess: 0.5979, kl_divergence: 0.4922, after_optimizer: 32.0808 calculate_losses: 26.3061 losses_init: 0.0076, forward_head: 1.6850, bptt_initial: 17.5300, tail: 1.0004, advantages_returns: 0.2832, losses: 3.4062 bptt: 2.1086 bptt_forward_core: 2.0061 update: 14.2043 clip: 1.3986 [2023-02-22 15:05:21,516][08504] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3034, enqueue_policy_requests: 123.7049, env_step: 788.7622, overhead: 17.8595, complete_rollouts: 6.9083 save_policy_outputs: 19.3334 split_output_tensors: 9.4977 [2023-02-22 15:05:21,531][08504] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3011, enqueue_policy_requests: 125.6160, env_step: 788.3447, overhead: 18.4502, complete_rollouts: 6.5766 save_policy_outputs: 19.0602 split_output_tensors: 9.3002 [2023-02-22 15:05:21,535][08504] Loop Runner_EvtLoop terminating... [2023-02-22 15:05:21,541][08504] Runner profile tree view: main_loop: 1063.1519 [2023-02-22 15:05:21,552][08504] Collected {0: 4005888}, FPS: 3767.9 [2023-02-22 15:05:23,314][08504] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 15:05:23,316][08504] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 15:05:23,334][08504] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 15:05:23,340][08504] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 15:05:23,345][08504] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 15:05:23,348][08504] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 15:05:23,354][08504] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 15:05:23,355][08504] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 15:05:23,358][08504] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 15:05:23,359][08504] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 15:05:23,377][08504] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 15:05:23,384][08504] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 15:05:23,385][08504] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 15:05:23,402][08504] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 15:05:23,403][08504] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 15:05:23,502][08504] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 15:05:23,513][08504] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 15:05:23,516][08504] RunningMeanStd input shape: (1,) [2023-02-22 15:05:23,590][08504] ConvEncoder: input_channels=3 [2023-02-22 15:05:24,941][08504] Conv encoder output size: 512 [2023-02-22 15:05:24,948][08504] Policy head output size: 512 [2023-02-22 15:05:28,226][08504] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 15:05:29,486][08504] Num frames 100... [2023-02-22 15:05:29,609][08504] Num frames 200... [2023-02-22 15:05:29,772][08504] Avg episode rewards: #0: 5.900, true rewards: #0: 2.900 [2023-02-22 15:05:29,774][08504] Avg episode reward: 5.900, avg true_objective: 2.900 [2023-02-22 15:05:29,789][08504] Num frames 300... [2023-02-22 15:05:29,898][08504] Num frames 400... [2023-02-22 15:05:30,008][08504] Num frames 500... [2023-02-22 15:05:30,121][08504] Num frames 600... [2023-02-22 15:05:30,232][08504] Num frames 700... [2023-02-22 15:05:30,342][08504] Num frames 800... [2023-02-22 15:05:30,455][08504] Num frames 900... [2023-02-22 15:05:30,571][08504] Num frames 1000... [2023-02-22 15:05:30,682][08504] Num frames 1100... [2023-02-22 15:05:30,800][08504] Num frames 1200... [2023-02-22 15:05:30,910][08504] Num frames 1300... [2023-02-22 15:05:31,020][08504] Num frames 1400... [2023-02-22 15:05:31,130][08504] Num frames 1500... [2023-02-22 15:05:31,243][08504] Num frames 1600... [2023-02-22 15:05:31,359][08504] Num frames 1700... [2023-02-22 15:05:31,472][08504] Num frames 1800... [2023-02-22 15:05:31,596][08504] Num frames 1900... [2023-02-22 15:05:31,716][08504] Num frames 2000... [2023-02-22 15:05:31,825][08504] Num frames 2100... [2023-02-22 15:05:31,937][08504] Num frames 2200... [2023-02-22 15:05:32,045][08504] Num frames 2300... [2023-02-22 15:05:32,163][08504] Avg episode rewards: #0: 32.779, true rewards: #0: 11.780 [2023-02-22 15:05:32,164][08504] Avg episode reward: 32.779, avg true_objective: 11.780 [2023-02-22 15:05:32,216][08504] Num frames 2400... [2023-02-22 15:05:32,325][08504] Num frames 2500... [2023-02-22 15:05:32,442][08504] Num frames 2600... [2023-02-22 15:05:32,556][08504] Num frames 2700... [2023-02-22 15:05:32,670][08504] Num frames 2800... [2023-02-22 15:05:32,778][08504] Num frames 2900... [2023-02-22 15:05:32,887][08504] Num frames 3000... [2023-02-22 15:05:32,999][08504] Num frames 3100... [2023-02-22 15:05:33,107][08504] Num frames 3200... [2023-02-22 15:05:33,222][08504] Avg episode rewards: #0: 28.506, true rewards: #0: 10.840 [2023-02-22 15:05:33,224][08504] Avg episode reward: 28.506, avg true_objective: 10.840 [2023-02-22 15:05:33,279][08504] Num frames 3300... [2023-02-22 15:05:33,405][08504] Num frames 3400... [2023-02-22 15:05:33,511][08504] Num frames 3500... [2023-02-22 15:05:33,633][08504] Num frames 3600... [2023-02-22 15:05:33,757][08504] Num frames 3700... [2023-02-22 15:05:33,865][08504] Num frames 3800... [2023-02-22 15:05:33,977][08504] Num frames 3900... [2023-02-22 15:05:34,086][08504] Num frames 4000... [2023-02-22 15:05:34,196][08504] Num frames 4100... [2023-02-22 15:05:34,312][08504] Num frames 4200... [2023-02-22 15:05:34,431][08504] Num frames 4300... [2023-02-22 15:05:34,540][08504] Num frames 4400... [2023-02-22 15:05:34,668][08504] Num frames 4500... [2023-02-22 15:05:34,789][08504] Num frames 4600... [2023-02-22 15:05:34,900][08504] Num frames 4700... [2023-02-22 15:05:35,013][08504] Num frames 4800... [2023-02-22 15:05:35,120][08504] Num frames 4900... [2023-02-22 15:05:35,229][08504] Num frames 5000... [2023-02-22 15:05:35,340][08504] Num frames 5100... [2023-02-22 15:05:35,450][08504] Num frames 5200... [2023-02-22 15:05:35,567][08504] Num frames 5300... [2023-02-22 15:05:35,692][08504] Avg episode rewards: #0: 35.379, true rewards: #0: 13.380 [2023-02-22 15:05:35,694][08504] Avg episode reward: 35.379, avg true_objective: 13.380 [2023-02-22 15:05:35,753][08504] Num frames 5400... [2023-02-22 15:05:35,865][08504] Num frames 5500... [2023-02-22 15:05:35,981][08504] Num frames 5600... [2023-02-22 15:05:36,089][08504] Num frames 5700... [2023-02-22 15:05:36,199][08504] Num frames 5800... [2023-02-22 15:05:36,317][08504] Num frames 5900... [2023-02-22 15:05:36,434][08504] Num frames 6000... [2023-02-22 15:05:36,543][08504] Num frames 6100... [2023-02-22 15:05:36,656][08504] Num frames 6200... [2023-02-22 15:05:36,773][08504] Num frames 6300... [2023-02-22 15:05:36,886][08504] Num frames 6400... [2023-02-22 15:05:36,997][08504] Num frames 6500... [2023-02-22 15:05:37,106][08504] Num frames 6600... [2023-02-22 15:05:37,262][08504] Num frames 6700... [2023-02-22 15:05:37,419][08504] Num frames 6800... [2023-02-22 15:05:37,567][08504] Num frames 6900... [2023-02-22 15:05:37,736][08504] Num frames 7000... [2023-02-22 15:05:37,890][08504] Num frames 7100... [2023-02-22 15:05:38,045][08504] Num frames 7200... [2023-02-22 15:05:38,192][08504] Num frames 7300... [2023-02-22 15:05:38,346][08504] Num frames 7400... [2023-02-22 15:05:38,478][08504] Avg episode rewards: #0: 39.903, true rewards: #0: 14.904 [2023-02-22 15:05:38,485][08504] Avg episode reward: 39.903, avg true_objective: 14.904 [2023-02-22 15:05:38,562][08504] Num frames 7500... [2023-02-22 15:05:38,709][08504] Num frames 7600... [2023-02-22 15:05:38,859][08504] Num frames 7700... [2023-02-22 15:05:39,011][08504] Num frames 7800... [2023-02-22 15:05:39,157][08504] Num frames 7900... [2023-02-22 15:05:39,308][08504] Num frames 8000... [2023-02-22 15:05:39,468][08504] Num frames 8100... [2023-02-22 15:05:39,633][08504] Num frames 8200... [2023-02-22 15:05:39,802][08504] Num frames 8300... [2023-02-22 15:05:39,965][08504] Num frames 8400... [2023-02-22 15:05:40,125][08504] Num frames 8500... [2023-02-22 15:05:40,287][08504] Num frames 8600... [2023-02-22 15:05:40,449][08504] Num frames 8700... [2023-02-22 15:05:40,660][08504] Avg episode rewards: #0: 38.159, true rewards: #0: 14.660 [2023-02-22 15:05:40,663][08504] Avg episode reward: 38.159, avg true_objective: 14.660 [2023-02-22 15:05:40,673][08504] Num frames 8800... [2023-02-22 15:05:40,797][08504] Num frames 8900... [2023-02-22 15:05:40,912][08504] Num frames 9000... [2023-02-22 15:05:41,025][08504] Num frames 9100... [2023-02-22 15:05:41,135][08504] Num frames 9200... [2023-02-22 15:05:41,251][08504] Num frames 9300... [2023-02-22 15:05:41,359][08504] Num frames 9400... [2023-02-22 15:05:41,465][08504] Num frames 9500... [2023-02-22 15:05:41,590][08504] Avg episode rewards: #0: 35.520, true rewards: #0: 13.663 [2023-02-22 15:05:41,591][08504] Avg episode reward: 35.520, avg true_objective: 13.663 [2023-02-22 15:05:41,634][08504] Num frames 9600... [2023-02-22 15:05:41,741][08504] Num frames 9700... [2023-02-22 15:05:41,855][08504] Num frames 9800... [2023-02-22 15:05:41,964][08504] Num frames 9900... [2023-02-22 15:05:42,072][08504] Num frames 10000... [2023-02-22 15:05:42,187][08504] Num frames 10100... [2023-02-22 15:05:42,296][08504] Num frames 10200... [2023-02-22 15:05:42,412][08504] Num frames 10300... [2023-02-22 15:05:42,529][08504] Num frames 10400... [2023-02-22 15:05:42,640][08504] Num frames 10500... [2023-02-22 15:05:42,752][08504] Num frames 10600... [2023-02-22 15:05:42,869][08504] Num frames 10700... [2023-02-22 15:05:42,987][08504] Num frames 10800... [2023-02-22 15:05:43,140][08504] Avg episode rewards: #0: 35.611, true rewards: #0: 13.611 [2023-02-22 15:05:43,142][08504] Avg episode reward: 35.611, avg true_objective: 13.611 [2023-02-22 15:05:43,157][08504] Num frames 10900... [2023-02-22 15:05:43,265][08504] Num frames 11000... [2023-02-22 15:05:43,377][08504] Num frames 11100... [2023-02-22 15:05:43,494][08504] Num frames 11200... [2023-02-22 15:05:43,600][08504] Num frames 11300... [2023-02-22 15:05:43,710][08504] Num frames 11400... [2023-02-22 15:05:43,815][08504] Num frames 11500... [2023-02-22 15:05:43,931][08504] Num frames 11600... [2023-02-22 15:05:44,039][08504] Num frames 11700... [2023-02-22 15:05:44,149][08504] Num frames 11800... [2023-02-22 15:05:44,260][08504] Num frames 11900... [2023-02-22 15:05:44,371][08504] Num frames 12000... [2023-02-22 15:05:44,479][08504] Num frames 12100... [2023-02-22 15:05:44,591][08504] Num frames 12200... [2023-02-22 15:05:44,682][08504] Avg episode rewards: #0: 35.703, true rewards: #0: 13.592 [2023-02-22 15:05:44,683][08504] Avg episode reward: 35.703, avg true_objective: 13.592 [2023-02-22 15:05:44,759][08504] Num frames 12300... [2023-02-22 15:05:44,893][08504] Num frames 12400... [2023-02-22 15:05:45,021][08504] Num frames 12500... [2023-02-22 15:05:45,143][08504] Num frames 12600... [2023-02-22 15:05:45,254][08504] Num frames 12700... [2023-02-22 15:05:45,380][08504] Num frames 12800... [2023-02-22 15:05:45,494][08504] Num frames 12900... [2023-02-22 15:05:45,616][08504] Num frames 13000... [2023-02-22 15:05:45,730][08504] Num frames 13100... [2023-02-22 15:05:45,836][08504] Num frames 13200... [2023-02-22 15:05:45,951][08504] Num frames 13300... [2023-02-22 15:05:46,057][08504] Num frames 13400... [2023-02-22 15:05:46,169][08504] Num frames 13500... [2023-02-22 15:05:46,283][08504] Num frames 13600... [2023-02-22 15:05:46,394][08504] Num frames 13700... [2023-02-22 15:05:46,504][08504] Num frames 13800... [2023-02-22 15:05:46,613][08504] Num frames 13900... [2023-02-22 15:05:46,727][08504] Num frames 14000... [2023-02-22 15:05:46,838][08504] Num frames 14100... [2023-02-22 15:05:46,968][08504] Avg episode rewards: #0: 37.154, true rewards: #0: 14.154 [2023-02-22 15:05:46,971][08504] Avg episode reward: 37.154, avg true_objective: 14.154 [2023-02-22 15:07:11,805][08504] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 15:08:09,706][08504] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 15:08:09,708][08504] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 15:08:09,710][08504] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 15:08:09,716][08504] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 15:08:09,719][08504] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 15:08:09,720][08504] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 15:08:09,723][08504] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-22 15:08:09,725][08504] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 15:08:09,726][08504] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-22 15:08:09,728][08504] Adding new argument 'hf_repository'='happycoding/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-22 15:08:09,731][08504] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 15:08:09,732][08504] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 15:08:09,734][08504] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 15:08:09,739][08504] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 15:08:09,740][08504] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 15:08:09,761][08504] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 15:08:09,763][08504] RunningMeanStd input shape: (1,) [2023-02-22 15:08:09,777][08504] ConvEncoder: input_channels=3 [2023-02-22 15:08:09,813][08504] Conv encoder output size: 512 [2023-02-22 15:08:09,815][08504] Policy head output size: 512 [2023-02-22 15:08:09,834][08504] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 15:08:10,266][08504] Num frames 100... [2023-02-22 15:08:10,382][08504] Num frames 200... [2023-02-22 15:08:10,504][08504] Num frames 300... [2023-02-22 15:08:10,615][08504] Num frames 400... [2023-02-22 15:08:10,774][08504] Num frames 500... [2023-02-22 15:08:10,926][08504] Num frames 600... [2023-02-22 15:08:11,044][08504] Num frames 700... [2023-02-22 15:08:11,154][08504] Num frames 800... [2023-02-22 15:08:11,276][08504] Num frames 900... [2023-02-22 15:08:11,388][08504] Num frames 1000... [2023-02-22 15:08:11,499][08504] Num frames 1100... [2023-02-22 15:08:11,613][08504] Num frames 1200... [2023-02-22 15:08:11,748][08504] Num frames 1300... [2023-02-22 15:08:11,859][08504] Num frames 1400... [2023-02-22 15:08:11,981][08504] Num frames 1500... [2023-02-22 15:08:12,089][08504] Num frames 1600... [2023-02-22 15:08:12,204][08504] Num frames 1700... [2023-02-22 15:08:12,335][08504] Num frames 1800... [2023-02-22 15:08:12,448][08504] Num frames 1900... [2023-02-22 15:08:12,574][08504] Num frames 2000... [2023-02-22 15:08:12,720][08504] Avg episode rewards: #0: 54.799, true rewards: #0: 20.800 [2023-02-22 15:08:12,723][08504] Avg episode reward: 54.799, avg true_objective: 20.800 [2023-02-22 15:08:12,747][08504] Num frames 2100... [2023-02-22 15:08:12,863][08504] Num frames 2200... [2023-02-22 15:08:12,971][08504] Num frames 2300... [2023-02-22 15:08:13,092][08504] Num frames 2400... [2023-02-22 15:08:13,219][08504] Num frames 2500... [2023-02-22 15:08:13,345][08504] Num frames 2600... [2023-02-22 15:08:13,455][08504] Num frames 2700... [2023-02-22 15:08:13,564][08504] Num frames 2800... [2023-02-22 15:08:13,676][08504] Num frames 2900... [2023-02-22 15:08:13,793][08504] Num frames 3000... [2023-02-22 15:08:13,906][08504] Num frames 3100... [2023-02-22 15:08:14,023][08504] Num frames 3200... [2023-02-22 15:08:14,090][08504] Avg episode rewards: #0: 42.044, true rewards: #0: 16.045 [2023-02-22 15:08:14,092][08504] Avg episode reward: 42.044, avg true_objective: 16.045 [2023-02-22 15:08:14,194][08504] Num frames 3300... [2023-02-22 15:08:14,311][08504] Num frames 3400... [2023-02-22 15:08:14,422][08504] Num frames 3500... [2023-02-22 15:08:14,546][08504] Num frames 3600... [2023-02-22 15:08:14,658][08504] Num frames 3700... [2023-02-22 15:08:14,776][08504] Num frames 3800... [2023-02-22 15:08:14,887][08504] Num frames 3900... [2023-02-22 15:08:15,010][08504] Num frames 4000... [2023-02-22 15:08:15,152][08504] Num frames 4100... [2023-02-22 15:08:15,330][08504] Num frames 4200... [2023-02-22 15:08:15,462][08504] Num frames 4300... [2023-02-22 15:08:15,577][08504] Num frames 4400... [2023-02-22 15:08:15,696][08504] Num frames 4500... [2023-02-22 15:08:15,776][08504] Avg episode rewards: #0: 39.403, true rewards: #0: 15.070 [2023-02-22 15:08:15,778][08504] Avg episode reward: 39.403, avg true_objective: 15.070 [2023-02-22 15:08:15,865][08504] Num frames 4600... [2023-02-22 15:08:15,984][08504] Num frames 4700... [2023-02-22 15:08:16,096][08504] Num frames 4800... [2023-02-22 15:08:16,207][08504] Num frames 4900... [2023-02-22 15:08:16,321][08504] Num frames 5000... [2023-02-22 15:08:16,431][08504] Num frames 5100... [2023-02-22 15:08:16,543][08504] Num frames 5200... [2023-02-22 15:08:16,679][08504] Avg episode rewards: #0: 33.675, true rewards: #0: 13.175 [2023-02-22 15:08:16,680][08504] Avg episode reward: 33.675, avg true_objective: 13.175 [2023-02-22 15:08:16,716][08504] Num frames 5300... [2023-02-22 15:08:16,833][08504] Num frames 5400... [2023-02-22 15:08:16,944][08504] Num frames 5500... [2023-02-22 15:08:17,063][08504] Num frames 5600... [2023-02-22 15:08:17,183][08504] Num frames 5700... [2023-02-22 15:08:17,305][08504] Num frames 5800... [2023-02-22 15:08:17,426][08504] Num frames 5900... [2023-02-22 15:08:17,548][08504] Num frames 6000... [2023-02-22 15:08:17,670][08504] Num frames 6100... [2023-02-22 15:08:17,829][08504] Num frames 6200... [2023-02-22 15:08:17,983][08504] Num frames 6300... [2023-02-22 15:08:18,145][08504] Num frames 6400... [2023-02-22 15:08:18,314][08504] Avg episode rewards: #0: 32.742, true rewards: #0: 12.942 [2023-02-22 15:08:18,316][08504] Avg episode reward: 32.742, avg true_objective: 12.942 [2023-02-22 15:08:18,363][08504] Num frames 6500... [2023-02-22 15:08:18,523][08504] Num frames 6600... [2023-02-22 15:08:18,684][08504] Num frames 6700... [2023-02-22 15:08:18,844][08504] Num frames 6800... [2023-02-22 15:08:18,999][08504] Num frames 6900... [2023-02-22 15:08:19,183][08504] Num frames 7000... [2023-02-22 15:08:19,340][08504] Num frames 7100... [2023-02-22 15:08:19,491][08504] Num frames 7200... [2023-02-22 15:08:19,663][08504] Num frames 7300... [2023-02-22 15:08:19,832][08504] Num frames 7400... [2023-02-22 15:08:19,988][08504] Num frames 7500... [2023-02-22 15:08:20,284][08504] Num frames 7600... [2023-02-22 15:08:20,624][08504] Num frames 7700... [2023-02-22 15:08:20,940][08504] Num frames 7800... [2023-02-22 15:08:21,245][08504] Num frames 7900... [2023-02-22 15:08:21,587][08504] Num frames 8000... [2023-02-22 15:08:21,879][08504] Num frames 8100... [2023-02-22 15:08:22,131][08504] Num frames 8200... [2023-02-22 15:08:22,323][08504] Num frames 8300... [2023-02-22 15:08:22,518][08504] Num frames 8400... [2023-02-22 15:08:22,686][08504] Num frames 8500... [2023-02-22 15:08:22,877][08504] Avg episode rewards: #0: 35.951, true rewards: #0: 14.285 [2023-02-22 15:08:22,884][08504] Avg episode reward: 35.951, avg true_objective: 14.285 [2023-02-22 15:08:22,948][08504] Num frames 8600... [2023-02-22 15:08:23,161][08504] Num frames 8700... [2023-02-22 15:08:23,332][08504] Num frames 8800... [2023-02-22 15:08:23,516][08504] Num frames 8900... [2023-02-22 15:08:23,798][08504] Num frames 9000... [2023-02-22 15:08:23,995][08504] Num frames 9100... [2023-02-22 15:08:24,152][08504] Num frames 9200... [2023-02-22 15:08:24,327][08504] Num frames 9300... [2023-02-22 15:08:24,532][08504] Num frames 9400... [2023-02-22 15:08:24,728][08504] Num frames 9500... [2023-02-22 15:08:24,945][08504] Num frames 9600... [2023-02-22 15:08:25,116][08504] Num frames 9700... [2023-02-22 15:08:25,272][08504] Num frames 9800... [2023-02-22 15:08:25,436][08504] Num frames 9900... [2023-02-22 15:08:25,694][08504] Num frames 10000... [2023-02-22 15:08:26,104][08504] Num frames 10100... [2023-02-22 15:08:26,338][08504] Num frames 10200... [2023-02-22 15:08:26,621][08504] Num frames 10300... [2023-02-22 15:08:26,982][08504] Num frames 10400... [2023-02-22 15:08:27,255][08504] Num frames 10500... [2023-02-22 15:08:27,612][08504] Num frames 10600... [2023-02-22 15:08:27,920][08504] Avg episode rewards: #0: 38.672, true rewards: #0: 15.244 [2023-02-22 15:08:27,925][08504] Avg episode reward: 38.672, avg true_objective: 15.244 [2023-02-22 15:08:28,012][08504] Num frames 10700... [2023-02-22 15:08:28,241][08504] Num frames 10800... [2023-02-22 15:08:28,401][08504] Num frames 10900... [2023-02-22 15:08:28,575][08504] Num frames 11000... [2023-02-22 15:08:28,762][08504] Num frames 11100... [2023-02-22 15:08:28,978][08504] Num frames 11200... [2023-02-22 15:08:29,167][08504] Num frames 11300... [2023-02-22 15:08:29,367][08504] Num frames 11400... [2023-02-22 15:08:29,556][08504] Num frames 11500... [2023-02-22 15:08:29,729][08504] Num frames 11600... [2023-02-22 15:08:29,905][08504] Num frames 11700... [2023-02-22 15:08:30,110][08504] Num frames 11800... [2023-02-22 15:08:30,282][08504] Num frames 11900... [2023-02-22 15:08:30,515][08504] Num frames 12000... [2023-02-22 15:08:30,711][08504] Num frames 12100... [2023-02-22 15:08:30,919][08504] Avg episode rewards: #0: 38.590, true rewards: #0: 15.215 [2023-02-22 15:08:30,922][08504] Avg episode reward: 38.590, avg true_objective: 15.215 [2023-02-22 15:08:31,032][08504] Num frames 12200... [2023-02-22 15:08:31,157][08504] Num frames 12300... [2023-02-22 15:08:31,276][08504] Num frames 12400... [2023-02-22 15:08:31,397][08504] Num frames 12500... [2023-02-22 15:08:31,510][08504] Num frames 12600... [2023-02-22 15:08:31,644][08504] Num frames 12700... [2023-02-22 15:08:31,815][08504] Num frames 12800... [2023-02-22 15:08:31,974][08504] Num frames 12900... [2023-02-22 15:08:32,181][08504] Avg episode rewards: #0: 36.219, true rewards: #0: 14.441 [2023-02-22 15:08:32,184][08504] Avg episode reward: 36.219, avg true_objective: 14.441 [2023-02-22 15:08:32,190][08504] Num frames 13000... [2023-02-22 15:08:32,348][08504] Num frames 13100... [2023-02-22 15:08:32,503][08504] Num frames 13200... [2023-02-22 15:08:32,658][08504] Num frames 13300... [2023-02-22 15:08:32,813][08504] Num frames 13400... [2023-02-22 15:08:32,974][08504] Num frames 13500... [2023-02-22 15:08:33,141][08504] Num frames 13600... [2023-02-22 15:08:33,301][08504] Num frames 13700... [2023-02-22 15:08:33,458][08504] Num frames 13800... [2023-02-22 15:08:33,622][08504] Num frames 13900... [2023-02-22 15:08:33,778][08504] Num frames 14000... [2023-02-22 15:08:33,955][08504] Num frames 14100... [2023-02-22 15:08:34,120][08504] Num frames 14200... [2023-02-22 15:08:34,275][08504] Num frames 14300... [2023-02-22 15:08:34,438][08504] Num frames 14400... [2023-02-22 15:08:34,604][08504] Num frames 14500... [2023-02-22 15:08:34,770][08504] Num frames 14600... [2023-02-22 15:08:34,939][08504] Num frames 14700... [2023-02-22 15:08:35,155][08504] Avg episode rewards: #0: 37.598, true rewards: #0: 14.798 [2023-02-22 15:08:35,158][08504] Avg episode reward: 37.598, avg true_objective: 14.798 [2023-02-22 15:08:35,163][08504] Num frames 14800... [2023-02-22 15:10:03,984][08504] Replay video saved to /content/train_dir/default_experiment/replay.mp4!