[2023-02-22 10:55:47,582][00415] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 10:55:47,585][00415] Rollout worker 0 uses device cpu [2023-02-22 10:55:47,587][00415] Rollout worker 1 uses device cpu [2023-02-22 10:55:47,588][00415] Rollout worker 2 uses device cpu [2023-02-22 10:55:47,589][00415] Rollout worker 3 uses device cpu [2023-02-22 10:55:47,591][00415] Rollout worker 4 uses device cpu [2023-02-22 10:55:47,592][00415] Rollout worker 5 uses device cpu [2023-02-22 10:55:47,593][00415] Rollout worker 6 uses device cpu [2023-02-22 10:55:47,595][00415] Rollout worker 7 uses device cpu [2023-02-22 10:55:47,782][00415] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 10:55:47,784][00415] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 10:55:47,818][00415] Starting all processes... [2023-02-22 10:55:47,820][00415] Starting process learner_proc0 [2023-02-22 10:55:47,875][00415] Starting all processes... [2023-02-22 10:55:47,888][00415] Starting process inference_proc0-0 [2023-02-22 10:55:47,890][00415] Starting process rollout_proc0 [2023-02-22 10:55:47,890][00415] Starting process rollout_proc1 [2023-02-22 10:55:47,890][00415] Starting process rollout_proc2 [2023-02-22 10:55:47,890][00415] Starting process rollout_proc3 [2023-02-22 10:55:47,890][00415] Starting process rollout_proc4 [2023-02-22 10:55:47,891][00415] Starting process rollout_proc5 [2023-02-22 10:55:47,895][00415] Starting process rollout_proc6 [2023-02-22 10:55:47,911][00415] Starting process rollout_proc7 [2023-02-22 10:55:57,528][11310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 10:55:57,538][11310] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 10:55:57,611][11331] Worker 7 uses CPU cores [1] [2023-02-22 10:55:57,718][11327] Worker 2 uses CPU cores [0] [2023-02-22 10:55:57,849][11325] Worker 0 uses CPU cores [0] [2023-02-22 10:55:57,871][11328] Worker 3 uses CPU cores [1] [2023-02-22 10:55:57,890][11329] Worker 4 uses CPU cores [0] [2023-02-22 10:55:57,929][11330] Worker 6 uses CPU cores [0] [2023-02-22 10:55:57,977][11326] Worker 1 uses CPU cores [1] [2023-02-22 10:55:58,019][11332] Worker 5 uses CPU cores [1] [2023-02-22 10:55:58,086][11324] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 10:55:58,086][11324] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 10:55:58,609][11324] Num visible devices: 1 [2023-02-22 10:55:58,612][11310] Num visible devices: 1 [2023-02-22 10:55:58,616][11310] Starting seed is not provided [2023-02-22 10:55:58,616][11310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 10:55:58,616][11310] Initializing actor-critic model on device cuda:0 [2023-02-22 10:55:58,617][11310] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 10:55:58,619][11310] RunningMeanStd input shape: (1,) [2023-02-22 10:55:58,645][11310] ConvEncoder: input_channels=3 [2023-02-22 10:55:58,986][11310] Conv encoder output size: 512 [2023-02-22 10:55:58,987][11310] Policy head output size: 512 [2023-02-22 10:55:59,048][11310] Created Actor Critic model with architecture: [2023-02-22 10:55:59,048][11310] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 10:56:06,555][11310] Using optimizer [2023-02-22 10:56:06,557][11310] No checkpoints found [2023-02-22 10:56:06,557][11310] Did not load from checkpoint, starting from scratch! [2023-02-22 10:56:06,557][11310] Initialized policy 0 weights for model version 0 [2023-02-22 10:56:06,560][11310] LearnerWorker_p0 finished initialization! [2023-02-22 10:56:06,561][11310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 10:56:06,673][11324] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 10:56:06,674][11324] RunningMeanStd input shape: (1,) [2023-02-22 10:56:06,685][11324] ConvEncoder: input_channels=3 [2023-02-22 10:56:06,781][11324] Conv encoder output size: 512 [2023-02-22 10:56:06,782][11324] Policy head output size: 512 [2023-02-22 10:56:07,775][00415] Heartbeat connected on Batcher_0 [2023-02-22 10:56:07,782][00415] Heartbeat connected on LearnerWorker_p0 [2023-02-22 10:56:07,792][00415] Heartbeat connected on RolloutWorker_w0 [2023-02-22 10:56:07,796][00415] Heartbeat connected on RolloutWorker_w1 [2023-02-22 10:56:07,799][00415] Heartbeat connected on RolloutWorker_w2 [2023-02-22 10:56:07,802][00415] Heartbeat connected on RolloutWorker_w3 [2023-02-22 10:56:07,805][00415] Heartbeat connected on RolloutWorker_w4 [2023-02-22 10:56:07,812][00415] Heartbeat connected on RolloutWorker_w5 [2023-02-22 10:56:07,813][00415] Heartbeat connected on RolloutWorker_w6 [2023-02-22 10:56:07,817][00415] Heartbeat connected on RolloutWorker_w7 [2023-02-22 10:56:07,898][00415] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 10:56:09,030][00415] Inference worker 0-0 is ready! [2023-02-22 10:56:09,031][00415] All inference workers are ready! Signal rollout workers to start! [2023-02-22 10:56:09,037][00415] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 10:56:09,305][11328] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,314][11332] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,320][11331] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,331][11325] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,343][11329] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,355][11327] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,355][11326] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:09,377][11330] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 10:56:11,252][11330] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,253][11329] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,254][11325] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,257][11327] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,279][11331] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,278][11328] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,283][11332] Decorrelating experience for 0 frames... [2023-02-22 10:56:11,284][11326] Decorrelating experience for 0 frames... [2023-02-22 10:56:12,506][11325] Decorrelating experience for 32 frames... [2023-02-22 10:56:12,505][11330] Decorrelating experience for 32 frames... [2023-02-22 10:56:12,579][11331] Decorrelating experience for 32 frames... [2023-02-22 10:56:12,582][11328] Decorrelating experience for 32 frames... [2023-02-22 10:56:12,584][11326] Decorrelating experience for 32 frames... [2023-02-22 10:56:12,897][00415] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 10:56:14,134][11332] Decorrelating experience for 32 frames... [2023-02-22 10:56:14,304][11327] Decorrelating experience for 32 frames... [2023-02-22 10:56:14,334][11329] Decorrelating experience for 32 frames... [2023-02-22 10:56:14,555][11331] Decorrelating experience for 64 frames... [2023-02-22 10:56:14,665][11325] Decorrelating experience for 64 frames... [2023-02-22 10:56:14,768][11330] Decorrelating experience for 64 frames... [2023-02-22 10:56:15,654][11326] Decorrelating experience for 64 frames... [2023-02-22 10:56:15,907][11332] Decorrelating experience for 64 frames... [2023-02-22 10:56:16,051][11331] Decorrelating experience for 96 frames... [2023-02-22 10:56:16,632][11326] Decorrelating experience for 96 frames... [2023-02-22 10:56:16,841][11327] Decorrelating experience for 64 frames... [2023-02-22 10:56:16,882][11329] Decorrelating experience for 64 frames... [2023-02-22 10:56:17,096][11325] Decorrelating experience for 96 frames... [2023-02-22 10:56:17,658][11328] Decorrelating experience for 64 frames... [2023-02-22 10:56:17,784][11332] Decorrelating experience for 96 frames... [2023-02-22 10:56:17,898][00415] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 10:56:18,496][11330] Decorrelating experience for 96 frames... [2023-02-22 10:56:18,919][11327] Decorrelating experience for 96 frames... [2023-02-22 10:56:18,922][11329] Decorrelating experience for 96 frames... [2023-02-22 10:56:19,460][11328] Decorrelating experience for 96 frames... [2023-02-22 10:56:22,897][00415] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 15.1. Samples: 226. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 10:56:22,900][00415] Avg episode reward: [(0, '1.585')] [2023-02-22 10:56:23,288][11310] Signal inference workers to stop experience collection... [2023-02-22 10:56:23,310][11324] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 10:56:25,818][11310] Signal inference workers to resume experience collection... [2023-02-22 10:56:25,820][11324] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 10:56:27,897][00415] Fps is (10 sec: 1228.9, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 145.9. Samples: 2918. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-22 10:56:27,901][00415] Avg episode reward: [(0, '2.678')] [2023-02-22 10:56:32,898][00415] Fps is (10 sec: 2867.0, 60 sec: 1146.8, 300 sec: 1146.8). Total num frames: 28672. Throughput: 0: 231.4. Samples: 5784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-22 10:56:32,905][00415] Avg episode reward: [(0, '3.570')] [2023-02-22 10:56:36,882][11324] Updated weights for policy 0, policy_version 10 (0.0360) [2023-02-22 10:56:37,898][00415] Fps is (10 sec: 2867.1, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 40960. Throughput: 0: 334.9. Samples: 10046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 10:56:37,904][00415] Avg episode reward: [(0, '4.065')] [2023-02-22 10:56:42,897][00415] Fps is (10 sec: 3277.0, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 441.1. Samples: 15438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 10:56:42,904][00415] Avg episode reward: [(0, '4.354')] [2023-02-22 10:56:46,950][11324] Updated weights for policy 0, policy_version 20 (0.0030) [2023-02-22 10:56:47,897][00415] Fps is (10 sec: 4096.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 470.2. Samples: 18806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 10:56:47,904][00415] Avg episode reward: [(0, '4.337')] [2023-02-22 10:56:52,898][00415] Fps is (10 sec: 4095.8, 60 sec: 2275.5, 300 sec: 2275.5). Total num frames: 102400. Throughput: 0: 554.8. Samples: 24966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:56:52,905][00415] Avg episode reward: [(0, '4.329')] [2023-02-22 10:56:52,919][11310] Saving new best policy, reward=4.329! [2023-02-22 10:56:57,897][00415] Fps is (10 sec: 3276.8, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 114688. Throughput: 0: 644.0. Samples: 28978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:56:57,901][00415] Avg episode reward: [(0, '4.404')] [2023-02-22 10:56:57,906][11310] Saving new best policy, reward=4.404! [2023-02-22 10:57:00,068][11324] Updated weights for policy 0, policy_version 30 (0.0016) [2023-02-22 10:57:02,898][00415] Fps is (10 sec: 2867.3, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 691.2. Samples: 31104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:57:02,905][00415] Avg episode reward: [(0, '4.468')] [2023-02-22 10:57:02,915][11310] Saving new best policy, reward=4.468! [2023-02-22 10:57:07,897][00415] Fps is (10 sec: 4096.0, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 829.3. Samples: 37546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:57:07,900][00415] Avg episode reward: [(0, '4.336')] [2023-02-22 10:57:09,681][11324] Updated weights for policy 0, policy_version 40 (0.0033) [2023-02-22 10:57:12,898][00415] Fps is (10 sec: 4096.0, 60 sec: 2867.2, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 897.1. Samples: 43288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:57:12,903][00415] Avg episode reward: [(0, '4.380')] [2023-02-22 10:57:17,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2633.1). Total num frames: 184320. Throughput: 0: 880.1. Samples: 45388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:57:17,902][00415] Avg episode reward: [(0, '4.402')] [2023-02-22 10:57:22,749][11324] Updated weights for policy 0, policy_version 50 (0.0027) [2023-02-22 10:57:22,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 204800. Throughput: 0: 881.5. Samples: 49712. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 10:57:22,906][00415] Avg episode reward: [(0, '4.418')] [2023-02-22 10:57:27,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 2816.0). Total num frames: 225280. Throughput: 0: 914.5. Samples: 56590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:57:27,903][00415] Avg episode reward: [(0, '4.436')] [2023-02-22 10:57:32,673][11324] Updated weights for policy 0, policy_version 60 (0.0016) [2023-02-22 10:57:32,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 913.5. Samples: 59912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:57:32,903][00415] Avg episode reward: [(0, '4.354')] [2023-02-22 10:57:37,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 2867.2). Total num frames: 258048. Throughput: 0: 872.7. Samples: 64236. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 10:57:37,903][00415] Avg episode reward: [(0, '4.313')] [2023-02-22 10:57:42,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 2888.8). Total num frames: 274432. Throughput: 0: 889.4. Samples: 69002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:57:42,900][00415] Avg episode reward: [(0, '4.529')] [2023-02-22 10:57:42,919][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2023-02-22 10:57:43,033][11310] Saving new best policy, reward=4.529! [2023-02-22 10:57:45,121][11324] Updated weights for policy 0, policy_version 70 (0.0020) [2023-02-22 10:57:47,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 914.1. Samples: 72238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:57:47,904][00415] Avg episode reward: [(0, '4.489')] [2023-02-22 10:57:52,900][00415] Fps is (10 sec: 4095.2, 60 sec: 3549.8, 300 sec: 3003.7). Total num frames: 315392. Throughput: 0: 914.4. Samples: 78694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 10:57:52,905][00415] Avg episode reward: [(0, '4.482')] [2023-02-22 10:57:56,489][11324] Updated weights for policy 0, policy_version 80 (0.0029) [2023-02-22 10:57:57,898][00415] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 2978.9). Total num frames: 327680. Throughput: 0: 879.8. Samples: 82880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:57:57,905][00415] Avg episode reward: [(0, '4.457')] [2023-02-22 10:58:02,898][00415] Fps is (10 sec: 2867.8, 60 sec: 3549.9, 300 sec: 2991.9). Total num frames: 344064. Throughput: 0: 879.6. Samples: 84968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:58:02,900][00415] Avg episode reward: [(0, '4.222')] [2023-02-22 10:58:07,571][11324] Updated weights for policy 0, policy_version 90 (0.0018) [2023-02-22 10:58:07,897][00415] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3072.0). Total num frames: 368640. Throughput: 0: 917.2. Samples: 90984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:58:07,900][00415] Avg episode reward: [(0, '4.301')] [2023-02-22 10:58:12,898][00415] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3113.0). Total num frames: 389120. Throughput: 0: 912.9. Samples: 97670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:58:12,904][00415] Avg episode reward: [(0, '4.444')] [2023-02-22 10:58:17,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3087.8). Total num frames: 401408. Throughput: 0: 886.2. Samples: 99792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:58:17,903][00415] Avg episode reward: [(0, '4.568')] [2023-02-22 10:58:17,908][11310] Saving new best policy, reward=4.568! [2023-02-22 10:58:19,848][11324] Updated weights for policy 0, policy_version 100 (0.0012) [2023-02-22 10:58:22,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3094.8). Total num frames: 417792. Throughput: 0: 879.1. Samples: 103796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:58:22,904][00415] Avg episode reward: [(0, '4.541')] [2023-02-22 10:58:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3130.5). Total num frames: 438272. Throughput: 0: 911.6. Samples: 110024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 10:58:27,904][00415] Avg episode reward: [(0, '4.506')] [2023-02-22 10:58:29,860][11324] Updated weights for policy 0, policy_version 110 (0.0013) [2023-02-22 10:58:32,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3163.8). Total num frames: 458752. Throughput: 0: 913.8. Samples: 113360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 10:58:32,902][00415] Avg episode reward: [(0, '4.600')] [2023-02-22 10:58:32,914][11310] Saving new best policy, reward=4.600! [2023-02-22 10:58:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3167.6). Total num frames: 475136. Throughput: 0: 881.5. Samples: 118358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:58:37,906][00415] Avg episode reward: [(0, '4.569')] [2023-02-22 10:58:42,898][00415] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 3144.7). Total num frames: 487424. Throughput: 0: 875.9. Samples: 122294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 10:58:42,904][00415] Avg episode reward: [(0, '4.493')] [2023-02-22 10:58:43,624][11324] Updated weights for policy 0, policy_version 120 (0.0033) [2023-02-22 10:58:47,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3174.4). Total num frames: 507904. Throughput: 0: 889.2. Samples: 124984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 10:58:47,901][00415] Avg episode reward: [(0, '4.480')] [2023-02-22 10:58:52,897][00415] Fps is (10 sec: 4096.4, 60 sec: 3550.0, 300 sec: 3202.3). Total num frames: 528384. Throughput: 0: 895.2. Samples: 131270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:58:52,906][00415] Avg episode reward: [(0, '4.694')] [2023-02-22 10:58:52,918][11310] Saving new best policy, reward=4.694! [2023-02-22 10:58:53,475][11324] Updated weights for policy 0, policy_version 130 (0.0015) [2023-02-22 10:58:57,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3204.5). Total num frames: 544768. Throughput: 0: 855.3. Samples: 136158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 10:58:57,903][00415] Avg episode reward: [(0, '4.755')] [2023-02-22 10:58:57,908][11310] Saving new best policy, reward=4.755! [2023-02-22 10:59:02,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3183.2). Total num frames: 557056. Throughput: 0: 852.0. Samples: 138132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:59:02,901][00415] Avg episode reward: [(0, '4.674')] [2023-02-22 10:59:07,113][11324] Updated weights for policy 0, policy_version 140 (0.0046) [2023-02-22 10:59:07,899][00415] Fps is (10 sec: 2866.6, 60 sec: 3413.2, 300 sec: 3185.7). Total num frames: 573440. Throughput: 0: 868.5. Samples: 142882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:59:07,906][00415] Avg episode reward: [(0, '4.411')] [2023-02-22 10:59:12,900][00415] Fps is (10 sec: 2866.6, 60 sec: 3276.7, 300 sec: 3166.1). Total num frames: 585728. Throughput: 0: 824.0. Samples: 147106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:59:12,903][00415] Avg episode reward: [(0, '4.422')] [2023-02-22 10:59:17,897][00415] Fps is (10 sec: 2458.1, 60 sec: 3276.8, 300 sec: 3147.5). Total num frames: 598016. Throughput: 0: 790.0. Samples: 148908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 10:59:17,907][00415] Avg episode reward: [(0, '4.484')] [2023-02-22 10:59:22,898][00415] Fps is (10 sec: 2458.1, 60 sec: 3208.5, 300 sec: 3129.8). Total num frames: 610304. Throughput: 0: 761.0. Samples: 152604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:59:22,906][00415] Avg episode reward: [(0, '4.809')] [2023-02-22 10:59:22,918][11310] Saving new best policy, reward=4.809! [2023-02-22 10:59:23,744][11324] Updated weights for policy 0, policy_version 150 (0.0043) [2023-02-22 10:59:27,902][00415] Fps is (10 sec: 2865.8, 60 sec: 3140.0, 300 sec: 3133.4). Total num frames: 626688. Throughput: 0: 773.5. Samples: 157106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 10:59:27,906][00415] Avg episode reward: [(0, '4.726')] [2023-02-22 10:59:32,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3156.9). Total num frames: 647168. Throughput: 0: 782.0. Samples: 160176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 10:59:32,900][00415] Avg episode reward: [(0, '4.570')] [2023-02-22 10:59:34,437][11324] Updated weights for policy 0, policy_version 160 (0.0023) [2023-02-22 10:59:37,897][00415] Fps is (10 sec: 3688.2, 60 sec: 3140.3, 300 sec: 3159.8). Total num frames: 663552. Throughput: 0: 762.7. Samples: 165592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 10:59:37,900][00415] Avg episode reward: [(0, '4.547')] [2023-02-22 10:59:42,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3124.4). Total num frames: 671744. Throughput: 0: 725.0. Samples: 168782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 10:59:42,903][00415] Avg episode reward: [(0, '4.574')] [2023-02-22 10:59:42,919][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth... [2023-02-22 10:59:47,898][00415] Fps is (10 sec: 2047.8, 60 sec: 2935.4, 300 sec: 3109.2). Total num frames: 684032. Throughput: 0: 716.5. Samples: 170376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 10:59:47,902][00415] Avg episode reward: [(0, '4.514')] [2023-02-22 10:59:52,897][00415] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 3076.6). Total num frames: 692224. Throughput: 0: 677.1. Samples: 173350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:59:52,901][00415] Avg episode reward: [(0, '4.525')] [2023-02-22 10:59:53,045][11324] Updated weights for policy 0, policy_version 170 (0.0042) [2023-02-22 10:59:57,897][00415] Fps is (10 sec: 2867.4, 60 sec: 2798.9, 300 sec: 3098.7). Total num frames: 712704. Throughput: 0: 702.3. Samples: 178708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 10:59:57,900][00415] Avg episode reward: [(0, '4.573')] [2023-02-22 11:00:02,897][00415] Fps is (10 sec: 4096.0, 60 sec: 2935.5, 300 sec: 3119.9). Total num frames: 733184. Throughput: 0: 729.8. Samples: 181748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:00:02,902][00415] Avg episode reward: [(0, '4.597')] [2023-02-22 11:00:04,038][11324] Updated weights for policy 0, policy_version 180 (0.0019) [2023-02-22 11:00:07,897][00415] Fps is (10 sec: 3276.8, 60 sec: 2867.3, 300 sec: 3106.1). Total num frames: 745472. Throughput: 0: 752.0. Samples: 186446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:00:07,902][00415] Avg episode reward: [(0, '4.616')] [2023-02-22 11:00:12,897][00415] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 3092.9). Total num frames: 757760. Throughput: 0: 736.0. Samples: 190224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:12,904][00415] Avg episode reward: [(0, '4.577')] [2023-02-22 11:00:17,130][11324] Updated weights for policy 0, policy_version 190 (0.0018) [2023-02-22 11:00:17,898][00415] Fps is (10 sec: 3276.6, 60 sec: 3003.7, 300 sec: 3113.0). Total num frames: 778240. Throughput: 0: 728.9. Samples: 192978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:17,901][00415] Avg episode reward: [(0, '4.727')] [2023-02-22 11:00:22,898][00415] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3132.2). Total num frames: 798720. Throughput: 0: 742.1. Samples: 198988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:00:22,900][00415] Avg episode reward: [(0, '4.860')] [2023-02-22 11:00:22,910][11310] Saving new best policy, reward=4.860! [2023-02-22 11:00:27,897][00415] Fps is (10 sec: 3277.0, 60 sec: 3072.3, 300 sec: 3119.3). Total num frames: 811008. Throughput: 0: 769.9. Samples: 203426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:00:27,903][00415] Avg episode reward: [(0, '4.683')] [2023-02-22 11:00:30,137][11324] Updated weights for policy 0, policy_version 200 (0.0027) [2023-02-22 11:00:32,897][00415] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3106.8). Total num frames: 823296. Throughput: 0: 778.9. Samples: 205424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:00:32,903][00415] Avg episode reward: [(0, '4.514')] [2023-02-22 11:00:37,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3125.1). Total num frames: 843776. Throughput: 0: 812.1. Samples: 209894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:00:37,900][00415] Avg episode reward: [(0, '4.416')] [2023-02-22 11:00:41,636][11324] Updated weights for policy 0, policy_version 210 (0.0020) [2023-02-22 11:00:42,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3142.7). Total num frames: 864256. Throughput: 0: 830.6. Samples: 216086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:42,899][00415] Avg episode reward: [(0, '4.593')] [2023-02-22 11:00:47,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3145.1). Total num frames: 880640. Throughput: 0: 824.9. Samples: 218870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:47,904][00415] Avg episode reward: [(0, '4.696')] [2023-02-22 11:00:52,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3133.1). Total num frames: 892928. Throughput: 0: 806.1. Samples: 222720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:52,906][00415] Avg episode reward: [(0, '4.688')] [2023-02-22 11:00:55,620][11324] Updated weights for policy 0, policy_version 220 (0.0017) [2023-02-22 11:00:57,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3135.6). Total num frames: 909312. Throughput: 0: 824.2. Samples: 227312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:00:57,900][00415] Avg episode reward: [(0, '4.670')] [2023-02-22 11:01:02,897][00415] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3151.8). Total num frames: 929792. Throughput: 0: 830.9. Samples: 230368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:02,900][00415] Avg episode reward: [(0, '4.796')] [2023-02-22 11:01:05,985][11324] Updated weights for policy 0, policy_version 230 (0.0038) [2023-02-22 11:01:07,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 946176. Throughput: 0: 826.2. Samples: 236168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:01:07,902][00415] Avg episode reward: [(0, '4.671')] [2023-02-22 11:01:12,899][00415] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 958464. Throughput: 0: 814.8. Samples: 240094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:12,901][00415] Avg episode reward: [(0, '4.801')] [2023-02-22 11:01:17,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 974848. Throughput: 0: 812.5. Samples: 241986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:01:17,900][00415] Avg episode reward: [(0, '4.595')] [2023-02-22 11:01:19,574][11324] Updated weights for policy 0, policy_version 240 (0.0027) [2023-02-22 11:01:22,898][00415] Fps is (10 sec: 3686.8, 60 sec: 3276.8, 300 sec: 3332.3). Total num frames: 995328. Throughput: 0: 839.9. Samples: 247688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:22,902][00415] Avg episode reward: [(0, '4.542')] [2023-02-22 11:01:27,898][00415] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1011712. Throughput: 0: 833.3. Samples: 253584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:27,902][00415] Avg episode reward: [(0, '4.560')] [2023-02-22 11:01:31,424][11324] Updated weights for policy 0, policy_version 250 (0.0023) [2023-02-22 11:01:32,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1024000. Throughput: 0: 815.5. Samples: 255566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:01:32,904][00415] Avg episode reward: [(0, '4.603')] [2023-02-22 11:01:37,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 1040384. Throughput: 0: 818.7. Samples: 259562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:01:37,901][00415] Avg episode reward: [(0, '4.552')] [2023-02-22 11:01:42,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 1060864. Throughput: 0: 847.3. Samples: 265442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:42,904][00415] Avg episode reward: [(0, '4.834')] [2023-02-22 11:01:42,915][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth... [2023-02-22 11:01:43,043][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2023-02-22 11:01:43,485][11324] Updated weights for policy 0, policy_version 260 (0.0021) [2023-02-22 11:01:47,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1081344. Throughput: 0: 846.7. Samples: 268468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:01:47,900][00415] Avg episode reward: [(0, '4.743')] [2023-02-22 11:01:52,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1093632. Throughput: 0: 824.5. Samples: 273272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:01:52,900][00415] Avg episode reward: [(0, '4.774')] [2023-02-22 11:01:56,372][11324] Updated weights for policy 0, policy_version 270 (0.0017) [2023-02-22 11:01:57,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1110016. Throughput: 0: 832.2. Samples: 277544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:01:57,900][00415] Avg episode reward: [(0, '4.881')] [2023-02-22 11:01:57,908][11310] Saving new best policy, reward=4.881! [2023-02-22 11:02:02,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 1130496. Throughput: 0: 855.6. Samples: 280488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:02,900][00415] Avg episode reward: [(0, '5.236')] [2023-02-22 11:02:02,910][11310] Saving new best policy, reward=5.236! [2023-02-22 11:02:06,400][11324] Updated weights for policy 0, policy_version 280 (0.0027) [2023-02-22 11:02:07,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 1150976. Throughput: 0: 873.1. Samples: 286976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:02:07,900][00415] Avg episode reward: [(0, '5.440')] [2023-02-22 11:02:07,906][11310] Saving new best policy, reward=5.440! [2023-02-22 11:02:12,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3332.3). Total num frames: 1167360. Throughput: 0: 851.1. Samples: 291882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:12,902][00415] Avg episode reward: [(0, '5.155')] [2023-02-22 11:02:17,898][00415] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1179648. Throughput: 0: 854.6. Samples: 294022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:17,906][00415] Avg episode reward: [(0, '5.157')] [2023-02-22 11:02:19,318][11324] Updated weights for policy 0, policy_version 290 (0.0040) [2023-02-22 11:02:22,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 1200128. Throughput: 0: 888.6. Samples: 299550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:02:22,905][00415] Avg episode reward: [(0, '5.137')] [2023-02-22 11:02:27,898][00415] Fps is (10 sec: 4505.5, 60 sec: 3549.8, 300 sec: 3318.4). Total num frames: 1224704. Throughput: 0: 903.1. Samples: 306080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:27,906][00415] Avg episode reward: [(0, '5.323')] [2023-02-22 11:02:28,870][11324] Updated weights for policy 0, policy_version 300 (0.0026) [2023-02-22 11:02:32,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1236992. Throughput: 0: 890.4. Samples: 308538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:32,901][00415] Avg episode reward: [(0, '5.387')] [2023-02-22 11:02:37,897][00415] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1253376. Throughput: 0: 876.0. Samples: 312690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:37,906][00415] Avg episode reward: [(0, '5.270')] [2023-02-22 11:02:41,899][11324] Updated weights for policy 0, policy_version 310 (0.0021) [2023-02-22 11:02:42,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3304.6). Total num frames: 1273856. Throughput: 0: 904.8. Samples: 318260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:02:42,900][00415] Avg episode reward: [(0, '5.160')] [2023-02-22 11:02:47,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1294336. Throughput: 0: 914.8. Samples: 321652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:02:47,900][00415] Avg episode reward: [(0, '5.140')] [2023-02-22 11:02:52,778][11324] Updated weights for policy 0, policy_version 320 (0.0020) [2023-02-22 11:02:52,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3332.3). Total num frames: 1310720. Throughput: 0: 889.6. Samples: 327008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:02:52,901][00415] Avg episode reward: [(0, '5.152')] [2023-02-22 11:02:57,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 1323008. Throughput: 0: 868.8. Samples: 330976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:02:57,902][00415] Avg episode reward: [(0, '5.044')] [2023-02-22 11:03:02,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 1339392. Throughput: 0: 866.9. Samples: 333034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:03:02,900][00415] Avg episode reward: [(0, '5.061')] [2023-02-22 11:03:05,476][11324] Updated weights for policy 0, policy_version 330 (0.0032) [2023-02-22 11:03:07,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 1359872. Throughput: 0: 881.8. Samples: 339232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:03:07,906][00415] Avg episode reward: [(0, '5.223')] [2023-02-22 11:03:12,900][00415] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3304.5). Total num frames: 1376256. Throughput: 0: 856.7. Samples: 344634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:03:12,906][00415] Avg episode reward: [(0, '5.245')] [2023-02-22 11:03:17,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3290.7). Total num frames: 1388544. Throughput: 0: 844.2. Samples: 346526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:03:17,901][00415] Avg episode reward: [(0, '5.168')] [2023-02-22 11:03:18,654][11324] Updated weights for policy 0, policy_version 340 (0.0012) [2023-02-22 11:03:22,897][00415] Fps is (10 sec: 2867.8, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 1404928. Throughput: 0: 838.5. Samples: 350422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:03:22,900][00415] Avg episode reward: [(0, '5.232')] [2023-02-22 11:03:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1425408. Throughput: 0: 849.5. Samples: 356486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:03:27,900][00415] Avg episode reward: [(0, '5.194')] [2023-02-22 11:03:29,469][11324] Updated weights for policy 0, policy_version 350 (0.0018) [2023-02-22 11:03:32,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 1441792. Throughput: 0: 841.6. Samples: 359526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:03:32,900][00415] Avg episode reward: [(0, '5.023')] [2023-02-22 11:03:37,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1454080. Throughput: 0: 808.3. Samples: 363382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:03:37,900][00415] Avg episode reward: [(0, '5.136')] [2023-02-22 11:03:42,898][00415] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1466368. Throughput: 0: 792.8. Samples: 366650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:03:42,903][00415] Avg episode reward: [(0, '5.127')] [2023-02-22 11:03:42,917][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000358_1466368.pth... [2023-02-22 11:03:43,056][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth [2023-02-22 11:03:46,366][11324] Updated weights for policy 0, policy_version 360 (0.0021) [2023-02-22 11:03:47,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3221.3). Total num frames: 1478656. Throughput: 0: 782.2. Samples: 368234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:03:47,905][00415] Avg episode reward: [(0, '5.149')] [2023-02-22 11:03:52,897][00415] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 1499136. Throughput: 0: 763.9. Samples: 373606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:03:52,907][00415] Avg episode reward: [(0, '5.276')] [2023-02-22 11:03:56,674][11324] Updated weights for policy 0, policy_version 370 (0.0024) [2023-02-22 11:03:57,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1515520. Throughput: 0: 780.9. Samples: 379774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:03:57,903][00415] Avg episode reward: [(0, '4.965')] [2023-02-22 11:04:02,903][00415] Fps is (10 sec: 3275.1, 60 sec: 3208.3, 300 sec: 3249.0). Total num frames: 1531904. Throughput: 0: 783.5. Samples: 381786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:04:02,905][00415] Avg episode reward: [(0, '4.837')] [2023-02-22 11:04:07,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3249.1). Total num frames: 1544192. Throughput: 0: 787.9. Samples: 385876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:04:07,900][00415] Avg episode reward: [(0, '4.825')] [2023-02-22 11:04:09,946][11324] Updated weights for policy 0, policy_version 380 (0.0042) [2023-02-22 11:04:12,902][00415] Fps is (10 sec: 3686.7, 60 sec: 3208.4, 300 sec: 3290.6). Total num frames: 1568768. Throughput: 0: 787.2. Samples: 391914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:04:12,905][00415] Avg episode reward: [(0, '4.953')] [2023-02-22 11:04:17,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1589248. Throughput: 0: 791.9. Samples: 395162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:04:17,902][00415] Avg episode reward: [(0, '5.152')] [2023-02-22 11:04:20,617][11324] Updated weights for policy 0, policy_version 390 (0.0029) [2023-02-22 11:04:22,897][00415] Fps is (10 sec: 3278.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 1601536. Throughput: 0: 818.0. Samples: 400192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:04:22,904][00415] Avg episode reward: [(0, '4.981')] [2023-02-22 11:04:27,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 1613824. Throughput: 0: 837.2. Samples: 404326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:04:27,900][00415] Avg episode reward: [(0, '4.849')] [2023-02-22 11:04:32,898][00415] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3290.7). Total num frames: 1634304. Throughput: 0: 864.0. Samples: 407114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:04:32,900][00415] Avg episode reward: [(0, '4.995')] [2023-02-22 11:04:32,962][11324] Updated weights for policy 0, policy_version 400 (0.0045) [2023-02-22 11:04:37,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1658880. Throughput: 0: 890.3. Samples: 413670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:04:37,900][00415] Avg episode reward: [(0, '5.353')] [2023-02-22 11:04:42,898][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1671168. Throughput: 0: 863.9. Samples: 418650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:04:42,904][00415] Avg episode reward: [(0, '5.268')] [2023-02-22 11:04:44,515][11324] Updated weights for policy 0, policy_version 410 (0.0024) [2023-02-22 11:04:47,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1687552. Throughput: 0: 865.2. Samples: 420714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:04:47,900][00415] Avg episode reward: [(0, '5.310')] [2023-02-22 11:04:52,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 1708032. Throughput: 0: 885.7. Samples: 425734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:04:52,901][00415] Avg episode reward: [(0, '5.473')] [2023-02-22 11:04:52,915][11310] Saving new best policy, reward=5.473! [2023-02-22 11:04:55,727][11324] Updated weights for policy 0, policy_version 420 (0.0013) [2023-02-22 11:04:57,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3374.0). Total num frames: 1728512. Throughput: 0: 895.0. Samples: 432186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:04:57,899][00415] Avg episode reward: [(0, '5.965')] [2023-02-22 11:04:57,909][11310] Saving new best policy, reward=5.965! [2023-02-22 11:05:02,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3374.0). Total num frames: 1740800. Throughput: 0: 880.8. Samples: 434796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:05:02,900][00415] Avg episode reward: [(0, '6.137')] [2023-02-22 11:05:02,909][11310] Saving new best policy, reward=6.137! [2023-02-22 11:05:07,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 1757184. Throughput: 0: 858.1. Samples: 438808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:05:07,899][00415] Avg episode reward: [(0, '6.355')] [2023-02-22 11:05:07,905][11310] Saving new best policy, reward=6.355! [2023-02-22 11:05:09,395][11324] Updated weights for policy 0, policy_version 430 (0.0031) [2023-02-22 11:05:12,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 3374.0). Total num frames: 1773568. Throughput: 0: 878.2. Samples: 443844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:05:12,903][00415] Avg episode reward: [(0, '6.734')] [2023-02-22 11:05:12,918][11310] Saving new best policy, reward=6.734! [2023-02-22 11:05:17,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 1794048. Throughput: 0: 883.2. Samples: 446856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:05:17,905][00415] Avg episode reward: [(0, '7.163')] [2023-02-22 11:05:17,906][11310] Saving new best policy, reward=7.163! [2023-02-22 11:05:19,058][11324] Updated weights for policy 0, policy_version 440 (0.0016) [2023-02-22 11:05:22,898][00415] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 1810432. Throughput: 0: 864.8. Samples: 452584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:05:22,902][00415] Avg episode reward: [(0, '7.113')] [2023-02-22 11:05:27,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 1822720. Throughput: 0: 841.3. Samples: 456510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:05:27,906][00415] Avg episode reward: [(0, '6.999')] [2023-02-22 11:05:32,670][11324] Updated weights for policy 0, policy_version 450 (0.0030) [2023-02-22 11:05:32,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 1843200. Throughput: 0: 840.2. Samples: 458524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:05:32,906][00415] Avg episode reward: [(0, '7.569')] [2023-02-22 11:05:32,921][11310] Saving new best policy, reward=7.569! [2023-02-22 11:05:37,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 1863680. Throughput: 0: 868.8. Samples: 464832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:05:37,900][00415] Avg episode reward: [(0, '7.868')] [2023-02-22 11:05:37,903][11310] Saving new best policy, reward=7.868! [2023-02-22 11:05:42,898][00415] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 1880064. Throughput: 0: 854.0. Samples: 470618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:05:42,905][00415] Avg episode reward: [(0, '7.606')] [2023-02-22 11:05:42,918][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_1880064.pth... [2023-02-22 11:05:43,072][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000259_1060864.pth [2023-02-22 11:05:43,761][11324] Updated weights for policy 0, policy_version 460 (0.0012) [2023-02-22 11:05:47,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 1892352. Throughput: 0: 840.8. Samples: 472632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 11:05:47,907][00415] Avg episode reward: [(0, '7.394')] [2023-02-22 11:05:52,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 1908736. Throughput: 0: 843.3. Samples: 476758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:05:52,902][00415] Avg episode reward: [(0, '6.954')] [2023-02-22 11:05:55,867][11324] Updated weights for policy 0, policy_version 470 (0.0029) [2023-02-22 11:05:57,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 1933312. Throughput: 0: 874.3. Samples: 483186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:05:57,900][00415] Avg episode reward: [(0, '7.484')] [2023-02-22 11:06:02,897][00415] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 1949696. Throughput: 0: 879.8. Samples: 486446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 11:06:02,900][00415] Avg episode reward: [(0, '7.670')] [2023-02-22 11:06:07,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 1961984. Throughput: 0: 851.0. Samples: 490878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:06:07,904][00415] Avg episode reward: [(0, '7.666')] [2023-02-22 11:06:07,924][11324] Updated weights for policy 0, policy_version 480 (0.0015) [2023-02-22 11:06:12,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 1982464. Throughput: 0: 862.4. Samples: 495318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:06:12,900][00415] Avg episode reward: [(0, '7.927')] [2023-02-22 11:06:12,911][11310] Saving new best policy, reward=7.927! [2023-02-22 11:06:17,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2002944. Throughput: 0: 887.5. Samples: 498460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:06:17,901][00415] Avg episode reward: [(0, '7.826')] [2023-02-22 11:06:18,621][11324] Updated weights for policy 0, policy_version 490 (0.0022) [2023-02-22 11:06:22,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2023424. Throughput: 0: 893.1. Samples: 505022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:06:22,900][00415] Avg episode reward: [(0, '8.324')] [2023-02-22 11:06:22,908][11310] Saving new best policy, reward=8.324! [2023-02-22 11:06:27,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2035712. Throughput: 0: 854.9. Samples: 509090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:06:27,900][00415] Avg episode reward: [(0, '8.226')] [2023-02-22 11:06:32,009][11324] Updated weights for policy 0, policy_version 500 (0.0036) [2023-02-22 11:06:32,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2048000. Throughput: 0: 854.1. Samples: 511068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:06:32,905][00415] Avg episode reward: [(0, '8.593')] [2023-02-22 11:06:32,918][11310] Saving new best policy, reward=8.593! [2023-02-22 11:06:37,898][00415] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2068480. Throughput: 0: 884.6. Samples: 516566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:06:37,907][00415] Avg episode reward: [(0, '7.524')] [2023-02-22 11:06:41,893][11324] Updated weights for policy 0, policy_version 510 (0.0021) [2023-02-22 11:06:42,902][00415] Fps is (10 sec: 4093.9, 60 sec: 3481.3, 300 sec: 3415.6). Total num frames: 2088960. Throughput: 0: 884.1. Samples: 522974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:06:42,906][00415] Avg episode reward: [(0, '7.912')] [2023-02-22 11:06:47,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2105344. Throughput: 0: 857.8. Samples: 525046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:06:47,904][00415] Avg episode reward: [(0, '7.673')] [2023-02-22 11:06:52,898][00415] Fps is (10 sec: 2868.6, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2117632. Throughput: 0: 852.7. Samples: 529248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:06:52,904][00415] Avg episode reward: [(0, '8.284')] [2023-02-22 11:06:55,197][11324] Updated weights for policy 0, policy_version 520 (0.0032) [2023-02-22 11:06:57,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2138112. Throughput: 0: 887.5. Samples: 535254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:06:57,901][00415] Avg episode reward: [(0, '8.336')] [2023-02-22 11:07:02,897][00415] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2162688. Throughput: 0: 889.7. Samples: 538496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:07:02,905][00415] Avg episode reward: [(0, '8.795')] [2023-02-22 11:07:02,917][11310] Saving new best policy, reward=8.795! [2023-02-22 11:07:05,345][11324] Updated weights for policy 0, policy_version 530 (0.0013) [2023-02-22 11:07:07,899][00415] Fps is (10 sec: 3685.8, 60 sec: 3549.7, 300 sec: 3415.6). Total num frames: 2174976. Throughput: 0: 859.8. Samples: 543716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:07:07,906][00415] Avg episode reward: [(0, '8.962')] [2023-02-22 11:07:07,912][11310] Saving new best policy, reward=8.962! [2023-02-22 11:07:12,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 2191360. Throughput: 0: 862.6. Samples: 547906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:07:12,904][00415] Avg episode reward: [(0, '8.861')] [2023-02-22 11:07:17,469][11324] Updated weights for policy 0, policy_version 540 (0.0034) [2023-02-22 11:07:17,897][00415] Fps is (10 sec: 3687.1, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 2211840. Throughput: 0: 881.6. Samples: 550740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:07:17,904][00415] Avg episode reward: [(0, '9.487')] [2023-02-22 11:07:17,908][11310] Saving new best policy, reward=9.487! [2023-02-22 11:07:22,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 2232320. Throughput: 0: 908.7. Samples: 557458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:07:22,908][00415] Avg episode reward: [(0, '8.726')] [2023-02-22 11:07:27,900][00415] Fps is (10 sec: 3275.8, 60 sec: 3481.4, 300 sec: 3415.6). Total num frames: 2244608. Throughput: 0: 868.6. Samples: 562058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:07:27,903][00415] Avg episode reward: [(0, '8.056')] [2023-02-22 11:07:30,459][11324] Updated weights for policy 0, policy_version 550 (0.0032) [2023-02-22 11:07:32,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2256896. Throughput: 0: 859.2. Samples: 563712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 11:07:32,900][00415] Avg episode reward: [(0, '7.437')] [2023-02-22 11:07:37,898][00415] Fps is (10 sec: 2458.1, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 2269184. Throughput: 0: 841.7. Samples: 567124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:07:37,904][00415] Avg episode reward: [(0, '7.252')] [2023-02-22 11:07:42,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3345.3, 300 sec: 3374.0). Total num frames: 2289664. Throughput: 0: 821.2. Samples: 572206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:07:42,900][00415] Avg episode reward: [(0, '8.009')] [2023-02-22 11:07:42,915][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000559_2289664.pth... [2023-02-22 11:07:43,058][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000358_1466368.pth [2023-02-22 11:07:43,862][11324] Updated weights for policy 0, policy_version 560 (0.0022) [2023-02-22 11:07:47,898][00415] Fps is (10 sec: 4096.3, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 2310144. Throughput: 0: 822.4. Samples: 575502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:07:47,904][00415] Avg episode reward: [(0, '8.610')] [2023-02-22 11:07:52,897][00415] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2326528. Throughput: 0: 840.7. Samples: 581546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:07:52,903][00415] Avg episode reward: [(0, '9.290')] [2023-02-22 11:07:54,730][11324] Updated weights for policy 0, policy_version 570 (0.0019) [2023-02-22 11:07:57,898][00415] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 2342912. Throughput: 0: 839.8. Samples: 585698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:07:57,906][00415] Avg episode reward: [(0, '9.734')] [2023-02-22 11:07:57,918][11310] Saving new best policy, reward=9.734! [2023-02-22 11:08:02,898][00415] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 2359296. Throughput: 0: 824.6. Samples: 587846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:08:02,901][00415] Avg episode reward: [(0, '10.250')] [2023-02-22 11:08:02,921][11310] Saving new best policy, reward=10.250! [2023-02-22 11:08:06,246][11324] Updated weights for policy 0, policy_version 580 (0.0017) [2023-02-22 11:08:07,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3401.8). Total num frames: 2379776. Throughput: 0: 820.7. Samples: 594390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:08:07,904][00415] Avg episode reward: [(0, '11.416')] [2023-02-22 11:08:08,012][11310] Saving new best policy, reward=11.416! [2023-02-22 11:08:12,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2396160. Throughput: 0: 844.7. Samples: 600068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:08:12,907][00415] Avg episode reward: [(0, '11.948')] [2023-02-22 11:08:12,927][11310] Saving new best policy, reward=11.948! [2023-02-22 11:08:17,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 2412544. Throughput: 0: 853.7. Samples: 602128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:08:17,908][00415] Avg episode reward: [(0, '11.957')] [2023-02-22 11:08:17,910][11310] Saving new best policy, reward=11.957! [2023-02-22 11:08:18,762][11324] Updated weights for policy 0, policy_version 590 (0.0028) [2023-02-22 11:08:22,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 2433024. Throughput: 0: 880.5. Samples: 606744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:08:22,900][00415] Avg episode reward: [(0, '12.246')] [2023-02-22 11:08:22,915][11310] Saving new best policy, reward=12.246! [2023-02-22 11:08:27,897][00415] Fps is (10 sec: 4096.1, 60 sec: 3481.8, 300 sec: 3429.5). Total num frames: 2453504. Throughput: 0: 918.8. Samples: 613552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:08:27,900][00415] Avg episode reward: [(0, '11.660')] [2023-02-22 11:08:28,289][11324] Updated weights for policy 0, policy_version 600 (0.0019) [2023-02-22 11:08:32,898][00415] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3443.4). Total num frames: 2469888. Throughput: 0: 917.3. Samples: 616780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:08:32,901][00415] Avg episode reward: [(0, '12.759')] [2023-02-22 11:08:32,915][11310] Saving new best policy, reward=12.759! [2023-02-22 11:08:37,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3457.3). Total num frames: 2486272. Throughput: 0: 876.6. Samples: 620994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:08:37,901][00415] Avg episode reward: [(0, '12.068')] [2023-02-22 11:08:41,505][11324] Updated weights for policy 0, policy_version 610 (0.0018) [2023-02-22 11:08:42,898][00415] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2502656. Throughput: 0: 894.1. Samples: 625934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:08:42,900][00415] Avg episode reward: [(0, '12.559')] [2023-02-22 11:08:47,897][00415] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2527232. Throughput: 0: 920.4. Samples: 629266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:08:47,900][00415] Avg episode reward: [(0, '12.378')] [2023-02-22 11:08:50,595][11324] Updated weights for policy 0, policy_version 620 (0.0020) [2023-02-22 11:08:52,899][00415] Fps is (10 sec: 4095.5, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2543616. Throughput: 0: 921.6. Samples: 635864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:08:52,904][00415] Avg episode reward: [(0, '12.490')] [2023-02-22 11:08:57,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2560000. Throughput: 0: 888.3. Samples: 640042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:08:57,902][00415] Avg episode reward: [(0, '12.347')] [2023-02-22 11:09:02,898][00415] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2576384. Throughput: 0: 890.4. Samples: 642194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:09:02,900][00415] Avg episode reward: [(0, '12.754')] [2023-02-22 11:09:03,384][11324] Updated weights for policy 0, policy_version 630 (0.0035) [2023-02-22 11:09:07,898][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2596864. Throughput: 0: 929.7. Samples: 648582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:09:07,900][00415] Avg episode reward: [(0, '12.820')] [2023-02-22 11:09:07,903][11310] Saving new best policy, reward=12.820! [2023-02-22 11:09:12,901][00415] Fps is (10 sec: 4094.7, 60 sec: 3686.2, 300 sec: 3485.0). Total num frames: 2617344. Throughput: 0: 918.8. Samples: 654902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:09:12,904][00415] Avg episode reward: [(0, '13.051')] [2023-02-22 11:09:12,919][11310] Saving new best policy, reward=13.051! [2023-02-22 11:09:13,747][11324] Updated weights for policy 0, policy_version 640 (0.0012) [2023-02-22 11:09:17,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2633728. Throughput: 0: 892.1. Samples: 656924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:09:17,901][00415] Avg episode reward: [(0, '12.974')] [2023-02-22 11:09:22,898][00415] Fps is (10 sec: 2868.1, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2646016. Throughput: 0: 890.2. Samples: 661054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:09:22,906][00415] Avg episode reward: [(0, '13.282')] [2023-02-22 11:09:22,921][11310] Saving new best policy, reward=13.282! [2023-02-22 11:09:25,738][11324] Updated weights for policy 0, policy_version 650 (0.0018) [2023-02-22 11:09:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2670592. Throughput: 0: 926.2. Samples: 667612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:09:27,903][00415] Avg episode reward: [(0, '13.124')] [2023-02-22 11:09:32,898][00415] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2691072. Throughput: 0: 928.6. Samples: 671054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:09:32,902][00415] Avg episode reward: [(0, '13.242')] [2023-02-22 11:09:37,055][11324] Updated weights for policy 0, policy_version 660 (0.0019) [2023-02-22 11:09:37,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3499.0). Total num frames: 2703360. Throughput: 0: 887.0. Samples: 675778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 11:09:37,904][00415] Avg episode reward: [(0, '13.095')] [2023-02-22 11:09:42,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2719744. Throughput: 0: 893.3. Samples: 680240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:09:42,901][00415] Avg episode reward: [(0, '12.577')] [2023-02-22 11:09:42,912][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth... [2023-02-22 11:09:43,042][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_1880064.pth [2023-02-22 11:09:47,719][11324] Updated weights for policy 0, policy_version 670 (0.0051) [2023-02-22 11:09:47,898][00415] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2744320. Throughput: 0: 918.7. Samples: 683534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:09:47,909][00415] Avg episode reward: [(0, '13.681')] [2023-02-22 11:09:47,913][11310] Saving new best policy, reward=13.681! [2023-02-22 11:09:52,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3512.8). Total num frames: 2764800. Throughput: 0: 925.5. Samples: 690228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:09:52,902][00415] Avg episode reward: [(0, '14.356')] [2023-02-22 11:09:52,921][11310] Saving new best policy, reward=14.356! [2023-02-22 11:09:57,899][00415] Fps is (10 sec: 3276.2, 60 sec: 3618.0, 300 sec: 3512.8). Total num frames: 2777088. Throughput: 0: 887.2. Samples: 694826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:09:57,904][00415] Avg episode reward: [(0, '14.599')] [2023-02-22 11:09:57,911][11310] Saving new best policy, reward=14.599! [2023-02-22 11:10:00,184][11324] Updated weights for policy 0, policy_version 680 (0.0013) [2023-02-22 11:10:02,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2793472. Throughput: 0: 887.0. Samples: 696840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:02,906][00415] Avg episode reward: [(0, '16.555')] [2023-02-22 11:10:02,920][11310] Saving new best policy, reward=16.555! [2023-02-22 11:10:07,897][00415] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2813952. Throughput: 0: 916.9. Samples: 702316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:07,900][00415] Avg episode reward: [(0, '17.179')] [2023-02-22 11:10:07,904][11310] Saving new best policy, reward=17.179! [2023-02-22 11:10:10,348][11324] Updated weights for policy 0, policy_version 690 (0.0017) [2023-02-22 11:10:12,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3526.7). Total num frames: 2834432. Throughput: 0: 920.3. Samples: 709026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:10:12,901][00415] Avg episode reward: [(0, '16.510')] [2023-02-22 11:10:17,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2846720. Throughput: 0: 891.6. Samples: 711176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:10:17,901][00415] Avg episode reward: [(0, '17.217')] [2023-02-22 11:10:17,909][11310] Saving new best policy, reward=17.217! [2023-02-22 11:10:22,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2863104. Throughput: 0: 879.9. Samples: 715372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:22,906][00415] Avg episode reward: [(0, '16.527')] [2023-02-22 11:10:23,477][11324] Updated weights for policy 0, policy_version 700 (0.0026) [2023-02-22 11:10:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2883584. Throughput: 0: 913.6. Samples: 721352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:10:27,905][00415] Avg episode reward: [(0, '15.884')] [2023-02-22 11:10:32,675][11324] Updated weights for policy 0, policy_version 710 (0.0023) [2023-02-22 11:10:32,898][00415] Fps is (10 sec: 4505.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2908160. Throughput: 0: 916.2. Samples: 724762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:10:32,901][00415] Avg episode reward: [(0, '16.514')] [2023-02-22 11:10:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2920448. Throughput: 0: 887.8. Samples: 730180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:37,902][00415] Avg episode reward: [(0, '16.956')] [2023-02-22 11:10:42,897][00415] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2936832. Throughput: 0: 879.4. Samples: 734398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:10:42,905][00415] Avg episode reward: [(0, '16.872')] [2023-02-22 11:10:45,572][11324] Updated weights for policy 0, policy_version 720 (0.0013) [2023-02-22 11:10:47,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2957312. Throughput: 0: 897.6. Samples: 737234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:47,905][00415] Avg episode reward: [(0, '16.501')] [2023-02-22 11:10:52,898][00415] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2981888. Throughput: 0: 927.2. Samples: 744040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:10:52,900][00415] Avg episode reward: [(0, '16.272')] [2023-02-22 11:10:55,405][11324] Updated weights for policy 0, policy_version 730 (0.0016) [2023-02-22 11:10:57,901][00415] Fps is (10 sec: 3685.1, 60 sec: 3618.0, 300 sec: 3540.6). Total num frames: 2994176. Throughput: 0: 896.6. Samples: 749374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:10:57,904][00415] Avg episode reward: [(0, '16.321')] [2023-02-22 11:11:02,899][00415] Fps is (10 sec: 2866.6, 60 sec: 3618.0, 300 sec: 3554.5). Total num frames: 3010560. Throughput: 0: 894.9. Samples: 751450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:02,908][00415] Avg episode reward: [(0, '16.157')] [2023-02-22 11:11:07,852][11324] Updated weights for policy 0, policy_version 740 (0.0023) [2023-02-22 11:11:07,897][00415] Fps is (10 sec: 3687.7, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3031040. Throughput: 0: 910.4. Samples: 756338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:07,899][00415] Avg episode reward: [(0, '16.292')] [2023-02-22 11:11:12,897][00415] Fps is (10 sec: 4096.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3051520. Throughput: 0: 927.9. Samples: 763108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:12,904][00415] Avg episode reward: [(0, '18.466')] [2023-02-22 11:11:12,918][11310] Saving new best policy, reward=18.466! [2023-02-22 11:11:17,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3067904. Throughput: 0: 915.6. Samples: 765964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:11:17,903][00415] Avg episode reward: [(0, '19.224')] [2023-02-22 11:11:17,911][11310] Saving new best policy, reward=19.224! [2023-02-22 11:11:19,602][11324] Updated weights for policy 0, policy_version 750 (0.0015) [2023-02-22 11:11:22,897][00415] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3076096. Throughput: 0: 870.8. Samples: 769366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:22,902][00415] Avg episode reward: [(0, '20.020')] [2023-02-22 11:11:22,917][11310] Saving new best policy, reward=20.020! [2023-02-22 11:11:27,897][00415] Fps is (10 sec: 2048.0, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3088384. Throughput: 0: 849.4. Samples: 772620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:27,902][00415] Avg episode reward: [(0, '19.777')] [2023-02-22 11:11:32,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 3104768. Throughput: 0: 823.2. Samples: 774276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:11:32,900][00415] Avg episode reward: [(0, '20.064')] [2023-02-22 11:11:32,920][11310] Saving new best policy, reward=20.064! [2023-02-22 11:11:34,878][11324] Updated weights for policy 0, policy_version 760 (0.0031) [2023-02-22 11:11:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.9). Total num frames: 3125248. Throughput: 0: 802.9. Samples: 780172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:11:37,905][00415] Avg episode reward: [(0, '19.823')] [2023-02-22 11:11:42,898][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 3141632. Throughput: 0: 808.4. Samples: 785748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:11:42,901][00415] Avg episode reward: [(0, '18.439')] [2023-02-22 11:11:42,916][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000767_3141632.pth... [2023-02-22 11:11:43,087][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000559_2289664.pth [2023-02-22 11:11:47,115][11324] Updated weights for policy 0, policy_version 770 (0.0033) [2023-02-22 11:11:47,900][00415] Fps is (10 sec: 2866.3, 60 sec: 3276.6, 300 sec: 3512.8). Total num frames: 3153920. Throughput: 0: 806.2. Samples: 787728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:47,903][00415] Avg episode reward: [(0, '17.449')] [2023-02-22 11:11:52,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3499.0). Total num frames: 3170304. Throughput: 0: 786.0. Samples: 791710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:11:52,905][00415] Avg episode reward: [(0, '17.842')] [2023-02-22 11:11:57,897][00415] Fps is (10 sec: 3687.5, 60 sec: 3277.0, 300 sec: 3485.1). Total num frames: 3190784. Throughput: 0: 771.8. Samples: 797840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:11:57,901][00415] Avg episode reward: [(0, '17.270')] [2023-02-22 11:11:58,420][11324] Updated weights for policy 0, policy_version 780 (0.0038) [2023-02-22 11:12:02,899][00415] Fps is (10 sec: 3685.7, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 3207168. Throughput: 0: 774.2. Samples: 800806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:12:02,906][00415] Avg episode reward: [(0, '16.255')] [2023-02-22 11:12:07,898][00415] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3485.1). Total num frames: 3219456. Throughput: 0: 789.7. Samples: 804904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:12:07,904][00415] Avg episode reward: [(0, '15.670')] [2023-02-22 11:12:12,683][11324] Updated weights for policy 0, policy_version 790 (0.0027) [2023-02-22 11:12:12,898][00415] Fps is (10 sec: 2867.7, 60 sec: 3072.0, 300 sec: 3471.2). Total num frames: 3235840. Throughput: 0: 806.2. Samples: 808900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:12:12,905][00415] Avg episode reward: [(0, '16.039')] [2023-02-22 11:12:17,897][00415] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3471.2). Total num frames: 3256320. Throughput: 0: 842.5. Samples: 812190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:12:17,906][00415] Avg episode reward: [(0, '15.509')] [2023-02-22 11:12:21,795][11324] Updated weights for policy 0, policy_version 800 (0.0021) [2023-02-22 11:12:22,897][00415] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 3276800. Throughput: 0: 859.9. Samples: 818866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:12:22,900][00415] Avg episode reward: [(0, '15.144')] [2023-02-22 11:12:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 3293184. Throughput: 0: 842.3. Samples: 823650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:12:27,902][00415] Avg episode reward: [(0, '15.882')] [2023-02-22 11:12:32,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3512.9). Total num frames: 3305472. Throughput: 0: 844.7. Samples: 825738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 11:12:32,905][00415] Avg episode reward: [(0, '15.558')] [2023-02-22 11:12:34,806][11324] Updated weights for policy 0, policy_version 810 (0.0035) [2023-02-22 11:12:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3330048. Throughput: 0: 877.3. Samples: 831188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:12:37,899][00415] Avg episode reward: [(0, '16.290')] [2023-02-22 11:12:42,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 3350528. Throughput: 0: 895.4. Samples: 838134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:12:42,906][00415] Avg episode reward: [(0, '17.383')] [2023-02-22 11:12:44,734][11324] Updated weights for policy 0, policy_version 820 (0.0020) [2023-02-22 11:12:47,898][00415] Fps is (10 sec: 3686.1, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 3366912. Throughput: 0: 884.4. Samples: 840604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 11:12:47,903][00415] Avg episode reward: [(0, '17.498')] [2023-02-22 11:12:52,898][00415] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 3379200. Throughput: 0: 887.1. Samples: 844824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:12:52,900][00415] Avg episode reward: [(0, '16.674')] [2023-02-22 11:12:56,697][11324] Updated weights for policy 0, policy_version 830 (0.0026) [2023-02-22 11:12:57,897][00415] Fps is (10 sec: 3686.7, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3403776. Throughput: 0: 930.1. Samples: 850754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:12:57,904][00415] Avg episode reward: [(0, '17.818')] [2023-02-22 11:13:02,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3540.6). Total num frames: 3424256. Throughput: 0: 929.9. Samples: 854034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:13:02,905][00415] Avg episode reward: [(0, '19.937')] [2023-02-22 11:13:07,904][00415] Fps is (10 sec: 3274.5, 60 sec: 3617.7, 300 sec: 3526.6). Total num frames: 3436544. Throughput: 0: 901.5. Samples: 859440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 11:13:07,907][00415] Avg episode reward: [(0, '20.681')] [2023-02-22 11:13:07,960][11310] Saving new best policy, reward=20.681! [2023-02-22 11:13:07,981][11324] Updated weights for policy 0, policy_version 840 (0.0030) [2023-02-22 11:13:12,902][00415] Fps is (10 sec: 2866.0, 60 sec: 3617.9, 300 sec: 3526.7). Total num frames: 3452928. Throughput: 0: 888.3. Samples: 863626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:13:12,904][00415] Avg episode reward: [(0, '20.831')] [2023-02-22 11:13:12,921][11310] Saving new best policy, reward=20.831! [2023-02-22 11:13:17,897][00415] Fps is (10 sec: 3689.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3473408. Throughput: 0: 898.6. Samples: 866176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:13:17,906][00415] Avg episode reward: [(0, '20.696')] [2023-02-22 11:13:19,309][11324] Updated weights for policy 0, policy_version 850 (0.0028) [2023-02-22 11:13:22,897][00415] Fps is (10 sec: 4507.5, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3497984. Throughput: 0: 929.7. Samples: 873026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:13:22,900][00415] Avg episode reward: [(0, '18.700')] [2023-02-22 11:13:27,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3510272. Throughput: 0: 893.9. Samples: 878358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:13:27,900][00415] Avg episode reward: [(0, '17.900')] [2023-02-22 11:13:30,906][11324] Updated weights for policy 0, policy_version 860 (0.0022) [2023-02-22 11:13:32,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 3526656. Throughput: 0: 887.1. Samples: 880524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:13:32,906][00415] Avg episode reward: [(0, '19.721')] [2023-02-22 11:13:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3547136. Throughput: 0: 905.4. Samples: 885568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:13:37,900][00415] Avg episode reward: [(0, '18.485')] [2023-02-22 11:13:41,271][11324] Updated weights for policy 0, policy_version 870 (0.0027) [2023-02-22 11:13:42,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3567616. Throughput: 0: 923.0. Samples: 892290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:13:42,900][00415] Avg episode reward: [(0, '19.356')] [2023-02-22 11:13:42,911][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000871_3567616.pth... [2023-02-22 11:13:43,085][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth [2023-02-22 11:13:47,898][00415] Fps is (10 sec: 3686.3, 60 sec: 3618.2, 300 sec: 3526.7). Total num frames: 3584000. Throughput: 0: 915.0. Samples: 895210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:13:47,903][00415] Avg episode reward: [(0, '20.004')] [2023-02-22 11:13:52,898][00415] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 3600384. Throughput: 0: 889.2. Samples: 899448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:13:52,902][00415] Avg episode reward: [(0, '20.757')] [2023-02-22 11:13:54,132][11324] Updated weights for policy 0, policy_version 880 (0.0037) [2023-02-22 11:13:57,897][00415] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3616768. Throughput: 0: 913.4. Samples: 904724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:13:57,904][00415] Avg episode reward: [(0, '19.243')] [2023-02-22 11:14:02,898][00415] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3641344. Throughput: 0: 932.3. Samples: 908130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:14:02,903][00415] Avg episode reward: [(0, '18.494')] [2023-02-22 11:14:03,505][11324] Updated weights for policy 0, policy_version 890 (0.0013) [2023-02-22 11:14:07,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3686.8, 300 sec: 3526.8). Total num frames: 3657728. Throughput: 0: 914.3. Samples: 914170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:14:07,905][00415] Avg episode reward: [(0, '18.159')] [2023-02-22 11:14:12,899][00415] Fps is (10 sec: 2866.6, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 3670016. Throughput: 0: 890.4. Samples: 918426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:14:12,902][00415] Avg episode reward: [(0, '17.371')] [2023-02-22 11:14:16,547][11324] Updated weights for policy 0, policy_version 900 (0.0017) [2023-02-22 11:14:17,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3690496. Throughput: 0: 889.2. Samples: 920540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:14:17,900][00415] Avg episode reward: [(0, '16.660')] [2023-02-22 11:14:22,897][00415] Fps is (10 sec: 4096.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 3710976. Throughput: 0: 927.2. Samples: 927292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:14:22,900][00415] Avg episode reward: [(0, '17.647')] [2023-02-22 11:14:25,758][11324] Updated weights for policy 0, policy_version 910 (0.0030) [2023-02-22 11:14:27,898][00415] Fps is (10 sec: 4095.6, 60 sec: 3686.3, 300 sec: 3526.7). Total num frames: 3731456. Throughput: 0: 909.4. Samples: 933214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:14:27,904][00415] Avg episode reward: [(0, '18.133')] [2023-02-22 11:14:32,899][00415] Fps is (10 sec: 3276.2, 60 sec: 3618.0, 300 sec: 3526.7). Total num frames: 3743744. Throughput: 0: 890.0. Samples: 935262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:14:32,904][00415] Avg episode reward: [(0, '17.692')] [2023-02-22 11:14:37,897][00415] Fps is (10 sec: 3277.1, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3764224. Throughput: 0: 892.5. Samples: 939610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:14:37,906][00415] Avg episode reward: [(0, '20.151')] [2023-02-22 11:14:38,775][11324] Updated weights for policy 0, policy_version 920 (0.0016) [2023-02-22 11:14:42,897][00415] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3784704. Throughput: 0: 925.3. Samples: 946364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:14:42,906][00415] Avg episode reward: [(0, '21.440')] [2023-02-22 11:14:42,917][11310] Saving new best policy, reward=21.440! [2023-02-22 11:14:47,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 3805184. Throughput: 0: 921.5. Samples: 949598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:14:47,901][00415] Avg episode reward: [(0, '20.899')] [2023-02-22 11:14:49,318][11324] Updated weights for policy 0, policy_version 930 (0.0019) [2023-02-22 11:14:52,898][00415] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3817472. Throughput: 0: 888.4. Samples: 954146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:14:52,900][00415] Avg episode reward: [(0, '20.591')] [2023-02-22 11:14:57,897][00415] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 3833856. Throughput: 0: 897.0. Samples: 958788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 11:14:57,904][00415] Avg episode reward: [(0, '20.820')] [2023-02-22 11:15:00,990][11324] Updated weights for policy 0, policy_version 940 (0.0018) [2023-02-22 11:15:02,897][00415] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3858432. Throughput: 0: 925.9. Samples: 962206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:15:02,905][00415] Avg episode reward: [(0, '20.147')] [2023-02-22 11:15:07,899][00415] Fps is (10 sec: 4095.2, 60 sec: 3618.0, 300 sec: 3526.7). Total num frames: 3874816. Throughput: 0: 926.8. Samples: 969000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:15:07,909][00415] Avg episode reward: [(0, '18.265')] [2023-02-22 11:15:12,899][00415] Fps is (10 sec: 2866.9, 60 sec: 3618.2, 300 sec: 3526.7). Total num frames: 3887104. Throughput: 0: 874.4. Samples: 972560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:15:12,900][00415] Avg episode reward: [(0, '18.921')] [2023-02-22 11:15:13,598][11324] Updated weights for policy 0, policy_version 950 (0.0020) [2023-02-22 11:15:17,897][00415] Fps is (10 sec: 2458.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 3899392. Throughput: 0: 865.5. Samples: 974208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 11:15:17,900][00415] Avg episode reward: [(0, '18.700')] [2023-02-22 11:15:22,898][00415] Fps is (10 sec: 2457.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3911680. Throughput: 0: 842.1. Samples: 977504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:15:22,900][00415] Avg episode reward: [(0, '19.207')] [2023-02-22 11:15:27,033][11324] Updated weights for policy 0, policy_version 960 (0.0033) [2023-02-22 11:15:27,897][00415] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 3932160. Throughput: 0: 824.7. Samples: 983476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:15:27,900][00415] Avg episode reward: [(0, '19.442')] [2023-02-22 11:15:32,897][00415] Fps is (10 sec: 4505.6, 60 sec: 3550.0, 300 sec: 3512.8). Total num frames: 3956736. Throughput: 0: 828.3. Samples: 986870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 11:15:32,900][00415] Avg episode reward: [(0, '19.622')] [2023-02-22 11:15:37,897][00415] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 3969024. Throughput: 0: 836.4. Samples: 991782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 11:15:37,902][00415] Avg episode reward: [(0, '20.512')] [2023-02-22 11:15:39,053][11324] Updated weights for policy 0, policy_version 970 (0.0013) [2023-02-22 11:15:42,898][00415] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3985408. Throughput: 0: 828.7. Samples: 996082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 11:15:42,906][00415] Avg episode reward: [(0, '20.954')] [2023-02-22 11:15:42,917][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000973_3985408.pth... [2023-02-22 11:15:43,036][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000767_3141632.pth [2023-02-22 11:15:47,310][11310] Stopping Batcher_0... [2023-02-22 11:15:47,310][11310] Loop batcher_evt_loop terminating... [2023-02-22 11:15:47,311][00415] Component Batcher_0 stopped! [2023-02-22 11:15:47,313][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 11:15:47,373][00415] Component RolloutWorker_w3 stopped! [2023-02-22 11:15:47,374][11328] Stopping RolloutWorker_w3... [2023-02-22 11:15:47,388][11324] Weights refcount: 2 0 [2023-02-22 11:15:47,392][11326] Stopping RolloutWorker_w1... [2023-02-22 11:15:47,391][00415] Component RolloutWorker_w1 stopped! [2023-02-22 11:15:47,400][11324] Stopping InferenceWorker_p0-w0... [2023-02-22 11:15:47,399][00415] Component InferenceWorker_p0-w0 stopped! [2023-02-22 11:15:47,401][11324] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 11:15:47,389][11328] Loop rollout_proc3_evt_loop terminating... [2023-02-22 11:15:47,414][00415] Component RolloutWorker_w4 stopped! [2023-02-22 11:15:47,416][11329] Stopping RolloutWorker_w4... [2023-02-22 11:15:47,421][11332] Stopping RolloutWorker_w5... [2023-02-22 11:15:47,421][11332] Loop rollout_proc5_evt_loop terminating... [2023-02-22 11:15:47,420][00415] Component RolloutWorker_w2 stopped! [2023-02-22 11:15:47,423][00415] Component RolloutWorker_w5 stopped! [2023-02-22 11:15:47,419][11327] Stopping RolloutWorker_w2... [2023-02-22 11:15:47,400][11326] Loop rollout_proc1_evt_loop terminating... [2023-02-22 11:15:47,418][11329] Loop rollout_proc4_evt_loop terminating... [2023-02-22 11:15:47,439][11331] Stopping RolloutWorker_w7... [2023-02-22 11:15:47,439][00415] Component RolloutWorker_w7 stopped! [2023-02-22 11:15:47,427][11327] Loop rollout_proc2_evt_loop terminating... [2023-02-22 11:15:47,448][11331] Loop rollout_proc7_evt_loop terminating... [2023-02-22 11:15:47,514][00415] Component RolloutWorker_w6 stopped! [2023-02-22 11:15:47,516][11330] Stopping RolloutWorker_w6... [2023-02-22 11:15:47,520][00415] Component RolloutWorker_w0 stopped! [2023-02-22 11:15:47,522][11325] Stopping RolloutWorker_w0... [2023-02-22 11:15:47,519][11330] Loop rollout_proc6_evt_loop terminating... [2023-02-22 11:15:47,524][11325] Loop rollout_proc0_evt_loop terminating... [2023-02-22 11:15:47,569][11310] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000871_3567616.pth [2023-02-22 11:15:47,585][11310] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 11:15:47,822][00415] Component LearnerWorker_p0 stopped! [2023-02-22 11:15:47,829][00415] Waiting for process learner_proc0 to stop... [2023-02-22 11:15:47,834][11310] Stopping LearnerWorker_p0... [2023-02-22 11:15:47,835][11310] Loop learner_proc0_evt_loop terminating... [2023-02-22 11:15:49,613][00415] Waiting for process inference_proc0-0 to join... [2023-02-22 11:15:49,993][00415] Waiting for process rollout_proc0 to join... [2023-02-22 11:15:49,995][00415] Waiting for process rollout_proc1 to join... [2023-02-22 11:15:50,351][00415] Waiting for process rollout_proc2 to join... [2023-02-22 11:15:50,353][00415] Waiting for process rollout_proc3 to join... [2023-02-22 11:15:50,364][00415] Waiting for process rollout_proc4 to join... [2023-02-22 11:15:50,366][00415] Waiting for process rollout_proc5 to join... [2023-02-22 11:15:50,368][00415] Waiting for process rollout_proc6 to join... [2023-02-22 11:15:50,369][00415] Waiting for process rollout_proc7 to join... [2023-02-22 11:15:50,370][00415] Batcher 0 profile tree view: batching: 27.2953, releasing_batches: 0.0267 [2023-02-22 11:15:50,371][00415] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0103 wait_policy_total: 560.3362 update_model: 8.6658 weight_update: 0.0029 one_step: 0.0036 handle_policy_step: 563.8125 deserialize: 15.6908, stack: 3.0321, obs_to_device_normalize: 119.6365, forward: 278.9147, send_messages: 28.0254 prepare_outputs: 91.3456 to_cpu: 56.5085 [2023-02-22 11:15:50,372][00415] Learner 0 profile tree view: misc: 0.0071, prepare_batch: 15.8586 train: 77.4167 epoch_init: 0.0059, minibatch_init: 0.0095, losses_postprocess: 0.5332, kl_divergence: 0.6215, after_optimizer: 32.9720 calculate_losses: 27.7337 losses_init: 0.0109, forward_head: 1.7988, bptt_initial: 18.1007, tail: 1.2954, advantages_returns: 0.3069, losses: 3.5977 bptt: 2.2809 bptt_forward_core: 2.1779 update: 14.9633 clip: 1.4313 [2023-02-22 11:15:50,373][00415] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3618, enqueue_policy_requests: 151.9889, env_step: 884.8476, overhead: 23.3751, complete_rollouts: 7.3227 save_policy_outputs: 22.6553 split_output_tensors: 10.9836 [2023-02-22 11:15:50,375][00415] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4027, enqueue_policy_requests: 154.7949, env_step: 882.4265, overhead: 21.9119, complete_rollouts: 7.8325 save_policy_outputs: 23.4911 split_output_tensors: 11.2544 [2023-02-22 11:15:50,376][00415] Loop Runner_EvtLoop terminating... [2023-02-22 11:15:50,377][00415] Runner profile tree view: main_loop: 1202.5598 [2023-02-22 11:15:50,383][00415] Collected {0: 4005888}, FPS: 3331.1 [2023-02-22 11:15:59,552][00415] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 11:15:59,554][00415] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 11:15:59,559][00415] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 11:15:59,560][00415] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 11:15:59,563][00415] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 11:15:59,565][00415] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 11:15:59,568][00415] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 11:15:59,570][00415] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 11:15:59,572][00415] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 11:15:59,574][00415] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 11:15:59,576][00415] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 11:15:59,577][00415] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 11:15:59,585][00415] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 11:15:59,586][00415] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 11:15:59,588][00415] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 11:15:59,608][00415] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 11:15:59,610][00415] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 11:15:59,612][00415] RunningMeanStd input shape: (1,) [2023-02-22 11:15:59,631][00415] ConvEncoder: input_channels=3 [2023-02-22 11:16:00,331][00415] Conv encoder output size: 512 [2023-02-22 11:16:00,334][00415] Policy head output size: 512 [2023-02-22 11:16:02,666][00415] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 11:16:03,964][00415] Num frames 100... [2023-02-22 11:16:04,074][00415] Num frames 200... [2023-02-22 11:16:04,185][00415] Num frames 300... [2023-02-22 11:16:04,300][00415] Num frames 400... [2023-02-22 11:16:04,408][00415] Num frames 500... [2023-02-22 11:16:04,528][00415] Num frames 600... [2023-02-22 11:16:04,667][00415] Num frames 700... [2023-02-22 11:16:04,818][00415] Num frames 800... [2023-02-22 11:16:04,975][00415] Num frames 900... [2023-02-22 11:16:05,125][00415] Num frames 1000... [2023-02-22 11:16:05,280][00415] Num frames 1100... [2023-02-22 11:16:05,429][00415] Num frames 1200... [2023-02-22 11:16:05,592][00415] Num frames 1300... [2023-02-22 11:16:05,739][00415] Num frames 1400... [2023-02-22 11:16:05,894][00415] Num frames 1500... [2023-02-22 11:16:06,061][00415] Num frames 1600... [2023-02-22 11:16:06,216][00415] Num frames 1700... [2023-02-22 11:16:06,368][00415] Num frames 1800... [2023-02-22 11:16:06,468][00415] Avg episode rewards: #0: 46.249, true rewards: #0: 18.250 [2023-02-22 11:16:06,470][00415] Avg episode reward: 46.249, avg true_objective: 18.250 [2023-02-22 11:16:06,602][00415] Num frames 1900... [2023-02-22 11:16:06,758][00415] Num frames 2000... [2023-02-22 11:16:06,913][00415] Num frames 2100... [2023-02-22 11:16:07,070][00415] Num frames 2200... [2023-02-22 11:16:07,231][00415] Num frames 2300... [2023-02-22 11:16:07,391][00415] Num frames 2400... [2023-02-22 11:16:07,551][00415] Num frames 2500... [2023-02-22 11:16:07,712][00415] Num frames 2600... [2023-02-22 11:16:07,917][00415] Avg episode rewards: #0: 31.945, true rewards: #0: 13.445 [2023-02-22 11:16:07,920][00415] Avg episode reward: 31.945, avg true_objective: 13.445 [2023-02-22 11:16:07,941][00415] Num frames 2700... [2023-02-22 11:16:08,126][00415] Num frames 2800... [2023-02-22 11:16:08,260][00415] Num frames 2900... [2023-02-22 11:16:08,380][00415] Num frames 3000... [2023-02-22 11:16:08,505][00415] Num frames 3100... [2023-02-22 11:16:08,630][00415] Num frames 3200... [2023-02-22 11:16:08,747][00415] Num frames 3300... [2023-02-22 11:16:08,862][00415] Num frames 3400... [2023-02-22 11:16:08,981][00415] Num frames 3500... [2023-02-22 11:16:09,062][00415] Avg episode rewards: #0: 26.070, true rewards: #0: 11.737 [2023-02-22 11:16:09,063][00415] Avg episode reward: 26.070, avg true_objective: 11.737 [2023-02-22 11:16:09,163][00415] Num frames 3600... [2023-02-22 11:16:09,285][00415] Num frames 3700... [2023-02-22 11:16:09,392][00415] Num frames 3800... [2023-02-22 11:16:09,505][00415] Num frames 3900... [2023-02-22 11:16:09,619][00415] Num frames 4000... [2023-02-22 11:16:09,781][00415] Avg episode rewards: #0: 21.987, true rewards: #0: 10.237 [2023-02-22 11:16:09,782][00415] Avg episode reward: 21.987, avg true_objective: 10.237 [2023-02-22 11:16:09,794][00415] Num frames 4100... [2023-02-22 11:16:09,906][00415] Num frames 4200... [2023-02-22 11:16:10,022][00415] Num frames 4300... [2023-02-22 11:16:10,148][00415] Num frames 4400... [2023-02-22 11:16:10,258][00415] Num frames 4500... [2023-02-22 11:16:10,369][00415] Avg episode rewards: #0: 19.302, true rewards: #0: 9.102 [2023-02-22 11:16:10,370][00415] Avg episode reward: 19.302, avg true_objective: 9.102 [2023-02-22 11:16:10,428][00415] Num frames 4600... [2023-02-22 11:16:10,537][00415] Num frames 4700... [2023-02-22 11:16:10,644][00415] Num frames 4800... [2023-02-22 11:16:10,758][00415] Num frames 4900... [2023-02-22 11:16:10,865][00415] Num frames 5000... [2023-02-22 11:16:10,972][00415] Num frames 5100... [2023-02-22 11:16:11,085][00415] Num frames 5200... [2023-02-22 11:16:11,199][00415] Num frames 5300... [2023-02-22 11:16:11,337][00415] Avg episode rewards: #0: 18.960, true rewards: #0: 8.960 [2023-02-22 11:16:11,339][00415] Avg episode reward: 18.960, avg true_objective: 8.960 [2023-02-22 11:16:11,370][00415] Num frames 5400... [2023-02-22 11:16:11,487][00415] Num frames 5500... [2023-02-22 11:16:11,607][00415] Num frames 5600... [2023-02-22 11:16:11,716][00415] Num frames 5700... [2023-02-22 11:16:11,828][00415] Num frames 5800... [2023-02-22 11:16:11,937][00415] Num frames 5900... [2023-02-22 11:16:12,049][00415] Num frames 6000... [2023-02-22 11:16:12,167][00415] Num frames 6100... [2023-02-22 11:16:12,281][00415] Num frames 6200... [2023-02-22 11:16:12,389][00415] Num frames 6300... [2023-02-22 11:16:12,499][00415] Num frames 6400... [2023-02-22 11:16:12,568][00415] Avg episode rewards: #0: 19.301, true rewards: #0: 9.159 [2023-02-22 11:16:12,569][00415] Avg episode reward: 19.301, avg true_objective: 9.159 [2023-02-22 11:16:12,677][00415] Num frames 6500... [2023-02-22 11:16:12,792][00415] Num frames 6600... [2023-02-22 11:16:12,913][00415] Num frames 6700... [2023-02-22 11:16:13,036][00415] Num frames 6800... [2023-02-22 11:16:13,158][00415] Num frames 6900... [2023-02-22 11:16:13,265][00415] Num frames 7000... [2023-02-22 11:16:13,383][00415] Num frames 7100... [2023-02-22 11:16:13,500][00415] Num frames 7200... [2023-02-22 11:16:13,614][00415] Num frames 7300... [2023-02-22 11:16:13,725][00415] Num frames 7400... [2023-02-22 11:16:13,836][00415] Num frames 7500... [2023-02-22 11:16:13,954][00415] Num frames 7600... [2023-02-22 11:16:14,087][00415] Avg episode rewards: #0: 20.199, true rewards: #0: 9.574 [2023-02-22 11:16:14,089][00415] Avg episode reward: 20.199, avg true_objective: 9.574 [2023-02-22 11:16:14,151][00415] Num frames 7700... [2023-02-22 11:16:14,261][00415] Num frames 7800... [2023-02-22 11:16:14,368][00415] Num frames 7900... [2023-02-22 11:16:14,479][00415] Num frames 8000... [2023-02-22 11:16:14,590][00415] Num frames 8100... [2023-02-22 11:16:14,705][00415] Num frames 8200... [2023-02-22 11:16:14,873][00415] Avg episode rewards: #0: 19.332, true rewards: #0: 9.221 [2023-02-22 11:16:14,875][00415] Avg episode reward: 19.332, avg true_objective: 9.221 [2023-02-22 11:16:14,882][00415] Num frames 8300... [2023-02-22 11:16:15,005][00415] Num frames 8400... [2023-02-22 11:16:15,120][00415] Num frames 8500... [2023-02-22 11:16:15,237][00415] Num frames 8600... [2023-02-22 11:16:15,347][00415] Num frames 8700... [2023-02-22 11:16:15,458][00415] Num frames 8800... [2023-02-22 11:16:15,574][00415] Num frames 8900... [2023-02-22 11:16:15,683][00415] Num frames 9000... [2023-02-22 11:16:15,788][00415] Num frames 9100... [2023-02-22 11:16:15,918][00415] Avg episode rewards: #0: 19.263, true rewards: #0: 9.163 [2023-02-22 11:16:15,920][00415] Avg episode reward: 19.263, avg true_objective: 9.163 [2023-02-22 11:17:13,919][00415] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 11:23:02,493][00415] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 11:23:02,497][00415] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 11:23:02,501][00415] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 11:23:02,503][00415] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 11:23:02,505][00415] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 11:23:02,507][00415] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 11:23:02,509][00415] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-22 11:23:02,512][00415] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 11:23:02,513][00415] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-22 11:23:02,514][00415] Adding new argument 'hf_repository'='frangiral/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-22 11:23:02,517][00415] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 11:23:02,518][00415] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 11:23:02,519][00415] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 11:23:02,521][00415] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 11:23:02,522][00415] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 11:23:02,551][00415] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 11:23:02,554][00415] RunningMeanStd input shape: (1,) [2023-02-22 11:23:02,572][00415] ConvEncoder: input_channels=3 [2023-02-22 11:23:02,643][00415] Conv encoder output size: 512 [2023-02-22 11:23:02,646][00415] Policy head output size: 512 [2023-02-22 11:23:02,674][00415] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 11:23:03,322][00415] Num frames 100... [2023-02-22 11:23:03,474][00415] Num frames 200... [2023-02-22 11:23:03,628][00415] Num frames 300... [2023-02-22 11:23:03,779][00415] Num frames 400... [2023-02-22 11:23:03,920][00415] Num frames 500... [2023-02-22 11:23:04,087][00415] Num frames 600... [2023-02-22 11:23:04,243][00415] Num frames 700... [2023-02-22 11:23:04,393][00415] Num frames 800... [2023-02-22 11:23:04,553][00415] Num frames 900... [2023-02-22 11:23:04,708][00415] Num frames 1000... [2023-02-22 11:23:04,863][00415] Num frames 1100... [2023-02-22 11:23:05,024][00415] Num frames 1200... [2023-02-22 11:23:05,140][00415] Avg episode rewards: #0: 28.350, true rewards: #0: 12.350 [2023-02-22 11:23:05,142][00415] Avg episode reward: 28.350, avg true_objective: 12.350 [2023-02-22 11:23:05,250][00415] Num frames 1300... [2023-02-22 11:23:05,414][00415] Num frames 1400... [2023-02-22 11:23:05,565][00415] Num frames 1500... [2023-02-22 11:23:05,719][00415] Num frames 1600... [2023-02-22 11:23:05,857][00415] Num frames 1700... [2023-02-22 11:23:05,930][00415] Avg episode rewards: #0: 17.575, true rewards: #0: 8.575 [2023-02-22 11:23:05,933][00415] Avg episode reward: 17.575, avg true_objective: 8.575 [2023-02-22 11:23:06,030][00415] Num frames 1800... [2023-02-22 11:23:06,152][00415] Num frames 1900... [2023-02-22 11:23:06,274][00415] Num frames 2000... [2023-02-22 11:23:06,384][00415] Num frames 2100... [2023-02-22 11:23:06,480][00415] Avg episode rewards: #0: 14.103, true rewards: #0: 7.103 [2023-02-22 11:23:06,482][00415] Avg episode reward: 14.103, avg true_objective: 7.103 [2023-02-22 11:23:06,567][00415] Num frames 2200... [2023-02-22 11:23:06,684][00415] Num frames 2300... [2023-02-22 11:23:06,800][00415] Num frames 2400... [2023-02-22 11:23:06,913][00415] Num frames 2500... [2023-02-22 11:23:07,033][00415] Num frames 2600... [2023-02-22 11:23:07,146][00415] Num frames 2700... [2023-02-22 11:23:07,247][00415] Avg episode rewards: #0: 12.848, true rewards: #0: 6.847 [2023-02-22 11:23:07,250][00415] Avg episode reward: 12.848, avg true_objective: 6.847 [2023-02-22 11:23:07,328][00415] Num frames 2800... [2023-02-22 11:23:07,450][00415] Num frames 2900... [2023-02-22 11:23:07,567][00415] Num frames 3000... [2023-02-22 11:23:07,678][00415] Num frames 3100... [2023-02-22 11:23:07,791][00415] Num frames 3200... [2023-02-22 11:23:07,905][00415] Num frames 3300... [2023-02-22 11:23:08,020][00415] Num frames 3400... [2023-02-22 11:23:08,128][00415] Num frames 3500... [2023-02-22 11:23:08,239][00415] Num frames 3600... [2023-02-22 11:23:08,335][00415] Avg episode rewards: #0: 13.670, true rewards: #0: 7.270 [2023-02-22 11:23:08,337][00415] Avg episode reward: 13.670, avg true_objective: 7.270 [2023-02-22 11:23:08,413][00415] Num frames 3700... [2023-02-22 11:23:08,529][00415] Num frames 3800... [2023-02-22 11:23:08,643][00415] Num frames 3900... [2023-02-22 11:23:08,751][00415] Num frames 4000... [2023-02-22 11:23:08,859][00415] Num frames 4100... [2023-02-22 11:23:08,968][00415] Num frames 4200... [2023-02-22 11:23:09,079][00415] Num frames 4300... [2023-02-22 11:23:09,193][00415] Num frames 4400... [2023-02-22 11:23:09,302][00415] Num frames 4500... [2023-02-22 11:23:09,414][00415] Num frames 4600... [2023-02-22 11:23:09,535][00415] Num frames 4700... [2023-02-22 11:23:09,608][00415] Avg episode rewards: #0: 15.193, true rewards: #0: 7.860 [2023-02-22 11:23:09,610][00415] Avg episode reward: 15.193, avg true_objective: 7.860 [2023-02-22 11:23:09,722][00415] Num frames 4800... [2023-02-22 11:23:09,834][00415] Num frames 4900... [2023-02-22 11:23:09,947][00415] Num frames 5000... [2023-02-22 11:23:10,069][00415] Num frames 5100... [2023-02-22 11:23:10,197][00415] Num frames 5200... [2023-02-22 11:23:10,322][00415] Num frames 5300... [2023-02-22 11:23:10,435][00415] Num frames 5400... [2023-02-22 11:23:10,558][00415] Num frames 5500... [2023-02-22 11:23:10,668][00415] Num frames 5600... [2023-02-22 11:23:10,750][00415] Avg episode rewards: #0: 15.600, true rewards: #0: 8.029 [2023-02-22 11:23:10,754][00415] Avg episode reward: 15.600, avg true_objective: 8.029 [2023-02-22 11:23:10,848][00415] Num frames 5700... [2023-02-22 11:23:10,968][00415] Num frames 5800... [2023-02-22 11:23:11,095][00415] Num frames 5900... [2023-02-22 11:23:11,214][00415] Num frames 6000... [2023-02-22 11:23:11,326][00415] Num frames 6100... [2023-02-22 11:23:11,437][00415] Num frames 6200... [2023-02-22 11:23:11,560][00415] Num frames 6300... [2023-02-22 11:23:11,672][00415] Num frames 6400... [2023-02-22 11:23:11,783][00415] Num frames 6500... [2023-02-22 11:23:11,898][00415] Avg episode rewards: #0: 15.685, true rewards: #0: 8.185 [2023-02-22 11:23:11,899][00415] Avg episode reward: 15.685, avg true_objective: 8.185 [2023-02-22 11:23:11,964][00415] Num frames 6600... [2023-02-22 11:23:12,082][00415] Num frames 6700... [2023-02-22 11:23:12,210][00415] Num frames 6800... [2023-02-22 11:23:12,334][00415] Num frames 6900... [2023-02-22 11:23:12,457][00415] Num frames 7000... [2023-02-22 11:23:12,581][00415] Num frames 7100... [2023-02-22 11:23:12,713][00415] Num frames 7200... [2023-02-22 11:23:12,835][00415] Num frames 7300... [2023-02-22 11:23:12,911][00415] Avg episode rewards: #0: 15.462, true rewards: #0: 8.129 [2023-02-22 11:23:12,913][00415] Avg episode reward: 15.462, avg true_objective: 8.129 [2023-02-22 11:23:13,024][00415] Num frames 7400... [2023-02-22 11:23:13,147][00415] Num frames 7500... [2023-02-22 11:23:13,258][00415] Num frames 7600... [2023-02-22 11:23:13,369][00415] Num frames 7700... [2023-02-22 11:23:13,487][00415] Num frames 7800... [2023-02-22 11:23:13,601][00415] Num frames 7900... [2023-02-22 11:23:13,709][00415] Num frames 8000... [2023-02-22 11:23:13,824][00415] Num frames 8100... [2023-02-22 11:23:13,934][00415] Num frames 8200... [2023-02-22 11:23:14,043][00415] Num frames 8300... [2023-02-22 11:23:14,161][00415] Num frames 8400... [2023-02-22 11:23:14,274][00415] Num frames 8500... [2023-02-22 11:23:14,386][00415] Num frames 8600... [2023-02-22 11:23:14,500][00415] Num frames 8700... [2023-02-22 11:23:14,618][00415] Num frames 8800... [2023-02-22 11:23:14,703][00415] Avg episode rewards: #0: 17.820, true rewards: #0: 8.820 [2023-02-22 11:23:14,708][00415] Avg episode reward: 17.820, avg true_objective: 8.820 [2023-02-22 11:24:10,191][00415] Replay video saved to /content/train_dir/default_experiment/replay.mp4!