[2023-02-26 05:30:01,058][01113] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 05:30:01,064][01113] Rollout worker 0 uses device cpu [2023-02-26 05:30:01,069][01113] Rollout worker 1 uses device cpu [2023-02-26 05:30:01,072][01113] Rollout worker 2 uses device cpu [2023-02-26 05:30:01,076][01113] Rollout worker 3 uses device cpu [2023-02-26 05:30:01,080][01113] Rollout worker 4 uses device cpu [2023-02-26 05:30:01,088][01113] Rollout worker 5 uses device cpu [2023-02-26 05:30:01,090][01113] Rollout worker 6 uses device cpu [2023-02-26 05:30:01,093][01113] Rollout worker 7 uses device cpu [2023-02-26 05:30:01,283][01113] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:30:01,285][01113] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 05:30:01,321][01113] Starting all processes... [2023-02-26 05:30:01,322][01113] Starting process learner_proc0 [2023-02-26 05:30:01,378][01113] Starting all processes... [2023-02-26 05:30:01,388][01113] Starting process inference_proc0-0 [2023-02-26 05:30:01,389][01113] Starting process rollout_proc0 [2023-02-26 05:30:01,391][01113] Starting process rollout_proc1 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc2 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc3 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc4 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc5 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc6 [2023-02-26 05:30:01,392][01113] Starting process rollout_proc7 [2023-02-26 05:30:11,150][12230] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:30:11,150][12230] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 05:30:11,574][12246] Worker 1 uses CPU cores [1] [2023-02-26 05:30:11,754][12251] Worker 6 uses CPU cores [0] [2023-02-26 05:30:12,023][12244] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:30:12,024][12244] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 05:30:12,098][12230] Num visible devices: 1 [2023-02-26 05:30:12,134][12244] Num visible devices: 1 [2023-02-26 05:30:12,148][12230] Starting seed is not provided [2023-02-26 05:30:12,149][12230] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:30:12,149][12230] Initializing actor-critic model on device cuda:0 [2023-02-26 05:30:12,150][12230] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:30:12,151][12230] RunningMeanStd input shape: (1,) [2023-02-26 05:30:12,209][12252] Worker 7 uses CPU cores [1] [2023-02-26 05:30:12,226][12245] Worker 0 uses CPU cores [0] [2023-02-26 05:30:12,246][12247] Worker 2 uses CPU cores [0] [2023-02-26 05:30:12,275][12230] ConvEncoder: input_channels=3 [2023-02-26 05:30:12,336][12248] Worker 3 uses CPU cores [1] [2023-02-26 05:30:12,386][12249] Worker 4 uses CPU cores [0] [2023-02-26 05:30:12,683][12250] Worker 5 uses CPU cores [1] [2023-02-26 05:30:12,822][12230] Conv encoder output size: 512 [2023-02-26 05:30:12,822][12230] Policy head output size: 512 [2023-02-26 05:30:12,878][12230] Created Actor Critic model with architecture: [2023-02-26 05:30:12,878][12230] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 05:30:20,191][12230] Using optimizer [2023-02-26 05:30:20,192][12230] No checkpoints found [2023-02-26 05:30:20,193][12230] Did not load from checkpoint, starting from scratch! [2023-02-26 05:30:20,193][12230] Initialized policy 0 weights for model version 0 [2023-02-26 05:30:20,197][12230] LearnerWorker_p0 finished initialization! [2023-02-26 05:30:20,197][12230] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:30:20,297][12244] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:30:20,298][12244] RunningMeanStd input shape: (1,) [2023-02-26 05:30:20,314][12244] ConvEncoder: input_channels=3 [2023-02-26 05:30:20,412][12244] Conv encoder output size: 512 [2023-02-26 05:30:20,412][12244] Policy head output size: 512 [2023-02-26 05:30:21,276][01113] Heartbeat connected on Batcher_0 [2023-02-26 05:30:21,282][01113] Heartbeat connected on LearnerWorker_p0 [2023-02-26 05:30:21,297][01113] Heartbeat connected on RolloutWorker_w0 [2023-02-26 05:30:21,303][01113] Heartbeat connected on RolloutWorker_w2 [2023-02-26 05:30:21,304][01113] Heartbeat connected on RolloutWorker_w1 [2023-02-26 05:30:21,308][01113] Heartbeat connected on RolloutWorker_w3 [2023-02-26 05:30:21,311][01113] Heartbeat connected on RolloutWorker_w4 [2023-02-26 05:30:21,315][01113] Heartbeat connected on RolloutWorker_w5 [2023-02-26 05:30:21,318][01113] Heartbeat connected on RolloutWorker_w6 [2023-02-26 05:30:21,325][01113] Heartbeat connected on RolloutWorker_w7 [2023-02-26 05:30:22,935][01113] Inference worker 0-0 is ready! [2023-02-26 05:30:22,937][01113] All inference workers are ready! Signal rollout workers to start! [2023-02-26 05:30:22,939][01113] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 05:30:23,096][12248] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,118][12252] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,159][12246] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,154][12250] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,161][12249] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,162][12245] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,197][12247] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:23,200][12251] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:30:24,862][01113] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:30:25,318][12245] Decorrelating experience for 0 frames... [2023-02-26 05:30:25,319][12249] Decorrelating experience for 0 frames... [2023-02-26 05:30:25,346][12248] Decorrelating experience for 0 frames... [2023-02-26 05:30:25,351][12246] Decorrelating experience for 0 frames... [2023-02-26 05:30:25,357][12250] Decorrelating experience for 0 frames... [2023-02-26 05:30:25,362][12252] Decorrelating experience for 0 frames... [2023-02-26 05:30:26,774][12252] Decorrelating experience for 32 frames... [2023-02-26 05:30:26,783][12250] Decorrelating experience for 32 frames... [2023-02-26 05:30:27,438][12245] Decorrelating experience for 32 frames... [2023-02-26 05:30:27,743][12249] Decorrelating experience for 32 frames... [2023-02-26 05:30:27,792][12247] Decorrelating experience for 0 frames... [2023-02-26 05:30:27,811][12251] Decorrelating experience for 0 frames... [2023-02-26 05:30:28,499][12252] Decorrelating experience for 64 frames... [2023-02-26 05:30:29,858][01113] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:30:30,545][12250] Decorrelating experience for 64 frames... [2023-02-26 05:30:30,851][12248] Decorrelating experience for 32 frames... [2023-02-26 05:30:31,184][12247] Decorrelating experience for 32 frames... [2023-02-26 05:30:31,186][12251] Decorrelating experience for 32 frames... [2023-02-26 05:30:31,188][12245] Decorrelating experience for 64 frames... [2023-02-26 05:30:31,687][12249] Decorrelating experience for 64 frames... [2023-02-26 05:30:31,712][12252] Decorrelating experience for 96 frames... [2023-02-26 05:30:32,546][12250] Decorrelating experience for 96 frames... [2023-02-26 05:30:32,747][12246] Decorrelating experience for 32 frames... [2023-02-26 05:30:32,752][12248] Decorrelating experience for 64 frames... [2023-02-26 05:30:33,739][12248] Decorrelating experience for 96 frames... [2023-02-26 05:30:33,869][12246] Decorrelating experience for 64 frames... [2023-02-26 05:30:34,003][12245] Decorrelating experience for 96 frames... [2023-02-26 05:30:34,088][12251] Decorrelating experience for 64 frames... [2023-02-26 05:30:34,332][12249] Decorrelating experience for 96 frames... [2023-02-26 05:30:34,706][12246] Decorrelating experience for 96 frames... [2023-02-26 05:30:34,720][12247] Decorrelating experience for 64 frames... [2023-02-26 05:30:34,853][01113] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:30:35,073][12251] Decorrelating experience for 96 frames... [2023-02-26 05:30:35,287][12247] Decorrelating experience for 96 frames... [2023-02-26 05:30:38,957][12230] Signal inference workers to stop experience collection... [2023-02-26 05:30:38,971][12244] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 05:30:39,853][01113] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 105.5. Samples: 1582. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:30:39,854][01113] Avg episode reward: [(0, '1.949')] [2023-02-26 05:30:41,557][12230] Signal inference workers to resume experience collection... [2023-02-26 05:30:41,559][12244] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 05:30:44,861][01113] Fps is (10 sec: 1227.9, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 189.0. Samples: 3780. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-26 05:30:44,863][01113] Avg episode reward: [(0, '3.231')] [2023-02-26 05:30:49,853][01113] Fps is (10 sec: 2867.1, 60 sec: 1147.3, 300 sec: 1147.3). Total num frames: 28672. Throughput: 0: 233.1. Samples: 5826. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 05:30:49,856][01113] Avg episode reward: [(0, '3.865')] [2023-02-26 05:30:53,666][12244] Updated weights for policy 0, policy_version 10 (0.0365) [2023-02-26 05:30:54,854][01113] Fps is (10 sec: 2869.2, 60 sec: 1365.7, 300 sec: 1365.7). Total num frames: 40960. Throughput: 0: 332.8. Samples: 9980. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2023-02-26 05:30:54,859][01113] Avg episode reward: [(0, '4.127')] [2023-02-26 05:30:59,853][01113] Fps is (10 sec: 3686.5, 60 sec: 1872.9, 300 sec: 1872.9). Total num frames: 65536. Throughput: 0: 456.7. Samples: 15980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:30:59,856][01113] Avg episode reward: [(0, '4.504')] [2023-02-26 05:31:03,081][12244] Updated weights for policy 0, policy_version 20 (0.0014) [2023-02-26 05:31:04,853][01113] Fps is (10 sec: 4505.8, 60 sec: 2150.9, 300 sec: 2150.9). Total num frames: 86016. Throughput: 0: 485.7. Samples: 19424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-26 05:31:04,856][01113] Avg episode reward: [(0, '4.485')] [2023-02-26 05:31:09,853][01113] Fps is (10 sec: 3276.8, 60 sec: 2185.0, 300 sec: 2185.0). Total num frames: 98304. Throughput: 0: 543.5. Samples: 24454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 05:31:09,857][01113] Avg episode reward: [(0, '4.464')] [2023-02-26 05:31:14,853][01113] Fps is (10 sec: 2457.6, 60 sec: 2212.2, 300 sec: 2212.2). Total num frames: 110592. Throughput: 0: 635.4. Samples: 28592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:31:14,857][01113] Avg episode reward: [(0, '4.588')] [2023-02-26 05:31:14,887][12230] Saving new best policy, reward=4.588! [2023-02-26 05:31:16,776][12244] Updated weights for policy 0, policy_version 30 (0.0021) [2023-02-26 05:31:19,853][01113] Fps is (10 sec: 3686.4, 60 sec: 2458.0, 300 sec: 2458.0). Total num frames: 135168. Throughput: 0: 695.7. Samples: 31306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:31:19,858][01113] Avg episode reward: [(0, '4.603')] [2023-02-26 05:31:19,862][12230] Saving new best policy, reward=4.603! [2023-02-26 05:31:24,853][01113] Fps is (10 sec: 4505.7, 60 sec: 2594.5, 300 sec: 2594.5). Total num frames: 155648. Throughput: 0: 816.0. Samples: 38302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:31:24,856][01113] Avg episode reward: [(0, '4.323')] [2023-02-26 05:31:26,066][12244] Updated weights for policy 0, policy_version 40 (0.0029) [2023-02-26 05:31:29,862][01113] Fps is (10 sec: 3683.2, 60 sec: 2867.0, 300 sec: 2646.6). Total num frames: 172032. Throughput: 0: 881.4. Samples: 43442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:31:29,869][01113] Avg episode reward: [(0, '4.449')] [2023-02-26 05:31:34,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2633.5). Total num frames: 184320. Throughput: 0: 878.1. Samples: 45342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:31:34,860][01113] Avg episode reward: [(0, '4.369')] [2023-02-26 05:31:39,398][12244] Updated weights for policy 0, policy_version 50 (0.0031) [2023-02-26 05:31:39,853][01113] Fps is (10 sec: 3279.7, 60 sec: 3413.3, 300 sec: 2731.0). Total num frames: 204800. Throughput: 0: 888.9. Samples: 49978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:31:39,856][01113] Avg episode reward: [(0, '4.475')] [2023-02-26 05:31:44,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3618.6, 300 sec: 2867.5). Total num frames: 229376. Throughput: 0: 909.9. Samples: 56926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:31:44,855][01113] Avg episode reward: [(0, '4.458')] [2023-02-26 05:31:49,757][12244] Updated weights for policy 0, policy_version 60 (0.0012) [2023-02-26 05:31:49,857][01113] Fps is (10 sec: 4094.1, 60 sec: 3617.9, 300 sec: 2891.4). Total num frames: 245760. Throughput: 0: 902.5. Samples: 60042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:31:49,860][01113] Avg episode reward: [(0, '4.284')] [2023-02-26 05:31:54,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 2867.5). Total num frames: 258048. Throughput: 0: 880.6. Samples: 64080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:31:54,861][01113] Avg episode reward: [(0, '4.350')] [2023-02-26 05:31:54,877][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth... [2023-02-26 05:31:59,853][01113] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 2889.0). Total num frames: 274432. Throughput: 0: 894.5. Samples: 68846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:31:59,861][01113] Avg episode reward: [(0, '4.486')] [2023-02-26 05:32:01,969][12244] Updated weights for policy 0, policy_version 70 (0.0026) [2023-02-26 05:32:04,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 2990.3). Total num frames: 299008. Throughput: 0: 909.2. Samples: 72222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:04,856][01113] Avg episode reward: [(0, '4.333')] [2023-02-26 05:32:09,857][01113] Fps is (10 sec: 4094.2, 60 sec: 3617.9, 300 sec: 3003.9). Total num frames: 315392. Throughput: 0: 894.0. Samples: 78536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:09,862][01113] Avg episode reward: [(0, '4.475')] [2023-02-26 05:32:13,600][12244] Updated weights for policy 0, policy_version 80 (0.0020) [2023-02-26 05:32:14,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 2979.1). Total num frames: 327680. Throughput: 0: 872.2. Samples: 82682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:14,863][01113] Avg episode reward: [(0, '4.390')] [2023-02-26 05:32:19,853][01113] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 2992.1). Total num frames: 344064. Throughput: 0: 872.9. Samples: 84622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:32:19,856][01113] Avg episode reward: [(0, '4.418')] [2023-02-26 05:32:24,683][12244] Updated weights for policy 0, policy_version 90 (0.0027) [2023-02-26 05:32:24,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3072.2). Total num frames: 368640. Throughput: 0: 906.5. Samples: 90770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:24,856][01113] Avg episode reward: [(0, '4.338')] [2023-02-26 05:32:29,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3550.4, 300 sec: 3080.4). Total num frames: 385024. Throughput: 0: 891.9. Samples: 97062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:32:29,861][01113] Avg episode reward: [(0, '4.449')] [2023-02-26 05:32:34,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3088.0). Total num frames: 401408. Throughput: 0: 868.4. Samples: 99116. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 05:32:34,861][01113] Avg episode reward: [(0, '4.487')] [2023-02-26 05:32:37,362][12244] Updated weights for policy 0, policy_version 100 (0.0021) [2023-02-26 05:32:39,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3064.6). Total num frames: 413696. Throughput: 0: 870.7. Samples: 103262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:32:39,860][01113] Avg episode reward: [(0, '4.690')] [2023-02-26 05:32:39,925][12230] Saving new best policy, reward=4.690! [2023-02-26 05:32:44,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3130.7). Total num frames: 438272. Throughput: 0: 898.3. Samples: 109270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:44,855][01113] Avg episode reward: [(0, '4.668')] [2023-02-26 05:32:47,549][12244] Updated weights for policy 0, policy_version 110 (0.0015) [2023-02-26 05:32:49,859][01113] Fps is (10 sec: 4502.7, 60 sec: 3549.8, 300 sec: 3163.9). Total num frames: 458752. Throughput: 0: 896.3. Samples: 112562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:32:49,866][01113] Avg episode reward: [(0, '4.851')] [2023-02-26 05:32:49,876][12230] Saving new best policy, reward=4.851! [2023-02-26 05:32:54,857][01113] Fps is (10 sec: 3275.6, 60 sec: 3549.6, 300 sec: 3140.4). Total num frames: 471040. Throughput: 0: 862.9. Samples: 117364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:32:54,859][01113] Avg episode reward: [(0, '4.946')] [2023-02-26 05:32:54,870][12230] Saving new best policy, reward=4.946! [2023-02-26 05:32:59,853][01113] Fps is (10 sec: 2459.2, 60 sec: 3481.6, 300 sec: 3118.4). Total num frames: 483328. Throughput: 0: 858.4. Samples: 121308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:32:59,859][01113] Avg episode reward: [(0, '5.075')] [2023-02-26 05:32:59,862][12230] Saving new best policy, reward=5.075! [2023-02-26 05:33:02,195][12244] Updated weights for policy 0, policy_version 120 (0.0030) [2023-02-26 05:33:04,853][01113] Fps is (10 sec: 2868.3, 60 sec: 3345.1, 300 sec: 3123.4). Total num frames: 499712. Throughput: 0: 857.6. Samples: 123212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:33:04,858][01113] Avg episode reward: [(0, '4.885')] [2023-02-26 05:33:09,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3277.0, 300 sec: 3103.2). Total num frames: 512000. Throughput: 0: 820.2. Samples: 127680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:33:09,860][01113] Avg episode reward: [(0, '4.913')] [2023-02-26 05:33:14,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3108.3). Total num frames: 528384. Throughput: 0: 784.3. Samples: 132354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:14,856][01113] Avg episode reward: [(0, '4.938')] [2023-02-26 05:33:15,827][12244] Updated weights for policy 0, policy_version 130 (0.0026) [2023-02-26 05:33:19,853][01113] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 3089.7). Total num frames: 540672. Throughput: 0: 783.2. Samples: 134362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:19,856][01113] Avg episode reward: [(0, '5.288')] [2023-02-26 05:33:19,858][12230] Saving new best policy, reward=5.288! [2023-02-26 05:33:24,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3117.7). Total num frames: 561152. Throughput: 0: 802.1. Samples: 139356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:24,855][01113] Avg episode reward: [(0, '5.257')] [2023-02-26 05:33:26,866][12244] Updated weights for policy 0, policy_version 140 (0.0021) [2023-02-26 05:33:29,853][01113] Fps is (10 sec: 4505.9, 60 sec: 3345.1, 300 sec: 3166.2). Total num frames: 585728. Throughput: 0: 820.8. Samples: 146204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:29,861][01113] Avg episode reward: [(0, '5.153')] [2023-02-26 05:33:34,855][01113] Fps is (10 sec: 4095.3, 60 sec: 3345.0, 300 sec: 3169.1). Total num frames: 602112. Throughput: 0: 810.8. Samples: 149044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:33:34,858][01113] Avg episode reward: [(0, '5.088')] [2023-02-26 05:33:39,258][12244] Updated weights for policy 0, policy_version 150 (0.0018) [2023-02-26 05:33:39,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3150.9). Total num frames: 614400. Throughput: 0: 795.0. Samples: 153136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:39,863][01113] Avg episode reward: [(0, '5.053')] [2023-02-26 05:33:44,853][01113] Fps is (10 sec: 3277.3, 60 sec: 3276.8, 300 sec: 3174.5). Total num frames: 634880. Throughput: 0: 829.7. Samples: 158646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:33:44,861][01113] Avg episode reward: [(0, '5.194')] [2023-02-26 05:33:49,033][12244] Updated weights for policy 0, policy_version 160 (0.0020) [2023-02-26 05:33:49,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3277.1, 300 sec: 3197.0). Total num frames: 655360. Throughput: 0: 863.5. Samples: 162070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:33:49,859][01113] Avg episode reward: [(0, '5.432')] [2023-02-26 05:33:49,863][12230] Saving new best policy, reward=5.432! [2023-02-26 05:33:54,854][01113] Fps is (10 sec: 3686.0, 60 sec: 3345.2, 300 sec: 3198.9). Total num frames: 671744. Throughput: 0: 899.4. Samples: 168156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:33:54,862][01113] Avg episode reward: [(0, '5.442')] [2023-02-26 05:33:54,888][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth... [2023-02-26 05:33:55,052][12230] Saving new best policy, reward=5.442! [2023-02-26 05:33:59,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3200.7). Total num frames: 688128. Throughput: 0: 887.8. Samples: 172304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:33:59,855][01113] Avg episode reward: [(0, '5.304')] [2023-02-26 05:34:02,179][12244] Updated weights for policy 0, policy_version 170 (0.0040) [2023-02-26 05:34:04,853][01113] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3202.5). Total num frames: 704512. Throughput: 0: 892.7. Samples: 174532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:34:04,855][01113] Avg episode reward: [(0, '5.462')] [2023-02-26 05:34:04,886][12230] Saving new best policy, reward=5.462! [2023-02-26 05:34:09,853][01113] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3240.5). Total num frames: 729088. Throughput: 0: 934.0. Samples: 181384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:34:09,856][01113] Avg episode reward: [(0, '5.469')] [2023-02-26 05:34:09,864][12230] Saving new best policy, reward=5.469! [2023-02-26 05:34:11,113][12244] Updated weights for policy 0, policy_version 180 (0.0016) [2023-02-26 05:34:14,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3259.1). Total num frames: 749568. Throughput: 0: 917.0. Samples: 187470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:34:14,861][01113] Avg episode reward: [(0, '5.530')] [2023-02-26 05:34:14,868][12230] Saving new best policy, reward=5.530! [2023-02-26 05:34:19,853][01113] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3242.1). Total num frames: 761856. Throughput: 0: 899.9. Samples: 189540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:34:19,856][01113] Avg episode reward: [(0, '5.959')] [2023-02-26 05:34:19,862][12230] Saving new best policy, reward=5.959! [2023-02-26 05:34:23,922][12244] Updated weights for policy 0, policy_version 190 (0.0030) [2023-02-26 05:34:24,853][01113] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3259.8). Total num frames: 782336. Throughput: 0: 906.2. Samples: 193916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:34:24,856][01113] Avg episode reward: [(0, '5.873')] [2023-02-26 05:34:29,853][01113] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3276.9). Total num frames: 802816. Throughput: 0: 942.0. Samples: 201038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:34:29,856][01113] Avg episode reward: [(0, '6.509')] [2023-02-26 05:34:29,861][12230] Saving new best policy, reward=6.509! [2023-02-26 05:34:32,916][12244] Updated weights for policy 0, policy_version 200 (0.0016) [2023-02-26 05:34:34,859][01113] Fps is (10 sec: 4093.9, 60 sec: 3686.2, 300 sec: 3293.2). Total num frames: 823296. Throughput: 0: 937.0. Samples: 204240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:34:34,867][01113] Avg episode reward: [(0, '6.273')] [2023-02-26 05:34:39,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.9). Total num frames: 835584. Throughput: 0: 903.7. Samples: 208822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:34:39,859][01113] Avg episode reward: [(0, '6.846')] [2023-02-26 05:34:39,865][12230] Saving new best policy, reward=6.846! [2023-02-26 05:34:44,853][01113] Fps is (10 sec: 2868.8, 60 sec: 3618.1, 300 sec: 3276.9). Total num frames: 851968. Throughput: 0: 913.4. Samples: 213406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:34:44,861][01113] Avg episode reward: [(0, '6.687')] [2023-02-26 05:34:45,780][12244] Updated weights for policy 0, policy_version 210 (0.0030) [2023-02-26 05:34:49,859][01113] Fps is (10 sec: 4093.6, 60 sec: 3686.0, 300 sec: 3307.7). Total num frames: 876544. Throughput: 0: 942.4. Samples: 216944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:34:49,862][01113] Avg episode reward: [(0, '6.715')] [2023-02-26 05:34:54,853][01113] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3322.4). Total num frames: 897024. Throughput: 0: 940.0. Samples: 223682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:34:54,860][01113] Avg episode reward: [(0, '6.707')] [2023-02-26 05:34:55,682][12244] Updated weights for policy 0, policy_version 220 (0.0021) [2023-02-26 05:34:59,853][01113] Fps is (10 sec: 3278.7, 60 sec: 3686.4, 300 sec: 3306.7). Total num frames: 909312. Throughput: 0: 903.2. Samples: 228114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:34:59,859][01113] Avg episode reward: [(0, '6.686')] [2023-02-26 05:35:04,853][01113] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3306.2). Total num frames: 925696. Throughput: 0: 902.4. Samples: 230148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:35:04,856][01113] Avg episode reward: [(0, '7.075')] [2023-02-26 05:35:04,869][12230] Saving new best policy, reward=7.075! [2023-02-26 05:35:07,882][12244] Updated weights for policy 0, policy_version 230 (0.0035) [2023-02-26 05:35:09,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3334.4). Total num frames: 950272. Throughput: 0: 941.1. Samples: 236266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 05:35:09,861][01113] Avg episode reward: [(0, '7.647')] [2023-02-26 05:35:09,866][12230] Saving new best policy, reward=7.647! [2023-02-26 05:35:14,853][01113] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3347.5). Total num frames: 970752. Throughput: 0: 932.4. Samples: 242996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 05:35:14,857][01113] Avg episode reward: [(0, '7.080')] [2023-02-26 05:35:18,933][12244] Updated weights for policy 0, policy_version 240 (0.0022) [2023-02-26 05:35:19,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3332.4). Total num frames: 983040. Throughput: 0: 906.6. Samples: 245034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:35:19,863][01113] Avg episode reward: [(0, '7.139')] [2023-02-26 05:35:24,855][01113] Fps is (10 sec: 2866.7, 60 sec: 3618.1, 300 sec: 3387.9). Total num frames: 999424. Throughput: 0: 900.5. Samples: 249348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:35:24,857][01113] Avg episode reward: [(0, '6.892')] [2023-02-26 05:35:29,626][12244] Updated weights for policy 0, policy_version 250 (0.0014) [2023-02-26 05:35:29,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3471.2). Total num frames: 1024000. Throughput: 0: 940.1. Samples: 255710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:35:29,855][01113] Avg episode reward: [(0, '7.521')] [2023-02-26 05:35:34,853][01113] Fps is (10 sec: 4096.7, 60 sec: 3618.5, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 933.5. Samples: 258946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:35:34,859][01113] Avg episode reward: [(0, '7.880')] [2023-02-26 05:35:34,881][12230] Saving new best policy, reward=7.880! [2023-02-26 05:35:39,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.8). Total num frames: 1052672. Throughput: 0: 871.3. Samples: 262888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:35:39,856][01113] Avg episode reward: [(0, '8.475')] [2023-02-26 05:35:39,864][12230] Saving new best policy, reward=8.475! [2023-02-26 05:35:44,839][12244] Updated weights for policy 0, policy_version 260 (0.0014) [2023-02-26 05:35:44,853][01113] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1064960. Throughput: 0: 847.8. Samples: 266266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:35:44,859][01113] Avg episode reward: [(0, '8.526')] [2023-02-26 05:35:44,871][12230] Saving new best policy, reward=8.526! [2023-02-26 05:35:49,853][01113] Fps is (10 sec: 2457.6, 60 sec: 3345.4, 300 sec: 3512.9). Total num frames: 1077248. Throughput: 0: 840.8. Samples: 267984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:35:49,860][01113] Avg episode reward: [(0, '9.096')] [2023-02-26 05:35:49,866][12230] Saving new best policy, reward=9.096! [2023-02-26 05:35:54,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 1097728. Throughput: 0: 825.6. Samples: 273418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:35:54,860][01113] Avg episode reward: [(0, '9.404')] [2023-02-26 05:35:54,869][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... [2023-02-26 05:35:55,023][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth [2023-02-26 05:35:55,035][12230] Saving new best policy, reward=9.404! [2023-02-26 05:35:56,550][12244] Updated weights for policy 0, policy_version 270 (0.0021) [2023-02-26 05:35:59,857][01113] Fps is (10 sec: 4094.4, 60 sec: 3481.4, 300 sec: 3498.9). Total num frames: 1118208. Throughput: 0: 822.3. Samples: 280002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:35:59,859][01113] Avg episode reward: [(0, '9.057')] [2023-02-26 05:36:04,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1134592. Throughput: 0: 826.5. Samples: 282226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:36:04,858][01113] Avg episode reward: [(0, '8.853')] [2023-02-26 05:36:08,752][12244] Updated weights for policy 0, policy_version 280 (0.0028) [2023-02-26 05:36:09,853][01113] Fps is (10 sec: 2868.3, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 1146880. Throughput: 0: 827.2. Samples: 286570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:36:09,856][01113] Avg episode reward: [(0, '8.480')] [2023-02-26 05:36:14,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1171456. Throughput: 0: 822.7. Samples: 292732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:36:14,858][01113] Avg episode reward: [(0, '8.307')] [2023-02-26 05:36:18,192][12244] Updated weights for policy 0, policy_version 290 (0.0035) [2023-02-26 05:36:19,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1191936. Throughput: 0: 826.0. Samples: 296116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:36:19,856][01113] Avg episode reward: [(0, '8.442')] [2023-02-26 05:36:24,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3512.9). Total num frames: 1208320. Throughput: 0: 857.1. Samples: 301456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:36:24,858][01113] Avg episode reward: [(0, '9.089')] [2023-02-26 05:36:29,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1224704. Throughput: 0: 882.1. Samples: 305960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-26 05:36:29,855][01113] Avg episode reward: [(0, '9.067')] [2023-02-26 05:36:30,701][12244] Updated weights for policy 0, policy_version 300 (0.0012) [2023-02-26 05:36:34,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1245184. Throughput: 0: 909.1. Samples: 308894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:36:34,861][01113] Avg episode reward: [(0, '9.560')] [2023-02-26 05:36:34,872][12230] Saving new best policy, reward=9.560! [2023-02-26 05:36:39,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1265664. Throughput: 0: 932.5. Samples: 315380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:36:39,855][01113] Avg episode reward: [(0, '10.040')] [2023-02-26 05:36:39,863][12230] Saving new best policy, reward=10.040! [2023-02-26 05:36:40,689][12244] Updated weights for policy 0, policy_version 310 (0.0018) [2023-02-26 05:36:44,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3512.9). Total num frames: 1282048. Throughput: 0: 898.6. Samples: 320436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:36:44,859][01113] Avg episode reward: [(0, '10.088')] [2023-02-26 05:36:44,874][12230] Saving new best policy, reward=10.088! [2023-02-26 05:36:49,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1294336. Throughput: 0: 897.8. Samples: 322626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-26 05:36:49,867][01113] Avg episode reward: [(0, '11.013')] [2023-02-26 05:36:49,872][12230] Saving new best policy, reward=11.013! [2023-02-26 05:36:52,947][12244] Updated weights for policy 0, policy_version 320 (0.0017) [2023-02-26 05:36:54,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1318912. Throughput: 0: 926.6. Samples: 328268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:36:54,856][01113] Avg episode reward: [(0, '11.208')] [2023-02-26 05:36:54,876][12230] Saving new best policy, reward=11.208! [2023-02-26 05:36:59,853][01113] Fps is (10 sec: 4505.5, 60 sec: 3686.6, 300 sec: 3526.7). Total num frames: 1339392. Throughput: 0: 936.1. Samples: 334858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:36:59,856][01113] Avg episode reward: [(0, '11.779')] [2023-02-26 05:36:59,863][12230] Saving new best policy, reward=11.779! [2023-02-26 05:37:03,503][12244] Updated weights for policy 0, policy_version 330 (0.0015) [2023-02-26 05:37:04,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3526.8). Total num frames: 1355776. Throughput: 0: 915.2. Samples: 337302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:37:04,858][01113] Avg episode reward: [(0, '11.658')] [2023-02-26 05:37:09,856][01113] Fps is (10 sec: 2866.5, 60 sec: 3686.2, 300 sec: 3526.7). Total num frames: 1368064. Throughput: 0: 897.2. Samples: 341832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:37:09,859][01113] Avg episode reward: [(0, '11.659')] [2023-02-26 05:37:14,556][12244] Updated weights for policy 0, policy_version 340 (0.0031) [2023-02-26 05:37:14,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1392640. Throughput: 0: 931.4. Samples: 347872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:37:14,859][01113] Avg episode reward: [(0, '11.518')] [2023-02-26 05:37:19,853][01113] Fps is (10 sec: 4506.8, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1413120. Throughput: 0: 941.2. Samples: 351246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:37:19,859][01113] Avg episode reward: [(0, '10.878')] [2023-02-26 05:37:24,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1429504. Throughput: 0: 920.4. Samples: 356800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:37:24,859][01113] Avg episode reward: [(0, '10.990')] [2023-02-26 05:37:25,639][12244] Updated weights for policy 0, policy_version 350 (0.0028) [2023-02-26 05:37:29,854][01113] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1441792. Throughput: 0: 904.9. Samples: 361158. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 05:37:29,863][01113] Avg episode reward: [(0, '12.164')] [2023-02-26 05:37:29,892][12230] Saving new best policy, reward=12.164! [2023-02-26 05:37:34,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1466368. Throughput: 0: 916.7. Samples: 363876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:37:34,858][01113] Avg episode reward: [(0, '11.934')] [2023-02-26 05:37:36,604][12244] Updated weights for policy 0, policy_version 360 (0.0017) [2023-02-26 05:37:39,855][01113] Fps is (10 sec: 4504.8, 60 sec: 3686.3, 300 sec: 3554.5). Total num frames: 1486848. Throughput: 0: 939.9. Samples: 370564. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:37:39,858][01113] Avg episode reward: [(0, '13.120')] [2023-02-26 05:37:39,861][12230] Saving new best policy, reward=13.120! [2023-02-26 05:37:44,853][01113] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3540.7). Total num frames: 1503232. Throughput: 0: 907.1. Samples: 375678. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-26 05:37:44,856][01113] Avg episode reward: [(0, '13.573')] [2023-02-26 05:37:44,873][12230] Saving new best policy, reward=13.573! [2023-02-26 05:37:48,806][12244] Updated weights for policy 0, policy_version 370 (0.0033) [2023-02-26 05:37:49,853][01113] Fps is (10 sec: 2867.7, 60 sec: 3686.4, 300 sec: 3540.7). Total num frames: 1515520. Throughput: 0: 902.3. Samples: 377904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:37:49,856][01113] Avg episode reward: [(0, '13.152')] [2023-02-26 05:37:54,853][01113] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 1536000. Throughput: 0: 919.3. Samples: 383198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:37:54,856][01113] Avg episode reward: [(0, '13.638')] [2023-02-26 05:37:54,869][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth... [2023-02-26 05:37:55,007][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_671744.pth [2023-02-26 05:37:55,022][12230] Saving new best policy, reward=13.638! [2023-02-26 05:37:58,828][12244] Updated weights for policy 0, policy_version 380 (0.0026) [2023-02-26 05:37:59,853][01113] Fps is (10 sec: 4505.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1560576. Throughput: 0: 928.8. Samples: 389668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:37:59,855][01113] Avg episode reward: [(0, '13.645')] [2023-02-26 05:37:59,864][12230] Saving new best policy, reward=13.645! [2023-02-26 05:38:04,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1572864. Throughput: 0: 913.2. Samples: 392342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:38:04,855][01113] Avg episode reward: [(0, '12.912')] [2023-02-26 05:38:09,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3686.6, 300 sec: 3596.2). Total num frames: 1589248. Throughput: 0: 887.3. Samples: 396728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:38:09,859][01113] Avg episode reward: [(0, '13.238')] [2023-02-26 05:38:12,668][12244] Updated weights for policy 0, policy_version 390 (0.0015) [2023-02-26 05:38:14,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3596.2). Total num frames: 1601536. Throughput: 0: 879.3. Samples: 400726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:38:14,859][01113] Avg episode reward: [(0, '13.077')] [2023-02-26 05:38:19,853][01113] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3568.4). Total num frames: 1613824. Throughput: 0: 864.3. Samples: 402770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:38:19,859][01113] Avg episode reward: [(0, '13.256')] [2023-02-26 05:38:24,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3554.5). Total num frames: 1634304. Throughput: 0: 830.6. Samples: 407938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:38:24,859][01113] Avg episode reward: [(0, '12.740')] [2023-02-26 05:38:26,402][12244] Updated weights for policy 0, policy_version 400 (0.0029) [2023-02-26 05:38:29,854][01113] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 1646592. Throughput: 0: 808.3. Samples: 412052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:38:29,858][01113] Avg episode reward: [(0, '12.068')] [2023-02-26 05:38:34,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3554.5). Total num frames: 1662976. Throughput: 0: 804.8. Samples: 414118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:38:34,859][01113] Avg episode reward: [(0, '11.760')] [2023-02-26 05:38:37,759][12244] Updated weights for policy 0, policy_version 410 (0.0027) [2023-02-26 05:38:39,853][01113] Fps is (10 sec: 4096.3, 60 sec: 3345.2, 300 sec: 3568.4). Total num frames: 1687552. Throughput: 0: 831.3. Samples: 420608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:38:39,861][01113] Avg episode reward: [(0, '10.902')] [2023-02-26 05:38:44,854][01113] Fps is (10 sec: 4095.7, 60 sec: 3345.0, 300 sec: 3554.5). Total num frames: 1703936. Throughput: 0: 823.4. Samples: 426720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:38:44,857][01113] Avg episode reward: [(0, '10.965')] [2023-02-26 05:38:49,117][12244] Updated weights for policy 0, policy_version 420 (0.0028) [2023-02-26 05:38:49,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3554.5). Total num frames: 1720320. Throughput: 0: 813.1. Samples: 428930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:38:49,861][01113] Avg episode reward: [(0, '11.451')] [2023-02-26 05:38:54,853][01113] Fps is (10 sec: 3686.7, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 1740800. Throughput: 0: 817.6. Samples: 433520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:38:54,855][01113] Avg episode reward: [(0, '12.258')] [2023-02-26 05:38:59,372][12244] Updated weights for policy 0, policy_version 430 (0.0020) [2023-02-26 05:38:59,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3582.3). Total num frames: 1761280. Throughput: 0: 880.7. Samples: 440356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:38:59,856][01113] Avg episode reward: [(0, '12.368')] [2023-02-26 05:39:04,853][01113] Fps is (10 sec: 4095.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 1781760. Throughput: 0: 908.5. Samples: 443654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:39:04,857][01113] Avg episode reward: [(0, '13.552')] [2023-02-26 05:39:09,855][01113] Fps is (10 sec: 3276.2, 60 sec: 3413.2, 300 sec: 3540.6). Total num frames: 1794048. Throughput: 0: 896.9. Samples: 448300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:39:09,856][01113] Avg episode reward: [(0, '13.680')] [2023-02-26 05:39:09,862][12230] Saving new best policy, reward=13.680! [2023-02-26 05:39:11,778][12244] Updated weights for policy 0, policy_version 440 (0.0013) [2023-02-26 05:39:14,854][01113] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 1810432. Throughput: 0: 913.8. Samples: 453172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:39:14,861][01113] Avg episode reward: [(0, '14.404')] [2023-02-26 05:39:14,957][12230] Saving new best policy, reward=14.404! [2023-02-26 05:39:19,853][01113] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1835008. Throughput: 0: 944.9. Samples: 456638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:39:19,856][01113] Avg episode reward: [(0, '15.202')] [2023-02-26 05:39:19,864][12230] Saving new best policy, reward=15.202! [2023-02-26 05:39:21,206][12244] Updated weights for policy 0, policy_version 450 (0.0017) [2023-02-26 05:39:24,853][01113] Fps is (10 sec: 4506.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1855488. Throughput: 0: 948.4. Samples: 463284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:39:24,859][01113] Avg episode reward: [(0, '14.234')] [2023-02-26 05:39:29,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3540.7). Total num frames: 1867776. Throughput: 0: 909.5. Samples: 467648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:39:29,860][01113] Avg episode reward: [(0, '15.650')] [2023-02-26 05:39:29,864][12230] Saving new best policy, reward=15.650! [2023-02-26 05:39:34,064][12244] Updated weights for policy 0, policy_version 460 (0.0018) [2023-02-26 05:39:34,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1884160. Throughput: 0: 909.2. Samples: 469842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:39:34,856][01113] Avg episode reward: [(0, '14.674')] [2023-02-26 05:39:39,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1908736. Throughput: 0: 943.4. Samples: 475972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:39:39,856][01113] Avg episode reward: [(0, '14.546')] [2023-02-26 05:39:43,128][12244] Updated weights for policy 0, policy_version 470 (0.0015) [2023-02-26 05:39:44,853][01113] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 1929216. Throughput: 0: 937.8. Samples: 482558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:39:44,860][01113] Avg episode reward: [(0, '14.763')] [2023-02-26 05:39:49,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3554.5). Total num frames: 1945600. Throughput: 0: 913.2. Samples: 484748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:39:49,859][01113] Avg episode reward: [(0, '14.997')] [2023-02-26 05:39:54,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1961984. Throughput: 0: 910.9. Samples: 489290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:39:54,861][01113] Avg episode reward: [(0, '14.516')] [2023-02-26 05:39:54,869][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000479_1961984.pth... [2023-02-26 05:39:54,981][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth [2023-02-26 05:39:55,611][12244] Updated weights for policy 0, policy_version 480 (0.0027) [2023-02-26 05:39:59,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1982464. Throughput: 0: 952.2. Samples: 496018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:39:59,860][01113] Avg episode reward: [(0, '13.835')] [2023-02-26 05:40:04,855][01113] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 2002944. Throughput: 0: 947.2. Samples: 499262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:04,863][01113] Avg episode reward: [(0, '13.655')] [2023-02-26 05:40:05,207][12244] Updated weights for policy 0, policy_version 490 (0.0023) [2023-02-26 05:40:09,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3554.5). Total num frames: 2019328. Throughput: 0: 911.1. Samples: 504282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:09,861][01113] Avg episode reward: [(0, '13.210')] [2023-02-26 05:40:14,853][01113] Fps is (10 sec: 3277.6, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 2035712. Throughput: 0: 914.9. Samples: 508820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:14,856][01113] Avg episode reward: [(0, '13.526')] [2023-02-26 05:40:17,204][12244] Updated weights for policy 0, policy_version 500 (0.0020) [2023-02-26 05:40:19,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2056192. Throughput: 0: 944.4. Samples: 512340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:19,856][01113] Avg episode reward: [(0, '14.869')] [2023-02-26 05:40:24,855][01113] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 2076672. Throughput: 0: 953.6. Samples: 518884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:24,864][01113] Avg episode reward: [(0, '15.609')] [2023-02-26 05:40:28,116][12244] Updated weights for policy 0, policy_version 510 (0.0023) [2023-02-26 05:40:29,859][01113] Fps is (10 sec: 3684.0, 60 sec: 3754.3, 300 sec: 3568.3). Total num frames: 2093056. Throughput: 0: 909.4. Samples: 523486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:29,867][01113] Avg episode reward: [(0, '16.031')] [2023-02-26 05:40:29,869][12230] Saving new best policy, reward=16.031! [2023-02-26 05:40:34,853][01113] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 2109440. Throughput: 0: 907.9. Samples: 525602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:34,859][01113] Avg episode reward: [(0, '16.096')] [2023-02-26 05:40:34,870][12230] Saving new best policy, reward=16.096! [2023-02-26 05:40:39,312][12244] Updated weights for policy 0, policy_version 520 (0.0029) [2023-02-26 05:40:39,853][01113] Fps is (10 sec: 3688.8, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 2129920. Throughput: 0: 937.9. Samples: 531494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:39,856][01113] Avg episode reward: [(0, '16.785')] [2023-02-26 05:40:39,861][12230] Saving new best policy, reward=16.785! [2023-02-26 05:40:44,857][01113] Fps is (10 sec: 4094.5, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 2150400. Throughput: 0: 933.9. Samples: 538048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:44,860][01113] Avg episode reward: [(0, '17.162')] [2023-02-26 05:40:44,872][12230] Saving new best policy, reward=17.162! [2023-02-26 05:40:49,858][01113] Fps is (10 sec: 3275.3, 60 sec: 3617.8, 300 sec: 3610.0). Total num frames: 2162688. Throughput: 0: 910.8. Samples: 540250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:49,860][01113] Avg episode reward: [(0, '17.554')] [2023-02-26 05:40:49,887][12230] Saving new best policy, reward=17.554! [2023-02-26 05:40:51,556][12244] Updated weights for policy 0, policy_version 530 (0.0012) [2023-02-26 05:40:54,853][01113] Fps is (10 sec: 2868.3, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 2179072. Throughput: 0: 888.3. Samples: 544254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:40:54,858][01113] Avg episode reward: [(0, '17.668')] [2023-02-26 05:40:54,869][12230] Saving new best policy, reward=17.668! [2023-02-26 05:40:59,853][01113] Fps is (10 sec: 2868.5, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 2191360. Throughput: 0: 872.8. Samples: 548098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:40:59,861][01113] Avg episode reward: [(0, '18.658')] [2023-02-26 05:40:59,870][12230] Saving new best policy, reward=18.658! [2023-02-26 05:41:04,675][12244] Updated weights for policy 0, policy_version 540 (0.0022) [2023-02-26 05:41:04,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3610.0). Total num frames: 2211840. Throughput: 0: 844.1. Samples: 550324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:41:04,855][01113] Avg episode reward: [(0, '18.644')] [2023-02-26 05:41:09,856][01113] Fps is (10 sec: 3685.5, 60 sec: 3481.4, 300 sec: 3582.2). Total num frames: 2228224. Throughput: 0: 839.9. Samples: 556680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:41:09,859][01113] Avg episode reward: [(0, '18.530')] [2023-02-26 05:41:14,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 2244608. Throughput: 0: 833.9. Samples: 561004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:41:14,855][01113] Avg episode reward: [(0, '18.418')] [2023-02-26 05:41:17,472][12244] Updated weights for policy 0, policy_version 550 (0.0030) [2023-02-26 05:41:19,853][01113] Fps is (10 sec: 3277.7, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 2260992. Throughput: 0: 834.8. Samples: 563170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:41:19,858][01113] Avg episode reward: [(0, '18.791')] [2023-02-26 05:41:19,861][12230] Saving new best policy, reward=18.791! [2023-02-26 05:41:24,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3582.3). Total num frames: 2281472. Throughput: 0: 854.5. Samples: 569948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:41:24,859][01113] Avg episode reward: [(0, '19.012')] [2023-02-26 05:41:24,933][12230] Saving new best policy, reward=19.012! [2023-02-26 05:41:26,704][12244] Updated weights for policy 0, policy_version 560 (0.0020) [2023-02-26 05:41:29,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3482.0, 300 sec: 3582.3). Total num frames: 2301952. Throughput: 0: 841.4. Samples: 575908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:41:29,857][01113] Avg episode reward: [(0, '19.733')] [2023-02-26 05:41:29,862][12230] Saving new best policy, reward=19.733! [2023-02-26 05:41:34,854][01113] Fps is (10 sec: 3686.1, 60 sec: 3481.5, 300 sec: 3568.4). Total num frames: 2318336. Throughput: 0: 839.9. Samples: 578044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:41:34,860][01113] Avg episode reward: [(0, '20.065')] [2023-02-26 05:41:34,872][12230] Saving new best policy, reward=20.065! [2023-02-26 05:41:39,237][12244] Updated weights for policy 0, policy_version 570 (0.0030) [2023-02-26 05:41:39,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 2334720. Throughput: 0: 852.8. Samples: 582630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:41:39,862][01113] Avg episode reward: [(0, '21.671')] [2023-02-26 05:41:39,865][12230] Saving new best policy, reward=21.671! [2023-02-26 05:41:44,853][01113] Fps is (10 sec: 4096.3, 60 sec: 3481.8, 300 sec: 3610.0). Total num frames: 2359296. Throughput: 0: 918.5. Samples: 589430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:41:44,860][01113] Avg episode reward: [(0, '21.709')] [2023-02-26 05:41:44,874][12230] Saving new best policy, reward=21.709! [2023-02-26 05:41:48,920][12244] Updated weights for policy 0, policy_version 580 (0.0016) [2023-02-26 05:41:49,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3550.1, 300 sec: 3582.3). Total num frames: 2375680. Throughput: 0: 942.8. Samples: 592750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:41:49,857][01113] Avg episode reward: [(0, '20.421')] [2023-02-26 05:41:54,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 2392064. Throughput: 0: 907.3. Samples: 597508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:41:54,855][01113] Avg episode reward: [(0, '20.331')] [2023-02-26 05:41:54,876][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000584_2392064.pth... [2023-02-26 05:41:54,999][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth [2023-02-26 05:41:59,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 2408448. Throughput: 0: 925.6. Samples: 602658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:41:59,863][01113] Avg episode reward: [(0, '19.630')] [2023-02-26 05:42:00,761][12244] Updated weights for policy 0, policy_version 590 (0.0039) [2023-02-26 05:42:04,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3610.1). Total num frames: 2433024. Throughput: 0: 954.2. Samples: 606110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:42:04,858][01113] Avg episode reward: [(0, '18.579')] [2023-02-26 05:42:09,853][01113] Fps is (10 sec: 4095.9, 60 sec: 3686.6, 300 sec: 3582.3). Total num frames: 2449408. Throughput: 0: 948.7. Samples: 612640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:09,856][01113] Avg episode reward: [(0, '19.045')] [2023-02-26 05:42:11,182][12244] Updated weights for policy 0, policy_version 600 (0.0015) [2023-02-26 05:42:14,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2465792. Throughput: 0: 914.8. Samples: 617076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:14,860][01113] Avg episode reward: [(0, '19.276')] [2023-02-26 05:42:19,853][01113] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 2486272. Throughput: 0: 920.1. Samples: 619448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:42:19,856][01113] Avg episode reward: [(0, '20.216')] [2023-02-26 05:42:22,263][12244] Updated weights for policy 0, policy_version 610 (0.0030) [2023-02-26 05:42:24,853][01113] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 2506752. Throughput: 0: 965.1. Samples: 626058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:42:24,855][01113] Avg episode reward: [(0, '19.615')] [2023-02-26 05:42:29,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 2527232. Throughput: 0: 955.1. Samples: 632410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:42:29,862][01113] Avg episode reward: [(0, '20.574')] [2023-02-26 05:42:33,172][12244] Updated weights for policy 0, policy_version 620 (0.0015) [2023-02-26 05:42:34,855][01113] Fps is (10 sec: 3685.7, 60 sec: 3754.6, 300 sec: 3582.3). Total num frames: 2543616. Throughput: 0: 929.0. Samples: 634558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:34,859][01113] Avg episode reward: [(0, '20.466')] [2023-02-26 05:42:39,853][01113] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3582.3). Total num frames: 2560000. Throughput: 0: 926.2. Samples: 639186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:39,860][01113] Avg episode reward: [(0, '21.241')] [2023-02-26 05:42:43,761][12244] Updated weights for policy 0, policy_version 630 (0.0044) [2023-02-26 05:42:44,853][01113] Fps is (10 sec: 4096.7, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2584576. Throughput: 0: 961.1. Samples: 645908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:42:44,862][01113] Avg episode reward: [(0, '20.855')] [2023-02-26 05:42:49,859][01113] Fps is (10 sec: 4093.7, 60 sec: 3754.3, 300 sec: 3610.0). Total num frames: 2600960. Throughput: 0: 959.1. Samples: 649276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:42:49,861][01113] Avg episode reward: [(0, '22.215')] [2023-02-26 05:42:49,868][12230] Saving new best policy, reward=22.215! [2023-02-26 05:42:54,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 2617344. Throughput: 0: 914.5. Samples: 653794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:54,855][01113] Avg episode reward: [(0, '21.700')] [2023-02-26 05:42:55,835][12244] Updated weights for policy 0, policy_version 640 (0.0017) [2023-02-26 05:42:59,853][01113] Fps is (10 sec: 3278.7, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 2633728. Throughput: 0: 927.0. Samples: 658790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:42:59,862][01113] Avg episode reward: [(0, '21.591')] [2023-02-26 05:43:04,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2658304. Throughput: 0: 946.4. Samples: 662034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:43:04,859][01113] Avg episode reward: [(0, '21.382')] [2023-02-26 05:43:05,796][12244] Updated weights for policy 0, policy_version 650 (0.0014) [2023-02-26 05:43:09,860][01113] Fps is (10 sec: 4093.2, 60 sec: 3754.3, 300 sec: 3637.7). Total num frames: 2674688. Throughput: 0: 944.8. Samples: 668580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:43:09,871][01113] Avg episode reward: [(0, '20.520')] [2023-02-26 05:43:14,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2691072. Throughput: 0: 896.8. Samples: 672768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:43:14,860][01113] Avg episode reward: [(0, '20.328')] [2023-02-26 05:43:18,650][12244] Updated weights for policy 0, policy_version 660 (0.0017) [2023-02-26 05:43:19,853][01113] Fps is (10 sec: 3279.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2707456. Throughput: 0: 897.7. Samples: 674954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:43:19,860][01113] Avg episode reward: [(0, '19.387')] [2023-02-26 05:43:24,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2727936. Throughput: 0: 936.4. Samples: 681326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:43:24,860][01113] Avg episode reward: [(0, '20.338')] [2023-02-26 05:43:28,857][12244] Updated weights for policy 0, policy_version 670 (0.0020) [2023-02-26 05:43:29,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2744320. Throughput: 0: 903.8. Samples: 686580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:43:29,859][01113] Avg episode reward: [(0, '20.152')] [2023-02-26 05:43:34,854][01113] Fps is (10 sec: 2867.0, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2756608. Throughput: 0: 863.5. Samples: 688130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:43:34,857][01113] Avg episode reward: [(0, '19.919')] [2023-02-26 05:43:39,853][01113] Fps is (10 sec: 2047.9, 60 sec: 3413.3, 300 sec: 3596.2). Total num frames: 2764800. Throughput: 0: 839.1. Samples: 691556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:43:39,856][01113] Avg episode reward: [(0, '20.287')] [2023-02-26 05:43:44,853][01113] Fps is (10 sec: 2457.8, 60 sec: 3276.8, 300 sec: 3596.1). Total num frames: 2781184. Throughput: 0: 816.0. Samples: 695510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:43:44,857][01113] Avg episode reward: [(0, '21.015')] [2023-02-26 05:43:45,152][12244] Updated weights for policy 0, policy_version 680 (0.0016) [2023-02-26 05:43:49,853][01113] Fps is (10 sec: 4096.2, 60 sec: 3413.7, 300 sec: 3610.0). Total num frames: 2805760. Throughput: 0: 817.8. Samples: 698836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:43:49,856][01113] Avg episode reward: [(0, '21.146')] [2023-02-26 05:43:54,855][01113] Fps is (10 sec: 4095.0, 60 sec: 3413.2, 300 sec: 3596.1). Total num frames: 2822144. Throughput: 0: 824.3. Samples: 705670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:43:54,859][01113] Avg episode reward: [(0, '21.192')] [2023-02-26 05:43:54,878][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000689_2822144.pth... [2023-02-26 05:43:55,013][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000479_1961984.pth [2023-02-26 05:43:55,174][12244] Updated weights for policy 0, policy_version 690 (0.0013) [2023-02-26 05:43:59,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3582.3). Total num frames: 2838528. Throughput: 0: 829.3. Samples: 710088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:43:59,864][01113] Avg episode reward: [(0, '21.267')] [2023-02-26 05:44:04,853][01113] Fps is (10 sec: 3277.6, 60 sec: 3276.8, 300 sec: 3596.2). Total num frames: 2854912. Throughput: 0: 829.5. Samples: 712282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:04,862][01113] Avg episode reward: [(0, '21.106')] [2023-02-26 05:44:06,956][12244] Updated weights for policy 0, policy_version 700 (0.0012) [2023-02-26 05:44:09,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3413.7, 300 sec: 3623.9). Total num frames: 2879488. Throughput: 0: 829.2. Samples: 718638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-26 05:44:09,855][01113] Avg episode reward: [(0, '20.735')] [2023-02-26 05:44:14,854][01113] Fps is (10 sec: 4095.7, 60 sec: 3413.3, 300 sec: 3596.1). Total num frames: 2895872. Throughput: 0: 848.5. Samples: 724762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:14,857][01113] Avg episode reward: [(0, '20.831')] [2023-02-26 05:44:17,800][12244] Updated weights for policy 0, policy_version 710 (0.0013) [2023-02-26 05:44:19,856][01113] Fps is (10 sec: 3275.7, 60 sec: 3413.1, 300 sec: 3582.2). Total num frames: 2912256. Throughput: 0: 863.2. Samples: 726974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:19,863][01113] Avg episode reward: [(0, '19.530')] [2023-02-26 05:44:24,853][01113] Fps is (10 sec: 3277.0, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 2928640. Throughput: 0: 888.3. Samples: 731528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:44:24,860][01113] Avg episode reward: [(0, '18.897')] [2023-02-26 05:44:28,803][12244] Updated weights for policy 0, policy_version 720 (0.0016) [2023-02-26 05:44:29,853][01113] Fps is (10 sec: 4097.4, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 2953216. Throughput: 0: 945.6. Samples: 738062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:29,855][01113] Avg episode reward: [(0, '19.454')] [2023-02-26 05:44:34,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 2969600. Throughput: 0: 943.8. Samples: 741306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:34,861][01113] Avg episode reward: [(0, '20.024')] [2023-02-26 05:44:39,858][01113] Fps is (10 sec: 3275.3, 60 sec: 3686.1, 300 sec: 3582.2). Total num frames: 2985984. Throughput: 0: 898.1. Samples: 746086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:39,862][01113] Avg episode reward: [(0, '20.284')] [2023-02-26 05:44:40,814][12244] Updated weights for policy 0, policy_version 730 (0.0022) [2023-02-26 05:44:44,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3002368. Throughput: 0: 900.8. Samples: 750622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 05:44:44,856][01113] Avg episode reward: [(0, '20.588')] [2023-02-26 05:44:49,855][01113] Fps is (10 sec: 3687.5, 60 sec: 3618.0, 300 sec: 3596.1). Total num frames: 3022848. Throughput: 0: 923.9. Samples: 753860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:49,858][01113] Avg episode reward: [(0, '21.652')] [2023-02-26 05:44:51,031][12244] Updated weights for policy 0, policy_version 740 (0.0025) [2023-02-26 05:44:54,858][01113] Fps is (10 sec: 4094.1, 60 sec: 3686.3, 300 sec: 3596.1). Total num frames: 3043328. Throughput: 0: 930.3. Samples: 760508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:54,863][01113] Avg episode reward: [(0, '21.513')] [2023-02-26 05:44:59,853][01113] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3059712. Throughput: 0: 896.0. Samples: 765082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:44:59,860][01113] Avg episode reward: [(0, '20.205')] [2023-02-26 05:45:03,589][12244] Updated weights for policy 0, policy_version 750 (0.0013) [2023-02-26 05:45:04,853][01113] Fps is (10 sec: 3278.3, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3076096. Throughput: 0: 897.7. Samples: 767368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:45:04,859][01113] Avg episode reward: [(0, '20.886')] [2023-02-26 05:45:09,853][01113] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3096576. Throughput: 0: 934.2. Samples: 773566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:45:09,860][01113] Avg episode reward: [(0, '19.979')] [2023-02-26 05:45:12,823][12244] Updated weights for policy 0, policy_version 760 (0.0014) [2023-02-26 05:45:14,854][01113] Fps is (10 sec: 4095.7, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3117056. Throughput: 0: 930.1. Samples: 779918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:45:14,859][01113] Avg episode reward: [(0, '19.469')] [2023-02-26 05:45:19,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3582.3). Total num frames: 3133440. Throughput: 0: 907.5. Samples: 782144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:45:19,861][01113] Avg episode reward: [(0, '18.051')] [2023-02-26 05:45:24,853][01113] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3149824. Throughput: 0: 902.8. Samples: 786708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:45:24,861][01113] Avg episode reward: [(0, '17.582')] [2023-02-26 05:45:25,146][12244] Updated weights for policy 0, policy_version 770 (0.0028) [2023-02-26 05:45:29,853][01113] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3174400. Throughput: 0: 945.8. Samples: 793184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:45:29,862][01113] Avg episode reward: [(0, '18.110')] [2023-02-26 05:45:34,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3190784. Throughput: 0: 947.6. Samples: 796500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:45:34,859][01113] Avg episode reward: [(0, '18.060')] [2023-02-26 05:45:35,026][12244] Updated weights for policy 0, policy_version 780 (0.0020) [2023-02-26 05:45:39,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3582.3). Total num frames: 3207168. Throughput: 0: 908.4. Samples: 801384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:45:39,856][01113] Avg episode reward: [(0, '18.950')] [2023-02-26 05:45:44,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3223552. Throughput: 0: 908.1. Samples: 805948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:45:44,856][01113] Avg episode reward: [(0, '19.544')] [2023-02-26 05:45:47,189][12244] Updated weights for policy 0, policy_version 790 (0.0037) [2023-02-26 05:45:49,853][01113] Fps is (10 sec: 4096.1, 60 sec: 3754.8, 300 sec: 3623.9). Total num frames: 3248128. Throughput: 0: 931.9. Samples: 809304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:45:49,856][01113] Avg episode reward: [(0, '22.305')] [2023-02-26 05:45:49,859][12230] Saving new best policy, reward=22.305! [2023-02-26 05:45:54,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3755.0, 300 sec: 3651.7). Total num frames: 3268608. Throughput: 0: 940.8. Samples: 815904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:45:54,859][01113] Avg episode reward: [(0, '23.042')] [2023-02-26 05:45:54,872][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000798_3268608.pth... [2023-02-26 05:45:55,059][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000584_2392064.pth [2023-02-26 05:45:55,070][12230] Saving new best policy, reward=23.042! [2023-02-26 05:45:57,873][12244] Updated weights for policy 0, policy_version 800 (0.0019) [2023-02-26 05:45:59,855][01113] Fps is (10 sec: 3276.2, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 3280896. Throughput: 0: 903.8. Samples: 820588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:45:59,861][01113] Avg episode reward: [(0, '23.827')] [2023-02-26 05:45:59,869][12230] Saving new best policy, reward=23.827! [2023-02-26 05:46:04,856][01113] Fps is (10 sec: 2866.2, 60 sec: 3686.2, 300 sec: 3623.9). Total num frames: 3297280. Throughput: 0: 902.6. Samples: 822764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:46:04,867][01113] Avg episode reward: [(0, '24.885')] [2023-02-26 05:46:04,882][12230] Saving new best policy, reward=24.885! [2023-02-26 05:46:09,855][01113] Fps is (10 sec: 2867.0, 60 sec: 3549.7, 300 sec: 3610.0). Total num frames: 3309568. Throughput: 0: 898.0. Samples: 827120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:46:09,866][01113] Avg episode reward: [(0, '23.728')] [2023-02-26 05:46:12,006][12244] Updated weights for policy 0, policy_version 810 (0.0014) [2023-02-26 05:46:14,853][01113] Fps is (10 sec: 2458.5, 60 sec: 3413.4, 300 sec: 3596.1). Total num frames: 3321856. Throughput: 0: 848.2. Samples: 831354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:46:14,864][01113] Avg episode reward: [(0, '23.817')] [2023-02-26 05:46:19,853][01113] Fps is (10 sec: 2867.9, 60 sec: 3413.3, 300 sec: 3582.3). Total num frames: 3338240. Throughput: 0: 820.0. Samples: 833400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:46:19,858][01113] Avg episode reward: [(0, '24.488')] [2023-02-26 05:46:24,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3554.5). Total num frames: 3350528. Throughput: 0: 804.8. Samples: 837602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:46:24,863][01113] Avg episode reward: [(0, '23.881')] [2023-02-26 05:46:25,974][12244] Updated weights for policy 0, policy_version 820 (0.0034) [2023-02-26 05:46:29,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3582.3). Total num frames: 3375104. Throughput: 0: 834.2. Samples: 843486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:46:29,855][01113] Avg episode reward: [(0, '22.298')] [2023-02-26 05:46:34,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3413.3, 300 sec: 3596.1). Total num frames: 3395584. Throughput: 0: 835.0. Samples: 846878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:46:34,856][01113] Avg episode reward: [(0, '20.550')] [2023-02-26 05:46:35,545][12244] Updated weights for policy 0, policy_version 830 (0.0016) [2023-02-26 05:46:39,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3568.4). Total num frames: 3411968. Throughput: 0: 810.1. Samples: 852360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:46:39,855][01113] Avg episode reward: [(0, '20.861')] [2023-02-26 05:46:44,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3554.5). Total num frames: 3424256. Throughput: 0: 804.7. Samples: 856796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:46:44,862][01113] Avg episode reward: [(0, '20.579')] [2023-02-26 05:46:47,972][12244] Updated weights for policy 0, policy_version 840 (0.0026) [2023-02-26 05:46:49,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3582.3). Total num frames: 3448832. Throughput: 0: 821.0. Samples: 859708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:46:49,861][01113] Avg episode reward: [(0, '21.067')] [2023-02-26 05:46:54,853][01113] Fps is (10 sec: 4505.5, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 3469312. Throughput: 0: 869.2. Samples: 866232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:46:54,856][01113] Avg episode reward: [(0, '21.348')] [2023-02-26 05:46:58,272][12244] Updated weights for policy 0, policy_version 850 (0.0018) [2023-02-26 05:46:59,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3568.4). Total num frames: 3485696. Throughput: 0: 890.8. Samples: 871440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:46:59,858][01113] Avg episode reward: [(0, '20.776')] [2023-02-26 05:47:04,853][01113] Fps is (10 sec: 2867.3, 60 sec: 3345.3, 300 sec: 3554.5). Total num frames: 3497984. Throughput: 0: 896.1. Samples: 873724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:47:04,861][01113] Avg episode reward: [(0, '22.043')] [2023-02-26 05:47:09,668][12244] Updated weights for policy 0, policy_version 860 (0.0019) [2023-02-26 05:47:09,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 3522560. Throughput: 0: 923.0. Samples: 879136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 05:47:09,861][01113] Avg episode reward: [(0, '21.771')] [2023-02-26 05:47:14,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3543040. Throughput: 0: 936.1. Samples: 885610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:47:14,863][01113] Avg episode reward: [(0, '20.986')] [2023-02-26 05:47:19,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3555328. Throughput: 0: 920.6. Samples: 888304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:47:19,860][01113] Avg episode reward: [(0, '20.855')] [2023-02-26 05:47:21,321][12244] Updated weights for policy 0, policy_version 870 (0.0044) [2023-02-26 05:47:24,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3571712. Throughput: 0: 896.7. Samples: 892710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:47:24,856][01113] Avg episode reward: [(0, '20.902')] [2023-02-26 05:47:29,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3592192. Throughput: 0: 924.2. Samples: 898386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:47:29,855][01113] Avg episode reward: [(0, '19.470')] [2023-02-26 05:47:32,089][12244] Updated weights for policy 0, policy_version 880 (0.0017) [2023-02-26 05:47:34,853][01113] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3616768. Throughput: 0: 931.5. Samples: 901624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:47:34,861][01113] Avg episode reward: [(0, '20.220')] [2023-02-26 05:47:39,858][01113] Fps is (10 sec: 3684.4, 60 sec: 3617.8, 300 sec: 3540.5). Total num frames: 3629056. Throughput: 0: 914.1. Samples: 907370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:47:39,869][01113] Avg episode reward: [(0, '19.955')] [2023-02-26 05:47:44,456][12244] Updated weights for policy 0, policy_version 890 (0.0013) [2023-02-26 05:47:44,856][01113] Fps is (10 sec: 2866.3, 60 sec: 3686.2, 300 sec: 3540.6). Total num frames: 3645440. Throughput: 0: 893.0. Samples: 911626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:47:44,859][01113] Avg episode reward: [(0, '19.944')] [2023-02-26 05:47:49,853][01113] Fps is (10 sec: 3688.3, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3665920. Throughput: 0: 896.8. Samples: 914082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:47:49,861][01113] Avg episode reward: [(0, '19.058')] [2023-02-26 05:47:54,466][12244] Updated weights for policy 0, policy_version 900 (0.0026) [2023-02-26 05:47:54,853][01113] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3686400. Throughput: 0: 923.1. Samples: 920674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:47:54,861][01113] Avg episode reward: [(0, '20.306')] [2023-02-26 05:47:54,875][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000900_3686400.pth... [2023-02-26 05:47:55,010][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000689_2822144.pth [2023-02-26 05:47:59,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3702784. Throughput: 0: 903.8. Samples: 926282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:47:59,857][01113] Avg episode reward: [(0, '20.037')] [2023-02-26 05:48:04,853][01113] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3540.7). Total num frames: 3719168. Throughput: 0: 892.5. Samples: 928468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:48:04,861][01113] Avg episode reward: [(0, '20.674')] [2023-02-26 05:48:06,971][12244] Updated weights for policy 0, policy_version 910 (0.0027) [2023-02-26 05:48:09,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3739648. Throughput: 0: 907.0. Samples: 933524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:48:09,859][01113] Avg episode reward: [(0, '21.626')] [2023-02-26 05:48:14,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3760128. Throughput: 0: 931.0. Samples: 940282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:48:14,863][01113] Avg episode reward: [(0, '21.789')] [2023-02-26 05:48:16,221][12244] Updated weights for policy 0, policy_version 920 (0.0014) [2023-02-26 05:48:19,859][01113] Fps is (10 sec: 3684.3, 60 sec: 3686.0, 300 sec: 3554.4). Total num frames: 3776512. Throughput: 0: 925.0. Samples: 943254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:19,863][01113] Avg episode reward: [(0, '22.161')] [2023-02-26 05:48:24,853][01113] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 3792896. Throughput: 0: 893.3. Samples: 947564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:24,857][01113] Avg episode reward: [(0, '22.838')] [2023-02-26 05:48:29,089][12244] Updated weights for policy 0, policy_version 930 (0.0024) [2023-02-26 05:48:29,853][01113] Fps is (10 sec: 3278.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3809280. Throughput: 0: 915.7. Samples: 952828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:29,856][01113] Avg episode reward: [(0, '24.128')] [2023-02-26 05:48:34,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3833856. Throughput: 0: 934.4. Samples: 956132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 05:48:34,859][01113] Avg episode reward: [(0, '24.396')] [2023-02-26 05:48:39,355][12244] Updated weights for policy 0, policy_version 940 (0.0016) [2023-02-26 05:48:39,856][01113] Fps is (10 sec: 4094.9, 60 sec: 3686.6, 300 sec: 3623.9). Total num frames: 3850240. Throughput: 0: 922.4. Samples: 962182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:48:39,858][01113] Avg episode reward: [(0, '24.409')] [2023-02-26 05:48:44,858][01113] Fps is (10 sec: 2865.8, 60 sec: 3618.0, 300 sec: 3582.2). Total num frames: 3862528. Throughput: 0: 892.7. Samples: 966456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:44,860][01113] Avg episode reward: [(0, '23.444')] [2023-02-26 05:48:49,856][01113] Fps is (10 sec: 2457.5, 60 sec: 3481.4, 300 sec: 3568.4). Total num frames: 3874816. Throughput: 0: 882.9. Samples: 968202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:49,859][01113] Avg episode reward: [(0, '23.242')] [2023-02-26 05:48:54,657][12244] Updated weights for policy 0, policy_version 950 (0.0050) [2023-02-26 05:48:54,853][01113] Fps is (10 sec: 2868.6, 60 sec: 3413.4, 300 sec: 3568.4). Total num frames: 3891200. Throughput: 0: 858.2. Samples: 972142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:54,860][01113] Avg episode reward: [(0, '22.266')] [2023-02-26 05:48:59,853][01113] Fps is (10 sec: 3687.6, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 3911680. Throughput: 0: 835.6. Samples: 977884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:48:59,862][01113] Avg episode reward: [(0, '20.218')] [2023-02-26 05:49:04,853][01113] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3923968. Throughput: 0: 818.4. Samples: 980076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 05:49:04,861][01113] Avg episode reward: [(0, '20.901')] [2023-02-26 05:49:06,880][12244] Updated weights for policy 0, policy_version 960 (0.0026) [2023-02-26 05:49:09,853][01113] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3540.6). Total num frames: 3940352. Throughput: 0: 817.7. Samples: 984360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:49:09,859][01113] Avg episode reward: [(0, '22.321')] [2023-02-26 05:49:14,853][01113] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3554.5). Total num frames: 3960832. Throughput: 0: 835.2. Samples: 990414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 05:49:14,856][01113] Avg episode reward: [(0, '22.934')] [2023-02-26 05:49:17,040][12244] Updated weights for policy 0, policy_version 970 (0.0023) [2023-02-26 05:49:19,853][01113] Fps is (10 sec: 4096.0, 60 sec: 3413.7, 300 sec: 3568.4). Total num frames: 3981312. Throughput: 0: 836.0. Samples: 993752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 05:49:19,860][01113] Avg episode reward: [(0, '22.710')] [2023-02-26 05:49:24,853][01113] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 3997696. Throughput: 0: 820.9. Samples: 999118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 05:49:24,855][01113] Avg episode reward: [(0, '22.295')] [2023-02-26 05:49:26,878][12230] Stopping Batcher_0... [2023-02-26 05:49:26,879][12230] Loop batcher_evt_loop terminating... [2023-02-26 05:49:26,878][01113] Component Batcher_0 stopped! [2023-02-26 05:49:26,880][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 05:49:26,948][12244] Weights refcount: 2 0 [2023-02-26 05:49:26,962][12244] Stopping InferenceWorker_p0-w0... [2023-02-26 05:49:26,966][12244] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 05:49:26,965][01113] Component InferenceWorker_p0-w0 stopped! [2023-02-26 05:49:26,990][01113] Component RolloutWorker_w5 stopped! [2023-02-26 05:49:26,999][01113] Component RolloutWorker_w3 stopped! [2023-02-26 05:49:26,993][12250] Stopping RolloutWorker_w5... [2023-02-26 05:49:27,001][12248] Stopping RolloutWorker_w3... [2023-02-26 05:49:27,021][01113] Component RolloutWorker_w1 stopped! [2023-02-26 05:49:27,023][12246] Stopping RolloutWorker_w1... [2023-02-26 05:49:27,019][12250] Loop rollout_proc5_evt_loop terminating... [2023-02-26 05:49:27,014][12248] Loop rollout_proc3_evt_loop terminating... [2023-02-26 05:49:27,041][01113] Component RolloutWorker_w6 stopped! [2023-02-26 05:49:27,024][12246] Loop rollout_proc1_evt_loop terminating... [2023-02-26 05:49:27,050][01113] Component RolloutWorker_w7 stopped! [2023-02-26 05:49:27,052][12252] Stopping RolloutWorker_w7... [2023-02-26 05:49:27,062][12252] Loop rollout_proc7_evt_loop terminating... [2023-02-26 05:49:27,040][12251] Stopping RolloutWorker_w6... [2023-02-26 05:49:27,069][12251] Loop rollout_proc6_evt_loop terminating... [2023-02-26 05:49:27,097][12245] Stopping RolloutWorker_w0... [2023-02-26 05:49:27,097][01113] Component RolloutWorker_w0 stopped! [2023-02-26 05:49:27,097][12230] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000798_3268608.pth [2023-02-26 05:49:27,120][12230] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 05:49:27,135][12249] Stopping RolloutWorker_w4... [2023-02-26 05:49:27,135][01113] Component RolloutWorker_w4 stopped! [2023-02-26 05:49:27,139][12245] Loop rollout_proc0_evt_loop terminating... [2023-02-26 05:49:27,163][12249] Loop rollout_proc4_evt_loop terminating... [2023-02-26 05:49:27,199][12247] Stopping RolloutWorker_w2... [2023-02-26 05:49:27,202][12247] Loop rollout_proc2_evt_loop terminating... [2023-02-26 05:49:27,229][01113] Component RolloutWorker_w2 stopped! [2023-02-26 05:49:27,479][12230] Stopping LearnerWorker_p0... [2023-02-26 05:49:27,478][01113] Component LearnerWorker_p0 stopped! [2023-02-26 05:49:27,482][01113] Waiting for process learner_proc0 to stop... [2023-02-26 05:49:27,480][12230] Loop learner_proc0_evt_loop terminating... [2023-02-26 05:49:29,706][01113] Waiting for process inference_proc0-0 to join... [2023-02-26 05:49:30,216][01113] Waiting for process rollout_proc0 to join... [2023-02-26 05:49:30,759][01113] Waiting for process rollout_proc1 to join... [2023-02-26 05:49:30,760][01113] Waiting for process rollout_proc2 to join... [2023-02-26 05:49:30,774][01113] Waiting for process rollout_proc3 to join... [2023-02-26 05:49:30,775][01113] Waiting for process rollout_proc4 to join... [2023-02-26 05:49:30,780][01113] Waiting for process rollout_proc5 to join... [2023-02-26 05:49:30,780][01113] Waiting for process rollout_proc6 to join... [2023-02-26 05:49:30,788][01113] Waiting for process rollout_proc7 to join... [2023-02-26 05:49:30,789][01113] Batcher 0 profile tree view: batching: 25.9294, releasing_batches: 0.0236 [2023-02-26 05:49:30,790][01113] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 553.9719 update_model: 8.3115 weight_update: 0.0017 one_step: 0.0044 handle_policy_step: 538.1212 deserialize: 15.2549, stack: 3.0637, obs_to_device_normalize: 116.8971, forward: 262.3157, send_messages: 26.8913 prepare_outputs: 86.9594 to_cpu: 54.2525 [2023-02-26 05:49:30,792][01113] Learner 0 profile tree view: misc: 0.0122, prepare_batch: 16.0550 train: 76.6248 epoch_init: 0.0157, minibatch_init: 0.0152, losses_postprocess: 0.5466, kl_divergence: 0.6049, after_optimizer: 33.1414 calculate_losses: 27.5212 losses_init: 0.0036, forward_head: 1.7897, bptt_initial: 18.0557, tail: 1.1856, advantages_returns: 0.3009, losses: 3.5443 bptt: 2.3110 bptt_forward_core: 2.2285 update: 14.2416 clip: 1.4024 [2023-02-26 05:49:30,793][01113] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3442, enqueue_policy_requests: 151.8710, env_step: 852.2488, overhead: 22.6761, complete_rollouts: 7.8890 save_policy_outputs: 21.1488 split_output_tensors: 10.2396 [2023-02-26 05:49:30,798][01113] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3154, enqueue_policy_requests: 154.7612, env_step: 854.3469, overhead: 21.5790, complete_rollouts: 6.8986 save_policy_outputs: 20.5265 split_output_tensors: 10.3071 [2023-02-26 05:49:30,801][01113] Loop Runner_EvtLoop terminating... [2023-02-26 05:49:30,802][01113] Runner profile tree view: main_loop: 1169.4817 [2023-02-26 05:49:30,804][01113] Collected {0: 4005888}, FPS: 3425.4 [2023-02-26 05:49:30,888][01113] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 05:49:30,890][01113] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 05:49:30,892][01113] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 05:49:30,894][01113] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 05:49:30,896][01113] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 05:49:30,897][01113] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 05:49:30,902][01113] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 05:49:30,907][01113] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 05:49:30,909][01113] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 05:49:30,910][01113] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 05:49:30,911][01113] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 05:49:30,912][01113] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 05:49:30,914][01113] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 05:49:30,915][01113] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 05:49:30,917][01113] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 05:49:30,944][01113] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:49:30,947][01113] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:49:30,949][01113] RunningMeanStd input shape: (1,) [2023-02-26 05:49:30,970][01113] ConvEncoder: input_channels=3 [2023-02-26 05:49:31,519][01113] Conv encoder output size: 512 [2023-02-26 05:49:31,521][01113] Policy head output size: 512 [2023-02-26 05:49:33,813][01113] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 05:49:35,066][01113] Num frames 100... [2023-02-26 05:49:35,177][01113] Num frames 200... [2023-02-26 05:49:35,294][01113] Num frames 300... [2023-02-26 05:49:35,407][01113] Num frames 400... [2023-02-26 05:49:35,520][01113] Num frames 500... [2023-02-26 05:49:35,639][01113] Num frames 600... [2023-02-26 05:49:35,752][01113] Num frames 700... [2023-02-26 05:49:35,871][01113] Avg episode rewards: #0: 12.490, true rewards: #0: 7.490 [2023-02-26 05:49:35,873][01113] Avg episode reward: 12.490, avg true_objective: 7.490 [2023-02-26 05:49:35,950][01113] Num frames 800... [2023-02-26 05:49:36,059][01113] Num frames 900... [2023-02-26 05:49:36,172][01113] Num frames 1000... [2023-02-26 05:49:36,287][01113] Num frames 1100... [2023-02-26 05:49:36,397][01113] Num frames 1200... [2023-02-26 05:49:36,511][01113] Num frames 1300... [2023-02-26 05:49:36,629][01113] Num frames 1400... [2023-02-26 05:49:36,743][01113] Num frames 1500... [2023-02-26 05:49:36,860][01113] Num frames 1600... [2023-02-26 05:49:36,972][01113] Num frames 1700... [2023-02-26 05:49:37,094][01113] Num frames 1800... [2023-02-26 05:49:37,204][01113] Num frames 1900... [2023-02-26 05:49:37,263][01113] Avg episode rewards: #0: 18.005, true rewards: #0: 9.505 [2023-02-26 05:49:37,264][01113] Avg episode reward: 18.005, avg true_objective: 9.505 [2023-02-26 05:49:37,383][01113] Num frames 2000... [2023-02-26 05:49:37,509][01113] Num frames 2100... [2023-02-26 05:49:37,618][01113] Num frames 2200... [2023-02-26 05:49:37,728][01113] Num frames 2300... [2023-02-26 05:49:37,844][01113] Num frames 2400... [2023-02-26 05:49:37,952][01113] Num frames 2500... [2023-02-26 05:49:38,060][01113] Num frames 2600... [2023-02-26 05:49:38,170][01113] Num frames 2700... [2023-02-26 05:49:38,279][01113] Avg episode rewards: #0: 17.497, true rewards: #0: 9.163 [2023-02-26 05:49:38,281][01113] Avg episode reward: 17.497, avg true_objective: 9.163 [2023-02-26 05:49:38,348][01113] Num frames 2800... [2023-02-26 05:49:38,462][01113] Num frames 2900... [2023-02-26 05:49:38,571][01113] Num frames 3000... [2023-02-26 05:49:38,683][01113] Num frames 3100... [2023-02-26 05:49:38,794][01113] Num frames 3200... [2023-02-26 05:49:38,914][01113] Num frames 3300... [2023-02-26 05:49:39,023][01113] Num frames 3400... [2023-02-26 05:49:39,133][01113] Num frames 3500... [2023-02-26 05:49:39,244][01113] Avg episode rewards: #0: 16.873, true rewards: #0: 8.872 [2023-02-26 05:49:39,246][01113] Avg episode reward: 16.873, avg true_objective: 8.872 [2023-02-26 05:49:39,308][01113] Num frames 3600... [2023-02-26 05:49:39,426][01113] Num frames 3700... [2023-02-26 05:49:39,550][01113] Num frames 3800... [2023-02-26 05:49:39,669][01113] Num frames 3900... [2023-02-26 05:49:39,788][01113] Num frames 4000... [2023-02-26 05:49:39,908][01113] Num frames 4100... [2023-02-26 05:49:40,045][01113] Num frames 4200... [2023-02-26 05:49:40,200][01113] Num frames 4300... [2023-02-26 05:49:40,355][01113] Num frames 4400... [2023-02-26 05:49:40,513][01113] Num frames 4500... [2023-02-26 05:49:40,681][01113] Avg episode rewards: #0: 17.546, true rewards: #0: 9.146 [2023-02-26 05:49:40,684][01113] Avg episode reward: 17.546, avg true_objective: 9.146 [2023-02-26 05:49:40,732][01113] Num frames 4600... [2023-02-26 05:49:40,899][01113] Num frames 4700... [2023-02-26 05:49:41,056][01113] Num frames 4800... [2023-02-26 05:49:41,227][01113] Num frames 4900... [2023-02-26 05:49:41,382][01113] Num frames 5000... [2023-02-26 05:49:41,540][01113] Num frames 5100... [2023-02-26 05:49:41,692][01113] Num frames 5200... [2023-02-26 05:49:41,861][01113] Num frames 5300... [2023-02-26 05:49:42,026][01113] Num frames 5400... [2023-02-26 05:49:42,184][01113] Num frames 5500... [2023-02-26 05:49:42,342][01113] Num frames 5600... [2023-02-26 05:49:42,501][01113] Num frames 5700... [2023-02-26 05:49:42,649][01113] Avg episode rewards: #0: 18.762, true rewards: #0: 9.595 [2023-02-26 05:49:42,651][01113] Avg episode reward: 18.762, avg true_objective: 9.595 [2023-02-26 05:49:42,720][01113] Num frames 5800... [2023-02-26 05:49:42,876][01113] Num frames 5900... [2023-02-26 05:49:43,031][01113] Num frames 6000... [2023-02-26 05:49:43,186][01113] Num frames 6100... [2023-02-26 05:49:43,340][01113] Num frames 6200... [2023-02-26 05:49:43,492][01113] Num frames 6300... [2023-02-26 05:49:43,602][01113] Num frames 6400... [2023-02-26 05:49:43,716][01113] Num frames 6500... [2023-02-26 05:49:43,845][01113] Num frames 6600... [2023-02-26 05:49:43,963][01113] Num frames 6700... [2023-02-26 05:49:44,045][01113] Avg episode rewards: #0: 18.596, true rewards: #0: 9.596 [2023-02-26 05:49:44,047][01113] Avg episode reward: 18.596, avg true_objective: 9.596 [2023-02-26 05:49:44,147][01113] Num frames 6800... [2023-02-26 05:49:44,267][01113] Num frames 6900... [2023-02-26 05:49:44,385][01113] Num frames 7000... [2023-02-26 05:49:44,495][01113] Num frames 7100... [2023-02-26 05:49:44,607][01113] Num frames 7200... [2023-02-26 05:49:44,721][01113] Num frames 7300... [2023-02-26 05:49:44,839][01113] Num frames 7400... [2023-02-26 05:49:44,950][01113] Num frames 7500... [2023-02-26 05:49:45,065][01113] Num frames 7600... [2023-02-26 05:49:45,170][01113] Avg episode rewards: #0: 18.931, true rewards: #0: 9.556 [2023-02-26 05:49:45,172][01113] Avg episode reward: 18.931, avg true_objective: 9.556 [2023-02-26 05:49:45,236][01113] Num frames 7700... [2023-02-26 05:49:45,351][01113] Num frames 7800... [2023-02-26 05:49:45,459][01113] Num frames 7900... [2023-02-26 05:49:45,574][01113] Num frames 8000... [2023-02-26 05:49:45,680][01113] Num frames 8100... [2023-02-26 05:49:45,791][01113] Num frames 8200... [2023-02-26 05:49:45,898][01113] Num frames 8300... [2023-02-26 05:49:46,013][01113] Num frames 8400... [2023-02-26 05:49:46,092][01113] Avg episode rewards: #0: 18.685, true rewards: #0: 9.351 [2023-02-26 05:49:46,094][01113] Avg episode reward: 18.685, avg true_objective: 9.351 [2023-02-26 05:49:46,191][01113] Num frames 8500... [2023-02-26 05:49:46,307][01113] Num frames 8600... [2023-02-26 05:49:46,416][01113] Num frames 8700... [2023-02-26 05:49:46,526][01113] Num frames 8800... [2023-02-26 05:49:46,651][01113] Avg episode rewards: #0: 17.464, true rewards: #0: 8.864 [2023-02-26 05:49:46,654][01113] Avg episode reward: 17.464, avg true_objective: 8.864 [2023-02-26 05:50:42,532][01113] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 05:50:42,850][01113] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 05:50:42,852][01113] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 05:50:42,855][01113] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 05:50:42,857][01113] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 05:50:42,859][01113] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 05:50:42,861][01113] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 05:50:42,862][01113] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 05:50:42,863][01113] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 05:50:42,865][01113] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 05:50:42,866][01113] Adding new argument 'hf_repository'='jxiao/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 05:50:42,867][01113] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 05:50:42,868][01113] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 05:50:42,869][01113] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 05:50:42,870][01113] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 05:50:42,871][01113] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 05:50:42,892][01113] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:50:42,894][01113] RunningMeanStd input shape: (1,) [2023-02-26 05:50:42,911][01113] ConvEncoder: input_channels=3 [2023-02-26 05:50:42,968][01113] Conv encoder output size: 512 [2023-02-26 05:50:42,970][01113] Policy head output size: 512 [2023-02-26 05:50:42,998][01113] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 05:50:43,709][01113] Num frames 100... [2023-02-26 05:50:43,865][01113] Num frames 200... [2023-02-26 05:50:44,026][01113] Num frames 300... [2023-02-26 05:50:44,194][01113] Num frames 400... [2023-02-26 05:50:44,358][01113] Num frames 500... [2023-02-26 05:50:44,517][01113] Num frames 600... [2023-02-26 05:50:44,683][01113] Num frames 700... [2023-02-26 05:50:44,847][01113] Num frames 800... [2023-02-26 05:50:45,015][01113] Num frames 900... [2023-02-26 05:50:45,214][01113] Avg episode rewards: #0: 18.920, true rewards: #0: 9.920 [2023-02-26 05:50:45,217][01113] Avg episode reward: 18.920, avg true_objective: 9.920 [2023-02-26 05:50:45,238][01113] Num frames 1000... [2023-02-26 05:50:45,409][01113] Num frames 1100... [2023-02-26 05:50:45,582][01113] Num frames 1200... [2023-02-26 05:50:45,732][01113] Num frames 1300... [2023-02-26 05:50:45,890][01113] Num frames 1400... [2023-02-26 05:50:46,075][01113] Num frames 1500... [2023-02-26 05:50:46,259][01113] Num frames 1600... [2023-02-26 05:50:46,444][01113] Num frames 1700... [2023-02-26 05:50:46,603][01113] Num frames 1800... [2023-02-26 05:50:46,760][01113] Num frames 1900... [2023-02-26 05:50:46,913][01113] Num frames 2000... [2023-02-26 05:50:47,063][01113] Num frames 2100... [2023-02-26 05:50:47,216][01113] Num frames 2200... [2023-02-26 05:50:47,367][01113] Num frames 2300... [2023-02-26 05:50:47,522][01113] Num frames 2400... [2023-02-26 05:50:47,656][01113] Num frames 2500... [2023-02-26 05:50:47,771][01113] Num frames 2600... [2023-02-26 05:50:47,891][01113] Num frames 2700... [2023-02-26 05:50:48,002][01113] Num frames 2800... [2023-02-26 05:50:48,111][01113] Num frames 2900... [2023-02-26 05:50:48,223][01113] Num frames 3000... [2023-02-26 05:50:48,383][01113] Avg episode rewards: #0: 35.460, true rewards: #0: 15.460 [2023-02-26 05:50:48,384][01113] Avg episode reward: 35.460, avg true_objective: 15.460 [2023-02-26 05:50:48,399][01113] Num frames 3100... [2023-02-26 05:50:48,510][01113] Num frames 3200... [2023-02-26 05:50:48,620][01113] Num frames 3300... [2023-02-26 05:50:48,730][01113] Num frames 3400... [2023-02-26 05:50:48,850][01113] Num frames 3500... [2023-02-26 05:50:48,961][01113] Num frames 3600... [2023-02-26 05:50:49,075][01113] Num frames 3700... [2023-02-26 05:50:49,129][01113] Avg episode rewards: #0: 26.666, true rewards: #0: 12.333 [2023-02-26 05:50:49,130][01113] Avg episode reward: 26.666, avg true_objective: 12.333 [2023-02-26 05:50:49,244][01113] Num frames 3800... [2023-02-26 05:50:49,356][01113] Num frames 3900... [2023-02-26 05:50:49,468][01113] Num frames 4000... [2023-02-26 05:50:49,615][01113] Avg episode rewards: #0: 20.960, true rewards: #0: 10.210 [2023-02-26 05:50:49,616][01113] Avg episode reward: 20.960, avg true_objective: 10.210 [2023-02-26 05:50:49,640][01113] Num frames 4100... [2023-02-26 05:50:49,754][01113] Num frames 4200... [2023-02-26 05:50:49,871][01113] Num frames 4300... [2023-02-26 05:50:49,980][01113] Num frames 4400... [2023-02-26 05:50:50,091][01113] Num frames 4500... [2023-02-26 05:50:50,202][01113] Num frames 4600... [2023-02-26 05:50:50,311][01113] Num frames 4700... [2023-02-26 05:50:50,423][01113] Num frames 4800... [2023-02-26 05:50:50,533][01113] Num frames 4900... [2023-02-26 05:50:50,697][01113] Avg episode rewards: #0: 20.598, true rewards: #0: 9.998 [2023-02-26 05:50:50,698][01113] Avg episode reward: 20.598, avg true_objective: 9.998 [2023-02-26 05:50:50,704][01113] Num frames 5000... [2023-02-26 05:50:50,816][01113] Num frames 5100... [2023-02-26 05:50:50,934][01113] Num frames 5200... [2023-02-26 05:50:51,044][01113] Num frames 5300... [2023-02-26 05:50:51,164][01113] Num frames 5400... [2023-02-26 05:50:51,276][01113] Num frames 5500... [2023-02-26 05:50:51,400][01113] Num frames 5600... [2023-02-26 05:50:51,556][01113] Num frames 5700... [2023-02-26 05:50:51,713][01113] Num frames 5800... [2023-02-26 05:50:51,870][01113] Num frames 5900... [2023-02-26 05:50:52,028][01113] Num frames 6000... [2023-02-26 05:50:52,182][01113] Num frames 6100... [2023-02-26 05:50:52,340][01113] Num frames 6200... [2023-02-26 05:50:52,506][01113] Num frames 6300... [2023-02-26 05:50:52,662][01113] Num frames 6400... [2023-02-26 05:50:52,823][01113] Num frames 6500... [2023-02-26 05:50:53,027][01113] Avg episode rewards: #0: 23.315, true rewards: #0: 10.982 [2023-02-26 05:50:53,029][01113] Avg episode reward: 23.315, avg true_objective: 10.982 [2023-02-26 05:50:53,058][01113] Num frames 6600... [2023-02-26 05:50:53,216][01113] Num frames 6700... [2023-02-26 05:50:53,377][01113] Num frames 6800... [2023-02-26 05:50:53,535][01113] Num frames 6900... [2023-02-26 05:50:53,699][01113] Num frames 7000... [2023-02-26 05:50:53,860][01113] Num frames 7100... [2023-02-26 05:50:54,024][01113] Num frames 7200... [2023-02-26 05:50:54,186][01113] Num frames 7300... [2023-02-26 05:50:54,351][01113] Num frames 7400... [2023-02-26 05:50:54,515][01113] Num frames 7500... [2023-02-26 05:50:54,682][01113] Num frames 7600... [2023-02-26 05:50:54,816][01113] Avg episode rewards: #0: 24.211, true rewards: #0: 10.926 [2023-02-26 05:50:54,818][01113] Avg episode reward: 24.211, avg true_objective: 10.926 [2023-02-26 05:50:54,881][01113] Num frames 7700... [2023-02-26 05:50:54,993][01113] Num frames 7800... [2023-02-26 05:50:55,110][01113] Num frames 7900... [2023-02-26 05:50:55,226][01113] Num frames 8000... [2023-02-26 05:50:55,337][01113] Num frames 8100... [2023-02-26 05:50:55,448][01113] Num frames 8200... [2023-02-26 05:50:55,560][01113] Num frames 8300... [2023-02-26 05:50:55,673][01113] Num frames 8400... [2023-02-26 05:50:55,787][01113] Num frames 8500... [2023-02-26 05:50:55,897][01113] Num frames 8600... [2023-02-26 05:50:56,008][01113] Num frames 8700... [2023-02-26 05:50:56,126][01113] Num frames 8800... [2023-02-26 05:50:56,220][01113] Avg episode rewards: #0: 24.290, true rewards: #0: 11.040 [2023-02-26 05:50:56,222][01113] Avg episode reward: 24.290, avg true_objective: 11.040 [2023-02-26 05:50:56,307][01113] Num frames 8900... [2023-02-26 05:50:56,431][01113] Num frames 9000... [2023-02-26 05:50:56,550][01113] Num frames 9100... [2023-02-26 05:50:56,670][01113] Num frames 9200... [2023-02-26 05:50:56,779][01113] Avg episode rewards: #0: 22.276, true rewards: #0: 10.276 [2023-02-26 05:50:56,781][01113] Avg episode reward: 22.276, avg true_objective: 10.276 [2023-02-26 05:50:56,840][01113] Num frames 9300... [2023-02-26 05:50:56,950][01113] Num frames 9400... [2023-02-26 05:50:57,067][01113] Num frames 9500... [2023-02-26 05:50:57,182][01113] Num frames 9600... [2023-02-26 05:50:57,236][01113] Avg episode rewards: #0: 20.500, true rewards: #0: 9.600 [2023-02-26 05:50:57,238][01113] Avg episode reward: 20.500, avg true_objective: 9.600 [2023-02-26 05:51:23,881][01113] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 05:53:27,218][22180] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 05:53:27,220][22180] Rollout worker 0 uses device cpu [2023-02-26 05:53:27,223][22180] Rollout worker 1 uses device cpu [2023-02-26 05:53:27,226][22180] Rollout worker 2 uses device cpu [2023-02-26 05:53:27,227][22180] Rollout worker 3 uses device cpu [2023-02-26 05:53:27,230][22180] Rollout worker 4 uses device cpu [2023-02-26 05:53:27,231][22180] Rollout worker 5 uses device cpu [2023-02-26 05:53:27,233][22180] Rollout worker 6 uses device cpu [2023-02-26 05:53:27,234][22180] Rollout worker 7 uses device cpu [2023-02-26 05:53:27,410][22180] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:53:27,415][22180] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 05:53:27,457][22180] Starting all processes... [2023-02-26 05:53:27,459][22180] Starting process learner_proc0 [2023-02-26 05:53:27,533][22180] Starting all processes... [2023-02-26 05:53:27,545][22180] Starting process inference_proc0-0 [2023-02-26 05:53:27,545][22180] Starting process rollout_proc0 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc1 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc2 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc3 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc4 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc5 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc6 [2023-02-26 05:53:27,547][22180] Starting process rollout_proc7 [2023-02-26 05:53:37,877][22708] Worker 1 uses CPU cores [1] [2023-02-26 05:53:38,059][22714] Worker 7 uses CPU cores [1] [2023-02-26 05:53:38,294][22710] Worker 4 uses CPU cores [0] [2023-02-26 05:53:38,381][22692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:53:38,387][22692] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 05:53:38,534][22705] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:53:38,535][22705] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 05:53:38,590][22712] Worker 5 uses CPU cores [1] [2023-02-26 05:53:38,647][22711] Worker 3 uses CPU cores [1] [2023-02-26 05:53:38,647][22709] Worker 2 uses CPU cores [0] [2023-02-26 05:53:38,655][22707] Worker 0 uses CPU cores [0] [2023-02-26 05:53:38,691][22713] Worker 6 uses CPU cores [0] [2023-02-26 05:53:39,184][22705] Num visible devices: 1 [2023-02-26 05:53:39,187][22692] Num visible devices: 1 [2023-02-26 05:53:39,206][22692] Starting seed is not provided [2023-02-26 05:53:39,207][22692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:53:39,208][22692] Initializing actor-critic model on device cuda:0 [2023-02-26 05:53:39,210][22692] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:53:39,212][22692] RunningMeanStd input shape: (1,) [2023-02-26 05:53:39,231][22692] ConvEncoder: input_channels=3 [2023-02-26 05:53:39,378][22692] Conv encoder output size: 512 [2023-02-26 05:53:39,378][22692] Policy head output size: 512 [2023-02-26 05:53:39,393][22692] Created Actor Critic model with architecture: [2023-02-26 05:53:39,393][22692] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 05:53:42,216][22692] Using optimizer [2023-02-26 05:53:42,218][22692] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 05:53:42,257][22692] Loading model from checkpoint [2023-02-26 05:53:42,264][22692] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-02-26 05:53:42,266][22692] Initialized policy 0 weights for model version 978 [2023-02-26 05:53:42,269][22692] LearnerWorker_p0 finished initialization! [2023-02-26 05:53:42,274][22692] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 05:53:42,404][22705] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:53:42,405][22705] RunningMeanStd input shape: (1,) [2023-02-26 05:53:42,436][22705] ConvEncoder: input_channels=3 [2023-02-26 05:53:42,611][22705] Conv encoder output size: 512 [2023-02-26 05:53:42,612][22705] Policy head output size: 512 [2023-02-26 05:53:43,745][22180] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:53:45,387][22180] Inference worker 0-0 is ready! [2023-02-26 05:53:45,388][22180] All inference workers are ready! Signal rollout workers to start! [2023-02-26 05:53:45,482][22711] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,486][22713] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,487][22710] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,484][22707] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,487][22708] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,488][22714] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,491][22709] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:45,484][22712] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:53:46,269][22712] Decorrelating experience for 0 frames... [2023-02-26 05:53:46,277][22707] Decorrelating experience for 0 frames... [2023-02-26 05:53:46,282][22709] Decorrelating experience for 0 frames... [2023-02-26 05:53:46,281][22711] Decorrelating experience for 0 frames... [2023-02-26 05:53:47,280][22707] Decorrelating experience for 32 frames... [2023-02-26 05:53:47,285][22712] Decorrelating experience for 32 frames... [2023-02-26 05:53:47,288][22709] Decorrelating experience for 32 frames... [2023-02-26 05:53:47,294][22711] Decorrelating experience for 32 frames... [2023-02-26 05:53:47,349][22713] Decorrelating experience for 0 frames... [2023-02-26 05:53:47,349][22708] Decorrelating experience for 0 frames... [2023-02-26 05:53:47,400][22180] Heartbeat connected on Batcher_0 [2023-02-26 05:53:47,404][22180] Heartbeat connected on LearnerWorker_p0 [2023-02-26 05:53:47,462][22180] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 05:53:48,097][22710] Decorrelating experience for 0 frames... [2023-02-26 05:53:48,232][22709] Decorrelating experience for 64 frames... [2023-02-26 05:53:48,472][22708] Decorrelating experience for 32 frames... [2023-02-26 05:53:48,493][22714] Decorrelating experience for 0 frames... [2023-02-26 05:53:48,588][22712] Decorrelating experience for 64 frames... [2023-02-26 05:53:48,745][22180] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:53:49,235][22710] Decorrelating experience for 32 frames... [2023-02-26 05:53:49,299][22713] Decorrelating experience for 32 frames... [2023-02-26 05:53:49,551][22714] Decorrelating experience for 32 frames... [2023-02-26 05:53:49,733][22708] Decorrelating experience for 64 frames... [2023-02-26 05:53:49,777][22712] Decorrelating experience for 96 frames... [2023-02-26 05:53:49,897][22707] Decorrelating experience for 64 frames... [2023-02-26 05:53:49,990][22180] Heartbeat connected on RolloutWorker_w5 [2023-02-26 05:53:50,060][22709] Decorrelating experience for 96 frames... [2023-02-26 05:53:50,320][22180] Heartbeat connected on RolloutWorker_w2 [2023-02-26 05:53:51,001][22713] Decorrelating experience for 64 frames... [2023-02-26 05:53:51,299][22710] Decorrelating experience for 64 frames... [2023-02-26 05:53:51,459][22707] Decorrelating experience for 96 frames... [2023-02-26 05:53:51,536][22714] Decorrelating experience for 64 frames... [2023-02-26 05:53:51,668][22708] Decorrelating experience for 96 frames... [2023-02-26 05:53:51,700][22180] Heartbeat connected on RolloutWorker_w0 [2023-02-26 05:53:51,798][22711] Decorrelating experience for 64 frames... [2023-02-26 05:53:51,959][22180] Heartbeat connected on RolloutWorker_w1 [2023-02-26 05:53:53,499][22710] Decorrelating experience for 96 frames... [2023-02-26 05:53:53,745][22180] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 1.6. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 05:53:53,753][22180] Avg episode reward: [(0, '2.210')] [2023-02-26 05:53:53,873][22714] Decorrelating experience for 96 frames... [2023-02-26 05:53:54,269][22180] Heartbeat connected on RolloutWorker_w4 [2023-02-26 05:53:54,359][22180] Heartbeat connected on RolloutWorker_w7 [2023-02-26 05:53:54,446][22711] Decorrelating experience for 96 frames... [2023-02-26 05:53:55,097][22713] Decorrelating experience for 96 frames... [2023-02-26 05:53:55,247][22180] Heartbeat connected on RolloutWorker_w3 [2023-02-26 05:53:56,258][22180] Heartbeat connected on RolloutWorker_w6 [2023-02-26 05:53:56,835][22692] Signal inference workers to stop experience collection... [2023-02-26 05:53:56,861][22705] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 05:53:57,668][22692] Signal inference workers to resume experience collection... [2023-02-26 05:53:57,670][22705] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 05:53:57,677][22692] Stopping Batcher_0... [2023-02-26 05:53:57,678][22692] Loop batcher_evt_loop terminating... [2023-02-26 05:53:57,677][22180] Component Batcher_0 stopped! [2023-02-26 05:53:57,723][22180] Component RolloutWorker_w4 stopped! [2023-02-26 05:53:57,724][22710] Stopping RolloutWorker_w4... [2023-02-26 05:53:57,737][22713] Stopping RolloutWorker_w6... [2023-02-26 05:53:57,741][22707] Stopping RolloutWorker_w0... [2023-02-26 05:53:57,737][22180] Component RolloutWorker_w6 stopped! [2023-02-26 05:53:57,744][22180] Component RolloutWorker_w0 stopped! [2023-02-26 05:53:57,749][22709] Stopping RolloutWorker_w2... [2023-02-26 05:53:57,751][22180] Component RolloutWorker_w2 stopped! [2023-02-26 05:53:57,735][22710] Loop rollout_proc4_evt_loop terminating... [2023-02-26 05:53:57,742][22713] Loop rollout_proc6_evt_loop terminating... [2023-02-26 05:53:57,743][22707] Loop rollout_proc0_evt_loop terminating... [2023-02-26 05:53:57,763][22180] Component RolloutWorker_w3 stopped! [2023-02-26 05:53:57,766][22711] Stopping RolloutWorker_w3... [2023-02-26 05:53:57,766][22711] Loop rollout_proc3_evt_loop terminating... [2023-02-26 05:53:57,751][22709] Loop rollout_proc2_evt_loop terminating... [2023-02-26 05:53:57,819][22180] Component RolloutWorker_w1 stopped! [2023-02-26 05:53:57,827][22712] Stopping RolloutWorker_w5... [2023-02-26 05:53:57,828][22712] Loop rollout_proc5_evt_loop terminating... [2023-02-26 05:53:57,824][22705] Weights refcount: 2 0 [2023-02-26 05:53:57,827][22180] Component RolloutWorker_w5 stopped! [2023-02-26 05:53:57,836][22708] Stopping RolloutWorker_w1... [2023-02-26 05:53:57,837][22708] Loop rollout_proc1_evt_loop terminating... [2023-02-26 05:53:57,838][22705] Stopping InferenceWorker_p0-w0... [2023-02-26 05:53:57,839][22705] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 05:53:57,842][22180] Component InferenceWorker_p0-w0 stopped! [2023-02-26 05:53:57,849][22714] Stopping RolloutWorker_w7... [2023-02-26 05:53:57,850][22714] Loop rollout_proc7_evt_loop terminating... [2023-02-26 05:53:57,849][22180] Component RolloutWorker_w7 stopped! [2023-02-26 05:54:00,257][22692] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-26 05:54:00,351][22692] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000900_3686400.pth [2023-02-26 05:54:00,365][22692] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-26 05:54:00,569][22692] Stopping LearnerWorker_p0... [2023-02-26 05:54:00,570][22692] Loop learner_proc0_evt_loop terminating... [2023-02-26 05:54:00,569][22180] Component LearnerWorker_p0 stopped! [2023-02-26 05:54:00,574][22180] Waiting for process learner_proc0 to stop... [2023-02-26 05:54:02,363][22180] Waiting for process inference_proc0-0 to join... [2023-02-26 05:54:02,365][22180] Waiting for process rollout_proc0 to join... [2023-02-26 05:54:02,534][22180] Waiting for process rollout_proc1 to join... [2023-02-26 05:54:02,537][22180] Waiting for process rollout_proc2 to join... [2023-02-26 05:54:02,538][22180] Waiting for process rollout_proc3 to join... [2023-02-26 05:54:02,540][22180] Waiting for process rollout_proc4 to join... [2023-02-26 05:54:02,541][22180] Waiting for process rollout_proc5 to join... [2023-02-26 05:54:02,545][22180] Waiting for process rollout_proc6 to join... [2023-02-26 05:54:02,551][22180] Waiting for process rollout_proc7 to join... [2023-02-26 05:54:02,552][22180] Batcher 0 profile tree view: batching: 0.0406, releasing_batches: 0.0021 [2023-02-26 05:54:02,556][22180] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 7.5226 update_model: 0.0194 weight_update: 0.0014 one_step: 0.1057 handle_policy_step: 3.8753 deserialize: 0.0635, stack: 0.0120, obs_to_device_normalize: 0.3484, forward: 2.9781, send_messages: 0.0904 prepare_outputs: 0.2916 to_cpu: 0.1864 [2023-02-26 05:54:02,559][22180] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 6.1408 train: 0.6247 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0003, kl_divergence: 0.0005, after_optimizer: 0.0038 calculate_losses: 0.1501 losses_init: 0.0000, forward_head: 0.1140, bptt_initial: 0.0205, tail: 0.0066, advantages_returns: 0.0010, losses: 0.0026 bptt: 0.0026 bptt_forward_core: 0.0025 update: 0.4691 clip: 0.0025 [2023-02-26 05:54:02,560][22180] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.9936, env_step: 3.5238, overhead: 0.1329, complete_rollouts: 0.0135 save_policy_outputs: 0.0759 split_output_tensors: 0.0506 [2023-02-26 05:54:02,566][22180] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.5493, env_step: 1.7411, overhead: 0.0854, complete_rollouts: 0.0005 save_policy_outputs: 0.0391 split_output_tensors: 0.0243 [2023-02-26 05:54:02,569][22180] Loop Runner_EvtLoop terminating... [2023-02-26 05:54:02,571][22180] Runner profile tree view: main_loop: 35.1144 [2023-02-26 05:54:02,574][22180] Collected {0: 4014080}, FPS: 233.3 [2023-02-26 05:54:02,605][22180] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 05:54:02,607][22180] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 05:54:02,608][22180] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 05:54:02,610][22180] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 05:54:02,615][22180] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 05:54:02,618][22180] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 05:54:02,622][22180] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 05:54:02,623][22180] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 05:54:02,625][22180] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 05:54:02,627][22180] Adding new argument 'hf_repository'='jxiao/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 05:54:02,631][22180] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 05:54:02,634][22180] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 05:54:02,636][22180] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 05:54:02,640][22180] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 05:54:02,642][22180] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 05:54:02,665][22180] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 05:54:02,668][22180] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 05:54:02,670][22180] RunningMeanStd input shape: (1,) [2023-02-26 05:54:02,688][22180] ConvEncoder: input_channels=3 [2023-02-26 05:54:03,408][22180] Conv encoder output size: 512 [2023-02-26 05:54:03,410][22180] Policy head output size: 512 [2023-02-26 05:54:06,364][22180] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2023-02-26 05:54:07,587][22180] Num frames 100... [2023-02-26 05:54:07,698][22180] Num frames 200... [2023-02-26 05:54:07,808][22180] Num frames 300... [2023-02-26 05:54:07,920][22180] Num frames 400... [2023-02-26 05:54:08,040][22180] Num frames 500... [2023-02-26 05:54:08,156][22180] Num frames 600... [2023-02-26 05:54:08,275][22180] Num frames 700... [2023-02-26 05:54:08,385][22180] Num frames 800... [2023-02-26 05:54:08,495][22180] Num frames 900... [2023-02-26 05:54:08,587][22180] Avg episode rewards: #0: 17.280, true rewards: #0: 9.280 [2023-02-26 05:54:08,589][22180] Avg episode reward: 17.280, avg true_objective: 9.280 [2023-02-26 05:54:08,672][22180] Num frames 1000... [2023-02-26 05:54:08,785][22180] Num frames 1100... [2023-02-26 05:54:08,901][22180] Num frames 1200... [2023-02-26 05:54:09,009][22180] Num frames 1300... [2023-02-26 05:54:09,116][22180] Num frames 1400... [2023-02-26 05:54:09,230][22180] Num frames 1500... [2023-02-26 05:54:09,349][22180] Num frames 1600... [2023-02-26 05:54:09,459][22180] Num frames 1700... [2023-02-26 05:54:09,581][22180] Num frames 1800... [2023-02-26 05:54:09,697][22180] Num frames 1900... [2023-02-26 05:54:09,810][22180] Num frames 2000... [2023-02-26 05:54:09,921][22180] Num frames 2100... [2023-02-26 05:54:10,031][22180] Num frames 2200... [2023-02-26 05:54:10,145][22180] Num frames 2300... [2023-02-26 05:54:10,205][22180] Avg episode rewards: #0: 23.520, true rewards: #0: 11.520 [2023-02-26 05:54:10,207][22180] Avg episode reward: 23.520, avg true_objective: 11.520 [2023-02-26 05:54:10,319][22180] Num frames 2400... [2023-02-26 05:54:10,429][22180] Num frames 2500... [2023-02-26 05:54:10,541][22180] Num frames 2600... [2023-02-26 05:54:10,657][22180] Num frames 2700... [2023-02-26 05:54:10,774][22180] Num frames 2800... [2023-02-26 05:54:10,883][22180] Avg episode rewards: #0: 18.494, true rewards: #0: 9.493 [2023-02-26 05:54:10,885][22180] Avg episode reward: 18.494, avg true_objective: 9.493 [2023-02-26 05:54:10,947][22180] Num frames 2900... [2023-02-26 05:54:11,063][22180] Num frames 3000... [2023-02-26 05:54:11,181][22180] Num frames 3100... [2023-02-26 05:54:11,294][22180] Num frames 3200... [2023-02-26 05:54:11,407][22180] Num frames 3300... [2023-02-26 05:54:11,517][22180] Num frames 3400... [2023-02-26 05:54:11,637][22180] Num frames 3500... [2023-02-26 05:54:11,781][22180] Avg episode rewards: #0: 17.645, true rewards: #0: 8.895 [2023-02-26 05:54:11,783][22180] Avg episode reward: 17.645, avg true_objective: 8.895 [2023-02-26 05:54:11,851][22180] Num frames 3600... [2023-02-26 05:54:12,013][22180] Num frames 3700... [2023-02-26 05:54:12,163][22180] Num frames 3800... [2023-02-26 05:54:12,317][22180] Num frames 3900... [2023-02-26 05:54:12,471][22180] Num frames 4000... [2023-02-26 05:54:12,633][22180] Num frames 4100... [2023-02-26 05:54:12,787][22180] Num frames 4200... [2023-02-26 05:54:12,943][22180] Num frames 4300... [2023-02-26 05:54:13,096][22180] Num frames 4400... [2023-02-26 05:54:13,255][22180] Num frames 4500... [2023-02-26 05:54:13,412][22180] Num frames 4600... [2023-02-26 05:54:13,573][22180] Num frames 4700... [2023-02-26 05:54:13,736][22180] Num frames 4800... [2023-02-26 05:54:13,895][22180] Num frames 4900... [2023-02-26 05:54:14,013][22180] Avg episode rewards: #0: 19.468, true rewards: #0: 9.868 [2023-02-26 05:54:14,016][22180] Avg episode reward: 19.468, avg true_objective: 9.868 [2023-02-26 05:54:14,124][22180] Num frames 5000... [2023-02-26 05:54:14,296][22180] Num frames 5100... [2023-02-26 05:54:14,453][22180] Num frames 5200... [2023-02-26 05:54:14,611][22180] Num frames 5300... [2023-02-26 05:54:14,778][22180] Num frames 5400... [2023-02-26 05:54:14,939][22180] Num frames 5500... [2023-02-26 05:54:15,100][22180] Num frames 5600... [2023-02-26 05:54:15,168][22180] Avg episode rewards: #0: 18.010, true rewards: #0: 9.343 [2023-02-26 05:54:15,171][22180] Avg episode reward: 18.010, avg true_objective: 9.343 [2023-02-26 05:54:15,303][22180] Num frames 5700... [2023-02-26 05:54:15,425][22180] Num frames 5800... [2023-02-26 05:54:15,568][22180] Num frames 5900... [2023-02-26 05:54:15,691][22180] Num frames 6000... [2023-02-26 05:54:15,819][22180] Num frames 6100... [2023-02-26 05:54:15,967][22180] Avg episode rewards: #0: 16.689, true rewards: #0: 8.831 [2023-02-26 05:54:15,969][22180] Avg episode reward: 16.689, avg true_objective: 8.831 [2023-02-26 05:54:15,996][22180] Num frames 6200... [2023-02-26 05:54:16,109][22180] Num frames 6300... [2023-02-26 05:54:16,232][22180] Num frames 6400... [2023-02-26 05:54:16,345][22180] Num frames 6500... [2023-02-26 05:54:16,454][22180] Num frames 6600... [2023-02-26 05:54:16,564][22180] Num frames 6700... [2023-02-26 05:54:16,679][22180] Num frames 6800... [2023-02-26 05:54:16,793][22180] Num frames 6900... [2023-02-26 05:54:16,906][22180] Num frames 7000... [2023-02-26 05:54:17,022][22180] Num frames 7100... [2023-02-26 05:54:17,132][22180] Num frames 7200... [2023-02-26 05:54:17,268][22180] Avg episode rewards: #0: 18.088, true rewards: #0: 9.087 [2023-02-26 05:54:17,270][22180] Avg episode reward: 18.088, avg true_objective: 9.087 [2023-02-26 05:54:17,311][22180] Num frames 7300... [2023-02-26 05:54:17,437][22180] Num frames 7400... [2023-02-26 05:54:17,548][22180] Num frames 7500... [2023-02-26 05:54:17,660][22180] Num frames 7600... [2023-02-26 05:54:17,776][22180] Num frames 7700... [2023-02-26 05:54:17,888][22180] Num frames 7800... [2023-02-26 05:54:18,003][22180] Num frames 7900... [2023-02-26 05:54:18,116][22180] Num frames 8000... [2023-02-26 05:54:18,238][22180] Num frames 8100... [2023-02-26 05:54:18,358][22180] Num frames 8200... [2023-02-26 05:54:18,456][22180] Avg episode rewards: #0: 18.367, true rewards: #0: 9.144 [2023-02-26 05:54:18,457][22180] Avg episode reward: 18.367, avg true_objective: 9.144 [2023-02-26 05:54:18,545][22180] Num frames 8300... [2023-02-26 05:54:18,668][22180] Num frames 8400... [2023-02-26 05:54:18,812][22180] Num frames 8500... [2023-02-26 05:54:18,946][22180] Num frames 8600... [2023-02-26 05:54:19,062][22180] Num frames 8700... [2023-02-26 05:54:19,187][22180] Num frames 8800... [2023-02-26 05:54:19,302][22180] Num frames 8900... [2023-02-26 05:54:19,427][22180] Num frames 9000... [2023-02-26 05:54:19,540][22180] Num frames 9100... [2023-02-26 05:54:19,663][22180] Num frames 9200... [2023-02-26 05:54:19,790][22180] Num frames 9300... [2023-02-26 05:54:19,909][22180] Num frames 9400... [2023-02-26 05:54:20,032][22180] Num frames 9500... [2023-02-26 05:54:20,154][22180] Num frames 9600... [2023-02-26 05:54:20,284][22180] Num frames 9700... [2023-02-26 05:54:20,415][22180] Num frames 9800... [2023-02-26 05:54:20,536][22180] Num frames 9900... [2023-02-26 05:54:20,655][22180] Num frames 10000... [2023-02-26 05:54:20,770][22180] Num frames 10100... [2023-02-26 05:54:20,895][22180] Num frames 10200... [2023-02-26 05:54:21,027][22180] Num frames 10300... [2023-02-26 05:54:21,097][22180] Avg episode rewards: #0: 22.410, true rewards: #0: 10.310 [2023-02-26 05:54:21,100][22180] Avg episode reward: 22.410, avg true_objective: 10.310 [2023-02-26 05:55:27,492][22180] Replay video saved to /content/train_dir/default_experiment/replay.mp4!