[2023-03-08 17:29:41,754][01803] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-08 17:29:41,755][01803] Rollout worker 0 uses device cpu [2023-03-08 17:29:41,757][01803] Rollout worker 1 uses device cpu [2023-03-08 17:29:41,758][01803] Rollout worker 2 uses device cpu [2023-03-08 17:29:41,759][01803] Rollout worker 3 uses device cpu [2023-03-08 17:29:41,760][01803] Rollout worker 4 uses device cpu [2023-03-08 17:29:41,761][01803] Rollout worker 5 uses device cpu [2023-03-08 17:29:41,763][01803] Rollout worker 6 uses device cpu [2023-03-08 17:29:41,765][01803] Rollout worker 7 uses device cpu [2023-03-08 17:29:42,091][01803] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:29:42,095][01803] InferenceWorker_p0-w0: min num requests: 2 [2023-03-08 17:29:42,135][01803] Starting all processes... [2023-03-08 17:29:42,140][01803] Starting process learner_proc0 [2023-03-08 17:29:42,213][01803] Starting all processes... [2023-03-08 17:29:42,225][01803] Starting process inference_proc0-0 [2023-03-08 17:29:42,225][01803] Starting process rollout_proc0 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc1 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc2 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc3 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc4 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc5 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc6 [2023-03-08 17:29:42,230][01803] Starting process rollout_proc7 [2023-03-08 17:29:51,395][14299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:29:51,399][14299] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-08 17:29:51,412][14312] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:29:51,413][14312] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-08 17:29:51,645][14315] Worker 1 uses CPU cores [1] [2023-03-08 17:29:51,706][14317] Worker 3 uses CPU cores [1] [2023-03-08 17:29:51,774][14320] Worker 6 uses CPU cores [0] [2023-03-08 17:29:51,877][14316] Worker 2 uses CPU cores [0] [2023-03-08 17:29:51,928][14318] Worker 4 uses CPU cores [0] [2023-03-08 17:29:51,997][14313] Worker 0 uses CPU cores [0] [2023-03-08 17:29:52,104][14319] Worker 5 uses CPU cores [1] [2023-03-08 17:29:52,106][14321] Worker 7 uses CPU cores [1] [2023-03-08 17:29:52,208][14299] Num visible devices: 1 [2023-03-08 17:29:52,209][14312] Num visible devices: 1 [2023-03-08 17:29:52,214][14299] Starting seed is not provided [2023-03-08 17:29:52,215][14299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:29:52,215][14299] Initializing actor-critic model on device cuda:0 [2023-03-08 17:29:52,216][14299] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 17:29:52,218][14299] RunningMeanStd input shape: (1,) [2023-03-08 17:29:52,236][14299] ConvEncoder: input_channels=3 [2023-03-08 17:29:52,560][14299] Conv encoder output size: 512 [2023-03-08 17:29:52,560][14299] Policy head output size: 512 [2023-03-08 17:29:52,617][14299] Created Actor Critic model with architecture: [2023-03-08 17:29:52,617][14299] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-08 17:30:01,042][14299] Using optimizer [2023-03-08 17:30:01,044][14299] No checkpoints found [2023-03-08 17:30:01,044][14299] Did not load from checkpoint, starting from scratch! [2023-03-08 17:30:01,044][14299] Initialized policy 0 weights for model version 0 [2023-03-08 17:30:01,049][14299] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:30:01,058][14299] LearnerWorker_p0 finished initialization! [2023-03-08 17:30:01,244][14312] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 17:30:01,245][14312] RunningMeanStd input shape: (1,) [2023-03-08 17:30:01,257][14312] ConvEncoder: input_channels=3 [2023-03-08 17:30:01,354][14312] Conv encoder output size: 512 [2023-03-08 17:30:01,355][14312] Policy head output size: 512 [2023-03-08 17:30:01,632][01803] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:30:02,081][01803] Heartbeat connected on Batcher_0 [2023-03-08 17:30:02,087][01803] Heartbeat connected on LearnerWorker_p0 [2023-03-08 17:30:02,102][01803] Heartbeat connected on RolloutWorker_w0 [2023-03-08 17:30:02,111][01803] Heartbeat connected on RolloutWorker_w1 [2023-03-08 17:30:02,114][01803] Heartbeat connected on RolloutWorker_w2 [2023-03-08 17:30:02,120][01803] Heartbeat connected on RolloutWorker_w3 [2023-03-08 17:30:02,122][01803] Heartbeat connected on RolloutWorker_w4 [2023-03-08 17:30:02,126][01803] Heartbeat connected on RolloutWorker_w5 [2023-03-08 17:30:02,132][01803] Heartbeat connected on RolloutWorker_w6 [2023-03-08 17:30:02,136][01803] Heartbeat connected on RolloutWorker_w7 [2023-03-08 17:30:03,686][01803] Inference worker 0-0 is ready! [2023-03-08 17:30:03,688][01803] All inference workers are ready! Signal rollout workers to start! [2023-03-08 17:30:03,693][01803] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-08 17:30:03,816][14315] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,823][14317] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,828][14318] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,835][14316] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,843][14319] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,853][14321] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,863][14320] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:03,874][14313] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:30:04,707][14318] Decorrelating experience for 0 frames... [2023-03-08 17:30:04,709][14316] Decorrelating experience for 0 frames... [2023-03-08 17:30:04,983][14319] Decorrelating experience for 0 frames... [2023-03-08 17:30:04,986][14315] Decorrelating experience for 0 frames... [2023-03-08 17:30:04,991][14317] Decorrelating experience for 0 frames... [2023-03-08 17:30:05,629][14316] Decorrelating experience for 32 frames... [2023-03-08 17:30:05,646][14318] Decorrelating experience for 32 frames... [2023-03-08 17:30:05,669][14320] Decorrelating experience for 0 frames... [2023-03-08 17:30:06,021][14313] Decorrelating experience for 0 frames... [2023-03-08 17:30:06,295][14315] Decorrelating experience for 32 frames... [2023-03-08 17:30:06,427][14319] Decorrelating experience for 32 frames... [2023-03-08 17:30:06,632][01803] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:30:06,859][14319] Decorrelating experience for 64 frames... [2023-03-08 17:30:07,026][14320] Decorrelating experience for 32 frames... [2023-03-08 17:30:07,298][14316] Decorrelating experience for 64 frames... [2023-03-08 17:30:07,482][14318] Decorrelating experience for 64 frames... [2023-03-08 17:30:07,486][14313] Decorrelating experience for 32 frames... [2023-03-08 17:30:07,883][14319] Decorrelating experience for 96 frames... [2023-03-08 17:30:08,307][14315] Decorrelating experience for 64 frames... [2023-03-08 17:30:08,626][14321] Decorrelating experience for 0 frames... [2023-03-08 17:30:09,094][14315] Decorrelating experience for 96 frames... [2023-03-08 17:30:09,388][14317] Decorrelating experience for 32 frames... [2023-03-08 17:30:09,751][14320] Decorrelating experience for 64 frames... [2023-03-08 17:30:10,024][14317] Decorrelating experience for 64 frames... [2023-03-08 17:30:10,297][14318] Decorrelating experience for 96 frames... [2023-03-08 17:30:10,544][14313] Decorrelating experience for 64 frames... [2023-03-08 17:30:10,945][14321] Decorrelating experience for 32 frames... [2023-03-08 17:30:11,585][14321] Decorrelating experience for 64 frames... [2023-03-08 17:30:11,632][01803] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:30:12,072][14321] Decorrelating experience for 96 frames... [2023-03-08 17:30:12,339][14316] Decorrelating experience for 96 frames... [2023-03-08 17:30:12,569][14320] Decorrelating experience for 96 frames... [2023-03-08 17:30:12,938][14317] Decorrelating experience for 96 frames... [2023-03-08 17:30:13,121][14313] Decorrelating experience for 96 frames... [2023-03-08 17:30:16,634][01803] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 82.5. Samples: 1238. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:30:16,636][01803] Avg episode reward: [(0, '1.626')] [2023-03-08 17:30:17,441][14299] Signal inference workers to stop experience collection... [2023-03-08 17:30:17,460][14312] InferenceWorker_p0-w0: stopping experience collection [2023-03-08 17:30:20,117][14299] Signal inference workers to resume experience collection... [2023-03-08 17:30:20,117][14312] InferenceWorker_p0-w0: resuming experience collection [2023-03-08 17:30:21,632][01803] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 125.8. Samples: 2516. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-03-08 17:30:21,634][01803] Avg episode reward: [(0, '2.681')] [2023-03-08 17:30:26,635][01803] Fps is (10 sec: 2867.0, 60 sec: 1146.8, 300 sec: 1146.8). Total num frames: 28672. Throughput: 0: 265.0. Samples: 6626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:30:26,641][01803] Avg episode reward: [(0, '3.931')] [2023-03-08 17:30:30,336][14312] Updated weights for policy 0, policy_version 10 (0.0381) [2023-03-08 17:30:31,632][01803] Fps is (10 sec: 4096.0, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 370.5. Samples: 11114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:30:31,639][01803] Avg episode reward: [(0, '4.428')] [2023-03-08 17:30:36,632][01803] Fps is (10 sec: 3277.7, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 383.9. Samples: 13436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:30:36,640][01803] Avg episode reward: [(0, '4.483')] [2023-03-08 17:30:40,591][14312] Updated weights for policy 0, policy_version 20 (0.0029) [2023-03-08 17:30:41,632][01803] Fps is (10 sec: 4096.0, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 493.7. Samples: 19748. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-08 17:30:41,636][01803] Avg episode reward: [(0, '4.375')] [2023-03-08 17:30:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 600.5. Samples: 27022. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-03-08 17:30:46,636][01803] Avg episode reward: [(0, '4.357')] [2023-03-08 17:30:46,646][14299] Saving new best policy, reward=4.357! [2023-03-08 17:30:51,216][14312] Updated weights for policy 0, policy_version 30 (0.0021) [2023-03-08 17:30:51,634][01803] Fps is (10 sec: 3685.7, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 122880. Throughput: 0: 651.0. Samples: 29296. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-03-08 17:30:51,637][01803] Avg episode reward: [(0, '4.509')] [2023-03-08 17:30:51,640][14299] Saving new best policy, reward=4.509! [2023-03-08 17:30:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 2532.1, 300 sec: 2532.1). Total num frames: 139264. Throughput: 0: 748.1. Samples: 33664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:30:56,634][01803] Avg episode reward: [(0, '4.496')] [2023-03-08 17:31:01,605][14312] Updated weights for policy 0, policy_version 40 (0.0014) [2023-03-08 17:31:01,632][01803] Fps is (10 sec: 4096.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 873.1. Samples: 40526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:31:01,634][01803] Avg episode reward: [(0, '4.422')] [2023-03-08 17:31:06,635][01803] Fps is (10 sec: 4504.2, 60 sec: 3071.8, 300 sec: 2835.6). Total num frames: 184320. Throughput: 0: 923.4. Samples: 44072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:31:06,639][01803] Avg episode reward: [(0, '4.442')] [2023-03-08 17:31:11,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 947.2. Samples: 49246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:31:11,638][01803] Avg episode reward: [(0, '4.369')] [2023-03-08 17:31:13,017][14312] Updated weights for policy 0, policy_version 50 (0.0012) [2023-03-08 17:31:16,632][01803] Fps is (10 sec: 3277.8, 60 sec: 3618.3, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 949.8. Samples: 53854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:31:16,639][01803] Avg episode reward: [(0, '4.362')] [2023-03-08 17:31:21,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 952.8. Samples: 56312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:31:21,635][01803] Avg episode reward: [(0, '4.388')] [2023-03-08 17:31:26,174][14312] Updated weights for policy 0, policy_version 60 (0.0051) [2023-03-08 17:31:26,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 915.0. Samples: 60922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:31:26,636][01803] Avg episode reward: [(0, '4.365')] [2023-03-08 17:31:31,632][01803] Fps is (10 sec: 2867.0, 60 sec: 3549.8, 300 sec: 2867.2). Total num frames: 258048. Throughput: 0: 852.5. Samples: 65384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:31:31,638][01803] Avg episode reward: [(0, '4.337')] [2023-03-08 17:31:36,633][01803] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 2931.8). Total num frames: 278528. Throughput: 0: 851.7. Samples: 67622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:31:36,637][01803] Avg episode reward: [(0, '4.253')] [2023-03-08 17:31:36,644][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2023-03-08 17:31:38,172][14312] Updated weights for policy 0, policy_version 70 (0.0043) [2023-03-08 17:31:41,632][01803] Fps is (10 sec: 4505.9, 60 sec: 3618.1, 300 sec: 3031.0). Total num frames: 303104. Throughput: 0: 895.6. Samples: 73968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:31:41,640][01803] Avg episode reward: [(0, '4.351')] [2023-03-08 17:31:46,632][01803] Fps is (10 sec: 4506.1, 60 sec: 3618.1, 300 sec: 3081.8). Total num frames: 323584. Throughput: 0: 902.1. Samples: 81120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:31:46,634][01803] Avg episode reward: [(0, '4.592')] [2023-03-08 17:31:46,647][14299] Saving new best policy, reward=4.592! [2023-03-08 17:31:47,685][14312] Updated weights for policy 0, policy_version 80 (0.0012) [2023-03-08 17:31:51,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3053.4). Total num frames: 335872. Throughput: 0: 872.3. Samples: 83322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:31:51,634][01803] Avg episode reward: [(0, '4.579')] [2023-03-08 17:31:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3098.7). Total num frames: 356352. Throughput: 0: 859.0. Samples: 87902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:31:56,634][01803] Avg episode reward: [(0, '4.413')] [2023-03-08 17:31:59,020][14312] Updated weights for policy 0, policy_version 90 (0.0016) [2023-03-08 17:32:01,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3174.4). Total num frames: 380928. Throughput: 0: 907.7. Samples: 94700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:32:01,634][01803] Avg episode reward: [(0, '4.504')] [2023-03-08 17:32:06,637][01803] Fps is (10 sec: 4503.3, 60 sec: 3618.0, 300 sec: 3211.1). Total num frames: 401408. Throughput: 0: 934.4. Samples: 98366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:32:06,639][01803] Avg episode reward: [(0, '4.557')] [2023-03-08 17:32:08,580][14312] Updated weights for policy 0, policy_version 100 (0.0011) [2023-03-08 17:32:11,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 948.8. Samples: 103620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:32:11,636][01803] Avg episode reward: [(0, '4.396')] [2023-03-08 17:32:16,632][01803] Fps is (10 sec: 3278.5, 60 sec: 3618.1, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 952.1. Samples: 108226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:32:16,637][01803] Avg episode reward: [(0, '4.360')] [2023-03-08 17:32:19,871][14312] Updated weights for policy 0, policy_version 110 (0.0025) [2023-03-08 17:32:21,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 458752. Throughput: 0: 984.8. Samples: 111938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:32:21,640][01803] Avg episode reward: [(0, '4.733')] [2023-03-08 17:32:21,644][14299] Saving new best policy, reward=4.733! [2023-03-08 17:32:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3305.0). Total num frames: 479232. Throughput: 0: 1004.7. Samples: 119180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:32:26,640][01803] Avg episode reward: [(0, '4.750')] [2023-03-08 17:32:26,648][14299] Saving new best policy, reward=4.750! [2023-03-08 17:32:30,389][14312] Updated weights for policy 0, policy_version 120 (0.0023) [2023-03-08 17:32:31,633][01803] Fps is (10 sec: 3276.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 491520. Throughput: 0: 946.0. Samples: 123690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:32:31,643][01803] Avg episode reward: [(0, '4.567')] [2023-03-08 17:32:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3303.2). Total num frames: 512000. Throughput: 0: 949.1. Samples: 126030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:32:36,633][01803] Avg episode reward: [(0, '4.534')] [2023-03-08 17:32:40,570][14312] Updated weights for policy 0, policy_version 130 (0.0025) [2023-03-08 17:32:41,632][01803] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3353.6). Total num frames: 536576. Throughput: 0: 996.3. Samples: 132736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:32:41,634][01803] Avg episode reward: [(0, '4.644')] [2023-03-08 17:32:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3376.1). Total num frames: 557056. Throughput: 0: 999.5. Samples: 139678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:32:46,634][01803] Avg episode reward: [(0, '4.497')] [2023-03-08 17:32:51,129][14312] Updated weights for policy 0, policy_version 140 (0.0016) [2023-03-08 17:32:51,635][01803] Fps is (10 sec: 3685.3, 60 sec: 3959.3, 300 sec: 3373.1). Total num frames: 573440. Throughput: 0: 970.3. Samples: 142026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:32:51,639][01803] Avg episode reward: [(0, '4.334')] [2023-03-08 17:32:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3370.4). Total num frames: 589824. Throughput: 0: 957.6. Samples: 146714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:32:56,639][01803] Avg episode reward: [(0, '4.459')] [2023-03-08 17:33:01,125][14312] Updated weights for policy 0, policy_version 150 (0.0036) [2023-03-08 17:33:01,632][01803] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3413.3). Total num frames: 614400. Throughput: 0: 1009.5. Samples: 153654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:33:01,634][01803] Avg episode reward: [(0, '4.232')] [2023-03-08 17:33:06,633][01803] Fps is (10 sec: 4505.1, 60 sec: 3891.5, 300 sec: 3431.8). Total num frames: 634880. Throughput: 0: 1006.6. Samples: 157238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:06,636][01803] Avg episode reward: [(0, '4.382')] [2023-03-08 17:33:11,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3427.7). Total num frames: 651264. Throughput: 0: 962.3. Samples: 162482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:11,638][01803] Avg episode reward: [(0, '4.683')] [2023-03-08 17:33:12,358][14312] Updated weights for policy 0, policy_version 160 (0.0019) [2023-03-08 17:33:16,632][01803] Fps is (10 sec: 3277.2, 60 sec: 3891.2, 300 sec: 3423.8). Total num frames: 667648. Throughput: 0: 968.9. Samples: 167288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:16,639][01803] Avg episode reward: [(0, '4.703')] [2023-03-08 17:33:21,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 997.1. Samples: 170898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:21,639][01803] Avg episode reward: [(0, '4.537')] [2023-03-08 17:33:21,812][14312] Updated weights for policy 0, policy_version 170 (0.0019) [2023-03-08 17:33:26,632][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3496.6). Total num frames: 716800. Throughput: 0: 1009.1. Samples: 178146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:26,642][01803] Avg episode reward: [(0, '4.514')] [2023-03-08 17:33:31,638][01803] Fps is (10 sec: 3684.2, 60 sec: 3959.2, 300 sec: 3471.7). Total num frames: 729088. Throughput: 0: 959.1. Samples: 182842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:33:31,640][01803] Avg episode reward: [(0, '4.440')] [2023-03-08 17:33:33,495][14312] Updated weights for policy 0, policy_version 180 (0.0029) [2023-03-08 17:33:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3486.4). Total num frames: 749568. Throughput: 0: 957.8. Samples: 185122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:36,634][01803] Avg episode reward: [(0, '4.465')] [2023-03-08 17:33:36,644][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth... [2023-03-08 17:33:41,632][01803] Fps is (10 sec: 4098.4, 60 sec: 3891.2, 300 sec: 3500.2). Total num frames: 770048. Throughput: 0: 999.0. Samples: 191670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:33:41,634][01803] Avg episode reward: [(0, '4.804')] [2023-03-08 17:33:41,640][14299] Saving new best policy, reward=4.804! [2023-03-08 17:33:42,702][14312] Updated weights for policy 0, policy_version 190 (0.0023) [2023-03-08 17:33:46,636][01803] Fps is (10 sec: 4503.8, 60 sec: 3959.2, 300 sec: 3531.6). Total num frames: 794624. Throughput: 0: 996.2. Samples: 198488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:33:46,638][01803] Avg episode reward: [(0, '4.798')] [2023-03-08 17:33:51,635][01803] Fps is (10 sec: 3685.3, 60 sec: 3891.2, 300 sec: 3508.3). Total num frames: 806912. Throughput: 0: 966.9. Samples: 200750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:33:51,637][01803] Avg episode reward: [(0, '4.776')] [2023-03-08 17:33:54,643][14312] Updated weights for policy 0, policy_version 200 (0.0026) [2023-03-08 17:33:56,632][01803] Fps is (10 sec: 3278.1, 60 sec: 3959.5, 300 sec: 3520.8). Total num frames: 827392. Throughput: 0: 951.5. Samples: 205298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:33:56,633][01803] Avg episode reward: [(0, '4.772')] [2023-03-08 17:34:01,632][01803] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3532.8). Total num frames: 847872. Throughput: 0: 1003.5. Samples: 212444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:34:01,639][01803] Avg episode reward: [(0, '4.770')] [2023-03-08 17:34:03,500][14312] Updated weights for policy 0, policy_version 210 (0.0012) [2023-03-08 17:34:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 1001.1. Samples: 215948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:34:06,636][01803] Avg episode reward: [(0, '4.648')] [2023-03-08 17:34:11,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3538.9). Total num frames: 884736. Throughput: 0: 953.2. Samples: 221042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:34:11,637][01803] Avg episode reward: [(0, '4.618')] [2023-03-08 17:34:15,610][14312] Updated weights for policy 0, policy_version 220 (0.0023) [2023-03-08 17:34:16,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 905216. Throughput: 0: 957.6. Samples: 225928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:34:16,634][01803] Avg episode reward: [(0, '4.634')] [2023-03-08 17:34:21,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3560.4). Total num frames: 925696. Throughput: 0: 985.1. Samples: 229450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:34:21,640][01803] Avg episode reward: [(0, '4.744')] [2023-03-08 17:34:24,230][14312] Updated weights for policy 0, policy_version 230 (0.0014) [2023-03-08 17:34:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 1002.8. Samples: 236794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:34:26,634][01803] Avg episode reward: [(0, '4.738')] [2023-03-08 17:34:31,637][01803] Fps is (10 sec: 4094.0, 60 sec: 3959.5, 300 sec: 3580.1). Total num frames: 966656. Throughput: 0: 954.8. Samples: 241456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:34:31,639][01803] Avg episode reward: [(0, '4.715')] [2023-03-08 17:34:36,360][14312] Updated weights for policy 0, policy_version 240 (0.0018) [2023-03-08 17:34:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 954.6. Samples: 243704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:34:36,638][01803] Avg episode reward: [(0, '4.839')] [2023-03-08 17:34:36,646][14299] Saving new best policy, reward=4.839! [2023-03-08 17:34:41,632][01803] Fps is (10 sec: 4098.1, 60 sec: 3959.5, 300 sec: 3598.6). Total num frames: 1007616. Throughput: 0: 1000.3. Samples: 250310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:34:41,637][01803] Avg episode reward: [(0, '4.636')] [2023-03-08 17:34:44,933][14312] Updated weights for policy 0, policy_version 250 (0.0018) [2023-03-08 17:34:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.5, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 995.6. Samples: 257248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:34:46,634][01803] Avg episode reward: [(0, '4.789')] [2023-03-08 17:34:51,640][01803] Fps is (10 sec: 3274.2, 60 sec: 3890.9, 300 sec: 3587.4). Total num frames: 1040384. Throughput: 0: 967.8. Samples: 259508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:34:51,642][01803] Avg episode reward: [(0, '4.998')] [2023-03-08 17:34:51,714][14299] Saving new best policy, reward=4.998! [2023-03-08 17:34:56,632][01803] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 957.4. Samples: 264124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:34:56,634][01803] Avg episode reward: [(0, '4.777')] [2023-03-08 17:34:56,951][14312] Updated weights for policy 0, policy_version 260 (0.0025) [2023-03-08 17:35:01,632][01803] Fps is (10 sec: 4509.2, 60 sec: 3959.5, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 1010.8. Samples: 271412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:35:01,638][01803] Avg episode reward: [(0, '4.777')] [2023-03-08 17:35:05,725][14312] Updated weights for policy 0, policy_version 270 (0.0012) [2023-03-08 17:35:06,632][01803] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 1012.3. Samples: 275006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:35:06,641][01803] Avg episode reward: [(0, '5.211')] [2023-03-08 17:35:06,653][14299] Saving new best policy, reward=5.211! [2023-03-08 17:35:11,638][01803] Fps is (10 sec: 3684.2, 60 sec: 3959.1, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 958.2. Samples: 279920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:35:11,645][01803] Avg episode reward: [(0, '5.124')] [2023-03-08 17:35:16,632][01803] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1142784. Throughput: 0: 972.0. Samples: 285192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:35:16,636][01803] Avg episode reward: [(0, '5.201')] [2023-03-08 17:35:17,478][14312] Updated weights for policy 0, policy_version 280 (0.0017) [2023-03-08 17:35:21,632][01803] Fps is (10 sec: 4098.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1163264. Throughput: 0: 1001.1. Samples: 288752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:35:21,634][01803] Avg episode reward: [(0, '5.324')] [2023-03-08 17:35:21,655][14299] Saving new best policy, reward=5.324! [2023-03-08 17:35:26,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1183744. Throughput: 0: 1012.6. Samples: 295878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:35:26,634][01803] Avg episode reward: [(0, '5.517')] [2023-03-08 17:35:26,651][14299] Saving new best policy, reward=5.517! [2023-03-08 17:35:27,124][14312] Updated weights for policy 0, policy_version 290 (0.0017) [2023-03-08 17:35:31,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3860.0). Total num frames: 1200128. Throughput: 0: 959.4. Samples: 300420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:35:31,636][01803] Avg episode reward: [(0, '5.610')] [2023-03-08 17:35:31,640][14299] Saving new best policy, reward=5.610! [2023-03-08 17:35:36,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1220608. Throughput: 0: 958.6. Samples: 302636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:35:36,637][01803] Avg episode reward: [(0, '5.776')] [2023-03-08 17:35:36,648][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth... [2023-03-08 17:35:36,790][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2023-03-08 17:35:36,806][14299] Saving new best policy, reward=5.776! [2023-03-08 17:35:38,436][14312] Updated weights for policy 0, policy_version 300 (0.0019) [2023-03-08 17:35:41,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1241088. Throughput: 0: 1006.2. Samples: 309404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:35:41,641][01803] Avg episode reward: [(0, '5.724')] [2023-03-08 17:35:46,638][01803] Fps is (10 sec: 3684.2, 60 sec: 3822.5, 300 sec: 3846.0). Total num frames: 1257472. Throughput: 0: 957.6. Samples: 314512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:35:46,642][01803] Avg episode reward: [(0, '6.104')] [2023-03-08 17:35:46,658][14299] Saving new best policy, reward=6.104! [2023-03-08 17:35:51,552][14312] Updated weights for policy 0, policy_version 310 (0.0012) [2023-03-08 17:35:51,632][01803] Fps is (10 sec: 2867.1, 60 sec: 3823.4, 300 sec: 3832.2). Total num frames: 1269760. Throughput: 0: 917.9. Samples: 316312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:35:51,638][01803] Avg episode reward: [(0, '6.229')] [2023-03-08 17:35:51,640][14299] Saving new best policy, reward=6.229! [2023-03-08 17:35:56,632][01803] Fps is (10 sec: 2459.1, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1282048. Throughput: 0: 889.4. Samples: 319936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:35:56,637][01803] Avg episode reward: [(0, '6.338')] [2023-03-08 17:35:56,657][14299] Saving new best policy, reward=6.338! [2023-03-08 17:36:01,632][01803] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3790.6). Total num frames: 1302528. Throughput: 0: 899.7. Samples: 325680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:36:01,641][01803] Avg episode reward: [(0, '5.790')] [2023-03-08 17:36:02,911][14312] Updated weights for policy 0, policy_version 320 (0.0025) [2023-03-08 17:36:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1327104. Throughput: 0: 901.9. Samples: 329336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:36:06,633][01803] Avg episode reward: [(0, '5.840')] [2023-03-08 17:36:11,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3686.8, 300 sec: 3818.3). Total num frames: 1343488. Throughput: 0: 882.1. Samples: 335574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:36:11,639][01803] Avg episode reward: [(0, '5.671')] [2023-03-08 17:36:13,401][14312] Updated weights for policy 0, policy_version 330 (0.0018) [2023-03-08 17:36:16,633][01803] Fps is (10 sec: 3276.5, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1359872. Throughput: 0: 885.7. Samples: 340278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:36:16,635][01803] Avg episode reward: [(0, '6.041')] [2023-03-08 17:36:21,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 1380352. Throughput: 0: 899.6. Samples: 343116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:36:21,634][01803] Avg episode reward: [(0, '6.285')] [2023-03-08 17:36:23,295][14312] Updated weights for policy 0, policy_version 340 (0.0026) [2023-03-08 17:36:26,632][01803] Fps is (10 sec: 4506.0, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 1404928. Throughput: 0: 912.8. Samples: 350478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:36:26,636][01803] Avg episode reward: [(0, '6.267')] [2023-03-08 17:36:31,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 1425408. Throughput: 0: 931.0. Samples: 356400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:36:31,638][01803] Avg episode reward: [(0, '6.221')] [2023-03-08 17:36:34,175][14312] Updated weights for policy 0, policy_version 350 (0.0021) [2023-03-08 17:36:36,632][01803] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 1437696. Throughput: 0: 941.8. Samples: 358694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:36:36,634][01803] Avg episode reward: [(0, '5.927')] [2023-03-08 17:36:41,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1462272. Throughput: 0: 986.6. Samples: 364332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:36:41,641][01803] Avg episode reward: [(0, '6.159')] [2023-03-08 17:36:43,707][14312] Updated weights for policy 0, policy_version 360 (0.0020) [2023-03-08 17:36:46,632][01803] Fps is (10 sec: 4915.3, 60 sec: 3823.3, 300 sec: 3901.6). Total num frames: 1486848. Throughput: 0: 1023.2. Samples: 371722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:36:46,634][01803] Avg episode reward: [(0, '6.507')] [2023-03-08 17:36:46,649][14299] Saving new best policy, reward=6.507! [2023-03-08 17:36:51,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1503232. Throughput: 0: 1010.4. Samples: 374804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:36:51,640][01803] Avg episode reward: [(0, '6.579')] [2023-03-08 17:36:51,642][14299] Saving new best policy, reward=6.579! [2023-03-08 17:36:55,025][14312] Updated weights for policy 0, policy_version 370 (0.0028) [2023-03-08 17:36:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1519616. Throughput: 0: 973.9. Samples: 379398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:36:56,635][01803] Avg episode reward: [(0, '6.739')] [2023-03-08 17:36:56,651][14299] Saving new best policy, reward=6.739! [2023-03-08 17:37:01,632][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3873.9). Total num frames: 1544192. Throughput: 0: 1007.4. Samples: 385608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:01,634][01803] Avg episode reward: [(0, '6.416')] [2023-03-08 17:37:04,099][14312] Updated weights for policy 0, policy_version 380 (0.0019) [2023-03-08 17:37:06,632][01803] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 1568768. Throughput: 0: 1026.8. Samples: 389324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:06,637][01803] Avg episode reward: [(0, '7.352')] [2023-03-08 17:37:06,647][14299] Saving new best policy, reward=7.352! [2023-03-08 17:37:11,633][01803] Fps is (10 sec: 4095.5, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 1585152. Throughput: 0: 1001.8. Samples: 395560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:37:11,635][01803] Avg episode reward: [(0, '8.027')] [2023-03-08 17:37:11,641][14299] Saving new best policy, reward=8.027! [2023-03-08 17:37:15,565][14312] Updated weights for policy 0, policy_version 390 (0.0016) [2023-03-08 17:37:16,633][01803] Fps is (10 sec: 2867.0, 60 sec: 3959.5, 300 sec: 3859.9). Total num frames: 1597440. Throughput: 0: 972.2. Samples: 400150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:37:16,637][01803] Avg episode reward: [(0, '9.141')] [2023-03-08 17:37:16,723][14299] Saving new best policy, reward=9.141! [2023-03-08 17:37:21,632][01803] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1622016. Throughput: 0: 990.3. Samples: 403258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-03-08 17:37:21,633][01803] Avg episode reward: [(0, '8.854')] [2023-03-08 17:37:24,455][14312] Updated weights for policy 0, policy_version 400 (0.0012) [2023-03-08 17:37:26,632][01803] Fps is (10 sec: 4915.7, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1646592. Throughput: 0: 1029.0. Samples: 410638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:26,639][01803] Avg episode reward: [(0, '8.292')] [2023-03-08 17:37:31,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1662976. Throughput: 0: 993.2. Samples: 416414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:37:31,638][01803] Avg episode reward: [(0, '8.242')] [2023-03-08 17:37:36,060][14312] Updated weights for policy 0, policy_version 410 (0.0014) [2023-03-08 17:37:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1679360. Throughput: 0: 974.3. Samples: 418648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:36,634][01803] Avg episode reward: [(0, '9.113')] [2023-03-08 17:37:36,650][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth... [2023-03-08 17:37:36,780][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth [2023-03-08 17:37:41,632][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 1703936. Throughput: 0: 1004.1. Samples: 424582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:41,634][01803] Avg episode reward: [(0, '9.291')] [2023-03-08 17:37:41,636][14299] Saving new best policy, reward=9.291! [2023-03-08 17:37:44,749][14312] Updated weights for policy 0, policy_version 420 (0.0012) [2023-03-08 17:37:46,632][01803] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1728512. Throughput: 0: 1026.8. Samples: 431814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:37:46,639][01803] Avg episode reward: [(0, '9.853')] [2023-03-08 17:37:46,647][14299] Saving new best policy, reward=9.853! [2023-03-08 17:37:51,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1740800. Throughput: 0: 1006.2. Samples: 434602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:37:51,638][01803] Avg episode reward: [(0, '9.821')] [2023-03-08 17:37:56,632][01803] Fps is (10 sec: 2867.3, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1757184. Throughput: 0: 967.2. Samples: 439082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:37:56,638][01803] Avg episode reward: [(0, '9.674')] [2023-03-08 17:37:56,717][14312] Updated weights for policy 0, policy_version 430 (0.0049) [2023-03-08 17:38:01,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1781760. Throughput: 0: 1009.3. Samples: 445566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:38:01,638][01803] Avg episode reward: [(0, '11.164')] [2023-03-08 17:38:01,641][14299] Saving new best policy, reward=11.164! [2023-03-08 17:38:05,360][14312] Updated weights for policy 0, policy_version 440 (0.0012) [2023-03-08 17:38:06,632][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1806336. Throughput: 0: 1019.6. Samples: 449138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:06,639][01803] Avg episode reward: [(0, '10.808')] [2023-03-08 17:38:11,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1822720. Throughput: 0: 989.6. Samples: 455170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:11,634][01803] Avg episode reward: [(0, '11.039')] [2023-03-08 17:38:16,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1835008. Throughput: 0: 960.4. Samples: 459632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:38:16,636][01803] Avg episode reward: [(0, '10.798')] [2023-03-08 17:38:17,458][14312] Updated weights for policy 0, policy_version 450 (0.0017) [2023-03-08 17:38:21,632][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 1863680. Throughput: 0: 986.4. Samples: 463034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:21,639][01803] Avg episode reward: [(0, '10.754')] [2023-03-08 17:38:25,685][14312] Updated weights for policy 0, policy_version 460 (0.0015) [2023-03-08 17:38:26,632][01803] Fps is (10 sec: 5324.8, 60 sec: 4027.7, 300 sec: 3929.5). Total num frames: 1888256. Throughput: 0: 1020.6. Samples: 470508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:26,634][01803] Avg episode reward: [(0, '10.554')] [2023-03-08 17:38:31,633][01803] Fps is (10 sec: 3686.0, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1900544. Throughput: 0: 979.3. Samples: 475882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:38:31,635][01803] Avg episode reward: [(0, '10.510')] [2023-03-08 17:38:36,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1916928. Throughput: 0: 968.9. Samples: 478202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:38:36,638][01803] Avg episode reward: [(0, '11.036')] [2023-03-08 17:38:37,666][14312] Updated weights for policy 0, policy_version 470 (0.0021) [2023-03-08 17:38:41,632][01803] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 1941504. Throughput: 0: 1008.4. Samples: 484462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:41,639][01803] Avg episode reward: [(0, '11.771')] [2023-03-08 17:38:41,644][14299] Saving new best policy, reward=11.771! [2023-03-08 17:38:45,938][14312] Updated weights for policy 0, policy_version 480 (0.0014) [2023-03-08 17:38:46,632][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1966080. Throughput: 0: 1026.2. Samples: 491746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:38:46,636][01803] Avg episode reward: [(0, '11.554')] [2023-03-08 17:38:51,632][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1982464. Throughput: 0: 1004.8. Samples: 494354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:38:51,637][01803] Avg episode reward: [(0, '11.248')] [2023-03-08 17:38:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 1998848. Throughput: 0: 974.0. Samples: 499000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:38:56,634][01803] Avg episode reward: [(0, '11.488')] [2023-03-08 17:38:58,051][14312] Updated weights for policy 0, policy_version 490 (0.0025) [2023-03-08 17:39:01,632][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2023424. Throughput: 0: 1016.2. Samples: 505360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:39:01,634][01803] Avg episode reward: [(0, '11.670')] [2023-03-08 17:39:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2043904. Throughput: 0: 1020.3. Samples: 508948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:39:06,636][01803] Avg episode reward: [(0, '11.983')] [2023-03-08 17:39:06,651][14299] Saving new best policy, reward=11.983! [2023-03-08 17:39:06,907][14312] Updated weights for policy 0, policy_version 500 (0.0016) [2023-03-08 17:39:11,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2060288. Throughput: 0: 980.5. Samples: 514632. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:39:11,639][01803] Avg episode reward: [(0, '12.894')] [2023-03-08 17:39:11,645][14299] Saving new best policy, reward=12.894! [2023-03-08 17:39:16,632][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2076672. Throughput: 0: 959.4. Samples: 519056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:39:16,634][01803] Avg episode reward: [(0, '13.318')] [2023-03-08 17:39:16,647][14299] Saving new best policy, reward=13.318! [2023-03-08 17:39:19,334][14312] Updated weights for policy 0, policy_version 510 (0.0017) [2023-03-08 17:39:21,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2097152. Throughput: 0: 977.5. Samples: 522188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:39:21,634][01803] Avg episode reward: [(0, '14.058')] [2023-03-08 17:39:21,636][14299] Saving new best policy, reward=14.058! [2023-03-08 17:39:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.6). Total num frames: 2121728. Throughput: 0: 992.3. Samples: 529114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:39:26,634][01803] Avg episode reward: [(0, '14.750')] [2023-03-08 17:39:26,647][14299] Saving new best policy, reward=14.750! [2023-03-08 17:39:29,145][14312] Updated weights for policy 0, policy_version 520 (0.0019) [2023-03-08 17:39:31,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 2134016. Throughput: 0: 942.0. Samples: 534136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:39:31,635][01803] Avg episode reward: [(0, '15.755')] [2023-03-08 17:39:31,639][14299] Saving new best policy, reward=15.755! [2023-03-08 17:39:36,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2150400. Throughput: 0: 934.9. Samples: 536426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:39:36,637][01803] Avg episode reward: [(0, '16.426')] [2023-03-08 17:39:36,649][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... [2023-03-08 17:39:36,753][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth [2023-03-08 17:39:36,771][14299] Saving new best policy, reward=16.426! [2023-03-08 17:39:40,145][14312] Updated weights for policy 0, policy_version 530 (0.0036) [2023-03-08 17:39:41,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2174976. Throughput: 0: 971.9. Samples: 542736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:39:41,634][01803] Avg episode reward: [(0, '17.096')] [2023-03-08 17:39:41,637][14299] Saving new best policy, reward=17.096! [2023-03-08 17:39:46,632][01803] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3929.5). Total num frames: 2199552. Throughput: 0: 992.4. Samples: 550020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:39:46,638][01803] Avg episode reward: [(0, '17.102')] [2023-03-08 17:39:46,648][14299] Saving new best policy, reward=17.102! [2023-03-08 17:39:50,348][14312] Updated weights for policy 0, policy_version 540 (0.0011) [2023-03-08 17:39:51,635][01803] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3901.6). Total num frames: 2211840. Throughput: 0: 966.5. Samples: 552446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:39:51,638][01803] Avg episode reward: [(0, '16.151')] [2023-03-08 17:39:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2232320. Throughput: 0: 943.5. Samples: 557090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:39:56,634][01803] Avg episode reward: [(0, '15.591')] [2023-03-08 17:40:00,545][14312] Updated weights for policy 0, policy_version 550 (0.0021) [2023-03-08 17:40:01,632][01803] Fps is (10 sec: 4507.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2256896. Throughput: 0: 1000.4. Samples: 564072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:40:01,634][01803] Avg episode reward: [(0, '15.663')] [2023-03-08 17:40:06,632][01803] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3915.6). Total num frames: 2277376. Throughput: 0: 1012.7. Samples: 567762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:40:06,635][01803] Avg episode reward: [(0, '17.483')] [2023-03-08 17:40:06,645][14299] Saving new best policy, reward=17.483! [2023-03-08 17:40:11,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2289664. Throughput: 0: 952.5. Samples: 571976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:40:11,639][01803] Avg episode reward: [(0, '18.162')] [2023-03-08 17:40:11,644][14299] Saving new best policy, reward=18.162! [2023-03-08 17:40:13,218][14312] Updated weights for policy 0, policy_version 560 (0.0024) [2023-03-08 17:40:16,632][01803] Fps is (10 sec: 2457.7, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2301952. Throughput: 0: 919.8. Samples: 575528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:40:16,634][01803] Avg episode reward: [(0, '19.444')] [2023-03-08 17:40:16,650][14299] Saving new best policy, reward=19.444! [2023-03-08 17:40:21,632][01803] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 2318336. Throughput: 0: 913.7. Samples: 577542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:40:21,634][01803] Avg episode reward: [(0, '20.106')] [2023-03-08 17:40:21,637][14299] Saving new best policy, reward=20.106! [2023-03-08 17:40:24,689][14312] Updated weights for policy 0, policy_version 570 (0.0025) [2023-03-08 17:40:26,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 2342912. Throughput: 0: 921.7. Samples: 584214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:40:26,634][01803] Avg episode reward: [(0, '22.384')] [2023-03-08 17:40:26,649][14299] Saving new best policy, reward=22.384! [2023-03-08 17:40:31,632][01803] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2363392. Throughput: 0: 911.5. Samples: 591038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:40:31,637][01803] Avg episode reward: [(0, '21.194')] [2023-03-08 17:40:35,278][14312] Updated weights for policy 0, policy_version 580 (0.0019) [2023-03-08 17:40:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2375680. Throughput: 0: 908.2. Samples: 593310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:40:36,639][01803] Avg episode reward: [(0, '19.972')] [2023-03-08 17:40:41,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 2396160. Throughput: 0: 907.5. Samples: 597926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:40:41,634][01803] Avg episode reward: [(0, '18.725')] [2023-03-08 17:40:45,439][14312] Updated weights for policy 0, policy_version 590 (0.0020) [2023-03-08 17:40:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 2420736. Throughput: 0: 912.3. Samples: 605126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:40:46,637][01803] Avg episode reward: [(0, '15.983')] [2023-03-08 17:40:51,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3823.2, 300 sec: 3929.4). Total num frames: 2441216. Throughput: 0: 910.2. Samples: 608722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:40:51,639][01803] Avg episode reward: [(0, '16.734')] [2023-03-08 17:40:56,219][14312] Updated weights for policy 0, policy_version 600 (0.0012) [2023-03-08 17:40:56,635][01803] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 2457600. Throughput: 0: 929.4. Samples: 613802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:40:56,639][01803] Avg episode reward: [(0, '17.007')] [2023-03-08 17:41:01,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 2478080. Throughput: 0: 968.5. Samples: 619110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:41:01,634][01803] Avg episode reward: [(0, '18.184')] [2023-03-08 17:41:05,762][14312] Updated weights for policy 0, policy_version 610 (0.0012) [2023-03-08 17:41:06,632][01803] Fps is (10 sec: 4507.0, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 2502656. Throughput: 0: 1004.4. Samples: 622742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:41:06,634][01803] Avg episode reward: [(0, '19.024')] [2023-03-08 17:41:11,637][01803] Fps is (10 sec: 4503.4, 60 sec: 3890.9, 300 sec: 3943.2). Total num frames: 2523136. Throughput: 0: 1013.8. Samples: 629842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:41:11,643][01803] Avg episode reward: [(0, '18.547')] [2023-03-08 17:41:16,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2535424. Throughput: 0: 966.3. Samples: 634522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:41:16,634][01803] Avg episode reward: [(0, '17.323')] [2023-03-08 17:41:16,881][14312] Updated weights for policy 0, policy_version 620 (0.0021) [2023-03-08 17:41:21,632][01803] Fps is (10 sec: 3278.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2555904. Throughput: 0: 968.7. Samples: 636900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:41:21,639][01803] Avg episode reward: [(0, '16.795')] [2023-03-08 17:41:25,975][14312] Updated weights for policy 0, policy_version 630 (0.0016) [2023-03-08 17:41:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2580480. Throughput: 0: 1025.2. Samples: 644058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:41:26,640][01803] Avg episode reward: [(0, '17.114')] [2023-03-08 17:41:31,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2600960. Throughput: 0: 1014.7. Samples: 650786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:41:31,637][01803] Avg episode reward: [(0, '17.429')] [2023-03-08 17:41:36,632][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2617344. Throughput: 0: 987.1. Samples: 653140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:41:36,637][01803] Avg episode reward: [(0, '17.525')] [2023-03-08 17:41:36,649][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth... [2023-03-08 17:41:36,784][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth [2023-03-08 17:41:37,392][14312] Updated weights for policy 0, policy_version 640 (0.0031) [2023-03-08 17:41:41,632][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2637824. Throughput: 0: 984.6. Samples: 658104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:41:41,634][01803] Avg episode reward: [(0, '17.964')] [2023-03-08 17:41:46,194][14312] Updated weights for policy 0, policy_version 650 (0.0013) [2023-03-08 17:41:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2662400. Throughput: 0: 1030.4. Samples: 665476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:41:46,638][01803] Avg episode reward: [(0, '18.877')] [2023-03-08 17:41:51,632][01803] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2682880. Throughput: 0: 1032.4. Samples: 669200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:41:51,634][01803] Avg episode reward: [(0, '20.195')] [2023-03-08 17:41:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 3901.6). Total num frames: 2695168. Throughput: 0: 976.5. Samples: 673780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:41:56,643][01803] Avg episode reward: [(0, '20.242')] [2023-03-08 17:41:58,159][14312] Updated weights for policy 0, policy_version 660 (0.0018) [2023-03-08 17:42:01,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2715648. Throughput: 0: 996.0. Samples: 679344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:42:01,640][01803] Avg episode reward: [(0, '21.978')] [2023-03-08 17:42:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2740224. Throughput: 0: 1021.8. Samples: 682880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:42:06,634][01803] Avg episode reward: [(0, '20.594')] [2023-03-08 17:42:06,930][14312] Updated weights for policy 0, policy_version 670 (0.0011) [2023-03-08 17:42:11,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3943.3). Total num frames: 2760704. Throughput: 0: 1010.0. Samples: 689506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:42:11,639][01803] Avg episode reward: [(0, '20.799')] [2023-03-08 17:42:16,632][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2777088. Throughput: 0: 961.4. Samples: 694048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:42:16,637][01803] Avg episode reward: [(0, '19.938')] [2023-03-08 17:42:18,922][14312] Updated weights for policy 0, policy_version 680 (0.0015) [2023-03-08 17:42:21,632][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2797568. Throughput: 0: 964.4. Samples: 696536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:42:21,640][01803] Avg episode reward: [(0, '19.493')] [2023-03-08 17:42:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2822144. Throughput: 0: 1015.2. Samples: 703790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:42:26,635][01803] Avg episode reward: [(0, '20.592')] [2023-03-08 17:42:27,449][14312] Updated weights for policy 0, policy_version 690 (0.0026) [2023-03-08 17:42:31,632][01803] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 2838528. Throughput: 0: 989.9. Samples: 710022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:42:31,635][01803] Avg episode reward: [(0, '21.346')] [2023-03-08 17:42:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2854912. Throughput: 0: 958.5. Samples: 712332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:42:36,636][01803] Avg episode reward: [(0, '22.788')] [2023-03-08 17:42:36,657][14299] Saving new best policy, reward=22.788! [2023-03-08 17:42:39,630][14312] Updated weights for policy 0, policy_version 700 (0.0036) [2023-03-08 17:42:41,632][01803] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2875392. Throughput: 0: 969.8. Samples: 717420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:42:41,636][01803] Avg episode reward: [(0, '23.106')] [2023-03-08 17:42:41,640][14299] Saving new best policy, reward=23.106! [2023-03-08 17:42:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2899968. Throughput: 0: 1004.2. Samples: 724534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:42:46,634][01803] Avg episode reward: [(0, '23.404')] [2023-03-08 17:42:46,643][14299] Saving new best policy, reward=23.404! [2023-03-08 17:42:48,365][14312] Updated weights for policy 0, policy_version 710 (0.0019) [2023-03-08 17:42:51,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2916352. Throughput: 0: 998.8. Samples: 727828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:42:51,638][01803] Avg episode reward: [(0, '23.990')] [2023-03-08 17:42:51,643][14299] Saving new best policy, reward=23.990! [2023-03-08 17:42:56,632][01803] Fps is (10 sec: 2867.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2928640. Throughput: 0: 948.6. Samples: 732192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:42:56,636][01803] Avg episode reward: [(0, '23.714')] [2023-03-08 17:43:00,735][14312] Updated weights for policy 0, policy_version 720 (0.0029) [2023-03-08 17:43:01,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2953216. Throughput: 0: 971.4. Samples: 737762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:01,634][01803] Avg episode reward: [(0, '24.615')] [2023-03-08 17:43:01,636][14299] Saving new best policy, reward=24.615! [2023-03-08 17:43:06,632][01803] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2973696. Throughput: 0: 991.6. Samples: 741158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:06,639][01803] Avg episode reward: [(0, '24.215')] [2023-03-08 17:43:10,778][14312] Updated weights for policy 0, policy_version 730 (0.0016) [2023-03-08 17:43:11,636][01803] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3915.4). Total num frames: 2990080. Throughput: 0: 966.0. Samples: 747264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:43:11,637][01803] Avg episode reward: [(0, '25.408')] [2023-03-08 17:43:11,644][14299] Saving new best policy, reward=25.408! [2023-03-08 17:43:16,632][01803] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3006464. Throughput: 0: 924.5. Samples: 751624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:16,638][01803] Avg episode reward: [(0, '25.695')] [2023-03-08 17:43:16,646][14299] Saving new best policy, reward=25.695! [2023-03-08 17:43:21,632][01803] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3026944. Throughput: 0: 930.5. Samples: 754204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:43:21,634][01803] Avg episode reward: [(0, '25.244')] [2023-03-08 17:43:22,289][14312] Updated weights for policy 0, policy_version 740 (0.0021) [2023-03-08 17:43:26,632][01803] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3047424. Throughput: 0: 971.4. Samples: 761132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:43:26,634][01803] Avg episode reward: [(0, '25.093')] [2023-03-08 17:43:31,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3063808. Throughput: 0: 934.6. Samples: 766592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:31,639][01803] Avg episode reward: [(0, '25.762')] [2023-03-08 17:43:31,646][14299] Saving new best policy, reward=25.762! [2023-03-08 17:43:33,404][14312] Updated weights for policy 0, policy_version 750 (0.0011) [2023-03-08 17:43:36,632][01803] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3080192. Throughput: 0: 909.2. Samples: 768742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:36,642][01803] Avg episode reward: [(0, '26.871')] [2023-03-08 17:43:36,663][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth... [2023-03-08 17:43:36,842][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth [2023-03-08 17:43:36,857][14299] Saving new best policy, reward=26.871! [2023-03-08 17:43:41,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3100672. Throughput: 0: 929.6. Samples: 774022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:43:41,634][01803] Avg episode reward: [(0, '26.575')] [2023-03-08 17:43:43,923][14312] Updated weights for policy 0, policy_version 760 (0.0014) [2023-03-08 17:43:46,632][01803] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3125248. Throughput: 0: 957.1. Samples: 780832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:43:46,634][01803] Avg episode reward: [(0, '26.520')] [2023-03-08 17:43:51,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3137536. Throughput: 0: 947.4. Samples: 783792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:43:51,634][01803] Avg episode reward: [(0, '27.213')] [2023-03-08 17:43:51,636][14299] Saving new best policy, reward=27.213! [2023-03-08 17:43:56,167][14312] Updated weights for policy 0, policy_version 770 (0.0029) [2023-03-08 17:43:56,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3153920. Throughput: 0: 908.0. Samples: 788122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:43:56,641][01803] Avg episode reward: [(0, '26.713')] [2023-03-08 17:44:01,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3174400. Throughput: 0: 942.9. Samples: 794052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:44:01,634][01803] Avg episode reward: [(0, '26.957')] [2023-03-08 17:44:05,000][14312] Updated weights for policy 0, policy_version 780 (0.0013) [2023-03-08 17:44:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3198976. Throughput: 0: 965.6. Samples: 797658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:44:06,634][01803] Avg episode reward: [(0, '25.927')] [2023-03-08 17:44:11,635][01803] Fps is (10 sec: 4094.8, 60 sec: 3754.7, 300 sec: 3859.9). Total num frames: 3215360. Throughput: 0: 945.7. Samples: 803692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:44:11,637][01803] Avg episode reward: [(0, '25.148')] [2023-03-08 17:44:16,634][01803] Fps is (10 sec: 3276.2, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 3231744. Throughput: 0: 920.9. Samples: 808032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:44:16,637][01803] Avg episode reward: [(0, '24.827')] [2023-03-08 17:44:17,681][14312] Updated weights for policy 0, policy_version 790 (0.0018) [2023-03-08 17:44:21,632][01803] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3252224. Throughput: 0: 935.1. Samples: 810820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:44:21,639][01803] Avg episode reward: [(0, '24.368')] [2023-03-08 17:44:26,562][14312] Updated weights for policy 0, policy_version 800 (0.0012) [2023-03-08 17:44:26,632][01803] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3276800. Throughput: 0: 970.3. Samples: 817684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:44:26,634][01803] Avg episode reward: [(0, '22.145')] [2023-03-08 17:44:31,634][01803] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3859.9). Total num frames: 3289088. Throughput: 0: 928.9. Samples: 822636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:44:31,641][01803] Avg episode reward: [(0, '22.096')] [2023-03-08 17:44:36,635][01803] Fps is (10 sec: 2047.4, 60 sec: 3618.0, 300 sec: 3804.4). Total num frames: 3297280. Throughput: 0: 898.8. Samples: 824242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:44:36,637][01803] Avg episode reward: [(0, '22.506')] [2023-03-08 17:44:41,633][01803] Fps is (10 sec: 2048.1, 60 sec: 3481.5, 300 sec: 3762.7). Total num frames: 3309568. Throughput: 0: 874.3. Samples: 827468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:44:41,636][01803] Avg episode reward: [(0, '21.916')] [2023-03-08 17:44:42,992][14312] Updated weights for policy 0, policy_version 810 (0.0024) [2023-03-08 17:44:46,632][01803] Fps is (10 sec: 3687.5, 60 sec: 3481.6, 300 sec: 3804.5). Total num frames: 3334144. Throughput: 0: 868.4. Samples: 833132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:44:46,634][01803] Avg episode reward: [(0, '22.337')] [2023-03-08 17:44:51,632][01803] Fps is (10 sec: 4506.1, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 3354624. Throughput: 0: 864.4. Samples: 836558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:44:51,643][01803] Avg episode reward: [(0, '22.846')] [2023-03-08 17:44:52,851][14312] Updated weights for policy 0, policy_version 820 (0.0011) [2023-03-08 17:44:56,637][01803] Fps is (10 sec: 3275.2, 60 sec: 3549.6, 300 sec: 3762.7). Total num frames: 3366912. Throughput: 0: 841.9. Samples: 841578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:44:56,639][01803] Avg episode reward: [(0, '24.090')] [2023-03-08 17:45:01,632][01803] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3748.9). Total num frames: 3383296. Throughput: 0: 852.1. Samples: 846376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:45:01,635][01803] Avg episode reward: [(0, '24.652')] [2023-03-08 17:45:04,396][14312] Updated weights for policy 0, policy_version 830 (0.0029) [2023-03-08 17:45:06,632][01803] Fps is (10 sec: 4098.0, 60 sec: 3481.6, 300 sec: 3790.5). Total num frames: 3407872. Throughput: 0: 864.6. Samples: 849728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:45:06,634][01803] Avg episode reward: [(0, '22.471')] [2023-03-08 17:45:11,632][01803] Fps is (10 sec: 4505.7, 60 sec: 3550.0, 300 sec: 3818.3). Total num frames: 3428352. Throughput: 0: 864.0. Samples: 856562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:45:11,643][01803] Avg episode reward: [(0, '21.988')] [2023-03-08 17:45:15,163][14312] Updated weights for policy 0, policy_version 840 (0.0023) [2023-03-08 17:45:16,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3818.3). Total num frames: 3444736. Throughput: 0: 853.2. Samples: 861030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:45:16,636][01803] Avg episode reward: [(0, '20.706')] [2023-03-08 17:45:21,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3790.5). Total num frames: 3461120. Throughput: 0: 866.5. Samples: 863232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:45:21,638][01803] Avg episode reward: [(0, '21.138')] [2023-03-08 17:45:25,870][14312] Updated weights for policy 0, policy_version 850 (0.0015) [2023-03-08 17:45:26,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3790.5). Total num frames: 3481600. Throughput: 0: 936.8. Samples: 869622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:45:26,634][01803] Avg episode reward: [(0, '22.495')] [2023-03-08 17:45:31,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3818.3). Total num frames: 3502080. Throughput: 0: 954.6. Samples: 876090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:45:31,639][01803] Avg episode reward: [(0, '23.472')] [2023-03-08 17:45:36,634][01803] Fps is (10 sec: 3685.6, 60 sec: 3686.5, 300 sec: 3804.4). Total num frames: 3518464. Throughput: 0: 926.0. Samples: 878230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:45:36,636][01803] Avg episode reward: [(0, '24.015')] [2023-03-08 17:45:36,652][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth... [2023-03-08 17:45:36,806][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth [2023-03-08 17:45:37,718][14312] Updated weights for policy 0, policy_version 860 (0.0021) [2023-03-08 17:45:41,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3534848. Throughput: 0: 913.9. Samples: 882700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:45:41,634][01803] Avg episode reward: [(0, '23.321')] [2023-03-08 17:45:46,632][01803] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3559424. Throughput: 0: 956.8. Samples: 889434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:45:46,634][01803] Avg episode reward: [(0, '23.752')] [2023-03-08 17:45:47,405][14312] Updated weights for policy 0, policy_version 870 (0.0018) [2023-03-08 17:45:51,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 3575808. Throughput: 0: 961.6. Samples: 893002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:45:51,642][01803] Avg episode reward: [(0, '25.676')] [2023-03-08 17:45:56,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3776.6). Total num frames: 3592192. Throughput: 0: 916.8. Samples: 897818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:45:56,638][01803] Avg episode reward: [(0, '24.536')] [2023-03-08 17:45:59,579][14312] Updated weights for policy 0, policy_version 880 (0.0018) [2023-03-08 17:46:01,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3612672. Throughput: 0: 934.9. Samples: 903100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:46:01,634][01803] Avg episode reward: [(0, '24.297')] [2023-03-08 17:46:06,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3637248. Throughput: 0: 966.3. Samples: 906714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:46:06,634][01803] Avg episode reward: [(0, '25.430')] [2023-03-08 17:46:07,908][14312] Updated weights for policy 0, policy_version 890 (0.0015) [2023-03-08 17:46:11,633][01803] Fps is (10 sec: 4505.1, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3657728. Throughput: 0: 984.9. Samples: 913944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:46:11,637][01803] Avg episode reward: [(0, '26.804')] [2023-03-08 17:46:16,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3674112. Throughput: 0: 944.3. Samples: 918584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:46:16,636][01803] Avg episode reward: [(0, '26.355')] [2023-03-08 17:46:19,797][14312] Updated weights for policy 0, policy_version 900 (0.0034) [2023-03-08 17:46:21,632][01803] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3694592. Throughput: 0: 948.6. Samples: 920914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:46:21,634][01803] Avg episode reward: [(0, '25.300')] [2023-03-08 17:46:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3719168. Throughput: 0: 1011.2. Samples: 928206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:46:26,638][01803] Avg episode reward: [(0, '24.645')] [2023-03-08 17:46:28,150][14312] Updated weights for policy 0, policy_version 910 (0.0019) [2023-03-08 17:46:31,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3739648. Throughput: 0: 1007.8. Samples: 934784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:46:31,634][01803] Avg episode reward: [(0, '25.152')] [2023-03-08 17:46:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3776.7). Total num frames: 3751936. Throughput: 0: 978.0. Samples: 937014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:46:36,634][01803] Avg episode reward: [(0, '23.793')] [2023-03-08 17:46:40,165][14312] Updated weights for policy 0, policy_version 920 (0.0015) [2023-03-08 17:46:41,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 3772416. Throughput: 0: 984.7. Samples: 942128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:46:41,634][01803] Avg episode reward: [(0, '24.658')] [2023-03-08 17:46:46,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 3796992. Throughput: 0: 1031.9. Samples: 949534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:46:46,634][01803] Avg episode reward: [(0, '24.406')] [2023-03-08 17:46:48,505][14312] Updated weights for policy 0, policy_version 930 (0.0013) [2023-03-08 17:46:51,633][01803] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3804.4). Total num frames: 3817472. Throughput: 0: 1032.1. Samples: 953160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:46:51,638][01803] Avg episode reward: [(0, '26.013')] [2023-03-08 17:46:56,632][01803] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 3833856. Throughput: 0: 975.3. Samples: 957834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:46:56,637][01803] Avg episode reward: [(0, '25.169')] [2023-03-08 17:47:00,474][14312] Updated weights for policy 0, policy_version 940 (0.0029) [2023-03-08 17:47:01,632][01803] Fps is (10 sec: 3687.1, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 3854336. Throughput: 0: 999.4. Samples: 963556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:47:01,633][01803] Avg episode reward: [(0, '24.824')] [2023-03-08 17:47:06,632][01803] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 3878912. Throughput: 0: 1028.6. Samples: 967202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:47:06,634][01803] Avg episode reward: [(0, '24.769')] [2023-03-08 17:47:09,309][14312] Updated weights for policy 0, policy_version 950 (0.0038) [2023-03-08 17:47:11,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3895296. Throughput: 0: 1012.6. Samples: 973772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:47:11,634][01803] Avg episode reward: [(0, '25.368')] [2023-03-08 17:47:16,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3776.6). Total num frames: 3911680. Throughput: 0: 969.9. Samples: 978428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:47:16,636][01803] Avg episode reward: [(0, '25.130')] [2023-03-08 17:47:20,971][14312] Updated weights for policy 0, policy_version 960 (0.0038) [2023-03-08 17:47:21,632][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 3932160. Throughput: 0: 974.9. Samples: 980886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:47:21,634][01803] Avg episode reward: [(0, '25.607')] [2023-03-08 17:47:26,632][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 3956736. Throughput: 0: 1017.2. Samples: 987902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:47:26,634][01803] Avg episode reward: [(0, '26.974')] [2023-03-08 17:47:30,642][14312] Updated weights for policy 0, policy_version 970 (0.0018) [2023-03-08 17:47:31,632][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3973120. Throughput: 0: 984.8. Samples: 993848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:47:31,634][01803] Avg episode reward: [(0, '26.559')] [2023-03-08 17:47:36,632][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3776.6). Total num frames: 3989504. Throughput: 0: 953.1. Samples: 996048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:47:36,637][01803] Avg episode reward: [(0, '27.978')] [2023-03-08 17:47:36,650][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth... [2023-03-08 17:47:36,819][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth [2023-03-08 17:47:36,867][14299] Saving new best policy, reward=27.978! [2023-03-08 17:47:40,456][14299] Stopping Batcher_0... [2023-03-08 17:47:40,456][14299] Loop batcher_evt_loop terminating... [2023-03-08 17:47:40,457][01803] Component Batcher_0 stopped! [2023-03-08 17:47:40,466][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-08 17:47:40,503][01803] Component RolloutWorker_w5 stopped! [2023-03-08 17:47:40,509][14318] Stopping RolloutWorker_w4... [2023-03-08 17:47:40,510][01803] Component RolloutWorker_w4 stopped! [2023-03-08 17:47:40,502][14319] Stopping RolloutWorker_w5... [2023-03-08 17:47:40,515][14320] Stopping RolloutWorker_w6... [2023-03-08 17:47:40,516][01803] Component RolloutWorker_w6 stopped! [2023-03-08 17:47:40,526][14320] Loop rollout_proc6_evt_loop terminating... [2023-03-08 17:47:40,510][14318] Loop rollout_proc4_evt_loop terminating... [2023-03-08 17:47:40,529][14312] Weights refcount: 2 0 [2023-03-08 17:47:40,515][14319] Loop rollout_proc5_evt_loop terminating... [2023-03-08 17:47:40,534][14313] Stopping RolloutWorker_w0... [2023-03-08 17:47:40,538][14315] Stopping RolloutWorker_w1... [2023-03-08 17:47:40,542][14315] Loop rollout_proc1_evt_loop terminating... [2023-03-08 17:47:40,535][01803] Component RolloutWorker_w0 stopped! [2023-03-08 17:47:40,536][14312] Stopping InferenceWorker_p0-w0... [2023-03-08 17:47:40,543][01803] Component InferenceWorker_p0-w0 stopped! [2023-03-08 17:47:40,543][14312] Loop inference_proc0-0_evt_loop terminating... [2023-03-08 17:47:40,545][01803] Component RolloutWorker_w1 stopped! [2023-03-08 17:47:40,554][14313] Loop rollout_proc0_evt_loop terminating... [2023-03-08 17:47:40,575][01803] Component RolloutWorker_w3 stopped! [2023-03-08 17:47:40,575][14317] Stopping RolloutWorker_w3... [2023-03-08 17:47:40,593][14317] Loop rollout_proc3_evt_loop terminating... [2023-03-08 17:47:40,598][01803] Component RolloutWorker_w2 stopped! [2023-03-08 17:47:40,604][14316] Stopping RolloutWorker_w2... [2023-03-08 17:47:40,609][14316] Loop rollout_proc2_evt_loop terminating... [2023-03-08 17:47:40,622][14321] Stopping RolloutWorker_w7... [2023-03-08 17:47:40,623][14321] Loop rollout_proc7_evt_loop terminating... [2023-03-08 17:47:40,622][01803] Component RolloutWorker_w7 stopped! [2023-03-08 17:47:40,693][14299] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000859_3518464.pth [2023-03-08 17:47:40,707][14299] Saving new best policy, reward=28.480! [2023-03-08 17:47:40,889][14299] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-08 17:47:41,076][01803] Component LearnerWorker_p0 stopped! [2023-03-08 17:47:41,084][01803] Waiting for process learner_proc0 to stop... [2023-03-08 17:47:41,091][14299] Stopping LearnerWorker_p0... [2023-03-08 17:47:41,091][14299] Loop learner_proc0_evt_loop terminating... [2023-03-08 17:47:42,891][01803] Waiting for process inference_proc0-0 to join... [2023-03-08 17:47:43,197][01803] Waiting for process rollout_proc0 to join... [2023-03-08 17:47:43,468][01803] Waiting for process rollout_proc1 to join... [2023-03-08 17:47:43,470][01803] Waiting for process rollout_proc2 to join... [2023-03-08 17:47:43,477][01803] Waiting for process rollout_proc3 to join... [2023-03-08 17:47:43,479][01803] Waiting for process rollout_proc4 to join... [2023-03-08 17:47:43,480][01803] Waiting for process rollout_proc5 to join... [2023-03-08 17:47:43,481][01803] Waiting for process rollout_proc6 to join... [2023-03-08 17:47:43,488][01803] Waiting for process rollout_proc7 to join... [2023-03-08 17:47:43,489][01803] Batcher 0 profile tree view: batching: 26.4925, releasing_batches: 0.0216 [2023-03-08 17:47:43,490][01803] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0010 wait_policy_total: 520.3305 update_model: 7.7741 weight_update: 0.0013 one_step: 0.0211 handle_policy_step: 488.6264 deserialize: 14.4025, stack: 2.8695, obs_to_device_normalize: 110.6651, forward: 232.4354, send_messages: 24.5485 prepare_outputs: 79.2720 to_cpu: 50.2653 [2023-03-08 17:47:43,491][01803] Learner 0 profile tree view: misc: 0.0069, prepare_batch: 15.8480 train: 75.4999 epoch_init: 0.0058, minibatch_init: 0.0082, losses_postprocess: 0.5692, kl_divergence: 0.5984, after_optimizer: 33.1467 calculate_losses: 26.5520 losses_init: 0.0034, forward_head: 1.7730, bptt_initial: 17.6733, tail: 0.9787, advantages_returns: 0.2769, losses: 3.4019 bptt: 2.1712 bptt_forward_core: 2.0961 update: 14.0696 clip: 1.3618 [2023-03-08 17:47:43,492][01803] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3854, enqueue_policy_requests: 137.2475, env_step: 796.2052, overhead: 19.3249, complete_rollouts: 6.4627 save_policy_outputs: 19.2300 split_output_tensors: 8.9500 [2023-03-08 17:47:43,493][01803] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3243, enqueue_policy_requests: 139.1183, env_step: 794.8116, overhead: 19.3452, complete_rollouts: 6.4954 save_policy_outputs: 19.1067 split_output_tensors: 8.9948 [2023-03-08 17:47:43,496][01803] Loop Runner_EvtLoop terminating... [2023-03-08 17:47:43,497][01803] Runner profile tree view: main_loop: 1081.3623 [2023-03-08 17:47:43,499][01803] Collected {0: 4005888}, FPS: 3704.5 [2023-03-08 17:47:43,502][01803] Environment doom_basic already registered, overwriting... [2023-03-08 17:47:43,503][01803] Environment doom_two_colors_easy already registered, overwriting... [2023-03-08 17:47:43,504][01803] Environment doom_two_colors_hard already registered, overwriting... [2023-03-08 17:47:43,505][01803] Environment doom_dm already registered, overwriting... [2023-03-08 17:47:43,506][01803] Environment doom_dwango5 already registered, overwriting... [2023-03-08 17:47:43,508][01803] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-03-08 17:47:43,512][01803] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-03-08 17:47:43,513][01803] Environment doom_my_way_home already registered, overwriting... [2023-03-08 17:47:43,514][01803] Environment doom_deadly_corridor already registered, overwriting... [2023-03-08 17:47:43,515][01803] Environment doom_defend_the_center already registered, overwriting... [2023-03-08 17:47:43,516][01803] Environment doom_defend_the_line already registered, overwriting... [2023-03-08 17:47:43,520][01803] Environment doom_health_gathering already registered, overwriting... [2023-03-08 17:47:43,521][01803] Environment doom_health_gathering_supreme already registered, overwriting... [2023-03-08 17:47:43,521][01803] Environment doom_battle already registered, overwriting... [2023-03-08 17:47:43,522][01803] Environment doom_battle2 already registered, overwriting... [2023-03-08 17:47:43,524][01803] Environment doom_duel_bots already registered, overwriting... [2023-03-08 17:47:43,525][01803] Environment doom_deathmatch_bots already registered, overwriting... [2023-03-08 17:47:43,529][01803] Environment doom_duel already registered, overwriting... [2023-03-08 17:47:43,530][01803] Environment doom_deathmatch_full already registered, overwriting... [2023-03-08 17:47:43,530][01803] Environment doom_benchmark already registered, overwriting... [2023-03-08 17:47:43,534][01803] register_encoder_factory: [2023-03-08 17:47:43,622][01803] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-08 17:47:43,623][01803] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line [2023-03-08 17:47:43,630][01803] Experiment dir /content/train_dir/default_experiment already exists! [2023-03-08 17:47:43,632][01803] Resuming existing experiment from /content/train_dir/default_experiment... [2023-03-08 17:47:43,633][01803] Weights and Biases integration disabled [2023-03-08 17:47:43,638][01803] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-03-08 17:47:45,094][01803] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=8000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-03-08 17:47:45,096][01803] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-03-08 17:47:45,104][01803] Rollout worker 0 uses device cpu [2023-03-08 17:47:45,105][01803] Rollout worker 1 uses device cpu [2023-03-08 17:47:45,110][01803] Rollout worker 2 uses device cpu [2023-03-08 17:47:45,113][01803] Rollout worker 3 uses device cpu [2023-03-08 17:47:45,114][01803] Rollout worker 4 uses device cpu [2023-03-08 17:47:45,118][01803] Rollout worker 5 uses device cpu [2023-03-08 17:47:45,120][01803] Rollout worker 6 uses device cpu [2023-03-08 17:47:45,122][01803] Rollout worker 7 uses device cpu [2023-03-08 17:47:45,257][01803] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:47:45,259][01803] InferenceWorker_p0-w0: min num requests: 2 [2023-03-08 17:47:45,292][01803] Starting all processes... [2023-03-08 17:47:45,294][01803] Starting process learner_proc0 [2023-03-08 17:47:45,347][01803] Starting all processes... [2023-03-08 17:47:45,358][01803] Starting process inference_proc0-0 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc0 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc1 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc2 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc3 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc4 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc5 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc6 [2023-03-08 17:47:45,359][01803] Starting process rollout_proc7 [2023-03-08 17:47:56,662][21759] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:47:56,663][21759] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-03-08 17:47:56,710][21773] Worker 0 uses CPU cores [0] [2023-03-08 17:47:56,869][21778] Worker 5 uses CPU cores [1] [2023-03-08 17:47:57,000][21774] Worker 1 uses CPU cores [1] [2023-03-08 17:47:57,128][21785] Worker 6 uses CPU cores [0] [2023-03-08 17:47:57,134][21775] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:47:57,135][21775] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-03-08 17:47:57,161][21779] Worker 4 uses CPU cores [0] [2023-03-08 17:47:57,230][21777] Worker 3 uses CPU cores [1] [2023-03-08 17:47:57,304][21776] Worker 2 uses CPU cores [0] [2023-03-08 17:47:57,442][21782] Worker 7 uses CPU cores [1] [2023-03-08 17:47:57,544][21759] Num visible devices: 1 [2023-03-08 17:47:57,545][21775] Num visible devices: 1 [2023-03-08 17:47:57,555][21759] Starting seed is not provided [2023-03-08 17:47:57,555][21759] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:47:57,555][21759] Initializing actor-critic model on device cuda:0 [2023-03-08 17:47:57,556][21759] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 17:47:57,558][21759] RunningMeanStd input shape: (1,) [2023-03-08 17:47:57,570][21759] ConvEncoder: input_channels=3 [2023-03-08 17:47:57,691][21759] Conv encoder output size: 512 [2023-03-08 17:47:57,692][21759] Policy head output size: 512 [2023-03-08 17:47:57,707][21759] Created Actor Critic model with architecture: [2023-03-08 17:47:57,707][21759] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-03-08 17:48:00,067][21759] Using optimizer [2023-03-08 17:48:00,068][21759] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-03-08 17:48:00,099][21759] Loading model from checkpoint [2023-03-08 17:48:00,104][21759] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-03-08 17:48:00,104][21759] Initialized policy 0 weights for model version 978 [2023-03-08 17:48:00,111][21759] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-03-08 17:48:00,118][21759] LearnerWorker_p0 finished initialization! [2023-03-08 17:48:00,303][21775] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 17:48:00,304][21775] RunningMeanStd input shape: (1,) [2023-03-08 17:48:00,316][21775] ConvEncoder: input_channels=3 [2023-03-08 17:48:00,413][21775] Conv encoder output size: 512 [2023-03-08 17:48:00,413][21775] Policy head output size: 512 [2023-03-08 17:48:02,610][01803] Inference worker 0-0 is ready! [2023-03-08 17:48:02,612][01803] All inference workers are ready! Signal rollout workers to start! [2023-03-08 17:48:02,701][21773] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,702][21779] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,705][21776] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,707][21785] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,714][21778] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,716][21777] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,720][21774] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:02,718][21782] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 17:48:03,638][01803] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:48:03,897][21782] Decorrelating experience for 0 frames... [2023-03-08 17:48:03,899][21777] Decorrelating experience for 0 frames... [2023-03-08 17:48:03,903][21778] Decorrelating experience for 0 frames... [2023-03-08 17:48:04,143][21779] Decorrelating experience for 0 frames... [2023-03-08 17:48:04,147][21785] Decorrelating experience for 0 frames... [2023-03-08 17:48:04,154][21776] Decorrelating experience for 0 frames... [2023-03-08 17:48:04,160][21773] Decorrelating experience for 0 frames... [2023-03-08 17:48:04,606][21777] Decorrelating experience for 32 frames... [2023-03-08 17:48:05,233][21774] Decorrelating experience for 0 frames... [2023-03-08 17:48:05,250][01803] Heartbeat connected on Batcher_0 [2023-03-08 17:48:05,257][01803] Heartbeat connected on LearnerWorker_p0 [2023-03-08 17:48:05,253][21782] Decorrelating experience for 32 frames... [2023-03-08 17:48:05,310][01803] Heartbeat connected on InferenceWorker_p0-w0 [2023-03-08 17:48:05,879][21779] Decorrelating experience for 32 frames... [2023-03-08 17:48:05,871][21776] Decorrelating experience for 32 frames... [2023-03-08 17:48:05,887][21773] Decorrelating experience for 32 frames... [2023-03-08 17:48:07,151][21785] Decorrelating experience for 32 frames... [2023-03-08 17:48:07,160][21774] Decorrelating experience for 32 frames... [2023-03-08 17:48:07,165][21778] Decorrelating experience for 32 frames... [2023-03-08 17:48:07,592][21782] Decorrelating experience for 64 frames... [2023-03-08 17:48:07,912][21777] Decorrelating experience for 64 frames... [2023-03-08 17:48:08,128][21776] Decorrelating experience for 64 frames... [2023-03-08 17:48:08,130][21779] Decorrelating experience for 64 frames... [2023-03-08 17:48:08,638][01803] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:48:09,049][21773] Decorrelating experience for 64 frames... [2023-03-08 17:48:09,379][21785] Decorrelating experience for 64 frames... [2023-03-08 17:48:09,544][21774] Decorrelating experience for 64 frames... [2023-03-08 17:48:09,554][21778] Decorrelating experience for 64 frames... [2023-03-08 17:48:09,805][21776] Decorrelating experience for 96 frames... [2023-03-08 17:48:09,807][21779] Decorrelating experience for 96 frames... [2023-03-08 17:48:10,010][21782] Decorrelating experience for 96 frames... [2023-03-08 17:48:10,280][01803] Heartbeat connected on RolloutWorker_w7 [2023-03-08 17:48:10,336][01803] Heartbeat connected on RolloutWorker_w4 [2023-03-08 17:48:10,342][01803] Heartbeat connected on RolloutWorker_w2 [2023-03-08 17:48:11,414][21777] Decorrelating experience for 96 frames... [2023-03-08 17:48:11,542][21773] Decorrelating experience for 96 frames... [2023-03-08 17:48:11,718][01803] Heartbeat connected on RolloutWorker_w0 [2023-03-08 17:48:11,778][21774] Decorrelating experience for 96 frames... [2023-03-08 17:48:11,803][01803] Heartbeat connected on RolloutWorker_w3 [2023-03-08 17:48:11,949][01803] Heartbeat connected on RolloutWorker_w1 [2023-03-08 17:48:11,979][21778] Decorrelating experience for 96 frames... [2023-03-08 17:48:12,133][01803] Heartbeat connected on RolloutWorker_w5 [2023-03-08 17:48:13,638][01803] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 3.2. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-03-08 17:48:13,641][01803] Avg episode reward: [(0, '2.387')] [2023-03-08 17:48:14,213][21785] Decorrelating experience for 96 frames... [2023-03-08 17:48:14,636][01803] Heartbeat connected on RolloutWorker_w6 [2023-03-08 17:48:15,101][21759] Signal inference workers to stop experience collection... [2023-03-08 17:48:15,110][21775] InferenceWorker_p0-w0: stopping experience collection [2023-03-08 17:48:16,364][21759] Signal inference workers to resume experience collection... [2023-03-08 17:48:16,367][21775] InferenceWorker_p0-w0: resuming experience collection [2023-03-08 17:48:18,638][01803] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4018176. Throughput: 0: 203.1. Samples: 3046. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-03-08 17:48:18,641][01803] Avg episode reward: [(0, '4.977')] [2023-03-08 17:48:23,642][01803] Fps is (10 sec: 3275.7, 60 sec: 1638.1, 300 sec: 1638.1). Total num frames: 4038656. Throughput: 0: 453.4. Samples: 9070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:48:23,645][01803] Avg episode reward: [(0, '12.697')] [2023-03-08 17:48:25,778][21775] Updated weights for policy 0, policy_version 988 (0.0346) [2023-03-08 17:48:28,639][01803] Fps is (10 sec: 3276.6, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 4050944. Throughput: 0: 446.2. Samples: 11156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-03-08 17:48:28,642][01803] Avg episode reward: [(0, '16.354')] [2023-03-08 17:48:33,638][01803] Fps is (10 sec: 2868.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4067328. Throughput: 0: 516.0. Samples: 15480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-03-08 17:48:33,641][01803] Avg episode reward: [(0, '19.541')] [2023-03-08 17:48:37,488][21775] Updated weights for policy 0, policy_version 998 (0.0012) [2023-03-08 17:48:38,638][01803] Fps is (10 sec: 4096.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4091904. Throughput: 0: 634.6. Samples: 22212. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-03-08 17:48:38,640][01803] Avg episode reward: [(0, '21.474')] [2023-03-08 17:48:43,638][01803] Fps is (10 sec: 4505.6, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 4112384. Throughput: 0: 640.0. Samples: 25598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:48:43,643][01803] Avg episode reward: [(0, '23.980')] [2023-03-08 17:48:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 4124672. Throughput: 0: 669.0. Samples: 30104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:48:48,646][01803] Avg episode reward: [(0, '23.709')] [2023-03-08 17:48:49,523][21775] Updated weights for policy 0, policy_version 1008 (0.0013) [2023-03-08 17:48:53,639][01803] Fps is (10 sec: 2867.1, 60 sec: 2703.3, 300 sec: 2703.3). Total num frames: 4141056. Throughput: 0: 768.0. Samples: 34560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:48:53,642][01803] Avg episode reward: [(0, '25.920')] [2023-03-08 17:48:58,638][01803] Fps is (10 sec: 2867.2, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 4153344. Throughput: 0: 817.3. Samples: 36812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:48:58,642][01803] Avg episode reward: [(0, '26.918')] [2023-03-08 17:49:02,501][21775] Updated weights for policy 0, policy_version 1018 (0.0028) [2023-03-08 17:49:03,639][01803] Fps is (10 sec: 2867.3, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 4169728. Throughput: 0: 858.1. Samples: 41660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:49:03,644][01803] Avg episode reward: [(0, '27.166')] [2023-03-08 17:49:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 4186112. Throughput: 0: 824.9. Samples: 46188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:49:08,641][01803] Avg episode reward: [(0, '27.999')] [2023-03-08 17:49:13,639][01803] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 4202496. Throughput: 0: 829.8. Samples: 48498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:49:13,648][01803] Avg episode reward: [(0, '27.074')] [2023-03-08 17:49:14,696][21775] Updated weights for policy 0, policy_version 1028 (0.0017) [2023-03-08 17:49:18,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2949.1). Total num frames: 4227072. Throughput: 0: 884.1. Samples: 55266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:49:18,644][01803] Avg episode reward: [(0, '25.353')] [2023-03-08 17:49:23,638][01803] Fps is (10 sec: 4505.8, 60 sec: 3481.8, 300 sec: 3020.8). Total num frames: 4247552. Throughput: 0: 877.8. Samples: 61714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:49:23,643][01803] Avg episode reward: [(0, '25.897')] [2023-03-08 17:49:24,248][21775] Updated weights for policy 0, policy_version 1038 (0.0012) [2023-03-08 17:49:28,639][01803] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3035.9). Total num frames: 4263936. Throughput: 0: 851.4. Samples: 63912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:49:28,642][01803] Avg episode reward: [(0, '26.082')] [2023-03-08 17:49:33,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3094.8). Total num frames: 4284416. Throughput: 0: 861.3. Samples: 68864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:49:33,641][01803] Avg episode reward: [(0, '26.782')] [2023-03-08 17:49:35,474][21775] Updated weights for policy 0, policy_version 1048 (0.0026) [2023-03-08 17:49:38,638][01803] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3147.5). Total num frames: 4304896. Throughput: 0: 920.2. Samples: 75970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:49:38,647][01803] Avg episode reward: [(0, '26.430')] [2023-03-08 17:49:43,641][01803] Fps is (10 sec: 4095.1, 60 sec: 3549.7, 300 sec: 3194.8). Total num frames: 4325376. Throughput: 0: 946.0. Samples: 79384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:49:43,643][01803] Avg episode reward: [(0, '28.318')] [2023-03-08 17:49:43,663][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth... [2023-03-08 17:49:43,831][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth [2023-03-08 17:49:46,054][21775] Updated weights for policy 0, policy_version 1058 (0.0023) [2023-03-08 17:49:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3159.8). Total num frames: 4337664. Throughput: 0: 929.7. Samples: 83498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:49:48,641][01803] Avg episode reward: [(0, '27.095')] [2023-03-08 17:49:53,638][01803] Fps is (10 sec: 3277.5, 60 sec: 3618.2, 300 sec: 3202.3). Total num frames: 4358144. Throughput: 0: 949.2. Samples: 88900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:49:53,641][01803] Avg episode reward: [(0, '29.642')] [2023-03-08 17:49:53,651][21759] Saving new best policy, reward=29.642! [2023-03-08 17:49:57,019][21775] Updated weights for policy 0, policy_version 1068 (0.0013) [2023-03-08 17:49:58,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3241.2). Total num frames: 4378624. Throughput: 0: 970.4. Samples: 92166. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:49:58,641][01803] Avg episode reward: [(0, '29.463')] [2023-03-08 17:50:03,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 4399104. Throughput: 0: 961.0. Samples: 98510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:50:03,648][01803] Avg episode reward: [(0, '30.871')] [2023-03-08 17:50:03,657][21759] Saving new best policy, reward=30.871! [2023-03-08 17:50:08,643][01803] Fps is (10 sec: 3275.4, 60 sec: 3754.4, 300 sec: 3243.9). Total num frames: 4411392. Throughput: 0: 911.0. Samples: 102714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:50:08,645][01803] Avg episode reward: [(0, '31.317')] [2023-03-08 17:50:08,650][21759] Saving new best policy, reward=31.317! [2023-03-08 17:50:09,210][21775] Updated weights for policy 0, policy_version 1078 (0.0019) [2023-03-08 17:50:13,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3276.8). Total num frames: 4431872. Throughput: 0: 915.9. Samples: 105128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:50:13,641][01803] Avg episode reward: [(0, '30.863')] [2023-03-08 17:50:18,638][01803] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3307.1). Total num frames: 4452352. Throughput: 0: 956.8. Samples: 111918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:50:18,641][01803] Avg episode reward: [(0, '31.286')] [2023-03-08 17:50:18,900][21775] Updated weights for policy 0, policy_version 1088 (0.0017) [2023-03-08 17:50:23,642][01803] Fps is (10 sec: 4094.4, 60 sec: 3754.4, 300 sec: 3335.2). Total num frames: 4472832. Throughput: 0: 926.9. Samples: 117686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:50:23,645][01803] Avg episode reward: [(0, '31.835')] [2023-03-08 17:50:23,658][21759] Saving new best policy, reward=31.835! [2023-03-08 17:50:28,639][01803] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3305.0). Total num frames: 4485120. Throughput: 0: 899.0. Samples: 119836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:50:28,645][01803] Avg episode reward: [(0, '30.890')] [2023-03-08 17:50:31,243][21775] Updated weights for policy 0, policy_version 1098 (0.0022) [2023-03-08 17:50:33,639][01803] Fps is (10 sec: 3278.0, 60 sec: 3686.4, 300 sec: 3331.4). Total num frames: 4505600. Throughput: 0: 923.1. Samples: 125036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:50:33,643][01803] Avg episode reward: [(0, '30.749')] [2023-03-08 17:50:38,639][01803] Fps is (10 sec: 4505.5, 60 sec: 3754.6, 300 sec: 3382.5). Total num frames: 4530176. Throughput: 0: 954.4. Samples: 131848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:50:38,641][01803] Avg episode reward: [(0, '27.895')] [2023-03-08 17:50:40,193][21775] Updated weights for policy 0, policy_version 1108 (0.0012) [2023-03-08 17:50:43,638][01803] Fps is (10 sec: 4096.1, 60 sec: 3686.5, 300 sec: 3379.2). Total num frames: 4546560. Throughput: 0: 950.1. Samples: 134920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:50:43,646][01803] Avg episode reward: [(0, '27.138')] [2023-03-08 17:50:48,639][01803] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3351.3). Total num frames: 4558848. Throughput: 0: 903.1. Samples: 139152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:50:48,641][01803] Avg episode reward: [(0, '25.378')] [2023-03-08 17:50:52,772][21775] Updated weights for policy 0, policy_version 1118 (0.0016) [2023-03-08 17:50:53,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3373.2). Total num frames: 4579328. Throughput: 0: 938.6. Samples: 144948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:50:53,645][01803] Avg episode reward: [(0, '24.373')] [2023-03-08 17:50:58,638][01803] Fps is (10 sec: 4505.8, 60 sec: 3754.7, 300 sec: 3417.2). Total num frames: 4603904. Throughput: 0: 964.2. Samples: 148518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:50:58,646][01803] Avg episode reward: [(0, '24.328')] [2023-03-08 17:51:02,189][21775] Updated weights for policy 0, policy_version 1128 (0.0012) [2023-03-08 17:51:03,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3436.1). Total num frames: 4624384. Throughput: 0: 949.5. Samples: 154646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:51:03,647][01803] Avg episode reward: [(0, '24.199')] [2023-03-08 17:51:08,639][01803] Fps is (10 sec: 3276.5, 60 sec: 3754.9, 300 sec: 3409.6). Total num frames: 4636672. Throughput: 0: 920.7. Samples: 159114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:51:08,645][01803] Avg episode reward: [(0, '24.757')] [2023-03-08 17:51:13,639][01803] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3427.7). Total num frames: 4657152. Throughput: 0: 939.6. Samples: 162118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:51:13,642][01803] Avg episode reward: [(0, '25.724')] [2023-03-08 17:51:13,653][21775] Updated weights for policy 0, policy_version 1138 (0.0020) [2023-03-08 17:51:18,639][01803] Fps is (10 sec: 4506.0, 60 sec: 3822.9, 300 sec: 3465.8). Total num frames: 4681728. Throughput: 0: 984.0. Samples: 169314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:51:18,641][01803] Avg episode reward: [(0, '26.859')] [2023-03-08 17:51:23,315][21775] Updated weights for policy 0, policy_version 1148 (0.0019) [2023-03-08 17:51:23,639][01803] Fps is (10 sec: 4505.5, 60 sec: 3823.2, 300 sec: 3481.6). Total num frames: 4702208. Throughput: 0: 956.8. Samples: 174904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:51:23,641][01803] Avg episode reward: [(0, '27.970')] [2023-03-08 17:51:28,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3456.6). Total num frames: 4714496. Throughput: 0: 938.4. Samples: 177148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:51:28,647][01803] Avg episode reward: [(0, '28.221')] [2023-03-08 17:51:33,638][01803] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3491.4). Total num frames: 4739072. Throughput: 0: 972.1. Samples: 182896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:51:33,641][01803] Avg episode reward: [(0, '29.736')] [2023-03-08 17:51:34,214][21775] Updated weights for policy 0, policy_version 1158 (0.0023) [2023-03-08 17:51:38,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3524.5). Total num frames: 4763648. Throughput: 0: 1001.6. Samples: 190022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:51:38,641][01803] Avg episode reward: [(0, '29.564')] [2023-03-08 17:51:43,639][01803] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3518.8). Total num frames: 4780032. Throughput: 0: 985.4. Samples: 192862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:51:43,645][01803] Avg episode reward: [(0, '30.213')] [2023-03-08 17:51:43,657][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001167_4780032.pth... [2023-03-08 17:51:43,827][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-03-08 17:51:44,629][21775] Updated weights for policy 0, policy_version 1168 (0.0021) [2023-03-08 17:51:48,643][01803] Fps is (10 sec: 2865.9, 60 sec: 3890.9, 300 sec: 3495.2). Total num frames: 4792320. Throughput: 0: 949.4. Samples: 197372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:51:48,648][01803] Avg episode reward: [(0, '29.651')] [2023-03-08 17:51:53,638][01803] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 3526.1). Total num frames: 4816896. Throughput: 0: 983.5. Samples: 203372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:51:53,641][01803] Avg episode reward: [(0, '28.530')] [2023-03-08 17:51:55,368][21775] Updated weights for policy 0, policy_version 1178 (0.0012) [2023-03-08 17:51:58,638][01803] Fps is (10 sec: 4507.5, 60 sec: 3891.2, 300 sec: 3538.2). Total num frames: 4837376. Throughput: 0: 993.4. Samples: 206822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:51:58,646][01803] Avg episode reward: [(0, '26.692')] [2023-03-08 17:52:03,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3532.8). Total num frames: 4853760. Throughput: 0: 959.4. Samples: 212486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:03,645][01803] Avg episode reward: [(0, '25.753')] [2023-03-08 17:52:06,999][21775] Updated weights for policy 0, policy_version 1188 (0.0026) [2023-03-08 17:52:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3527.6). Total num frames: 4870144. Throughput: 0: 930.7. Samples: 216786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:52:08,645][01803] Avg episode reward: [(0, '25.219')] [2023-03-08 17:52:13,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3538.9). Total num frames: 4890624. Throughput: 0: 954.2. Samples: 220086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:13,647][01803] Avg episode reward: [(0, '25.472')] [2023-03-08 17:52:16,732][21775] Updated weights for policy 0, policy_version 1198 (0.0012) [2023-03-08 17:52:18,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3565.9). Total num frames: 4915200. Throughput: 0: 976.3. Samples: 226828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:18,648][01803] Avg episode reward: [(0, '24.632')] [2023-03-08 17:52:23,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3544.6). Total num frames: 4927488. Throughput: 0: 928.6. Samples: 231810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:23,643][01803] Avg episode reward: [(0, '23.894')] [2023-03-08 17:52:28,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3539.6). Total num frames: 4943872. Throughput: 0: 913.4. Samples: 233964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:52:28,644][01803] Avg episode reward: [(0, '22.960')] [2023-03-08 17:52:29,211][21775] Updated weights for policy 0, policy_version 1208 (0.0024) [2023-03-08 17:52:33,639][01803] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 4964352. Throughput: 0: 949.8. Samples: 240108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:52:33,641][01803] Avg episode reward: [(0, '22.955')] [2023-03-08 17:52:38,094][21775] Updated weights for policy 0, policy_version 1218 (0.0013) [2023-03-08 17:52:38,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3574.7). Total num frames: 4988928. Throughput: 0: 966.7. Samples: 246874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:52:38,647][01803] Avg episode reward: [(0, '23.513')] [2023-03-08 17:52:43,640][01803] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3569.4). Total num frames: 5005312. Throughput: 0: 940.3. Samples: 249138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:43,645][01803] Avg episode reward: [(0, '22.274')] [2023-03-08 17:52:48,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3754.9, 300 sec: 3549.9). Total num frames: 5017600. Throughput: 0: 911.2. Samples: 253490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:48,641][01803] Avg episode reward: [(0, '24.521')] [2023-03-08 17:52:50,680][21775] Updated weights for policy 0, policy_version 1228 (0.0017) [2023-03-08 17:52:53,638][01803] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3573.4). Total num frames: 5042176. Throughput: 0: 962.0. Samples: 260078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:52:53,648][01803] Avg episode reward: [(0, '25.627')] [2023-03-08 17:52:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3596.1). Total num frames: 5066752. Throughput: 0: 965.5. Samples: 263534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:52:58,644][01803] Avg episode reward: [(0, '26.439')] [2023-03-08 17:52:59,867][21775] Updated weights for policy 0, policy_version 1238 (0.0016) [2023-03-08 17:53:03,641][01803] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3637.8). Total num frames: 5079040. Throughput: 0: 934.0. Samples: 268862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:53:03,646][01803] Avg episode reward: [(0, '27.240')] [2023-03-08 17:53:08,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 5095424. Throughput: 0: 925.7. Samples: 273468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:53:08,645][01803] Avg episode reward: [(0, '28.514')] [2023-03-08 17:53:11,589][21775] Updated weights for policy 0, policy_version 1248 (0.0035) [2023-03-08 17:53:13,639][01803] Fps is (10 sec: 3687.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 5115904. Throughput: 0: 956.2. Samples: 276992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:53:13,645][01803] Avg episode reward: [(0, '28.833')] [2023-03-08 17:53:18,641][01803] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 5132288. Throughput: 0: 925.5. Samples: 281756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:53:18,643][01803] Avg episode reward: [(0, '29.201')] [2023-03-08 17:53:23,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 5144576. Throughput: 0: 854.4. Samples: 285322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:53:23,641][01803] Avg episode reward: [(0, '28.140')] [2023-03-08 17:53:26,289][21775] Updated weights for policy 0, policy_version 1258 (0.0044) [2023-03-08 17:53:28,638][01803] Fps is (10 sec: 2458.2, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 5156864. Throughput: 0: 850.3. Samples: 287398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:53:28,641][01803] Avg episode reward: [(0, '27.748')] [2023-03-08 17:53:33,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 5177344. Throughput: 0: 883.8. Samples: 293262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:53:33,641][01803] Avg episode reward: [(0, '26.986')] [2023-03-08 17:53:36,141][21775] Updated weights for policy 0, policy_version 1268 (0.0018) [2023-03-08 17:53:38,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 5201920. Throughput: 0: 894.9. Samples: 300348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:53:38,641][01803] Avg episode reward: [(0, '24.646')] [2023-03-08 17:53:43,639][01803] Fps is (10 sec: 4095.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 5218304. Throughput: 0: 878.2. Samples: 303054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:53:43,647][01803] Avg episode reward: [(0, '24.997')] [2023-03-08 17:53:43,662][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001274_5218304.pth... [2023-03-08 17:53:43,791][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001056_4325376.pth [2023-03-08 17:53:48,229][21775] Updated weights for policy 0, policy_version 1278 (0.0013) [2023-03-08 17:53:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 5234688. Throughput: 0: 855.4. Samples: 307354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:53:48,641][01803] Avg episode reward: [(0, '24.768')] [2023-03-08 17:53:53,638][01803] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 5255168. Throughput: 0: 891.4. Samples: 313580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:53:53,641][01803] Avg episode reward: [(0, '25.631')] [2023-03-08 17:53:57,551][21775] Updated weights for policy 0, policy_version 1288 (0.0012) [2023-03-08 17:53:58,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 5279744. Throughput: 0: 887.6. Samples: 316934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:53:58,641][01803] Avg episode reward: [(0, '26.276')] [2023-03-08 17:54:03,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3762.8). Total num frames: 5296128. Throughput: 0: 905.4. Samples: 322496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:54:03,647][01803] Avg episode reward: [(0, '26.960')] [2023-03-08 17:54:08,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 5308416. Throughput: 0: 924.1. Samples: 326906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:54:08,641][01803] Avg episode reward: [(0, '26.845')] [2023-03-08 17:54:10,001][21775] Updated weights for policy 0, policy_version 1298 (0.0016) [2023-03-08 17:54:13,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 5332992. Throughput: 0: 949.8. Samples: 330140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:54:13,641][01803] Avg episode reward: [(0, '26.010')] [2023-03-08 17:54:18,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 5353472. Throughput: 0: 974.4. Samples: 337108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:54:18,646][01803] Avg episode reward: [(0, '25.769')] [2023-03-08 17:54:18,907][21775] Updated weights for policy 0, policy_version 1308 (0.0015) [2023-03-08 17:54:23,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 5369856. Throughput: 0: 929.1. Samples: 342158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:54:23,643][01803] Avg episode reward: [(0, '24.085')] [2023-03-08 17:54:28,639][01803] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5386240. Throughput: 0: 918.3. Samples: 344376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:54:28,647][01803] Avg episode reward: [(0, '22.980')] [2023-03-08 17:54:31,052][21775] Updated weights for policy 0, policy_version 1318 (0.0025) [2023-03-08 17:54:33,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 5406720. Throughput: 0: 961.7. Samples: 350630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:54:33,645][01803] Avg episode reward: [(0, '23.438')] [2023-03-08 17:54:38,638][01803] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 5431296. Throughput: 0: 985.4. Samples: 357924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:54:38,644][01803] Avg episode reward: [(0, '23.440')] [2023-03-08 17:54:39,803][21775] Updated weights for policy 0, policy_version 1328 (0.0013) [2023-03-08 17:54:43,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 5447680. Throughput: 0: 964.2. Samples: 360324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:54:43,640][01803] Avg episode reward: [(0, '23.094')] [2023-03-08 17:54:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 5464064. Throughput: 0: 944.4. Samples: 364996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:54:48,642][01803] Avg episode reward: [(0, '22.774')] [2023-03-08 17:54:51,698][21775] Updated weights for policy 0, policy_version 1338 (0.0012) [2023-03-08 17:54:53,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 5488640. Throughput: 0: 994.4. Samples: 371652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:54:53,641][01803] Avg episode reward: [(0, '23.910')] [2023-03-08 17:54:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 5513216. Throughput: 0: 1001.8. Samples: 375222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:54:58,641][01803] Avg episode reward: [(0, '23.902')] [2023-03-08 17:55:01,007][21775] Updated weights for policy 0, policy_version 1348 (0.0011) [2023-03-08 17:55:03,639][01803] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 5525504. Throughput: 0: 968.9. Samples: 380708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:55:03,641][01803] Avg episode reward: [(0, '25.014')] [2023-03-08 17:55:08,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 5541888. Throughput: 0: 962.0. Samples: 385448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:55:08,641][01803] Avg episode reward: [(0, '25.743')] [2023-03-08 17:55:12,207][21775] Updated weights for policy 0, policy_version 1358 (0.0025) [2023-03-08 17:55:13,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 5566464. Throughput: 0: 992.1. Samples: 389022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:55:13,645][01803] Avg episode reward: [(0, '25.416')] [2023-03-08 17:55:18,641][01803] Fps is (10 sec: 4914.1, 60 sec: 3959.3, 300 sec: 3790.6). Total num frames: 5591040. Throughput: 0: 1016.2. Samples: 396360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:55:18,649][01803] Avg episode reward: [(0, '26.383')] [2023-03-08 17:55:21,892][21775] Updated weights for policy 0, policy_version 1368 (0.0013) [2023-03-08 17:55:23,641][01803] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3804.4). Total num frames: 5607424. Throughput: 0: 963.9. Samples: 401300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:55:23,643][01803] Avg episode reward: [(0, '26.344')] [2023-03-08 17:55:28,638][01803] Fps is (10 sec: 3277.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 5623808. Throughput: 0: 959.7. Samples: 403510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:55:28,645][01803] Avg episode reward: [(0, '26.681')] [2023-03-08 17:55:32,604][21775] Updated weights for policy 0, policy_version 1378 (0.0011) [2023-03-08 17:55:33,638][01803] Fps is (10 sec: 4097.0, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 5648384. Throughput: 0: 1003.2. Samples: 410138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:55:33,641][01803] Avg episode reward: [(0, '26.804')] [2023-03-08 17:55:38,640][01803] Fps is (10 sec: 4914.5, 60 sec: 4027.6, 300 sec: 3818.3). Total num frames: 5672960. Throughput: 0: 1017.8. Samples: 417456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:55:38,642][01803] Avg episode reward: [(0, '27.039')] [2023-03-08 17:55:42,452][21775] Updated weights for policy 0, policy_version 1388 (0.0011) [2023-03-08 17:55:43,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 5685248. Throughput: 0: 990.1. Samples: 419776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:55:43,641][01803] Avg episode reward: [(0, '28.920')] [2023-03-08 17:55:43,658][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001388_5685248.pth... [2023-03-08 17:55:43,809][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001167_4780032.pth [2023-03-08 17:55:48,638][01803] Fps is (10 sec: 2867.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 5701632. Throughput: 0: 970.7. Samples: 424390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:55:48,645][01803] Avg episode reward: [(0, '29.669')] [2023-03-08 17:55:52,971][21775] Updated weights for policy 0, policy_version 1398 (0.0015) [2023-03-08 17:55:53,640][01803] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3804.4). Total num frames: 5726208. Throughput: 0: 1024.8. Samples: 431564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:55:53,643][01803] Avg episode reward: [(0, '30.172')] [2023-03-08 17:55:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 5750784. Throughput: 0: 1025.0. Samples: 435146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:55:58,646][01803] Avg episode reward: [(0, '30.143')] [2023-03-08 17:56:03,457][21775] Updated weights for policy 0, policy_version 1408 (0.0014) [2023-03-08 17:56:03,638][01803] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 5767168. Throughput: 0: 977.7. Samples: 440354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:56:03,644][01803] Avg episode reward: [(0, '29.404')] [2023-03-08 17:56:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 5783552. Throughput: 0: 980.8. Samples: 445432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:56:08,648][01803] Avg episode reward: [(0, '29.523')] [2023-03-08 17:56:13,433][21775] Updated weights for policy 0, policy_version 1418 (0.0021) [2023-03-08 17:56:13,641][01803] Fps is (10 sec: 4094.8, 60 sec: 4027.5, 300 sec: 3818.3). Total num frames: 5808128. Throughput: 0: 1010.6. Samples: 448992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:56:13,650][01803] Avg episode reward: [(0, '28.393')] [2023-03-08 17:56:18,641][01803] Fps is (10 sec: 4504.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 5828608. Throughput: 0: 1026.2. Samples: 456320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:56:18,645][01803] Avg episode reward: [(0, '26.512')] [2023-03-08 17:56:23,639][01803] Fps is (10 sec: 3687.4, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 5844992. Throughput: 0: 964.7. Samples: 460868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:56:23,642][01803] Avg episode reward: [(0, '25.561')] [2023-03-08 17:56:24,214][21775] Updated weights for policy 0, policy_version 1428 (0.0015) [2023-03-08 17:56:28,638][01803] Fps is (10 sec: 3277.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 5861376. Throughput: 0: 965.1. Samples: 463204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:56:28,648][01803] Avg episode reward: [(0, '24.790')] [2023-03-08 17:56:33,638][01803] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 5885952. Throughput: 0: 1020.3. Samples: 470304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:56:33,641][01803] Avg episode reward: [(0, '24.494')] [2023-03-08 17:56:33,798][21775] Updated weights for policy 0, policy_version 1438 (0.0037) [2023-03-08 17:56:38,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 5910528. Throughput: 0: 1009.3. Samples: 476980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:56:38,643][01803] Avg episode reward: [(0, '25.343')] [2023-03-08 17:56:43,644][01803] Fps is (10 sec: 3684.4, 60 sec: 3959.1, 300 sec: 3832.2). Total num frames: 5922816. Throughput: 0: 980.3. Samples: 479266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:56:43,646][01803] Avg episode reward: [(0, '25.755')] [2023-03-08 17:56:44,995][21775] Updated weights for policy 0, policy_version 1448 (0.0028) [2023-03-08 17:56:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 5943296. Throughput: 0: 974.2. Samples: 484192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:56:48,644][01803] Avg episode reward: [(0, '27.173')] [2023-03-08 17:56:53,638][01803] Fps is (10 sec: 4508.0, 60 sec: 4027.9, 300 sec: 3832.2). Total num frames: 5967872. Throughput: 0: 1019.5. Samples: 491310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:56:53,644][01803] Avg episode reward: [(0, '28.578')] [2023-03-08 17:56:54,402][21775] Updated weights for policy 0, policy_version 1458 (0.0028) [2023-03-08 17:56:58,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 5988352. Throughput: 0: 1019.7. Samples: 494876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:56:58,645][01803] Avg episode reward: [(0, '28.976')] [2023-03-08 17:57:03,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 6004736. Throughput: 0: 962.8. Samples: 499642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:57:03,645][01803] Avg episode reward: [(0, '29.026')] [2023-03-08 17:57:06,213][21775] Updated weights for policy 0, policy_version 1468 (0.0012) [2023-03-08 17:57:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 6021120. Throughput: 0: 984.2. Samples: 505156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:57:08,641][01803] Avg episode reward: [(0, '29.218')] [2023-03-08 17:57:13,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3832.2). Total num frames: 6045696. Throughput: 0: 1010.9. Samples: 508696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:57:13,641][01803] Avg episode reward: [(0, '29.386')] [2023-03-08 17:57:14,866][21775] Updated weights for policy 0, policy_version 1478 (0.0013) [2023-03-08 17:57:18,639][01803] Fps is (10 sec: 4505.4, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 6066176. Throughput: 0: 1009.0. Samples: 515708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:57:18,644][01803] Avg episode reward: [(0, '28.516')] [2023-03-08 17:57:23,639][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 6082560. Throughput: 0: 961.4. Samples: 520244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:57:23,642][01803] Avg episode reward: [(0, '29.127')] [2023-03-08 17:57:26,687][21775] Updated weights for policy 0, policy_version 1488 (0.0017) [2023-03-08 17:57:28,639][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 6103040. Throughput: 0: 964.8. Samples: 522678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:57:28,641][01803] Avg episode reward: [(0, '30.079')] [2023-03-08 17:57:33,638][01803] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 6127616. Throughput: 0: 1015.9. Samples: 529906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:57:33,641][01803] Avg episode reward: [(0, '29.599')] [2023-03-08 17:57:35,856][21775] Updated weights for policy 0, policy_version 1498 (0.0014) [2023-03-08 17:57:38,642][01803] Fps is (10 sec: 3685.3, 60 sec: 3822.7, 300 sec: 3846.1). Total num frames: 6139904. Throughput: 0: 963.3. Samples: 534660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:57:38,644][01803] Avg episode reward: [(0, '28.795')] [2023-03-08 17:57:43,642][01803] Fps is (10 sec: 2456.8, 60 sec: 3823.1, 300 sec: 3846.0). Total num frames: 6152192. Throughput: 0: 923.1. Samples: 536420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:57:43,644][01803] Avg episode reward: [(0, '28.824')] [2023-03-08 17:57:43,659][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001502_6152192.pth... [2023-03-08 17:57:43,870][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001274_5218304.pth [2023-03-08 17:57:48,638][01803] Fps is (10 sec: 2458.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 6164480. Throughput: 0: 898.0. Samples: 540050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:57:48,645][01803] Avg episode reward: [(0, '28.525')] [2023-03-08 17:57:50,926][21775] Updated weights for policy 0, policy_version 1508 (0.0021) [2023-03-08 17:57:53,638][01803] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 6189056. Throughput: 0: 916.8. Samples: 546414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:57:53,641][01803] Avg episode reward: [(0, '28.486')] [2023-03-08 17:57:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6213632. Throughput: 0: 918.9. Samples: 550046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:57:58,641][01803] Avg episode reward: [(0, '28.000')] [2023-03-08 17:57:59,555][21775] Updated weights for policy 0, policy_version 1518 (0.0017) [2023-03-08 17:58:03,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 6230016. Throughput: 0: 890.1. Samples: 555764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:58:03,646][01803] Avg episode reward: [(0, '27.789')] [2023-03-08 17:58:08,639][01803] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 6242304. Throughput: 0: 889.8. Samples: 560284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 17:58:08,644][01803] Avg episode reward: [(0, '27.636')] [2023-03-08 17:58:11,681][21775] Updated weights for policy 0, policy_version 1528 (0.0013) [2023-03-08 17:58:13,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 6266880. Throughput: 0: 913.7. Samples: 563792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:58:13,641][01803] Avg episode reward: [(0, '28.635')] [2023-03-08 17:58:18,639][01803] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 6291456. Throughput: 0: 915.5. Samples: 571106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:58:18,641][01803] Avg episode reward: [(0, '28.507')] [2023-03-08 17:58:20,844][21775] Updated weights for policy 0, policy_version 1538 (0.0017) [2023-03-08 17:58:23,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 6307840. Throughput: 0: 924.0. Samples: 576236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:58:23,641][01803] Avg episode reward: [(0, '27.510')] [2023-03-08 17:58:28,638][01803] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 6320128. Throughput: 0: 935.2. Samples: 578500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:58:28,647][01803] Avg episode reward: [(0, '27.659')] [2023-03-08 17:58:32,157][21775] Updated weights for policy 0, policy_version 1548 (0.0025) [2023-03-08 17:58:33,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3873.8). Total num frames: 6344704. Throughput: 0: 1000.3. Samples: 585064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:58:33,641][01803] Avg episode reward: [(0, '27.597')] [2023-03-08 17:58:38,639][01803] Fps is (10 sec: 4915.2, 60 sec: 3823.1, 300 sec: 3901.6). Total num frames: 6369280. Throughput: 0: 1022.4. Samples: 592424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:58:38,646][01803] Avg episode reward: [(0, '28.216')] [2023-03-08 17:58:41,297][21775] Updated weights for policy 0, policy_version 1558 (0.0020) [2023-03-08 17:58:43,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3901.6). Total num frames: 6385664. Throughput: 0: 992.4. Samples: 594704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:58:43,642][01803] Avg episode reward: [(0, '28.126')] [2023-03-08 17:58:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6402048. Throughput: 0: 967.4. Samples: 599298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 17:58:48,641][01803] Avg episode reward: [(0, '29.102')] [2023-03-08 17:58:52,313][21775] Updated weights for policy 0, policy_version 1568 (0.0035) [2023-03-08 17:58:53,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6426624. Throughput: 0: 1023.6. Samples: 606346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:58:53,651][01803] Avg episode reward: [(0, '28.863')] [2023-03-08 17:58:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 6451200. Throughput: 0: 1026.0. Samples: 609962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:58:58,646][01803] Avg episode reward: [(0, '29.815')] [2023-03-08 17:59:02,306][21775] Updated weights for policy 0, policy_version 1578 (0.0011) [2023-03-08 17:59:03,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 6463488. Throughput: 0: 982.4. Samples: 615314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 17:59:03,649][01803] Avg episode reward: [(0, '30.096')] [2023-03-08 17:59:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 3901.6). Total num frames: 6483968. Throughput: 0: 981.8. Samples: 620418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:59:08,640][01803] Avg episode reward: [(0, '28.869')] [2023-03-08 17:59:12,575][21775] Updated weights for policy 0, policy_version 1588 (0.0020) [2023-03-08 17:59:13,638][01803] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6508544. Throughput: 0: 1011.7. Samples: 624028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:59:13,640][01803] Avg episode reward: [(0, '28.412')] [2023-03-08 17:59:18,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 6529024. Throughput: 0: 1030.1. Samples: 631420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:59:18,649][01803] Avg episode reward: [(0, '27.480')] [2023-03-08 17:59:22,855][21775] Updated weights for policy 0, policy_version 1598 (0.0019) [2023-03-08 17:59:23,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 6545408. Throughput: 0: 972.0. Samples: 636166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:59:23,644][01803] Avg episode reward: [(0, '25.929')] [2023-03-08 17:59:28,638][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6561792. Throughput: 0: 972.9. Samples: 638486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:59:28,642][01803] Avg episode reward: [(0, '26.334')] [2023-03-08 17:59:32,900][21775] Updated weights for policy 0, policy_version 1608 (0.0018) [2023-03-08 17:59:33,638][01803] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6586368. Throughput: 0: 1026.8. Samples: 645502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 17:59:33,645][01803] Avg episode reward: [(0, '26.929')] [2023-03-08 17:59:38,643][01803] Fps is (10 sec: 4913.0, 60 sec: 4027.4, 300 sec: 3943.2). Total num frames: 6610944. Throughput: 0: 1022.2. Samples: 652350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-03-08 17:59:38,652][01803] Avg episode reward: [(0, '30.515')] [2023-03-08 17:59:43,085][21775] Updated weights for policy 0, policy_version 1618 (0.0011) [2023-03-08 17:59:43,643][01803] Fps is (10 sec: 4094.1, 60 sec: 4027.4, 300 sec: 3943.2). Total num frames: 6627328. Throughput: 0: 994.1. Samples: 654702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 17:59:43,651][01803] Avg episode reward: [(0, '30.517')] [2023-03-08 17:59:43,666][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001618_6627328.pth... [2023-03-08 17:59:43,819][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001388_5685248.pth [2023-03-08 17:59:48,638][01803] Fps is (10 sec: 3278.2, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6643712. Throughput: 0: 985.9. Samples: 659680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:59:48,642][01803] Avg episode reward: [(0, '30.548')] [2023-03-08 17:59:52,868][21775] Updated weights for policy 0, policy_version 1628 (0.0019) [2023-03-08 17:59:53,638][01803] Fps is (10 sec: 4097.9, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6668288. Throughput: 0: 1036.6. Samples: 667066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 17:59:53,641][01803] Avg episode reward: [(0, '31.037')] [2023-03-08 17:59:58,638][01803] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 6692864. Throughput: 0: 1033.7. Samples: 670544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 17:59:58,642][01803] Avg episode reward: [(0, '31.036')] [2023-03-08 18:00:03,644][01803] Fps is (10 sec: 3684.4, 60 sec: 4027.4, 300 sec: 3943.2). Total num frames: 6705152. Throughput: 0: 978.7. Samples: 675466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:00:03,648][01803] Avg episode reward: [(0, '31.512')] [2023-03-08 18:00:03,765][21775] Updated weights for policy 0, policy_version 1638 (0.0016) [2023-03-08 18:00:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 6725632. Throughput: 0: 997.0. Samples: 681032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:00:08,647][01803] Avg episode reward: [(0, '29.662')] [2023-03-08 18:00:13,157][21775] Updated weights for policy 0, policy_version 1648 (0.0022) [2023-03-08 18:00:13,638][01803] Fps is (10 sec: 4508.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 6750208. Throughput: 0: 1027.3. Samples: 684716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 18:00:13,647][01803] Avg episode reward: [(0, '27.703')] [2023-03-08 18:00:18,640][01803] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3943.3). Total num frames: 6770688. Throughput: 0: 1027.1. Samples: 691722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:00:18,642][01803] Avg episode reward: [(0, '27.327')] [2023-03-08 18:00:23,638][01803] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 6787072. Throughput: 0: 978.7. Samples: 696386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:23,647][01803] Avg episode reward: [(0, '26.810')] [2023-03-08 18:00:24,484][21775] Updated weights for policy 0, policy_version 1658 (0.0016) [2023-03-08 18:00:28,638][01803] Fps is (10 sec: 3687.0, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 6807552. Throughput: 0: 980.5. Samples: 698822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:28,641][01803] Avg episode reward: [(0, '26.104')] [2023-03-08 18:00:33,357][21775] Updated weights for policy 0, policy_version 1668 (0.0013) [2023-03-08 18:00:33,639][01803] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 6832128. Throughput: 0: 1033.1. Samples: 706170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:33,641][01803] Avg episode reward: [(0, '24.997')] [2023-03-08 18:00:38,640][01803] Fps is (10 sec: 4095.5, 60 sec: 3959.7, 300 sec: 3943.3). Total num frames: 6848512. Throughput: 0: 990.8. Samples: 711652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:38,644][01803] Avg episode reward: [(0, '25.272')] [2023-03-08 18:00:43,641][01803] Fps is (10 sec: 3276.1, 60 sec: 3959.6, 300 sec: 3943.2). Total num frames: 6864896. Throughput: 0: 965.0. Samples: 713970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:43,643][01803] Avg episode reward: [(0, '26.690')] [2023-03-08 18:00:45,931][21775] Updated weights for policy 0, policy_version 1678 (0.0018) [2023-03-08 18:00:48,639][01803] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 6885376. Throughput: 0: 975.8. Samples: 719372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 18:00:48,641][01803] Avg episode reward: [(0, '27.103')] [2023-03-08 18:00:53,638][01803] Fps is (10 sec: 4096.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 6905856. Throughput: 0: 1006.4. Samples: 726322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:53,641][01803] Avg episode reward: [(0, '28.462')] [2023-03-08 18:00:55,839][21775] Updated weights for policy 0, policy_version 1688 (0.0024) [2023-03-08 18:00:58,638][01803] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 6918144. Throughput: 0: 963.2. Samples: 728062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:00:58,643][01803] Avg episode reward: [(0, '28.153')] [2023-03-08 18:01:03,641][01803] Fps is (10 sec: 2457.0, 60 sec: 3754.9, 300 sec: 3887.7). Total num frames: 6930432. Throughput: 0: 888.9. Samples: 731724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:01:03,643][01803] Avg episode reward: [(0, '29.462')] [2023-03-08 18:01:08,638][01803] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 6946816. Throughput: 0: 888.8. Samples: 736384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:01:08,641][01803] Avg episode reward: [(0, '28.935')] [2023-03-08 18:01:09,936][21775] Updated weights for policy 0, policy_version 1698 (0.0028) [2023-03-08 18:01:13,638][01803] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3873.9). Total num frames: 6971392. Throughput: 0: 915.8. Samples: 740034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:01:13,645][01803] Avg episode reward: [(0, '30.477')] [2023-03-08 18:01:18,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3887.7). Total num frames: 6991872. Throughput: 0: 898.9. Samples: 746622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:01:18,645][01803] Avg episode reward: [(0, '30.442')] [2023-03-08 18:01:19,381][21775] Updated weights for policy 0, policy_version 1708 (0.0014) [2023-03-08 18:01:23,639][01803] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 7008256. Throughput: 0: 879.6. Samples: 751232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:01:23,643][01803] Avg episode reward: [(0, '29.043')] [2023-03-08 18:01:28,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 7028736. Throughput: 0: 893.8. Samples: 754190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 18:01:28,641][01803] Avg episode reward: [(0, '29.751')] [2023-03-08 18:01:30,031][21775] Updated weights for policy 0, policy_version 1718 (0.0011) [2023-03-08 18:01:33,638][01803] Fps is (10 sec: 4505.8, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 7053312. Throughput: 0: 935.8. Samples: 761484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:01:33,642][01803] Avg episode reward: [(0, '28.829')] [2023-03-08 18:01:38,651][01803] Fps is (10 sec: 4090.8, 60 sec: 3685.7, 300 sec: 3887.6). Total num frames: 7069696. Throughput: 0: 914.8. Samples: 767500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:01:38,653][01803] Avg episode reward: [(0, '29.428')] [2023-03-08 18:01:40,560][21775] Updated weights for policy 0, policy_version 1728 (0.0021) [2023-03-08 18:01:43,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3873.8). Total num frames: 7086080. Throughput: 0: 913.0. Samples: 769146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:01:43,641][01803] Avg episode reward: [(0, '28.412')] [2023-03-08 18:01:43,648][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001730_7086080.pth... [2023-03-08 18:01:43,845][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001502_6152192.pth [2023-03-08 18:01:48,638][01803] Fps is (10 sec: 3281.0, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 7102464. Throughput: 0: 961.3. Samples: 774982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:01:48,640][01803] Avg episode reward: [(0, '26.885')] [2023-03-08 18:01:51,251][21775] Updated weights for policy 0, policy_version 1738 (0.0019) [2023-03-08 18:01:53,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 7127040. Throughput: 0: 998.4. Samples: 781312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:01:53,641][01803] Avg episode reward: [(0, '28.634')] [2023-03-08 18:01:58,642][01803] Fps is (10 sec: 3275.7, 60 sec: 3617.9, 300 sec: 3832.1). Total num frames: 7135232. Throughput: 0: 949.9. Samples: 782784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:01:58,644][01803] Avg episode reward: [(0, '28.123')] [2023-03-08 18:02:03,638][01803] Fps is (10 sec: 2048.0, 60 sec: 3618.3, 300 sec: 3818.3). Total num frames: 7147520. Throughput: 0: 870.5. Samples: 785796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:02:03,645][01803] Avg episode reward: [(0, '28.347')] [2023-03-08 18:02:08,377][21775] Updated weights for policy 0, policy_version 1748 (0.0018) [2023-03-08 18:02:08,640][01803] Fps is (10 sec: 2458.2, 60 sec: 3549.8, 300 sec: 3776.6). Total num frames: 7159808. Throughput: 0: 844.5. Samples: 789236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:02:08,652][01803] Avg episode reward: [(0, '28.744')] [2023-03-08 18:02:13,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3776.7). Total num frames: 7180288. Throughput: 0: 857.5. Samples: 792776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:02:13,647][01803] Avg episode reward: [(0, '28.086')] [2023-03-08 18:02:17,097][21775] Updated weights for policy 0, policy_version 1758 (0.0018) [2023-03-08 18:02:18,639][01803] Fps is (10 sec: 4505.8, 60 sec: 3549.8, 300 sec: 3804.4). Total num frames: 7204864. Throughput: 0: 857.3. Samples: 800064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:02:18,646][01803] Avg episode reward: [(0, '30.502')] [2023-03-08 18:02:23,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 7221248. Throughput: 0: 835.7. Samples: 805094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:02:23,644][01803] Avg episode reward: [(0, '30.227')] [2023-03-08 18:02:28,638][01803] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3762.8). Total num frames: 7237632. Throughput: 0: 849.1. Samples: 807354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:02:28,641][01803] Avg episode reward: [(0, '30.380')] [2023-03-08 18:02:29,128][21775] Updated weights for policy 0, policy_version 1768 (0.0012) [2023-03-08 18:02:33,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3804.5). Total num frames: 7262208. Throughput: 0: 866.8. Samples: 813990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:02:33,641][01803] Avg episode reward: [(0, '30.142')] [2023-03-08 18:02:37,386][21775] Updated weights for policy 0, policy_version 1778 (0.0012) [2023-03-08 18:02:38,638][01803] Fps is (10 sec: 4915.2, 60 sec: 3618.9, 300 sec: 3846.1). Total num frames: 7286784. Throughput: 0: 887.6. Samples: 821252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 18:02:38,647][01803] Avg episode reward: [(0, '28.956')] [2023-03-08 18:02:43,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 7303168. Throughput: 0: 905.0. Samples: 823504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:02:43,641][01803] Avg episode reward: [(0, '28.979')] [2023-03-08 18:02:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 7319552. Throughput: 0: 941.4. Samples: 828158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:02:48,641][01803] Avg episode reward: [(0, '29.270')] [2023-03-08 18:02:49,474][21775] Updated weights for policy 0, policy_version 1788 (0.0033) [2023-03-08 18:02:53,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 7340032. Throughput: 0: 1020.4. Samples: 835154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:02:53,644][01803] Avg episode reward: [(0, '27.880')] [2023-03-08 18:02:57,986][21775] Updated weights for policy 0, policy_version 1798 (0.0016) [2023-03-08 18:02:58,643][01803] Fps is (10 sec: 4503.6, 60 sec: 3822.9, 300 sec: 3846.0). Total num frames: 7364608. Throughput: 0: 1020.7. Samples: 838712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:02:58,646][01803] Avg episode reward: [(0, '28.135')] [2023-03-08 18:03:03,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 7380992. Throughput: 0: 973.3. Samples: 843862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:03:03,649][01803] Avg episode reward: [(0, '28.391')] [2023-03-08 18:03:08,639][01803] Fps is (10 sec: 3278.2, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 7397376. Throughput: 0: 966.8. Samples: 848598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 18:03:08,641][01803] Avg episode reward: [(0, '29.643')] [2023-03-08 18:03:10,176][21775] Updated weights for policy 0, policy_version 1808 (0.0025) [2023-03-08 18:03:13,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 7417856. Throughput: 0: 995.4. Samples: 852148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 18:03:13,641][01803] Avg episode reward: [(0, '28.264')] [2023-03-08 18:03:18,638][01803] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 7442432. Throughput: 0: 1003.7. Samples: 859158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:03:18,641][01803] Avg episode reward: [(0, '27.628')] [2023-03-08 18:03:19,688][21775] Updated weights for policy 0, policy_version 1818 (0.0013) [2023-03-08 18:03:23,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 7454720. Throughput: 0: 893.1. Samples: 861440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:03:23,644][01803] Avg episode reward: [(0, '27.835')] [2023-03-08 18:03:28,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 7475200. Throughput: 0: 943.0. Samples: 865938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:03:28,643][01803] Avg episode reward: [(0, '27.686')] [2023-03-08 18:03:31,055][21775] Updated weights for policy 0, policy_version 1828 (0.0020) [2023-03-08 18:03:33,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 7495680. Throughput: 0: 990.9. Samples: 872750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:03:33,641][01803] Avg episode reward: [(0, '25.822')] [2023-03-08 18:03:38,640][01803] Fps is (10 sec: 4505.0, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 7520256. Throughput: 0: 978.2. Samples: 879174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:03:38,642][01803] Avg episode reward: [(0, '26.001')] [2023-03-08 18:03:41,301][21775] Updated weights for policy 0, policy_version 1838 (0.0011) [2023-03-08 18:03:43,638][01803] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 7532544. Throughput: 0: 949.4. Samples: 881430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:03:43,645][01803] Avg episode reward: [(0, '25.249')] [2023-03-08 18:03:43,653][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth... [2023-03-08 18:03:43,817][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001618_6627328.pth [2023-03-08 18:03:48,639][01803] Fps is (10 sec: 2867.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 7548928. Throughput: 0: 936.6. Samples: 886010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:03:48,643][01803] Avg episode reward: [(0, '26.645')] [2023-03-08 18:03:52,528][21775] Updated weights for policy 0, policy_version 1848 (0.0019) [2023-03-08 18:03:53,639][01803] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 7573504. Throughput: 0: 982.7. Samples: 892820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:03:53,641][01803] Avg episode reward: [(0, '27.026')] [2023-03-08 18:03:58,641][01803] Fps is (10 sec: 4504.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 7593984. Throughput: 0: 981.7. Samples: 896326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 18:03:58,647][01803] Avg episode reward: [(0, '29.295')] [2023-03-08 18:04:03,543][21775] Updated weights for policy 0, policy_version 1858 (0.0011) [2023-03-08 18:04:03,642][01803] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3818.3). Total num frames: 7610368. Throughput: 0: 927.5. Samples: 900898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:04:03,649][01803] Avg episode reward: [(0, '29.806')] [2023-03-08 18:04:08,638][01803] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 7626752. Throughput: 0: 998.3. Samples: 906364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:04:08,640][01803] Avg episode reward: [(0, '31.359')] [2023-03-08 18:04:13,239][21775] Updated weights for policy 0, policy_version 1868 (0.0011) [2023-03-08 18:04:13,638][01803] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 7651328. Throughput: 0: 978.6. Samples: 909974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-03-08 18:04:13,647][01803] Avg episode reward: [(0, '31.628')] [2023-03-08 18:04:18,639][01803] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 7671808. Throughput: 0: 976.1. Samples: 916674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 18:04:18,647][01803] Avg episode reward: [(0, '33.202')] [2023-03-08 18:04:18,649][21759] Saving new best policy, reward=33.202! [2023-03-08 18:04:23,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 7684096. Throughput: 0: 881.1. Samples: 918822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 18:04:23,654][01803] Avg episode reward: [(0, '33.357')] [2023-03-08 18:04:23,674][21759] Saving new best policy, reward=33.357! [2023-03-08 18:04:24,940][21775] Updated weights for policy 0, policy_version 1878 (0.0021) [2023-03-08 18:04:28,638][01803] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 7704576. Throughput: 0: 933.6. Samples: 923440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 18:04:28,641][01803] Avg episode reward: [(0, '32.564')] [2023-03-08 18:04:33,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 7729152. Throughput: 0: 985.4. Samples: 930352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:04:33,641][01803] Avg episode reward: [(0, '32.635')] [2023-03-08 18:04:34,380][21775] Updated weights for policy 0, policy_version 1888 (0.0026) [2023-03-08 18:04:38,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3790.6). Total num frames: 7745536. Throughput: 0: 966.2. Samples: 936300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-03-08 18:04:38,642][01803] Avg episode reward: [(0, '31.169')] [2023-03-08 18:04:43,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 7761920. Throughput: 0: 937.1. Samples: 938492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:04:43,646][01803] Avg episode reward: [(0, '30.201')] [2023-03-08 18:04:46,831][21775] Updated weights for policy 0, policy_version 1898 (0.0044) [2023-03-08 18:04:48,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7778304. Throughput: 0: 949.4. Samples: 943620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:04:48,646][01803] Avg episode reward: [(0, '30.326')] [2023-03-08 18:04:53,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7802880. Throughput: 0: 983.0. Samples: 950600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:04:53,641][01803] Avg episode reward: [(0, '30.489')] [2023-03-08 18:04:55,584][21775] Updated weights for policy 0, policy_version 1908 (0.0012) [2023-03-08 18:04:58,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3790.6). Total num frames: 7823360. Throughput: 0: 975.4. Samples: 953868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:04:58,641][01803] Avg episode reward: [(0, '30.995')] [2023-03-08 18:05:03,642][01803] Fps is (10 sec: 3685.2, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 7839744. Throughput: 0: 924.0. Samples: 958256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:05:03,648][01803] Avg episode reward: [(0, '30.220')] [2023-03-08 18:05:07,984][21775] Updated weights for policy 0, policy_version 1918 (0.0011) [2023-03-08 18:05:08,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 7856128. Throughput: 0: 1002.2. Samples: 963920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-03-08 18:05:08,647][01803] Avg episode reward: [(0, '28.891')] [2023-03-08 18:05:13,638][01803] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7880704. Throughput: 0: 976.9. Samples: 967402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:05:13,644][01803] Avg episode reward: [(0, '30.698')] [2023-03-08 18:05:17,156][21775] Updated weights for policy 0, policy_version 1928 (0.0017) [2023-03-08 18:05:18,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 7901184. Throughput: 0: 961.8. Samples: 973632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-03-08 18:05:18,640][01803] Avg episode reward: [(0, '30.988')] [2023-03-08 18:05:23,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 7913472. Throughput: 0: 929.6. Samples: 978130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-03-08 18:05:23,643][01803] Avg episode reward: [(0, '28.928')] [2023-03-08 18:05:28,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 7933952. Throughput: 0: 943.8. Samples: 980962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:05:28,641][01803] Avg episode reward: [(0, '30.395')] [2023-03-08 18:05:28,897][21775] Updated weights for policy 0, policy_version 1938 (0.0020) [2023-03-08 18:05:33,638][01803] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7958528. Throughput: 0: 986.9. Samples: 988030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:05:33,641][01803] Avg episode reward: [(0, '29.638')] [2023-03-08 18:05:38,638][01803] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 7974912. Throughput: 0: 953.9. Samples: 993526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-03-08 18:05:38,647][01803] Avg episode reward: [(0, '29.928')] [2023-03-08 18:05:39,203][21775] Updated weights for policy 0, policy_version 1948 (0.0022) [2023-03-08 18:05:43,638][01803] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 7991296. Throughput: 0: 930.6. Samples: 995746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-03-08 18:05:43,646][01803] Avg episode reward: [(0, '29.402')] [2023-03-08 18:05:43,656][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001951_7991296.pth... [2023-03-08 18:05:43,920][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001730_7086080.pth [2023-03-08 18:05:47,443][21759] Stopping Batcher_0... [2023-03-08 18:05:47,453][21759] Loop batcher_evt_loop terminating... [2023-03-08 18:05:47,445][01803] Component Batcher_0 stopped! [2023-03-08 18:05:47,445][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-08 18:05:47,496][01803] Component RolloutWorker_w6 stopped! [2023-03-08 18:05:47,506][21785] Stopping RolloutWorker_w6... [2023-03-08 18:05:47,506][21785] Loop rollout_proc6_evt_loop terminating... [2023-03-08 18:05:47,535][21782] Stopping RolloutWorker_w7... [2023-03-08 18:05:47,534][01803] Component RolloutWorker_w4 stopped! [2023-03-08 18:05:47,543][01803] Component RolloutWorker_w7 stopped! [2023-03-08 18:05:47,549][21779] Stopping RolloutWorker_w4... [2023-03-08 18:05:47,549][21779] Loop rollout_proc4_evt_loop terminating... [2023-03-08 18:05:47,557][21776] Stopping RolloutWorker_w2... [2023-03-08 18:05:47,558][21776] Loop rollout_proc2_evt_loop terminating... [2023-03-08 18:05:47,536][21782] Loop rollout_proc7_evt_loop terminating... [2023-03-08 18:05:47,557][01803] Component RolloutWorker_w2 stopped! [2023-03-08 18:05:47,568][21775] Weights refcount: 2 0 [2023-03-08 18:05:47,574][01803] Component RolloutWorker_w0 stopped! [2023-03-08 18:05:47,580][21773] Stopping RolloutWorker_w0... [2023-03-08 18:05:47,582][21773] Loop rollout_proc0_evt_loop terminating... [2023-03-08 18:05:47,587][21775] Stopping InferenceWorker_p0-w0... [2023-03-08 18:05:47,588][21775] Loop inference_proc0-0_evt_loop terminating... [2023-03-08 18:05:47,587][01803] Component InferenceWorker_p0-w0 stopped! [2023-03-08 18:05:47,601][21778] Stopping RolloutWorker_w5... [2023-03-08 18:05:47,605][21778] Loop rollout_proc5_evt_loop terminating... [2023-03-08 18:05:47,601][01803] Component RolloutWorker_w5 stopped! [2023-03-08 18:05:47,608][21774] Stopping RolloutWorker_w1... [2023-03-08 18:05:47,609][21774] Loop rollout_proc1_evt_loop terminating... [2023-03-08 18:05:47,611][01803] Component RolloutWorker_w1 stopped! [2023-03-08 18:05:47,653][21777] Stopping RolloutWorker_w3... [2023-03-08 18:05:47,653][01803] Component RolloutWorker_w3 stopped! [2023-03-08 18:05:47,665][21777] Loop rollout_proc3_evt_loop terminating... [2023-03-08 18:05:47,684][21759] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth [2023-03-08 18:05:47,696][21759] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-08 18:05:47,901][01803] Component LearnerWorker_p0 stopped! [2023-03-08 18:05:47,905][01803] Waiting for process learner_proc0 to stop... [2023-03-08 18:05:47,913][21759] Stopping LearnerWorker_p0... [2023-03-08 18:05:47,913][21759] Loop learner_proc0_evt_loop terminating... [2023-03-08 18:05:49,520][01803] Waiting for process inference_proc0-0 to join... [2023-03-08 18:05:49,838][01803] Waiting for process rollout_proc0 to join... [2023-03-08 18:05:50,529][01803] Waiting for process rollout_proc1 to join... [2023-03-08 18:05:50,531][01803] Waiting for process rollout_proc2 to join... [2023-03-08 18:05:50,535][01803] Waiting for process rollout_proc3 to join... [2023-03-08 18:05:50,536][01803] Waiting for process rollout_proc4 to join... [2023-03-08 18:05:50,538][01803] Waiting for process rollout_proc5 to join... [2023-03-08 18:05:50,539][01803] Waiting for process rollout_proc6 to join... [2023-03-08 18:05:50,540][01803] Waiting for process rollout_proc7 to join... [2023-03-08 18:05:50,541][01803] Batcher 0 profile tree view: batching: 26.4014, releasing_batches: 0.0273 [2023-03-08 18:05:50,542][01803] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 528.5949 update_model: 7.8092 weight_update: 0.0016 one_step: 0.0099 handle_policy_step: 489.9170 deserialize: 14.4931, stack: 2.8534, obs_to_device_normalize: 112.1535, forward: 229.6642, send_messages: 25.5913 prepare_outputs: 80.1985 to_cpu: 50.5467 [2023-03-08 18:05:50,544][01803] Learner 0 profile tree view: misc: 0.0056, prepare_batch: 14.9854 train: 77.9006 epoch_init: 0.0078, minibatch_init: 0.0101, losses_postprocess: 0.7609, kl_divergence: 0.5775, after_optimizer: 2.8572 calculate_losses: 25.6166 losses_init: 0.0233, forward_head: 1.6311, bptt_initial: 16.9620, tail: 0.8949, advantages_returns: 0.3590, losses: 3.4410 bptt: 2.0011 bptt_forward_core: 1.8678 update: 47.3978 clip: 1.3311 [2023-03-08 18:05:50,545][01803] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3465, enqueue_policy_requests: 143.2375, env_step: 800.5837, overhead: 19.7988, complete_rollouts: 7.4059 save_policy_outputs: 19.3788 split_output_tensors: 9.3423 [2023-03-08 18:05:50,546][01803] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3593, enqueue_policy_requests: 142.6662, env_step: 801.4875, overhead: 19.9749, complete_rollouts: 6.6729 save_policy_outputs: 19.5038 split_output_tensors: 9.8045 [2023-03-08 18:05:50,548][01803] Loop Runner_EvtLoop terminating... [2023-03-08 18:05:50,550][01803] Runner profile tree view: main_loop: 1085.2578 [2023-03-08 18:05:50,551][01803] Collected {0: 8007680}, FPS: 3687.4 [2023-03-08 18:05:50,841][01803] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-08 18:05:50,843][01803] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-08 18:05:50,845][01803] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-08 18:05:50,848][01803] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-08 18:05:50,850][01803] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-08 18:05:50,852][01803] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-08 18:05:50,854][01803] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-03-08 18:05:50,855][01803] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-08 18:05:50,856][01803] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-03-08 18:05:50,857][01803] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-03-08 18:05:50,858][01803] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-08 18:05:50,860][01803] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-08 18:05:50,862][01803] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-08 18:05:50,863][01803] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-08 18:05:50,865][01803] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-08 18:05:50,894][01803] Doom resolution: 160x120, resize resolution: (128, 72) [2023-03-08 18:05:50,900][01803] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 18:05:50,902][01803] RunningMeanStd input shape: (1,) [2023-03-08 18:05:50,918][01803] ConvEncoder: input_channels=3 [2023-03-08 18:05:51,603][01803] Conv encoder output size: 512 [2023-03-08 18:05:51,605][01803] Policy head output size: 512 [2023-03-08 18:05:54,032][01803] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-08 18:05:55,511][01803] Num frames 100... [2023-03-08 18:05:55,677][01803] Num frames 200... [2023-03-08 18:05:55,839][01803] Num frames 300... [2023-03-08 18:05:55,998][01803] Num frames 400... [2023-03-08 18:05:56,168][01803] Num frames 500... [2023-03-08 18:05:56,349][01803] Num frames 600... [2023-03-08 18:05:56,519][01803] Num frames 700... [2023-03-08 18:05:56,686][01803] Num frames 800... [2023-03-08 18:05:56,848][01803] Num frames 900... [2023-03-08 18:05:57,010][01803] Num frames 1000... [2023-03-08 18:05:57,187][01803] Num frames 1100... [2023-03-08 18:05:57,346][01803] Num frames 1200... [2023-03-08 18:05:57,506][01803] Num frames 1300... [2023-03-08 18:05:57,664][01803] Num frames 1400... [2023-03-08 18:05:57,833][01803] Avg episode rewards: #0: 32.720, true rewards: #0: 14.720 [2023-03-08 18:05:57,835][01803] Avg episode reward: 32.720, avg true_objective: 14.720 [2023-03-08 18:05:57,884][01803] Num frames 1500... [2023-03-08 18:05:58,038][01803] Num frames 1600... [2023-03-08 18:05:58,203][01803] Num frames 1700... [2023-03-08 18:05:58,366][01803] Num frames 1800... [2023-03-08 18:05:58,522][01803] Num frames 1900... [2023-03-08 18:05:58,671][01803] Num frames 2000... [2023-03-08 18:05:58,791][01803] Num frames 2100... [2023-03-08 18:05:58,901][01803] Num frames 2200... [2023-03-08 18:05:59,011][01803] Num frames 2300... [2023-03-08 18:05:59,140][01803] Num frames 2400... [2023-03-08 18:05:59,256][01803] Num frames 2500... [2023-03-08 18:05:59,372][01803] Num frames 2600... [2023-03-08 18:05:59,493][01803] Num frames 2700... [2023-03-08 18:05:59,604][01803] Num frames 2800... [2023-03-08 18:05:59,717][01803] Num frames 2900... [2023-03-08 18:05:59,827][01803] Num frames 3000... [2023-03-08 18:05:59,943][01803] Num frames 3100... [2023-03-08 18:06:00,062][01803] Num frames 3200... [2023-03-08 18:06:00,191][01803] Num frames 3300... [2023-03-08 18:06:00,306][01803] Num frames 3400... [2023-03-08 18:06:00,425][01803] Num frames 3500... [2023-03-08 18:06:00,562][01803] Avg episode rewards: #0: 46.359, true rewards: #0: 17.860 [2023-03-08 18:06:00,564][01803] Avg episode reward: 46.359, avg true_objective: 17.860 [2023-03-08 18:06:00,602][01803] Num frames 3600... [2023-03-08 18:06:00,729][01803] Num frames 3700... [2023-03-08 18:06:00,841][01803] Num frames 3800... [2023-03-08 18:06:00,952][01803] Num frames 3900... [2023-03-08 18:06:01,065][01803] Num frames 4000... [2023-03-08 18:06:01,212][01803] Avg episode rewards: #0: 33.280, true rewards: #0: 13.613 [2023-03-08 18:06:01,214][01803] Avg episode reward: 33.280, avg true_objective: 13.613 [2023-03-08 18:06:01,235][01803] Num frames 4100... [2023-03-08 18:06:01,345][01803] Num frames 4200... [2023-03-08 18:06:01,459][01803] Num frames 4300... [2023-03-08 18:06:01,571][01803] Num frames 4400... [2023-03-08 18:06:01,681][01803] Num frames 4500... [2023-03-08 18:06:01,800][01803] Num frames 4600... [2023-03-08 18:06:01,915][01803] Num frames 4700... [2023-03-08 18:06:02,032][01803] Num frames 4800... [2023-03-08 18:06:02,130][01803] Avg episode rewards: #0: 29.845, true rewards: #0: 12.095 [2023-03-08 18:06:02,132][01803] Avg episode reward: 29.845, avg true_objective: 12.095 [2023-03-08 18:06:02,208][01803] Num frames 4900... [2023-03-08 18:06:02,320][01803] Num frames 5000... [2023-03-08 18:06:02,439][01803] Num frames 5100... [2023-03-08 18:06:02,549][01803] Num frames 5200... [2023-03-08 18:06:02,645][01803] Avg episode rewards: #0: 25.274, true rewards: #0: 10.474 [2023-03-08 18:06:02,647][01803] Avg episode reward: 25.274, avg true_objective: 10.474 [2023-03-08 18:06:02,728][01803] Num frames 5300... [2023-03-08 18:06:02,836][01803] Num frames 5400... [2023-03-08 18:06:02,947][01803] Num frames 5500... [2023-03-08 18:06:03,061][01803] Num frames 5600... [2023-03-08 18:06:03,174][01803] Num frames 5700... [2023-03-08 18:06:03,289][01803] Num frames 5800... [2023-03-08 18:06:03,401][01803] Num frames 5900... [2023-03-08 18:06:03,464][01803] Avg episode rewards: #0: 23.510, true rewards: #0: 9.843 [2023-03-08 18:06:03,466][01803] Avg episode reward: 23.510, avg true_objective: 9.843 [2023-03-08 18:06:03,570][01803] Num frames 6000... [2023-03-08 18:06:03,681][01803] Num frames 6100... [2023-03-08 18:06:03,790][01803] Num frames 6200... [2023-03-08 18:06:03,903][01803] Num frames 6300... [2023-03-08 18:06:04,013][01803] Num frames 6400... [2023-03-08 18:06:04,129][01803] Num frames 6500... [2023-03-08 18:06:04,257][01803] Num frames 6600... [2023-03-08 18:06:04,403][01803] Num frames 6700... [2023-03-08 18:06:04,519][01803] Num frames 6800... [2023-03-08 18:06:04,634][01803] Num frames 6900... [2023-03-08 18:06:04,754][01803] Num frames 7000... [2023-03-08 18:06:04,871][01803] Num frames 7100... [2023-03-08 18:06:04,985][01803] Num frames 7200... [2023-03-08 18:06:05,102][01803] Num frames 7300... [2023-03-08 18:06:05,213][01803] Num frames 7400... [2023-03-08 18:06:05,331][01803] Num frames 7500... [2023-03-08 18:06:05,449][01803] Num frames 7600... [2023-03-08 18:06:05,557][01803] Num frames 7700... [2023-03-08 18:06:05,670][01803] Num frames 7800... [2023-03-08 18:06:05,783][01803] Num frames 7900... [2023-03-08 18:06:05,903][01803] Num frames 8000... [2023-03-08 18:06:05,967][01803] Avg episode rewards: #0: 28.580, true rewards: #0: 11.437 [2023-03-08 18:06:05,969][01803] Avg episode reward: 28.580, avg true_objective: 11.437 [2023-03-08 18:06:06,075][01803] Num frames 8100... [2023-03-08 18:06:06,188][01803] Num frames 8200... [2023-03-08 18:06:06,307][01803] Num frames 8300... [2023-03-08 18:06:06,424][01803] Num frames 8400... [2023-03-08 18:06:06,533][01803] Num frames 8500... [2023-03-08 18:06:06,644][01803] Num frames 8600... [2023-03-08 18:06:06,762][01803] Num frames 8700... [2023-03-08 18:06:06,888][01803] Avg episode rewards: #0: 26.575, true rewards: #0: 10.950 [2023-03-08 18:06:06,890][01803] Avg episode reward: 26.575, avg true_objective: 10.950 [2023-03-08 18:06:06,940][01803] Num frames 8800... [2023-03-08 18:06:07,057][01803] Num frames 8900... [2023-03-08 18:06:07,169][01803] Num frames 9000... [2023-03-08 18:06:07,285][01803] Num frames 9100... [2023-03-08 18:06:07,403][01803] Num frames 9200... [2023-03-08 18:06:07,471][01803] Avg episode rewards: #0: 24.678, true rewards: #0: 10.233 [2023-03-08 18:06:07,472][01803] Avg episode reward: 24.678, avg true_objective: 10.233 [2023-03-08 18:06:07,572][01803] Num frames 9300... [2023-03-08 18:06:07,681][01803] Num frames 9400... [2023-03-08 18:06:07,799][01803] Num frames 9500... [2023-03-08 18:06:07,934][01803] Num frames 9600... [2023-03-08 18:06:08,047][01803] Num frames 9700... [2023-03-08 18:06:08,159][01803] Num frames 9800... [2023-03-08 18:06:08,276][01803] Num frames 9900... [2023-03-08 18:06:08,397][01803] Num frames 10000... [2023-03-08 18:06:08,507][01803] Num frames 10100... [2023-03-08 18:06:08,616][01803] Num frames 10200... [2023-03-08 18:06:08,761][01803] Num frames 10300... [2023-03-08 18:06:08,927][01803] Num frames 10400... [2023-03-08 18:06:09,086][01803] Num frames 10500... [2023-03-08 18:06:09,246][01803] Num frames 10600... [2023-03-08 18:06:09,410][01803] Num frames 10700... [2023-03-08 18:06:09,568][01803] Num frames 10800... [2023-03-08 18:06:09,727][01803] Num frames 10900... [2023-03-08 18:06:09,893][01803] Avg episode rewards: #0: 26.568, true rewards: #0: 10.968 [2023-03-08 18:06:09,895][01803] Avg episode reward: 26.568, avg true_objective: 10.968 [2023-03-08 18:07:15,873][01803] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-08 18:08:15,636][01803] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-08 18:08:15,638][01803] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-08 18:08:15,639][01803] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-08 18:08:15,643][01803] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-08 18:08:15,647][01803] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-08 18:08:15,648][01803] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-08 18:08:15,650][01803] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-08 18:08:15,652][01803] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-08 18:08:15,653][01803] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-08 18:08:15,654][01803] Adding new argument 'hf_repository'='jinhu2659/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-08 18:08:15,655][01803] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-08 18:08:15,656][01803] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-08 18:08:15,657][01803] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-08 18:08:15,658][01803] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-08 18:08:15,659][01803] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-08 18:08:15,683][01803] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 18:08:15,685][01803] RunningMeanStd input shape: (1,) [2023-03-08 18:08:15,699][01803] ConvEncoder: input_channels=3 [2023-03-08 18:08:15,737][01803] Conv encoder output size: 512 [2023-03-08 18:08:15,738][01803] Policy head output size: 512 [2023-03-08 18:08:15,759][01803] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-08 18:08:16,193][01803] Num frames 100... [2023-03-08 18:08:16,306][01803] Num frames 200... [2023-03-08 18:08:16,416][01803] Num frames 300... [2023-03-08 18:08:16,532][01803] Num frames 400... [2023-03-08 18:08:16,646][01803] Num frames 500... [2023-03-08 18:08:16,757][01803] Num frames 600... [2023-03-08 18:08:16,865][01803] Num frames 700... [2023-03-08 18:08:16,984][01803] Num frames 800... [2023-03-08 18:08:17,037][01803] Avg episode rewards: #0: 16.000, true rewards: #0: 8.000 [2023-03-08 18:08:17,040][01803] Avg episode reward: 16.000, avg true_objective: 8.000 [2023-03-08 18:08:17,149][01803] Num frames 900... [2023-03-08 18:08:17,262][01803] Num frames 1000... [2023-03-08 18:08:17,374][01803] Num frames 1100... [2023-03-08 18:08:17,485][01803] Num frames 1200... [2023-03-08 18:08:17,596][01803] Num frames 1300... [2023-03-08 18:08:17,705][01803] Num frames 1400... [2023-03-08 18:08:17,822][01803] Num frames 1500... [2023-03-08 18:08:17,940][01803] Num frames 1600... [2023-03-08 18:08:18,056][01803] Num frames 1700... [2023-03-08 18:08:18,166][01803] Num frames 1800... [2023-03-08 18:08:18,279][01803] Num frames 1900... [2023-03-08 18:08:18,357][01803] Avg episode rewards: #0: 20.100, true rewards: #0: 9.600 [2023-03-08 18:08:18,358][01803] Avg episode reward: 20.100, avg true_objective: 9.600 [2023-03-08 18:08:18,447][01803] Num frames 2000... [2023-03-08 18:08:18,555][01803] Num frames 2100... [2023-03-08 18:08:18,663][01803] Num frames 2200... [2023-03-08 18:08:18,774][01803] Num frames 2300... [2023-03-08 18:08:18,887][01803] Num frames 2400... [2023-03-08 18:08:18,997][01803] Num frames 2500... [2023-03-08 18:08:19,117][01803] Num frames 2600... [2023-03-08 18:08:19,246][01803] Num frames 2700... [2023-03-08 18:08:19,356][01803] Num frames 2800... [2023-03-08 18:08:19,468][01803] Num frames 2900... [2023-03-08 18:08:19,560][01803] Avg episode rewards: #0: 20.444, true rewards: #0: 9.777 [2023-03-08 18:08:19,562][01803] Avg episode reward: 20.444, avg true_objective: 9.777 [2023-03-08 18:08:19,639][01803] Num frames 3000... [2023-03-08 18:08:19,749][01803] Num frames 3100... [2023-03-08 18:08:19,860][01803] Num frames 3200... [2023-03-08 18:08:19,970][01803] Num frames 3300... [2023-03-08 18:08:20,095][01803] Avg episode rewards: #0: 16.873, true rewards: #0: 8.372 [2023-03-08 18:08:20,097][01803] Avg episode reward: 16.873, avg true_objective: 8.372 [2023-03-08 18:08:20,157][01803] Num frames 3400... [2023-03-08 18:08:20,270][01803] Num frames 3500... [2023-03-08 18:08:20,383][01803] Num frames 3600... [2023-03-08 18:08:20,491][01803] Num frames 3700... [2023-03-08 18:08:20,614][01803] Num frames 3800... [2023-03-08 18:08:20,725][01803] Num frames 3900... [2023-03-08 18:08:20,835][01803] Num frames 4000... [2023-03-08 18:08:20,945][01803] Num frames 4100... [2023-03-08 18:08:21,065][01803] Num frames 4200... [2023-03-08 18:08:21,182][01803] Num frames 4300... [2023-03-08 18:08:21,290][01803] Avg episode rewards: #0: 17.482, true rewards: #0: 8.682 [2023-03-08 18:08:21,292][01803] Avg episode reward: 17.482, avg true_objective: 8.682 [2023-03-08 18:08:21,360][01803] Num frames 4400... [2023-03-08 18:08:21,481][01803] Num frames 4500... [2023-03-08 18:08:21,637][01803] Num frames 4600... [2023-03-08 18:08:21,799][01803] Num frames 4700... [2023-03-08 18:08:21,956][01803] Num frames 4800... [2023-03-08 18:08:22,117][01803] Num frames 4900... [2023-03-08 18:08:22,274][01803] Num frames 5000... [2023-03-08 18:08:22,430][01803] Num frames 5100... [2023-03-08 18:08:22,587][01803] Num frames 5200... [2023-03-08 18:08:22,745][01803] Num frames 5300... [2023-03-08 18:08:22,896][01803] Num frames 5400... [2023-03-08 18:08:23,060][01803] Num frames 5500... [2023-03-08 18:08:23,236][01803] Num frames 5600... [2023-03-08 18:08:23,398][01803] Num frames 5700... [2023-03-08 18:08:23,561][01803] Num frames 5800... [2023-03-08 18:08:23,684][01803] Avg episode rewards: #0: 21.235, true rewards: #0: 9.735 [2023-03-08 18:08:23,686][01803] Avg episode reward: 21.235, avg true_objective: 9.735 [2023-03-08 18:08:23,777][01803] Num frames 5900... [2023-03-08 18:08:23,935][01803] Num frames 6000... [2023-03-08 18:08:24,091][01803] Num frames 6100... [2023-03-08 18:08:24,254][01803] Num frames 6200... [2023-03-08 18:08:24,414][01803] Num frames 6300... [2023-03-08 18:08:24,572][01803] Num frames 6400... [2023-03-08 18:08:24,732][01803] Num frames 6500... [2023-03-08 18:08:24,885][01803] Num frames 6600... [2023-03-08 18:08:24,995][01803] Num frames 6700... [2023-03-08 18:08:25,115][01803] Num frames 6800... [2023-03-08 18:08:25,233][01803] Num frames 6900... [2023-03-08 18:08:25,347][01803] Num frames 7000... [2023-03-08 18:08:25,460][01803] Num frames 7100... [2023-03-08 18:08:25,581][01803] Num frames 7200... [2023-03-08 18:08:25,693][01803] Num frames 7300... [2023-03-08 18:08:25,806][01803] Num frames 7400... [2023-03-08 18:08:25,873][01803] Avg episode rewards: #0: 24.870, true rewards: #0: 10.584 [2023-03-08 18:08:25,874][01803] Avg episode reward: 24.870, avg true_objective: 10.584 [2023-03-08 18:08:25,977][01803] Num frames 7500... [2023-03-08 18:08:26,088][01803] Num frames 7600... [2023-03-08 18:08:26,207][01803] Num frames 7700... [2023-03-08 18:08:26,317][01803] Num frames 7800... [2023-03-08 18:08:26,427][01803] Num frames 7900... [2023-03-08 18:08:26,535][01803] Num frames 8000... [2023-03-08 18:08:26,644][01803] Num frames 8100... [2023-03-08 18:08:26,758][01803] Num frames 8200... [2023-03-08 18:08:26,870][01803] Num frames 8300... [2023-03-08 18:08:26,983][01803] Num frames 8400... [2023-03-08 18:08:27,100][01803] Num frames 8500... [2023-03-08 18:08:27,215][01803] Num frames 8600... [2023-03-08 18:08:27,329][01803] Num frames 8700... [2023-03-08 18:08:27,445][01803] Num frames 8800... [2023-03-08 18:08:27,555][01803] Num frames 8900... [2023-03-08 18:08:27,663][01803] Num frames 9000... [2023-03-08 18:08:27,778][01803] Num frames 9100... [2023-03-08 18:08:27,891][01803] Num frames 9200... [2023-03-08 18:08:28,002][01803] Num frames 9300... [2023-03-08 18:08:28,115][01803] Num frames 9400... [2023-03-08 18:08:28,238][01803] Num frames 9500... [2023-03-08 18:08:28,306][01803] Avg episode rewards: #0: 27.761, true rewards: #0: 11.886 [2023-03-08 18:08:28,307][01803] Avg episode reward: 27.761, avg true_objective: 11.886 [2023-03-08 18:08:28,410][01803] Num frames 9600... [2023-03-08 18:08:28,520][01803] Num frames 9700... [2023-03-08 18:08:28,630][01803] Num frames 9800... [2023-03-08 18:08:28,743][01803] Num frames 9900... [2023-03-08 18:08:28,855][01803] Num frames 10000... [2023-03-08 18:08:28,975][01803] Num frames 10100... [2023-03-08 18:08:29,087][01803] Num frames 10200... [2023-03-08 18:08:29,200][01803] Num frames 10300... [2023-03-08 18:08:29,320][01803] Num frames 10400... [2023-03-08 18:08:29,431][01803] Num frames 10500... [2023-03-08 18:08:29,545][01803] Num frames 10600... [2023-03-08 18:08:29,659][01803] Num frames 10700... [2023-03-08 18:08:29,774][01803] Num frames 10800... [2023-03-08 18:08:29,889][01803] Num frames 10900... [2023-03-08 18:08:30,009][01803] Num frames 11000... [2023-03-08 18:08:30,151][01803] Avg episode rewards: #0: 28.863, true rewards: #0: 12.308 [2023-03-08 18:08:30,153][01803] Avg episode reward: 28.863, avg true_objective: 12.308 [2023-03-08 18:08:30,182][01803] Num frames 11100... [2023-03-08 18:08:30,297][01803] Num frames 11200... [2023-03-08 18:08:30,409][01803] Num frames 11300... [2023-03-08 18:08:30,519][01803] Num frames 11400... [2023-03-08 18:08:30,628][01803] Num frames 11500... [2023-03-08 18:08:30,738][01803] Num frames 11600... [2023-03-08 18:08:30,854][01803] Avg episode rewards: #0: 27.052, true rewards: #0: 11.652 [2023-03-08 18:08:30,855][01803] Avg episode reward: 27.052, avg true_objective: 11.652 [2023-03-08 18:09:38,503][01803] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-03-08 18:12:32,253][01803] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-03-08 18:12:32,255][01803] Overriding arg 'num_workers' with value 1 passed from command line [2023-03-08 18:12:32,257][01803] Adding new argument 'no_render'=True that is not in the saved config file! [2023-03-08 18:12:32,260][01803] Adding new argument 'save_video'=True that is not in the saved config file! [2023-03-08 18:12:32,262][01803] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-03-08 18:12:32,264][01803] Adding new argument 'video_name'=None that is not in the saved config file! [2023-03-08 18:12:32,266][01803] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-03-08 18:12:32,267][01803] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-03-08 18:12:32,269][01803] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-03-08 18:12:32,270][01803] Adding new argument 'hf_repository'='jinhu2659/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-03-08 18:12:32,271][01803] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-03-08 18:12:32,272][01803] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-03-08 18:12:32,274][01803] Adding new argument 'train_script'=None that is not in the saved config file! [2023-03-08 18:12:32,275][01803] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-03-08 18:12:32,276][01803] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-03-08 18:12:32,302][01803] RunningMeanStd input shape: (3, 72, 128) [2023-03-08 18:12:32,304][01803] RunningMeanStd input shape: (1,) [2023-03-08 18:12:32,318][01803] ConvEncoder: input_channels=3 [2023-03-08 18:12:32,355][01803] Conv encoder output size: 512 [2023-03-08 18:12:32,356][01803] Policy head output size: 512 [2023-03-08 18:12:32,376][01803] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2023-03-08 18:12:32,814][01803] Num frames 100... [2023-03-08 18:12:32,925][01803] Num frames 200... [2023-03-08 18:12:33,042][01803] Num frames 300... [2023-03-08 18:12:33,163][01803] Num frames 400... [2023-03-08 18:12:33,276][01803] Num frames 500... [2023-03-08 18:12:33,389][01803] Num frames 600... [2023-03-08 18:12:33,506][01803] Num frames 700... [2023-03-08 18:12:33,617][01803] Num frames 800... [2023-03-08 18:12:33,739][01803] Num frames 900... [2023-03-08 18:12:33,850][01803] Num frames 1000... [2023-03-08 18:12:33,965][01803] Num frames 1100... [2023-03-08 18:12:34,082][01803] Num frames 1200... [2023-03-08 18:12:34,141][01803] Avg episode rewards: #0: 30.020, true rewards: #0: 12.020 [2023-03-08 18:12:34,145][01803] Avg episode reward: 30.020, avg true_objective: 12.020 [2023-03-08 18:12:34,258][01803] Num frames 1300... [2023-03-08 18:12:34,373][01803] Num frames 1400... [2023-03-08 18:12:34,494][01803] Num frames 1500... [2023-03-08 18:12:34,607][01803] Num frames 1600... [2023-03-08 18:12:34,717][01803] Num frames 1700... [2023-03-08 18:12:34,834][01803] Num frames 1800... [2023-03-08 18:12:34,946][01803] Num frames 1900... [2023-03-08 18:12:35,059][01803] Num frames 2000... [2023-03-08 18:12:35,181][01803] Num frames 2100... [2023-03-08 18:12:35,291][01803] Num frames 2200... [2023-03-08 18:12:35,404][01803] Num frames 2300... [2023-03-08 18:12:35,501][01803] Avg episode rewards: #0: 26.685, true rewards: #0: 11.685 [2023-03-08 18:12:35,502][01803] Avg episode reward: 26.685, avg true_objective: 11.685 [2023-03-08 18:12:35,577][01803] Num frames 2400... [2023-03-08 18:12:35,684][01803] Num frames 2500... [2023-03-08 18:12:35,800][01803] Num frames 2600... [2023-03-08 18:12:35,913][01803] Num frames 2700... [2023-03-08 18:12:36,024][01803] Num frames 2800... [2023-03-08 18:12:36,135][01803] Num frames 2900... [2023-03-08 18:12:36,255][01803] Num frames 3000... [2023-03-08 18:12:36,371][01803] Num frames 3100... [2023-03-08 18:12:36,494][01803] Num frames 3200... [2023-03-08 18:12:36,553][01803] Avg episode rewards: #0: 24.004, true rewards: #0: 10.670 [2023-03-08 18:12:36,555][01803] Avg episode reward: 24.004, avg true_objective: 10.670 [2023-03-08 18:12:36,666][01803] Num frames 3300... [2023-03-08 18:12:36,777][01803] Num frames 3400... [2023-03-08 18:12:36,888][01803] Num frames 3500... [2023-03-08 18:12:37,002][01803] Num frames 3600... [2023-03-08 18:12:37,116][01803] Num frames 3700... [2023-03-08 18:12:37,227][01803] Num frames 3800... [2023-03-08 18:12:37,337][01803] Num frames 3900... [2023-03-08 18:12:37,456][01803] Num frames 4000... [2023-03-08 18:12:37,572][01803] Num frames 4100... [2023-03-08 18:12:37,710][01803] Num frames 4200... [2023-03-08 18:12:37,905][01803] Num frames 4300... [2023-03-08 18:12:38,076][01803] Num frames 4400... [2023-03-08 18:12:38,259][01803] Num frames 4500... [2023-03-08 18:12:38,423][01803] Num frames 4600... [2023-03-08 18:12:38,590][01803] Num frames 4700... [2023-03-08 18:12:38,768][01803] Num frames 4800... [2023-03-08 18:12:38,984][01803] Avg episode rewards: #0: 27.993, true rewards: #0: 12.242 [2023-03-08 18:12:38,989][01803] Avg episode reward: 27.993, avg true_objective: 12.242 [2023-03-08 18:12:38,996][01803] Num frames 4900... [2023-03-08 18:12:39,150][01803] Num frames 5000... [2023-03-08 18:12:39,301][01803] Num frames 5100... [2023-03-08 18:12:39,456][01803] Num frames 5200... [2023-03-08 18:12:39,619][01803] Num frames 5300... [2023-03-08 18:12:39,779][01803] Num frames 5400... [2023-03-08 18:12:39,936][01803] Num frames 5500... [2023-03-08 18:12:40,091][01803] Num frames 5600... [2023-03-08 18:12:40,247][01803] Num frames 5700... [2023-03-08 18:12:40,604][01803] Avg episode rewards: #0: 25.786, true rewards: #0: 11.586 [2023-03-08 18:12:40,606][01803] Avg episode reward: 25.786, avg true_objective: 11.586 [2023-03-08 18:12:40,621][01803] Num frames 5800... [2023-03-08 18:12:40,775][01803] Num frames 5900... [2023-03-08 18:12:40,932][01803] Num frames 6000... [2023-03-08 18:12:41,092][01803] Num frames 6100... [2023-03-08 18:12:41,247][01803] Num frames 6200... [2023-03-08 18:12:41,355][01803] Num frames 6300... [2023-03-08 18:12:41,473][01803] Num frames 6400... [2023-03-08 18:12:41,591][01803] Num frames 6500... [2023-03-08 18:12:41,706][01803] Num frames 6600... [2023-03-08 18:12:41,828][01803] Num frames 6700... [2023-03-08 18:12:41,941][01803] Num frames 6800... [2023-03-08 18:12:42,055][01803] Num frames 6900... [2023-03-08 18:12:42,175][01803] Num frames 7000... [2023-03-08 18:12:42,292][01803] Num frames 7100... [2023-03-08 18:12:42,405][01803] Num frames 7200... [2023-03-08 18:12:42,516][01803] Num frames 7300... [2023-03-08 18:12:42,637][01803] Num frames 7400... [2023-03-08 18:12:42,767][01803] Num frames 7500... [2023-03-08 18:12:42,884][01803] Num frames 7600... [2023-03-08 18:12:43,002][01803] Num frames 7700... [2023-03-08 18:12:43,119][01803] Num frames 7800... [2023-03-08 18:12:43,187][01803] Avg episode rewards: #0: 31.015, true rewards: #0: 13.015 [2023-03-08 18:12:43,188][01803] Avg episode reward: 31.015, avg true_objective: 13.015 [2023-03-08 18:12:43,288][01803] Num frames 7900... [2023-03-08 18:12:43,401][01803] Num frames 8000... [2023-03-08 18:12:43,516][01803] Num frames 8100... [2023-03-08 18:12:43,629][01803] Num frames 8200... [2023-03-08 18:12:43,751][01803] Num frames 8300... [2023-03-08 18:12:43,864][01803] Num frames 8400... [2023-03-08 18:12:43,979][01803] Num frames 8500... [2023-03-08 18:12:44,092][01803] Num frames 8600... [2023-03-08 18:12:44,206][01803] Num frames 8700... [2023-03-08 18:12:44,318][01803] Num frames 8800... [2023-03-08 18:12:44,432][01803] Num frames 8900... [2023-03-08 18:12:44,545][01803] Num frames 9000... [2023-03-08 18:12:44,662][01803] Num frames 9100... [2023-03-08 18:12:44,772][01803] Num frames 9200... [2023-03-08 18:12:44,888][01803] Num frames 9300... [2023-03-08 18:12:45,009][01803] Num frames 9400... [2023-03-08 18:12:45,112][01803] Avg episode rewards: #0: 32.916, true rewards: #0: 13.487 [2023-03-08 18:12:45,114][01803] Avg episode reward: 32.916, avg true_objective: 13.487 [2023-03-08 18:12:45,183][01803] Num frames 9500... [2023-03-08 18:12:45,303][01803] Num frames 9600... [2023-03-08 18:12:45,415][01803] Num frames 9700... [2023-03-08 18:12:45,536][01803] Num frames 9800... [2023-03-08 18:12:45,647][01803] Num frames 9900... [2023-03-08 18:12:45,765][01803] Num frames 10000... [2023-03-08 18:12:45,878][01803] Num frames 10100... [2023-03-08 18:12:45,986][01803] Avg episode rewards: #0: 30.806, true rewards: #0: 12.681 [2023-03-08 18:12:45,989][01803] Avg episode reward: 30.806, avg true_objective: 12.681 [2023-03-08 18:12:46,053][01803] Num frames 10200... [2023-03-08 18:12:46,169][01803] Num frames 10300... [2023-03-08 18:12:46,286][01803] Num frames 10400... [2023-03-08 18:12:46,400][01803] Num frames 10500... [2023-03-08 18:12:46,508][01803] Num frames 10600... [2023-03-08 18:12:46,620][01803] Num frames 10700... [2023-03-08 18:12:46,737][01803] Num frames 10800... [2023-03-08 18:12:46,880][01803] Avg episode rewards: #0: 29.090, true rewards: #0: 12.090 [2023-03-08 18:12:46,882][01803] Avg episode reward: 29.090, avg true_objective: 12.090 [2023-03-08 18:12:46,906][01803] Num frames 10900... [2023-03-08 18:12:47,015][01803] Num frames 11000... [2023-03-08 18:12:47,127][01803] Num frames 11100... [2023-03-08 18:12:47,244][01803] Num frames 11200... [2023-03-08 18:12:47,357][01803] Num frames 11300... [2023-03-08 18:12:47,469][01803] Num frames 11400... [2023-03-08 18:12:47,581][01803] Num frames 11500... [2023-03-08 18:12:47,697][01803] Num frames 11600... [2023-03-08 18:12:47,817][01803] Num frames 11700... [2023-03-08 18:12:47,932][01803] Num frames 11800... [2023-03-08 18:12:48,055][01803] Num frames 11900... [2023-03-08 18:12:48,173][01803] Num frames 12000... [2023-03-08 18:12:48,288][01803] Num frames 12100... [2023-03-08 18:12:48,420][01803] Num frames 12200... [2023-03-08 18:12:48,532][01803] Num frames 12300... [2023-03-08 18:12:48,646][01803] Num frames 12400... [2023-03-08 18:12:48,769][01803] Num frames 12500... [2023-03-08 18:12:48,883][01803] Num frames 12600... [2023-03-08 18:12:48,995][01803] Num frames 12700... [2023-03-08 18:12:49,094][01803] Avg episode rewards: #0: 30.937, true rewards: #0: 12.737 [2023-03-08 18:12:49,096][01803] Avg episode reward: 30.937, avg true_objective: 12.737 [2023-03-08 18:14:04,556][01803] Replay video saved to /content/train_dir/default_experiment/replay.mp4!