diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1333,3 +1333,1259 @@ main_loop: 1553.8885 [2024-06-06 13:32:02,036][00159] Avg episode rewards: #0: 22.029, true rewards: #0: 9.629 [2024-06-06 13:32:02,037][00159] Avg episode reward: 22.029, avg true_objective: 9.629 [2024-06-06 13:33:05,226][00159] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-06-06 13:33:10,908][00159] The model has been pushed to https://huggingface.co/swritchie/rl_course_vizdoom_health_gathering_supreme +[2024-06-06 13:37:26,950][20018] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-06-06 13:37:26,956][20018] Rollout worker 0 uses device cpu +[2024-06-06 13:37:26,960][20018] Rollout worker 1 uses device cpu +[2024-06-06 13:37:26,965][20018] Rollout worker 2 uses device cpu +[2024-06-06 13:37:26,969][20018] Rollout worker 3 uses device cpu +[2024-06-06 13:37:26,986][20018] Rollout worker 4 uses device cpu +[2024-06-06 13:37:26,987][20018] Rollout worker 5 uses device cpu +[2024-06-06 13:37:26,989][20018] Rollout worker 6 uses device cpu +[2024-06-06 13:37:26,991][20018] Rollout worker 7 uses device cpu +[2024-06-06 13:37:27,229][20018] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 13:37:27,232][20018] InferenceWorker_p0-w0: min num requests: 2 +[2024-06-06 13:37:27,303][20018] Starting all processes... +[2024-06-06 13:37:27,309][20018] Starting process learner_proc0 +[2024-06-06 13:37:31,191][20018] Starting all processes... +[2024-06-06 13:37:31,231][20018] Starting process inference_proc0-0 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc0 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc1 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc2 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc3 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc4 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc5 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc6 +[2024-06-06 13:37:31,232][20018] Starting process rollout_proc7 +[2024-06-06 13:37:48,597][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 13:37:48,602][20333] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-06-06 13:37:48,649][20354] Worker 3 uses CPU cores [1] +[2024-06-06 13:37:48,675][20333] Num visible devices: 1 +[2024-06-06 13:37:48,702][20355] Worker 4 uses CPU cores [0] +[2024-06-06 13:37:48,712][20333] Starting seed is not provided +[2024-06-06 13:37:48,713][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 13:37:48,714][20333] Initializing actor-critic model on device cuda:0 +[2024-06-06 13:37:48,715][20333] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 13:37:48,717][20333] RunningMeanStd input shape: (1,) +[2024-06-06 13:37:48,730][20018] Heartbeat connected on Batcher_0 +[2024-06-06 13:37:48,744][20353] Worker 2 uses CPU cores [0] +[2024-06-06 13:37:48,775][20333] ConvEncoder: input_channels=3 +[2024-06-06 13:37:48,789][20018] Heartbeat connected on RolloutWorker_w3 +[2024-06-06 13:37:48,861][20018] Heartbeat connected on RolloutWorker_w4 +[2024-06-06 13:37:48,874][20358] Worker 7 uses CPU cores [1] +[2024-06-06 13:37:48,884][20018] Heartbeat connected on RolloutWorker_w2 +[2024-06-06 13:37:48,920][20351] Worker 0 uses CPU cores [0] +[2024-06-06 13:37:48,953][20352] Worker 1 uses CPU cores [1] +[2024-06-06 13:37:48,988][20018] Heartbeat connected on RolloutWorker_w7 +[2024-06-06 13:37:48,999][20018] Heartbeat connected on RolloutWorker_w0 +[2024-06-06 13:37:49,043][20350] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 13:37:49,043][20350] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-06-06 13:37:49,057][20018] Heartbeat connected on RolloutWorker_w1 +[2024-06-06 13:37:49,091][20350] Num visible devices: 1 +[2024-06-06 13:37:49,115][20018] Heartbeat connected on InferenceWorker_p0-w0 +[2024-06-06 13:37:49,120][20356] Worker 5 uses CPU cores [1] +[2024-06-06 13:37:49,142][20357] Worker 6 uses CPU cores [0] +[2024-06-06 13:37:49,155][20018] Heartbeat connected on RolloutWorker_w6 +[2024-06-06 13:37:49,163][20018] Heartbeat connected on RolloutWorker_w5 +[2024-06-06 13:37:49,197][20333] Conv encoder output size: 512 +[2024-06-06 13:37:49,198][20333] Policy head output size: 512 +[2024-06-06 13:37:49,217][20333] Created Actor Critic model with architecture: +[2024-06-06 13:37:49,217][20333] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-06-06 13:37:49,491][20333] Using optimizer +[2024-06-06 13:37:50,572][20333] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-06-06 13:37:50,607][20333] Loading model from checkpoint +[2024-06-06 13:37:50,609][20333] Loaded experiment state at self.train_step=978, self.env_steps=4005888 +[2024-06-06 13:37:50,610][20333] Initialized policy 0 weights for model version 978 +[2024-06-06 13:37:50,613][20333] LearnerWorker_p0 finished initialization! +[2024-06-06 13:37:50,614][20333] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-06-06 13:37:50,614][20018] Heartbeat connected on LearnerWorker_p0 +[2024-06-06 13:37:50,832][20350] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 13:37:50,833][20350] RunningMeanStd input shape: (1,) +[2024-06-06 13:37:50,846][20350] ConvEncoder: input_channels=3 +[2024-06-06 13:37:50,962][20350] Conv encoder output size: 512 +[2024-06-06 13:37:50,962][20350] Policy head output size: 512 +[2024-06-06 13:37:51,022][20018] Inference worker 0-0 is ready! +[2024-06-06 13:37:51,024][20018] All inference workers are ready! Signal rollout workers to start! +[2024-06-06 13:37:51,270][20358] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,274][20351] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,274][20352] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,276][20355] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,272][20354] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,278][20353] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,273][20357] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:51,276][20356] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 13:37:52,359][20356] Decorrelating experience for 0 frames... +[2024-06-06 13:37:52,364][20358] Decorrelating experience for 0 frames... +[2024-06-06 13:37:52,797][20358] Decorrelating experience for 32 frames... +[2024-06-06 13:37:53,078][20357] Decorrelating experience for 0 frames... +[2024-06-06 13:37:53,082][20355] Decorrelating experience for 0 frames... +[2024-06-06 13:37:53,088][20353] Decorrelating experience for 0 frames... +[2024-06-06 13:37:53,090][20351] Decorrelating experience for 0 frames... +[2024-06-06 13:37:53,819][20358] Decorrelating experience for 64 frames... +[2024-06-06 13:37:54,003][20018] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 13:37:54,190][20356] Decorrelating experience for 32 frames... +[2024-06-06 13:37:54,250][20354] Decorrelating experience for 0 frames... +[2024-06-06 13:37:54,478][20353] Decorrelating experience for 32 frames... +[2024-06-06 13:37:54,479][20355] Decorrelating experience for 32 frames... +[2024-06-06 13:37:54,493][20357] Decorrelating experience for 32 frames... +[2024-06-06 13:37:55,253][20351] Decorrelating experience for 32 frames... +[2024-06-06 13:37:55,593][20358] Decorrelating experience for 96 frames... +[2024-06-06 13:37:55,611][20354] Decorrelating experience for 32 frames... +[2024-06-06 13:37:56,227][20352] Decorrelating experience for 0 frames... +[2024-06-06 13:37:56,660][20355] Decorrelating experience for 64 frames... +[2024-06-06 13:37:58,247][20356] Decorrelating experience for 64 frames... +[2024-06-06 13:37:58,379][20351] Decorrelating experience for 64 frames... +[2024-06-06 13:37:59,003][20018] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 1.2. Samples: 6. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 13:37:59,053][20357] Decorrelating experience for 64 frames... +[2024-06-06 13:37:59,260][20352] Decorrelating experience for 32 frames... +[2024-06-06 13:37:59,392][20353] Decorrelating experience for 64 frames... +[2024-06-06 13:37:59,600][20355] Decorrelating experience for 96 frames... +[2024-06-06 13:38:01,376][20356] Decorrelating experience for 96 frames... +[2024-06-06 13:38:01,481][20351] Decorrelating experience for 96 frames... +[2024-06-06 13:38:02,211][20354] Decorrelating experience for 64 frames... +[2024-06-06 13:38:02,733][20353] Decorrelating experience for 96 frames... +[2024-06-06 13:38:03,101][20352] Decorrelating experience for 64 frames... +[2024-06-06 13:38:04,003][20018] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 108.8. Samples: 1088. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-06-06 13:38:04,006][20018] Avg episode reward: [(0, '2.592')] +[2024-06-06 13:38:06,445][20354] Decorrelating experience for 96 frames... +[2024-06-06 13:38:06,576][20357] Decorrelating experience for 96 frames... +[2024-06-06 13:38:07,122][20333] Signal inference workers to stop experience collection... +[2024-06-06 13:38:07,142][20350] InferenceWorker_p0-w0: stopping experience collection +[2024-06-06 13:38:07,253][20352] Decorrelating experience for 96 frames... +[2024-06-06 13:38:08,378][20333] Signal inference workers to resume experience collection... +[2024-06-06 13:38:08,380][20350] InferenceWorker_p0-w0: resuming experience collection +[2024-06-06 13:38:09,003][20018] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4009984. Throughput: 0: 128.0. Samples: 1920. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-06-06 13:38:09,009][20018] Avg episode reward: [(0, '2.427')] +[2024-06-06 13:38:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 4026368. Throughput: 0: 251.3. Samples: 5026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:38:14,009][20018] Avg episode reward: [(0, '7.512')] +[2024-06-06 13:38:19,003][20018] Fps is (10 sec: 2457.4, 60 sec: 1146.8, 300 sec: 1146.8). Total num frames: 4034560. Throughput: 0: 318.0. Samples: 7950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:38:19,006][20018] Avg episode reward: [(0, '8.834')] +[2024-06-06 13:38:24,003][20018] Fps is (10 sec: 1638.4, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 4042752. Throughput: 0: 306.0. Samples: 9180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:38:24,005][20018] Avg episode reward: [(0, '10.417')] +[2024-06-06 13:38:24,951][20350] Updated weights for policy 0, policy_version 988 (0.0046) +[2024-06-06 13:38:29,003][20018] Fps is (10 sec: 2457.7, 60 sec: 1521.4, 300 sec: 1521.4). Total num frames: 4059136. Throughput: 0: 354.3. Samples: 12402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:38:29,005][20018] Avg episode reward: [(0, '17.916')] +[2024-06-06 13:38:34,003][20018] Fps is (10 sec: 3276.8, 60 sec: 1740.8, 300 sec: 1740.8). Total num frames: 4075520. Throughput: 0: 433.9. Samples: 17354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:38:34,005][20018] Avg episode reward: [(0, '25.079')] +[2024-06-06 13:38:38,948][20350] Updated weights for policy 0, policy_version 998 (0.0036) +[2024-06-06 13:38:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 1820.4, 300 sec: 1820.4). Total num frames: 4087808. Throughput: 0: 433.9. Samples: 19524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:38:39,010][20018] Avg episode reward: [(0, '24.380')] +[2024-06-06 13:38:44,003][20018] Fps is (10 sec: 2048.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 4096000. Throughput: 0: 502.3. Samples: 22608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:38:44,012][20018] Avg episode reward: [(0, '24.183')] +[2024-06-06 13:38:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2010.8, 300 sec: 2010.8). Total num frames: 4116480. Throughput: 0: 585.4. Samples: 27430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:38:49,005][20018] Avg episode reward: [(0, '24.553')] +[2024-06-06 13:38:52,696][20350] Updated weights for policy 0, policy_version 1008 (0.0027) +[2024-06-06 13:38:54,005][20018] Fps is (10 sec: 3276.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4128768. Throughput: 0: 624.5. Samples: 30022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:38:54,008][20018] Avg episode reward: [(0, '24.900')] +[2024-06-06 13:38:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2184.5, 300 sec: 2016.5). Total num frames: 4136960. Throughput: 0: 620.7. Samples: 32958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:38:59,008][20018] Avg episode reward: [(0, '26.942')] +[2024-06-06 13:39:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2106.5). Total num frames: 4153344. Throughput: 0: 640.6. Samples: 36776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:39:04,010][20018] Avg episode reward: [(0, '27.074')] +[2024-06-06 13:39:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2075.3). Total num frames: 4161536. Throughput: 0: 666.5. Samples: 39172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-06-06 13:39:09,009][20018] Avg episode reward: [(0, '26.819')] +[2024-06-06 13:39:12,170][20350] Updated weights for policy 0, policy_version 1018 (0.0046) +[2024-06-06 13:39:14,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2389.3, 300 sec: 2048.0). Total num frames: 4169728. Throughput: 0: 648.0. Samples: 41562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:39:14,008][20018] Avg episode reward: [(0, '25.730')] +[2024-06-06 13:39:19,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2389.4, 300 sec: 2023.9). Total num frames: 4177920. Throughput: 0: 588.7. Samples: 43846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:39:19,011][20018] Avg episode reward: [(0, '25.004')] +[2024-06-06 13:39:19,024][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001020_4177920.pth... +[2024-06-06 13:39:19,182][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000902_3694592.pth +[2024-06-06 13:39:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2093.5). Total num frames: 4194304. Throughput: 0: 590.5. Samples: 46098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:39:24,005][20018] Avg episode reward: [(0, '25.016')] +[2024-06-06 13:39:27,843][20350] Updated weights for policy 0, policy_version 1028 (0.0036) +[2024-06-06 13:39:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2525.9, 300 sec: 2155.8). Total num frames: 4210688. Throughput: 0: 634.8. Samples: 51176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:39:29,010][20018] Avg episode reward: [(0, '24.625')] +[2024-06-06 13:39:34,008][20018] Fps is (10 sec: 2865.6, 60 sec: 2457.4, 300 sec: 2170.8). Total num frames: 4222976. Throughput: 0: 609.7. Samples: 54870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:39:34,011][20018] Avg episode reward: [(0, '23.834')] +[2024-06-06 13:39:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2184.5). Total num frames: 4235264. Throughput: 0: 587.4. Samples: 56456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:39:39,011][20018] Avg episode reward: [(0, '22.416')] +[2024-06-06 13:39:43,121][20350] Updated weights for policy 0, policy_version 1038 (0.0018) +[2024-06-06 13:39:44,003][20018] Fps is (10 sec: 2868.8, 60 sec: 2594.1, 300 sec: 2234.2). Total num frames: 4251648. Throughput: 0: 627.0. Samples: 61174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:39:44,005][20018] Avg episode reward: [(0, '23.734')] +[2024-06-06 13:39:49,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2525.8, 300 sec: 2279.5). Total num frames: 4268032. Throughput: 0: 649.0. Samples: 65984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:39:49,009][20018] Avg episode reward: [(0, '23.523')] +[2024-06-06 13:39:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2252.8). Total num frames: 4276224. Throughput: 0: 630.6. Samples: 67550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:39:54,005][20018] Avg episode reward: [(0, '23.176')] +[2024-06-06 13:39:58,050][20350] Updated weights for policy 0, policy_version 1048 (0.0022) +[2024-06-06 13:39:59,003][20018] Fps is (10 sec: 2458.1, 60 sec: 2594.1, 300 sec: 2293.8). Total num frames: 4292608. Throughput: 0: 659.5. Samples: 71238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:39:59,010][20018] Avg episode reward: [(0, '22.383')] +[2024-06-06 13:40:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2331.6). Total num frames: 4308992. Throughput: 0: 722.6. Samples: 76364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:40:04,005][20018] Avg episode reward: [(0, '22.037')] +[2024-06-06 13:40:09,006][20018] Fps is (10 sec: 2866.3, 60 sec: 2662.3, 300 sec: 2336.2). Total num frames: 4321280. Throughput: 0: 724.4. Samples: 78698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:40:09,008][20018] Avg episode reward: [(0, '22.129')] +[2024-06-06 13:40:13,300][20350] Updated weights for policy 0, policy_version 1058 (0.0029) +[2024-06-06 13:40:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2340.6). Total num frames: 4333568. Throughput: 0: 677.6. Samples: 81670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:40:14,008][20018] Avg episode reward: [(0, '23.298')] +[2024-06-06 13:40:19,003][20018] Fps is (10 sec: 2868.1, 60 sec: 2867.2, 300 sec: 2372.9). Total num frames: 4349952. Throughput: 0: 700.6. Samples: 86394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:40:19,005][20018] Avg episode reward: [(0, '21.871')] +[2024-06-06 13:40:24,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2403.0). Total num frames: 4366336. Throughput: 0: 720.8. Samples: 88892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:40:24,008][20018] Avg episode reward: [(0, '22.377')] +[2024-06-06 13:40:27,116][20350] Updated weights for policy 0, policy_version 1068 (0.0026) +[2024-06-06 13:40:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2378.3). Total num frames: 4374528. Throughput: 0: 702.4. Samples: 92784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:40:29,005][20018] Avg episode reward: [(0, '21.844')] +[2024-06-06 13:40:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.2, 300 sec: 2406.4). Total num frames: 4390912. Throughput: 0: 675.3. Samples: 96372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:40:34,008][20018] Avg episode reward: [(0, '21.551')] +[2024-06-06 13:40:39,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2432.8). Total num frames: 4407296. Throughput: 0: 696.4. Samples: 98888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:40:39,005][20018] Avg episode reward: [(0, '23.563')] +[2024-06-06 13:40:40,930][20350] Updated weights for policy 0, policy_version 1078 (0.0041) +[2024-06-06 13:40:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2433.5). Total num frames: 4419584. Throughput: 0: 721.7. Samples: 103714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:40:44,005][20018] Avg episode reward: [(0, '24.745')] +[2024-06-06 13:40:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2434.2). Total num frames: 4431872. Throughput: 0: 674.7. Samples: 106726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:40:49,007][20018] Avg episode reward: [(0, '24.777')] +[2024-06-06 13:40:54,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2457.6). Total num frames: 4448256. Throughput: 0: 668.8. Samples: 108792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:40:54,005][20018] Avg episode reward: [(0, '25.201')] +[2024-06-06 13:40:56,205][20350] Updated weights for policy 0, policy_version 1088 (0.0022) +[2024-06-06 13:40:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2479.7). Total num frames: 4464640. Throughput: 0: 713.4. Samples: 113772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:40:59,005][20018] Avg episode reward: [(0, '26.337')] +[2024-06-06 13:41:04,002][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2457.6). Total num frames: 4472832. Throughput: 0: 694.6. Samples: 117652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:41:04,007][20018] Avg episode reward: [(0, '26.051')] +[2024-06-06 13:41:09,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.8, 300 sec: 2457.6). Total num frames: 4485120. Throughput: 0: 673.9. Samples: 119216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:09,005][20018] Avg episode reward: [(0, '26.368')] +[2024-06-06 13:41:12,219][20350] Updated weights for policy 0, policy_version 1098 (0.0042) +[2024-06-06 13:41:14,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2478.1). Total num frames: 4501504. Throughput: 0: 681.0. Samples: 123430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:14,009][20018] Avg episode reward: [(0, '27.421')] +[2024-06-06 13:41:19,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2497.5). Total num frames: 4517888. Throughput: 0: 713.5. Samples: 128482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:19,010][20018] Avg episode reward: [(0, '27.133')] +[2024-06-06 13:41:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001103_4517888.pth... +[2024-06-06 13:41:19,204][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth +[2024-06-06 13:41:24,002][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2477.1). Total num frames: 4526080. Throughput: 0: 690.6. Samples: 129964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:41:24,006][20018] Avg episode reward: [(0, '26.136')] +[2024-06-06 13:41:27,164][20350] Updated weights for policy 0, policy_version 1108 (0.0038) +[2024-06-06 13:41:29,002][20018] Fps is (10 sec: 2458.3, 60 sec: 2798.9, 300 sec: 2495.7). Total num frames: 4542464. Throughput: 0: 662.7. Samples: 133534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:41:29,005][20018] Avg episode reward: [(0, '25.694')] +[2024-06-06 13:41:34,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2513.5). Total num frames: 4558848. Throughput: 0: 709.6. Samples: 138660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:41:34,011][20018] Avg episode reward: [(0, '24.416')] +[2024-06-06 13:41:39,002][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2512.2). Total num frames: 4571136. Throughput: 0: 718.5. Samples: 141126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:39,007][20018] Avg episode reward: [(0, '24.509')] +[2024-06-06 13:41:41,673][20350] Updated weights for policy 0, policy_version 1118 (0.0036) +[2024-06-06 13:41:44,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2730.6, 300 sec: 2511.0). Total num frames: 4583424. Throughput: 0: 675.4. Samples: 144168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:44,010][20018] Avg episode reward: [(0, '25.813')] +[2024-06-06 13:41:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2527.3). Total num frames: 4599808. Throughput: 0: 688.6. Samples: 148638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:49,011][20018] Avg episode reward: [(0, '25.435')] +[2024-06-06 13:41:54,002][20018] Fps is (10 sec: 3277.4, 60 sec: 2798.9, 300 sec: 2542.9). Total num frames: 4616192. Throughput: 0: 709.4. Samples: 151138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:41:54,009][20018] Avg episode reward: [(0, '26.091')] +[2024-06-06 13:41:54,882][20350] Updated weights for policy 0, policy_version 1128 (0.0031) +[2024-06-06 13:41:59,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2524.5). Total num frames: 4624384. Throughput: 0: 707.6. Samples: 155272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:41:59,006][20018] Avg episode reward: [(0, '26.249')] +[2024-06-06 13:42:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2539.5). Total num frames: 4640768. Throughput: 0: 672.4. Samples: 158736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:42:04,005][20018] Avg episode reward: [(0, '25.724')] +[2024-06-06 13:42:09,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2554.0). Total num frames: 4657152. Throughput: 0: 696.8. Samples: 161320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:42:09,005][20018] Avg episode reward: [(0, '26.064')] +[2024-06-06 13:42:09,849][20350] Updated weights for policy 0, policy_version 1138 (0.0023) +[2024-06-06 13:42:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2552.1). Total num frames: 4669440. Throughput: 0: 728.6. Samples: 166322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:42:14,009][20018] Avg episode reward: [(0, '27.983')] +[2024-06-06 13:42:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2550.3). Total num frames: 4681728. Throughput: 0: 684.3. Samples: 169454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:42:19,010][20018] Avg episode reward: [(0, '27.945')] +[2024-06-06 13:42:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2563.8). Total num frames: 4698112. Throughput: 0: 672.3. Samples: 171378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:42:24,010][20018] Avg episode reward: [(0, '27.245')] +[2024-06-06 13:42:24,913][20350] Updated weights for policy 0, policy_version 1148 (0.0050) +[2024-06-06 13:42:29,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2867.2, 300 sec: 2576.8). Total num frames: 4714496. Throughput: 0: 719.1. Samples: 176526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:42:29,006][20018] Avg episode reward: [(0, '26.392')] +[2024-06-06 13:42:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2574.6). Total num frames: 4726784. Throughput: 0: 709.0. Samples: 180542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:42:34,005][20018] Avg episode reward: [(0, '25.773')] +[2024-06-06 13:42:39,003][20018] Fps is (10 sec: 2048.2, 60 sec: 2730.7, 300 sec: 2558.2). Total num frames: 4734976. Throughput: 0: 686.4. Samples: 182026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:42:39,005][20018] Avg episode reward: [(0, '26.134')] +[2024-06-06 13:42:40,562][20350] Updated weights for policy 0, policy_version 1158 (0.0017) +[2024-06-06 13:42:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2570.6). Total num frames: 4751360. Throughput: 0: 688.4. Samples: 186248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:42:44,010][20018] Avg episode reward: [(0, '26.152')] +[2024-06-06 13:42:49,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2582.6). Total num frames: 4767744. Throughput: 0: 722.9. Samples: 191266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:42:49,007][20018] Avg episode reward: [(0, '26.560')] +[2024-06-06 13:42:54,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2730.6, 300 sec: 2624.2). Total num frames: 4780032. Throughput: 0: 701.8. Samples: 192904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:42:54,009][20018] Avg episode reward: [(0, '26.713')] +[2024-06-06 13:42:55,456][20350] Updated weights for policy 0, policy_version 1168 (0.0018) +[2024-06-06 13:42:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2665.9). Total num frames: 4792320. Throughput: 0: 665.1. Samples: 196252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:42:59,010][20018] Avg episode reward: [(0, '27.026')] +[2024-06-06 13:43:04,006][20018] Fps is (10 sec: 2866.8, 60 sec: 2798.8, 300 sec: 2707.5). Total num frames: 4808704. Throughput: 0: 698.2. Samples: 200876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:04,019][20018] Avg episode reward: [(0, '26.138')] +[2024-06-06 13:43:09,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2679.8). Total num frames: 4816896. Throughput: 0: 689.1. Samples: 202388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:43:09,008][20018] Avg episode reward: [(0, '26.782')] +[2024-06-06 13:43:13,126][20350] Updated weights for policy 0, policy_version 1178 (0.0036) +[2024-06-06 13:43:14,005][20018] Fps is (10 sec: 1638.5, 60 sec: 2594.0, 300 sec: 2679.7). Total num frames: 4825088. Throughput: 0: 629.6. Samples: 204860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:14,010][20018] Avg episode reward: [(0, '26.224')] +[2024-06-06 13:43:19,003][20018] Fps is (10 sec: 2048.1, 60 sec: 2594.1, 300 sec: 2693.6). Total num frames: 4837376. Throughput: 0: 620.8. Samples: 208480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:43:19,008][20018] Avg episode reward: [(0, '26.233')] +[2024-06-06 13:43:19,021][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001181_4837376.pth... +[2024-06-06 13:43:19,163][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001020_4177920.pth +[2024-06-06 13:43:24,003][20018] Fps is (10 sec: 3277.5, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4857856. Throughput: 0: 643.1. Samples: 210964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:24,005][20018] Avg episode reward: [(0, '27.320')] +[2024-06-06 13:43:26,526][20350] Updated weights for policy 0, policy_version 1188 (0.0036) +[2024-06-06 13:43:29,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2594.0, 300 sec: 2693.6). Total num frames: 4870144. Throughput: 0: 659.2. Samples: 215916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:29,008][20018] Avg episode reward: [(0, '26.221')] +[2024-06-06 13:43:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2679.8). Total num frames: 4878336. Throughput: 0: 614.2. Samples: 218906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:43:34,005][20018] Avg episode reward: [(0, '25.327')] +[2024-06-06 13:43:39,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4894720. Throughput: 0: 624.6. Samples: 221012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:39,011][20018] Avg episode reward: [(0, '24.100')] +[2024-06-06 13:43:42,093][20350] Updated weights for policy 0, policy_version 1198 (0.0031) +[2024-06-06 13:43:44,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 4911104. Throughput: 0: 659.3. Samples: 225922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:43:44,005][20018] Avg episode reward: [(0, '24.614')] +[2024-06-06 13:43:49,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2594.1, 300 sec: 2693.6). Total num frames: 4923392. Throughput: 0: 643.6. Samples: 229834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:43:49,008][20018] Avg episode reward: [(0, '24.841')] +[2024-06-06 13:43:54,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2594.2, 300 sec: 2707.5). Total num frames: 4935680. Throughput: 0: 643.3. Samples: 231336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:43:54,005][20018] Avg episode reward: [(0, '26.531')] +[2024-06-06 13:43:57,414][20350] Updated weights for policy 0, policy_version 1208 (0.0026) +[2024-06-06 13:43:59,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 4952064. Throughput: 0: 689.0. Samples: 235864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:43:59,011][20018] Avg episode reward: [(0, '26.444')] +[2024-06-06 13:44:04,003][20018] Fps is (10 sec: 3276.9, 60 sec: 2662.5, 300 sec: 2735.3). Total num frames: 4968448. Throughput: 0: 718.9. Samples: 240830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:44:04,009][20018] Avg episode reward: [(0, '25.697')] +[2024-06-06 13:44:09,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2662.3, 300 sec: 2735.3). Total num frames: 4976640. Throughput: 0: 697.9. Samples: 242372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:44:09,007][20018] Avg episode reward: [(0, '25.071')] +[2024-06-06 13:44:12,791][20350] Updated weights for policy 0, policy_version 1218 (0.0020) +[2024-06-06 13:44:14,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 4988928. Throughput: 0: 662.2. Samples: 245712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:44:14,006][20018] Avg episode reward: [(0, '25.129')] +[2024-06-06 13:44:19,003][20018] Fps is (10 sec: 3277.4, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5009408. Throughput: 0: 708.8. Samples: 250800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:44:19,005][20018] Avg episode reward: [(0, '24.814')] +[2024-06-06 13:44:24,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2730.5, 300 sec: 2749.2). Total num frames: 5021696. Throughput: 0: 718.1. Samples: 253328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:44:24,008][20018] Avg episode reward: [(0, '24.198')] +[2024-06-06 13:44:27,256][20350] Updated weights for policy 0, policy_version 1228 (0.0038) +[2024-06-06 13:44:29,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2735.3). Total num frames: 5029888. Throughput: 0: 676.8. Samples: 256378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:44:29,005][20018] Avg episode reward: [(0, '22.802')] +[2024-06-06 13:44:34,003][20018] Fps is (10 sec: 2867.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5050368. Throughput: 0: 688.9. Samples: 260836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-06-06 13:44:34,006][20018] Avg episode reward: [(0, '22.799')] +[2024-06-06 13:44:39,003][20018] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5066752. Throughput: 0: 711.8. Samples: 263366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:44:39,005][20018] Avg episode reward: [(0, '23.041')] +[2024-06-06 13:44:41,049][20350] Updated weights for policy 0, policy_version 1238 (0.0036) +[2024-06-06 13:44:44,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5074944. Throughput: 0: 698.6. Samples: 267302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:44:44,005][20018] Avg episode reward: [(0, '23.123')] +[2024-06-06 13:44:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5087232. Throughput: 0: 665.2. Samples: 270766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:44:49,008][20018] Avg episode reward: [(0, '24.571')] +[2024-06-06 13:44:54,002][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5107712. Throughput: 0: 687.5. Samples: 273308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:44:54,005][20018] Avg episode reward: [(0, '25.210')] +[2024-06-06 13:44:55,121][20350] Updated weights for policy 0, policy_version 1248 (0.0025) +[2024-06-06 13:44:59,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2749.2). Total num frames: 5120000. Throughput: 0: 734.1. Samples: 278750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:44:59,008][20018] Avg episode reward: [(0, '25.624')] +[2024-06-06 13:45:04,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5132288. Throughput: 0: 695.2. Samples: 282086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:04,005][20018] Avg episode reward: [(0, '26.340')] +[2024-06-06 13:45:09,005][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2763.0). Total num frames: 5148672. Throughput: 0: 684.1. Samples: 284114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:09,012][20018] Avg episode reward: [(0, '26.613')] +[2024-06-06 13:45:09,735][20350] Updated weights for policy 0, policy_version 1258 (0.0021) +[2024-06-06 13:45:14,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 5165056. Throughput: 0: 728.5. Samples: 289160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:45:14,012][20018] Avg episode reward: [(0, '28.372')] +[2024-06-06 13:45:19,003][20018] Fps is (10 sec: 2868.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5177344. Throughput: 0: 720.7. Samples: 293266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:45:19,009][20018] Avg episode reward: [(0, '28.487')] +[2024-06-06 13:45:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001264_5177344.pth... +[2024-06-06 13:45:19,218][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001103_4517888.pth +[2024-06-06 13:45:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2763.1). Total num frames: 5189632. Throughput: 0: 696.0. Samples: 294686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:24,010][20018] Avg episode reward: [(0, '29.361')] +[2024-06-06 13:45:24,013][20333] Saving new best policy, reward=29.361! +[2024-06-06 13:45:24,983][20350] Updated weights for policy 0, policy_version 1268 (0.0045) +[2024-06-06 13:45:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2763.1). Total num frames: 5206016. Throughput: 0: 709.5. Samples: 299230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:29,010][20018] Avg episode reward: [(0, '28.119')] +[2024-06-06 13:45:34,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5222400. Throughput: 0: 741.4. Samples: 304130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:45:34,005][20018] Avg episode reward: [(0, '28.570')] +[2024-06-06 13:45:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5230592. Throughput: 0: 718.2. Samples: 305628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:39,013][20018] Avg episode reward: [(0, '29.010')] +[2024-06-06 13:45:40,207][20350] Updated weights for policy 0, policy_version 1278 (0.0037) +[2024-06-06 13:45:44,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5242880. Throughput: 0: 669.8. Samples: 308888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:45:44,005][20018] Avg episode reward: [(0, '28.986')] +[2024-06-06 13:45:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5259264. Throughput: 0: 707.2. Samples: 313910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:45:49,010][20018] Avg episode reward: [(0, '29.509')] +[2024-06-06 13:45:49,104][20333] Saving new best policy, reward=29.509! +[2024-06-06 13:45:54,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5271552. Throughput: 0: 714.5. Samples: 316264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:45:54,010][20018] Avg episode reward: [(0, '30.129')] +[2024-06-06 13:45:54,012][20333] Saving new best policy, reward=30.129! +[2024-06-06 13:45:54,691][20350] Updated weights for policy 0, policy_version 1288 (0.0019) +[2024-06-06 13:45:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 5283840. Throughput: 0: 668.8. Samples: 319258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:45:59,005][20018] Avg episode reward: [(0, '29.440')] +[2024-06-06 13:46:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5300224. Throughput: 0: 677.4. Samples: 323748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:46:04,005][20018] Avg episode reward: [(0, '27.721')] +[2024-06-06 13:46:08,245][20350] Updated weights for policy 0, policy_version 1298 (0.0016) +[2024-06-06 13:46:09,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2799.0, 300 sec: 2763.0). Total num frames: 5316608. Throughput: 0: 701.0. Samples: 326234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:46:09,014][20018] Avg episode reward: [(0, '28.397')] +[2024-06-06 13:46:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5328896. Throughput: 0: 687.9. Samples: 330186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:46:14,005][20018] Avg episode reward: [(0, '28.027')] +[2024-06-06 13:46:19,003][20018] Fps is (10 sec: 2048.5, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 5337088. Throughput: 0: 655.3. Samples: 333618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:46:19,008][20018] Avg episode reward: [(0, '28.289')] +[2024-06-06 13:46:23,737][20350] Updated weights for policy 0, policy_version 1308 (0.0028) +[2024-06-06 13:46:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5357568. Throughput: 0: 678.7. Samples: 336170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:46:24,009][20018] Avg episode reward: [(0, '27.744')] +[2024-06-06 13:46:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5369856. Throughput: 0: 719.3. Samples: 341258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:46:29,011][20018] Avg episode reward: [(0, '27.312')] +[2024-06-06 13:46:34,005][20018] Fps is (10 sec: 2456.9, 60 sec: 2662.3, 300 sec: 2749.2). Total num frames: 5382144. Throughput: 0: 675.3. Samples: 344302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:46:34,008][20018] Avg episode reward: [(0, '26.906')] +[2024-06-06 13:46:38,998][20350] Updated weights for policy 0, policy_version 1318 (0.0028) +[2024-06-06 13:46:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5398528. Throughput: 0: 665.5. Samples: 346212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:46:39,005][20018] Avg episode reward: [(0, '27.093')] +[2024-06-06 13:46:44,003][20018] Fps is (10 sec: 2868.0, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5410816. Throughput: 0: 710.7. Samples: 351240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:46:44,006][20018] Avg episode reward: [(0, '28.390')] +[2024-06-06 13:46:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5423104. Throughput: 0: 699.8. Samples: 355238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:46:49,007][20018] Avg episode reward: [(0, '29.759')] +[2024-06-06 13:46:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5435392. Throughput: 0: 678.9. Samples: 356782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:46:54,010][20018] Avg episode reward: [(0, '28.427')] +[2024-06-06 13:46:54,545][20350] Updated weights for policy 0, policy_version 1328 (0.0046) +[2024-06-06 13:46:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5455872. Throughput: 0: 689.9. Samples: 361232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:46:59,005][20018] Avg episode reward: [(0, '27.701')] +[2024-06-06 13:47:04,008][20018] Fps is (10 sec: 3274.9, 60 sec: 2798.7, 300 sec: 2749.1). Total num frames: 5468160. Throughput: 0: 725.8. Samples: 366284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:47:04,011][20018] Avg episode reward: [(0, '28.158')] +[2024-06-06 13:47:08,721][20350] Updated weights for policy 0, policy_version 1338 (0.0026) +[2024-06-06 13:47:09,007][20018] Fps is (10 sec: 2456.4, 60 sec: 2730.6, 300 sec: 2749.1). Total num frames: 5480448. Throughput: 0: 703.2. Samples: 367818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:47:09,011][20018] Avg episode reward: [(0, '28.420')] +[2024-06-06 13:47:14,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2730.7, 300 sec: 2749.2). Total num frames: 5492736. Throughput: 0: 665.4. Samples: 371202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:47:14,009][20018] Avg episode reward: [(0, '29.323')] +[2024-06-06 13:47:19,003][20018] Fps is (10 sec: 2868.6, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5509120. Throughput: 0: 708.7. Samples: 376192. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:47:19,009][20018] Avg episode reward: [(0, '30.120')] +[2024-06-06 13:47:19,023][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth... +[2024-06-06 13:47:19,152][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001181_4837376.pth +[2024-06-06 13:47:22,380][20350] Updated weights for policy 0, policy_version 1348 (0.0015) +[2024-06-06 13:47:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5521408. Throughput: 0: 721.7. Samples: 378688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:47:24,007][20018] Avg episode reward: [(0, '28.871')] +[2024-06-06 13:47:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5533696. Throughput: 0: 678.1. Samples: 381756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:47:29,005][20018] Avg episode reward: [(0, '29.723')] +[2024-06-06 13:47:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2763.1). Total num frames: 5550080. Throughput: 0: 684.9. Samples: 386060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:47:34,017][20018] Avg episode reward: [(0, '29.338')] +[2024-06-06 13:47:37,416][20350] Updated weights for policy 0, policy_version 1358 (0.0027) +[2024-06-06 13:47:39,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2763.1). Total num frames: 5566464. Throughput: 0: 706.0. Samples: 388552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:47:39,009][20018] Avg episode reward: [(0, '31.329')] +[2024-06-06 13:47:39,021][20333] Saving new best policy, reward=31.329! +[2024-06-06 13:47:44,004][20018] Fps is (10 sec: 2457.4, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 5574656. Throughput: 0: 696.3. Samples: 392566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:47:44,007][20018] Avg episode reward: [(0, '31.324')] +[2024-06-06 13:47:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5586944. Throughput: 0: 655.0. Samples: 395754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:47:49,005][20018] Avg episode reward: [(0, '28.732')] +[2024-06-06 13:47:54,006][20018] Fps is (10 sec: 2457.1, 60 sec: 2730.5, 300 sec: 2735.3). Total num frames: 5599232. Throughput: 0: 667.0. Samples: 397834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:47:54,010][20018] Avg episode reward: [(0, '28.208')] +[2024-06-06 13:47:54,917][20350] Updated weights for policy 0, policy_version 1368 (0.0016) +[2024-06-06 13:47:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2707.6). Total num frames: 5607424. Throughput: 0: 659.3. Samples: 400872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:47:59,011][20018] Avg episode reward: [(0, '27.807')] +[2024-06-06 13:48:04,003][20018] Fps is (10 sec: 2048.6, 60 sec: 2526.1, 300 sec: 2721.4). Total num frames: 5619712. Throughput: 0: 614.0. Samples: 403820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:48:04,008][20018] Avg episode reward: [(0, '27.721')] +[2024-06-06 13:48:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.1, 300 sec: 2735.3). Total num frames: 5632000. Throughput: 0: 594.4. Samples: 405434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:48:09,012][20018] Avg episode reward: [(0, '29.175')] +[2024-06-06 13:48:11,618][20350] Updated weights for policy 0, policy_version 1378 (0.0033) +[2024-06-06 13:48:14,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5648384. Throughput: 0: 639.5. Samples: 410534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:48:14,005][20018] Avg episode reward: [(0, '28.412')] +[2024-06-06 13:48:19,006][20018] Fps is (10 sec: 3275.8, 60 sec: 2594.0, 300 sec: 2735.3). Total num frames: 5664768. Throughput: 0: 638.8. Samples: 414808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:48:19,012][20018] Avg episode reward: [(0, '28.011')] +[2024-06-06 13:48:24,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2525.8, 300 sec: 2721.4). Total num frames: 5672960. Throughput: 0: 616.5. Samples: 416296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:48:24,013][20018] Avg episode reward: [(0, '26.777')] +[2024-06-06 13:48:27,085][20350] Updated weights for policy 0, policy_version 1388 (0.0027) +[2024-06-06 13:48:29,003][20018] Fps is (10 sec: 2458.4, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5689344. Throughput: 0: 619.1. Samples: 420426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:48:29,005][20018] Avg episode reward: [(0, '26.795')] +[2024-06-06 13:48:34,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 5705728. Throughput: 0: 659.1. Samples: 425412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:48:34,010][20018] Avg episode reward: [(0, '27.800')] +[2024-06-06 13:48:39,004][20018] Fps is (10 sec: 2866.8, 60 sec: 2525.8, 300 sec: 2735.3). Total num frames: 5718016. Throughput: 0: 656.6. Samples: 427378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:48:39,007][20018] Avg episode reward: [(0, '28.618')] +[2024-06-06 13:48:42,087][20350] Updated weights for policy 0, policy_version 1398 (0.0029) +[2024-06-06 13:48:44,005][20018] Fps is (10 sec: 2456.9, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 5730304. Throughput: 0: 658.9. Samples: 430524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:48:44,008][20018] Avg episode reward: [(0, '28.016')] +[2024-06-06 13:48:49,003][20018] Fps is (10 sec: 2867.6, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 5746688. Throughput: 0: 709.1. Samples: 435730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:48:49,011][20018] Avg episode reward: [(0, '27.836')] +[2024-06-06 13:48:54,003][20018] Fps is (10 sec: 3277.7, 60 sec: 2730.8, 300 sec: 2749.2). Total num frames: 5763072. Throughput: 0: 732.4. Samples: 438394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:48:54,005][20018] Avg episode reward: [(0, '28.303')] +[2024-06-06 13:48:54,897][20350] Updated weights for policy 0, policy_version 1408 (0.0022) +[2024-06-06 13:48:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5775360. Throughput: 0: 696.5. Samples: 441876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:48:59,005][20018] Avg episode reward: [(0, '27.045')] +[2024-06-06 13:49:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2763.1). Total num frames: 5791744. Throughput: 0: 698.0. Samples: 446214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:49:04,005][20018] Avg episode reward: [(0, '27.788')] +[2024-06-06 13:49:08,580][20350] Updated weights for policy 0, policy_version 1418 (0.0015) +[2024-06-06 13:49:09,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2777.0). Total num frames: 5808128. Throughput: 0: 725.2. Samples: 448930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:49:09,010][20018] Avg episode reward: [(0, '27.232')] +[2024-06-06 13:49:14,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2867.1, 300 sec: 2749.2). Total num frames: 5820416. Throughput: 0: 731.0. Samples: 453324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:49:14,006][20018] Avg episode reward: [(0, '27.308')] +[2024-06-06 13:49:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2749.2). Total num frames: 5832704. Throughput: 0: 689.1. Samples: 456422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:49:19,010][20018] Avg episode reward: [(0, '27.351')] +[2024-06-06 13:49:19,022][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001424_5832704.pth... +[2024-06-06 13:49:19,171][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001264_5177344.pth +[2024-06-06 13:49:23,901][20350] Updated weights for policy 0, policy_version 1428 (0.0028) +[2024-06-06 13:49:24,003][20018] Fps is (10 sec: 2867.6, 60 sec: 2935.5, 300 sec: 2776.9). Total num frames: 5849088. Throughput: 0: 699.8. Samples: 458870. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-06-06 13:49:24,008][20018] Avg episode reward: [(0, '25.795')] +[2024-06-06 13:49:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 5861376. Throughput: 0: 741.7. Samples: 463900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:49:29,009][20018] Avg episode reward: [(0, '25.712')] +[2024-06-06 13:49:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5873664. Throughput: 0: 700.2. Samples: 467238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:49:34,005][20018] Avg episode reward: [(0, '25.408')] +[2024-06-06 13:49:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2749.2). Total num frames: 5885952. Throughput: 0: 674.3. Samples: 468738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:49:39,011][20018] Avg episode reward: [(0, '24.941')] +[2024-06-06 13:49:39,527][20350] Updated weights for policy 0, policy_version 1438 (0.0052) +[2024-06-06 13:49:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.3, 300 sec: 2763.1). Total num frames: 5902336. Throughput: 0: 708.3. Samples: 473748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:49:44,005][20018] Avg episode reward: [(0, '25.818')] +[2024-06-06 13:49:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 5914624. Throughput: 0: 707.2. Samples: 478036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:49:49,011][20018] Avg episode reward: [(0, '26.302')] +[2024-06-06 13:49:54,004][20018] Fps is (10 sec: 2457.3, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 5926912. Throughput: 0: 679.9. Samples: 479524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:49:54,007][20018] Avg episode reward: [(0, '26.634')] +[2024-06-06 13:49:54,915][20350] Updated weights for policy 0, policy_version 1448 (0.0042) +[2024-06-06 13:49:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5943296. Throughput: 0: 671.6. Samples: 483544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:49:59,006][20018] Avg episode reward: [(0, '25.409')] +[2024-06-06 13:50:04,003][20018] Fps is (10 sec: 3277.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 5959680. Throughput: 0: 712.5. Samples: 488486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:04,005][20018] Avg episode reward: [(0, '26.158')] +[2024-06-06 13:50:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 5967872. Throughput: 0: 699.3. Samples: 490340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:50:09,005][20018] Avg episode reward: [(0, '27.691')] +[2024-06-06 13:50:09,682][20350] Updated weights for policy 0, policy_version 1458 (0.0026) +[2024-06-06 13:50:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.5, 300 sec: 2721.4). Total num frames: 5980160. Throughput: 0: 653.2. Samples: 493294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:14,009][20018] Avg episode reward: [(0, '27.932')] +[2024-06-06 13:50:19,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 5996544. Throughput: 0: 688.4. Samples: 498216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:50:19,006][20018] Avg episode reward: [(0, '28.560')] +[2024-06-06 13:50:23,591][20350] Updated weights for policy 0, policy_version 1468 (0.0021) +[2024-06-06 13:50:24,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6012928. Throughput: 0: 710.6. Samples: 500714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:50:24,006][20018] Avg episode reward: [(0, '29.066')] +[2024-06-06 13:50:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6021120. Throughput: 0: 674.4. Samples: 504094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:50:29,005][20018] Avg episode reward: [(0, '29.491')] +[2024-06-06 13:50:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6037504. Throughput: 0: 666.8. Samples: 508040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:34,008][20018] Avg episode reward: [(0, '29.805')] +[2024-06-06 13:50:38,366][20350] Updated weights for policy 0, policy_version 1478 (0.0018) +[2024-06-06 13:50:39,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 6053888. Throughput: 0: 688.5. Samples: 510506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:39,005][20018] Avg episode reward: [(0, '29.112')] +[2024-06-06 13:50:44,004][20018] Fps is (10 sec: 2866.7, 60 sec: 2730.6, 300 sec: 2735.3). Total num frames: 6066176. Throughput: 0: 698.1. Samples: 514960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:50:44,007][20018] Avg episode reward: [(0, '30.930')] +[2024-06-06 13:50:49,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6074368. Throughput: 0: 654.6. Samples: 517942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:49,005][20018] Avg episode reward: [(0, '30.438')] +[2024-06-06 13:50:54,003][20018] Fps is (10 sec: 2458.0, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6090752. Throughput: 0: 666.6. Samples: 520336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:54,011][20018] Avg episode reward: [(0, '29.671')] +[2024-06-06 13:50:54,100][20350] Updated weights for policy 0, policy_version 1488 (0.0018) +[2024-06-06 13:50:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6107136. Throughput: 0: 711.0. Samples: 525290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:50:59,006][20018] Avg episode reward: [(0, '29.318')] +[2024-06-06 13:51:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6119424. Throughput: 0: 679.6. Samples: 528798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:51:04,009][20018] Avg episode reward: [(0, '29.295')] +[2024-06-06 13:51:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6131712. Throughput: 0: 658.0. Samples: 530326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:51:09,005][20018] Avg episode reward: [(0, '28.874')] +[2024-06-06 13:51:09,429][20350] Updated weights for policy 0, policy_version 1498 (0.0039) +[2024-06-06 13:51:14,008][20018] Fps is (10 sec: 2865.6, 60 sec: 2798.7, 300 sec: 2749.1). Total num frames: 6148096. Throughput: 0: 691.1. Samples: 535198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:51:14,016][20018] Avg episode reward: [(0, '29.764')] +[2024-06-06 13:51:19,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6160384. Throughput: 0: 703.8. Samples: 539712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:51:19,008][20018] Avg episode reward: [(0, '29.126')] +[2024-06-06 13:51:19,064][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001505_6164480.pth... +[2024-06-06 13:51:19,218][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001345_5509120.pth +[2024-06-06 13:51:24,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6172672. Throughput: 0: 682.7. Samples: 541226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:51:24,005][20018] Avg episode reward: [(0, '28.099')] +[2024-06-06 13:51:25,018][20350] Updated weights for policy 0, policy_version 1508 (0.0024) +[2024-06-06 13:51:29,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6189056. Throughput: 0: 667.7. Samples: 545004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:51:29,005][20018] Avg episode reward: [(0, '27.215')] +[2024-06-06 13:51:34,005][20018] Fps is (10 sec: 3275.9, 60 sec: 2798.8, 300 sec: 2735.3). Total num frames: 6205440. Throughput: 0: 712.8. Samples: 550022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:51:34,007][20018] Avg episode reward: [(0, '25.411')] +[2024-06-06 13:51:37,990][20350] Updated weights for policy 0, policy_version 1518 (0.0020) +[2024-06-06 13:51:39,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6217728. Throughput: 0: 711.2. Samples: 552338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:51:39,006][20018] Avg episode reward: [(0, '25.190')] +[2024-06-06 13:51:44,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6230016. Throughput: 0: 670.4. Samples: 555458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:51:44,011][20018] Avg episode reward: [(0, '24.904')] +[2024-06-06 13:51:49,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6246400. Throughput: 0: 696.0. Samples: 560118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:51:49,005][20018] Avg episode reward: [(0, '23.187')] +[2024-06-06 13:51:52,289][20350] Updated weights for policy 0, policy_version 1528 (0.0036) +[2024-06-06 13:51:54,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6262784. Throughput: 0: 718.1. Samples: 562642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:51:54,009][20018] Avg episode reward: [(0, '23.643')] +[2024-06-06 13:51:59,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.5). Total num frames: 6270976. Throughput: 0: 693.8. Samples: 566416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:51:59,009][20018] Avg episode reward: [(0, '24.250')] +[2024-06-06 13:52:04,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6287360. Throughput: 0: 678.8. Samples: 570258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:04,005][20018] Avg episode reward: [(0, '25.145')] +[2024-06-06 13:52:07,087][20350] Updated weights for policy 0, policy_version 1538 (0.0019) +[2024-06-06 13:52:09,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6303744. Throughput: 0: 704.5. Samples: 572928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:52:09,006][20018] Avg episode reward: [(0, '26.829')] +[2024-06-06 13:52:14,004][20018] Fps is (10 sec: 2866.8, 60 sec: 2799.1, 300 sec: 2735.3). Total num frames: 6316032. Throughput: 0: 732.2. Samples: 577952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:52:14,012][20018] Avg episode reward: [(0, '26.352')] +[2024-06-06 13:52:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6328320. Throughput: 0: 688.4. Samples: 580996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:19,012][20018] Avg episode reward: [(0, '26.017')] +[2024-06-06 13:52:22,239][20350] Updated weights for policy 0, policy_version 1548 (0.0036) +[2024-06-06 13:52:24,003][20018] Fps is (10 sec: 2867.7, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6344704. Throughput: 0: 683.8. Samples: 583108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:24,008][20018] Avg episode reward: [(0, '25.576')] +[2024-06-06 13:52:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2749.2). Total num frames: 6361088. Throughput: 0: 729.5. Samples: 588286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:52:29,008][20018] Avg episode reward: [(0, '24.663')] +[2024-06-06 13:52:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2735.3). Total num frames: 6373376. Throughput: 0: 709.6. Samples: 592052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:34,008][20018] Avg episode reward: [(0, '23.795')] +[2024-06-06 13:52:38,398][20350] Updated weights for policy 0, policy_version 1558 (0.0029) +[2024-06-06 13:52:39,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6381568. Throughput: 0: 682.4. Samples: 593352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:39,008][20018] Avg episode reward: [(0, '23.474')] +[2024-06-06 13:52:44,003][20018] Fps is (10 sec: 1638.4, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6389760. Throughput: 0: 660.3. Samples: 596130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:52:44,005][20018] Avg episode reward: [(0, '23.506')] +[2024-06-06 13:52:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2735.3). Total num frames: 6406144. Throughput: 0: 669.0. Samples: 600364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:52:49,010][20018] Avg episode reward: [(0, '23.392')] +[2024-06-06 13:52:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6414336. Throughput: 0: 650.0. Samples: 602180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:52:54,008][20018] Avg episode reward: [(0, '23.988')] +[2024-06-06 13:52:56,234][20350] Updated weights for policy 0, policy_version 1568 (0.0023) +[2024-06-06 13:52:59,003][20018] Fps is (10 sec: 2048.1, 60 sec: 2594.1, 300 sec: 2735.3). Total num frames: 6426624. Throughput: 0: 606.3. Samples: 605236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:52:59,010][20018] Avg episode reward: [(0, '23.889')] +[2024-06-06 13:53:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2763.1). Total num frames: 6447104. Throughput: 0: 646.8. Samples: 610102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:53:04,011][20018] Avg episode reward: [(0, '26.348')] +[2024-06-06 13:53:09,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6459392. Throughput: 0: 656.2. Samples: 612638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:53:09,006][20018] Avg episode reward: [(0, '26.564')] +[2024-06-06 13:53:09,801][20350] Updated weights for policy 0, policy_version 1578 (0.0015) +[2024-06-06 13:53:14,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2594.2, 300 sec: 2735.3). Total num frames: 6471680. Throughput: 0: 618.2. Samples: 616106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:53:14,010][20018] Avg episode reward: [(0, '27.438')] +[2024-06-06 13:53:19,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6483968. Throughput: 0: 622.3. Samples: 620056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:53:19,006][20018] Avg episode reward: [(0, '27.859')] +[2024-06-06 13:53:19,021][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001583_6483968.pth... +[2024-06-06 13:53:19,166][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001424_5832704.pth +[2024-06-06 13:53:24,003][20018] Fps is (10 sec: 2867.4, 60 sec: 2594.1, 300 sec: 2749.2). Total num frames: 6500352. Throughput: 0: 647.5. Samples: 622490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:53:24,012][20018] Avg episode reward: [(0, '28.806')] +[2024-06-06 13:53:24,277][20350] Updated weights for policy 0, policy_version 1588 (0.0020) +[2024-06-06 13:53:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6512640. Throughput: 0: 685.0. Samples: 626956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:53:29,013][20018] Avg episode reward: [(0, '29.426')] +[2024-06-06 13:53:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2525.9, 300 sec: 2735.3). Total num frames: 6524928. Throughput: 0: 658.1. Samples: 629980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:53:34,005][20018] Avg episode reward: [(0, '28.733')] +[2024-06-06 13:53:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2749.2). Total num frames: 6541312. Throughput: 0: 672.5. Samples: 632444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:53:39,011][20018] Avg episode reward: [(0, '28.706')] +[2024-06-06 13:53:39,721][20350] Updated weights for policy 0, policy_version 1598 (0.0020) +[2024-06-06 13:53:44,006][20018] Fps is (10 sec: 3275.6, 60 sec: 2798.8, 300 sec: 2749.1). Total num frames: 6557696. Throughput: 0: 717.3. Samples: 637516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:53:44,009][20018] Avg episode reward: [(0, '25.904')] +[2024-06-06 13:53:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6565888. Throughput: 0: 682.5. Samples: 640816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:53:49,013][20018] Avg episode reward: [(0, '26.589')] +[2024-06-06 13:53:54,003][20018] Fps is (10 sec: 2048.8, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6578176. Throughput: 0: 661.6. Samples: 642410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:53:54,011][20018] Avg episode reward: [(0, '25.272')] +[2024-06-06 13:53:55,364][20350] Updated weights for policy 0, policy_version 1608 (0.0026) +[2024-06-06 13:53:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6598656. Throughput: 0: 694.4. Samples: 647352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-06-06 13:53:59,010][20018] Avg episode reward: [(0, '25.098')] +[2024-06-06 13:54:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6610944. Throughput: 0: 706.0. Samples: 651826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:54:04,010][20018] Avg episode reward: [(0, '25.363')] +[2024-06-06 13:54:09,004][20018] Fps is (10 sec: 2047.7, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6619136. Throughput: 0: 683.4. Samples: 653244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:54:09,007][20018] Avg episode reward: [(0, '26.583')] +[2024-06-06 13:54:10,830][20350] Updated weights for policy 0, policy_version 1618 (0.0027) +[2024-06-06 13:54:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6635520. Throughput: 0: 671.2. Samples: 657160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-06-06 13:54:14,009][20018] Avg episode reward: [(0, '26.282')] +[2024-06-06 13:54:19,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 6651904. Throughput: 0: 714.1. Samples: 662116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:54:19,010][20018] Avg episode reward: [(0, '25.950')] +[2024-06-06 13:54:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6664192. Throughput: 0: 702.9. Samples: 664074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:54:24,012][20018] Avg episode reward: [(0, '26.865')] +[2024-06-06 13:54:25,249][20350] Updated weights for policy 0, policy_version 1628 (0.0020) +[2024-06-06 13:54:29,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6676480. Throughput: 0: 657.3. Samples: 667094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:54:29,005][20018] Avg episode reward: [(0, '26.736')] +[2024-06-06 13:54:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6692864. Throughput: 0: 693.7. Samples: 672034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:54:34,010][20018] Avg episode reward: [(0, '27.051')] +[2024-06-06 13:54:39,003][20350] Updated weights for policy 0, policy_version 1638 (0.0015) +[2024-06-06 13:54:39,005][20018] Fps is (10 sec: 3276.1, 60 sec: 2798.8, 300 sec: 2735.3). Total num frames: 6709248. Throughput: 0: 714.3. Samples: 674554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:54:39,012][20018] Avg episode reward: [(0, '25.769')] +[2024-06-06 13:54:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.6, 300 sec: 2721.4). Total num frames: 6717440. Throughput: 0: 682.4. Samples: 678060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:54:44,008][20018] Avg episode reward: [(0, '24.369')] +[2024-06-06 13:54:49,003][20018] Fps is (10 sec: 2458.1, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6733824. Throughput: 0: 670.1. Samples: 681982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:54:49,005][20018] Avg episode reward: [(0, '24.376')] +[2024-06-06 13:54:53,706][20350] Updated weights for policy 0, policy_version 1648 (0.0020) +[2024-06-06 13:54:54,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 6750208. Throughput: 0: 691.7. Samples: 684370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:54:54,006][20018] Avg episode reward: [(0, '24.360')] +[2024-06-06 13:54:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6762496. Throughput: 0: 702.0. Samples: 688752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:54:59,011][20018] Avg episode reward: [(0, '25.821')] +[2024-06-06 13:55:04,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6770688. Throughput: 0: 658.8. Samples: 691760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-06-06 13:55:04,005][20018] Avg episode reward: [(0, '25.932')] +[2024-06-06 13:55:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2735.3). Total num frames: 6787072. Throughput: 0: 670.7. Samples: 694254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:55:09,008][20018] Avg episode reward: [(0, '27.138')] +[2024-06-06 13:55:09,221][20350] Updated weights for policy 0, policy_version 1658 (0.0029) +[2024-06-06 13:55:14,003][20018] Fps is (10 sec: 3276.6, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6803456. Throughput: 0: 714.8. Samples: 699260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:55:14,010][20018] Avg episode reward: [(0, '27.601')] +[2024-06-06 13:55:19,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 6811648. Throughput: 0: 680.2. Samples: 702644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:55:19,005][20018] Avg episode reward: [(0, '27.858')] +[2024-06-06 13:55:19,110][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth... +[2024-06-06 13:55:19,341][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001505_6164480.pth +[2024-06-06 13:55:24,003][20018] Fps is (10 sec: 2457.8, 60 sec: 2730.7, 300 sec: 2735.3). Total num frames: 6828032. Throughput: 0: 657.2. Samples: 704126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:55:24,011][20018] Avg episode reward: [(0, '29.921')] +[2024-06-06 13:55:24,907][20350] Updated weights for policy 0, policy_version 1668 (0.0025) +[2024-06-06 13:55:29,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6844416. Throughput: 0: 685.9. Samples: 708926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:55:29,008][20018] Avg episode reward: [(0, '30.611')] +[2024-06-06 13:55:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 6856704. Throughput: 0: 700.0. Samples: 713482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:55:34,011][20018] Avg episode reward: [(0, '29.801')] +[2024-06-06 13:55:39,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2721.4). Total num frames: 6868992. Throughput: 0: 679.5. Samples: 714946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:55:39,010][20018] Avg episode reward: [(0, '29.201')] +[2024-06-06 13:55:40,356][20350] Updated weights for policy 0, policy_version 1678 (0.0047) +[2024-06-06 13:55:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2749.2). Total num frames: 6885376. Throughput: 0: 664.5. Samples: 718654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:55:44,005][20018] Avg episode reward: [(0, '29.323')] +[2024-06-06 13:55:49,005][20018] Fps is (10 sec: 3276.0, 60 sec: 2798.8, 300 sec: 2749.2). Total num frames: 6901760. Throughput: 0: 708.9. Samples: 723662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:55:49,012][20018] Avg episode reward: [(0, '29.066')] +[2024-06-06 13:55:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6909952. Throughput: 0: 699.1. Samples: 725714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:55:54,005][20018] Avg episode reward: [(0, '29.126')] +[2024-06-06 13:55:54,497][20350] Updated weights for policy 0, policy_version 1688 (0.0065) +[2024-06-06 13:55:59,003][20018] Fps is (10 sec: 2048.5, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 6922240. Throughput: 0: 655.7. Samples: 728768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:55:59,005][20018] Avg episode reward: [(0, '28.612')] +[2024-06-06 13:56:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6938624. Throughput: 0: 685.4. Samples: 733488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:04,005][20018] Avg episode reward: [(0, '28.922')] +[2024-06-06 13:56:08,421][20350] Updated weights for policy 0, policy_version 1698 (0.0027) +[2024-06-06 13:56:09,008][20018] Fps is (10 sec: 3275.0, 60 sec: 2798.7, 300 sec: 2735.3). Total num frames: 6955008. Throughput: 0: 707.8. Samples: 735982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-06-06 13:56:09,011][20018] Avg episode reward: [(0, '28.923')] +[2024-06-06 13:56:14,007][20018] Fps is (10 sec: 2456.5, 60 sec: 2662.2, 300 sec: 2721.4). Total num frames: 6963200. Throughput: 0: 683.2. Samples: 739672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:14,014][20018] Avg episode reward: [(0, '28.466')] +[2024-06-06 13:56:19,003][20018] Fps is (10 sec: 2459.0, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6979584. Throughput: 0: 664.4. Samples: 743380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:19,006][20018] Avg episode reward: [(0, '28.805')] +[2024-06-06 13:56:23,762][20350] Updated weights for policy 0, policy_version 1708 (0.0021) +[2024-06-06 13:56:24,003][20018] Fps is (10 sec: 3278.3, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 6995968. Throughput: 0: 686.4. Samples: 745832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:56:24,009][20018] Avg episode reward: [(0, '27.745')] +[2024-06-06 13:56:29,003][20018] Fps is (10 sec: 2867.1, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 7008256. Throughput: 0: 709.1. Samples: 750562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:56:29,008][20018] Avg episode reward: [(0, '27.989')] +[2024-06-06 13:56:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2707.5). Total num frames: 7016448. Throughput: 0: 664.0. Samples: 753542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:34,008][20018] Avg episode reward: [(0, '27.450')] +[2024-06-06 13:56:39,005][20018] Fps is (10 sec: 2457.1, 60 sec: 2730.6, 300 sec: 2721.4). Total num frames: 7032832. Throughput: 0: 667.3. Samples: 755744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:39,007][20018] Avg episode reward: [(0, '27.028')] +[2024-06-06 13:56:39,108][20350] Updated weights for policy 0, policy_version 1718 (0.0045) +[2024-06-06 13:56:44,003][20018] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2735.3). Total num frames: 7053312. Throughput: 0: 711.8. Samples: 760800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:56:44,011][20018] Avg episode reward: [(0, '26.760')] +[2024-06-06 13:56:49,007][20018] Fps is (10 sec: 2866.8, 60 sec: 2662.3, 300 sec: 2707.5). Total num frames: 7061504. Throughput: 0: 689.9. Samples: 764534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:56:49,013][20018] Avg episode reward: [(0, '26.369')] +[2024-06-06 13:56:54,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2730.7, 300 sec: 2721.4). Total num frames: 7073792. Throughput: 0: 667.1. Samples: 765996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:56:54,005][20018] Avg episode reward: [(0, '26.248')] +[2024-06-06 13:56:54,619][20350] Updated weights for policy 0, policy_version 1728 (0.0023) +[2024-06-06 13:56:59,003][20018] Fps is (10 sec: 2868.3, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 7090176. Throughput: 0: 687.0. Samples: 770582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:56:59,005][20018] Avg episode reward: [(0, '26.419')] +[2024-06-06 13:57:04,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2721.4). Total num frames: 7106560. Throughput: 0: 711.3. Samples: 775390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:57:04,010][20018] Avg episode reward: [(0, '27.420')] +[2024-06-06 13:57:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.6, 300 sec: 2707.5). Total num frames: 7114752. Throughput: 0: 688.9. Samples: 776834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:57:09,012][20018] Avg episode reward: [(0, '28.205')] +[2024-06-06 13:57:09,775][20350] Updated weights for policy 0, policy_version 1738 (0.0039) +[2024-06-06 13:57:14,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2721.4). Total num frames: 7131136. Throughput: 0: 664.2. Samples: 780450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:57:14,005][20018] Avg episode reward: [(0, '28.978')] +[2024-06-06 13:57:19,008][20018] Fps is (10 sec: 3274.9, 60 sec: 2798.7, 300 sec: 2721.4). Total num frames: 7147520. Throughput: 0: 709.9. Samples: 785492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:57:19,011][20018] Avg episode reward: [(0, '28.194')] +[2024-06-06 13:57:19,026][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth... +[2024-06-06 13:57:19,174][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001583_6483968.pth +[2024-06-06 13:57:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7155712. Throughput: 0: 707.9. Samples: 787598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:57:24,008][20018] Avg episode reward: [(0, '27.636')] +[2024-06-06 13:57:24,236][20350] Updated weights for policy 0, policy_version 1748 (0.0039) +[2024-06-06 13:57:29,003][20018] Fps is (10 sec: 2049.1, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7168000. Throughput: 0: 662.8. Samples: 790626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:57:29,005][20018] Avg episode reward: [(0, '28.152')] +[2024-06-06 13:57:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2707.5). Total num frames: 7180288. Throughput: 0: 658.8. Samples: 794178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:57:34,006][20018] Avg episode reward: [(0, '28.515')] +[2024-06-06 13:57:39,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.2, 300 sec: 2707.5). Total num frames: 7188480. Throughput: 0: 660.2. Samples: 795706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:57:39,006][20018] Avg episode reward: [(0, '27.867')] +[2024-06-06 13:57:43,182][20350] Updated weights for policy 0, policy_version 1758 (0.0035) +[2024-06-06 13:57:44,007][20018] Fps is (10 sec: 2047.2, 60 sec: 2457.4, 300 sec: 2693.6). Total num frames: 7200768. Throughput: 0: 627.3. Samples: 798812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:57:44,009][20018] Avg episode reward: [(0, '28.362')] +[2024-06-06 13:57:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.0, 300 sec: 2707.5). Total num frames: 7213056. Throughput: 0: 599.7. Samples: 802376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 13:57:49,008][20018] Avg episode reward: [(0, '28.330')] +[2024-06-06 13:57:54,003][20018] Fps is (10 sec: 2868.3, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 7229440. Throughput: 0: 619.9. Samples: 804730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:57:54,005][20018] Avg episode reward: [(0, '28.256')] +[2024-06-06 13:57:56,791][20350] Updated weights for policy 0, policy_version 1768 (0.0034) +[2024-06-06 13:57:59,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7245824. Throughput: 0: 648.1. Samples: 809616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:57:59,008][20018] Avg episode reward: [(0, '26.941')] +[2024-06-06 13:58:04,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2457.5, 300 sec: 2693.6). Total num frames: 7254016. Throughput: 0: 604.1. Samples: 812674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:58:04,011][20018] Avg episode reward: [(0, '26.977')] +[2024-06-06 13:58:09,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7270400. Throughput: 0: 602.9. Samples: 814728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 13:58:09,011][20018] Avg episode reward: [(0, '27.320')] +[2024-06-06 13:58:11,921][20350] Updated weights for policy 0, policy_version 1778 (0.0042) +[2024-06-06 13:58:14,003][20018] Fps is (10 sec: 3277.3, 60 sec: 2594.1, 300 sec: 2721.4). Total num frames: 7286784. Throughput: 0: 648.7. Samples: 819818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:58:14,010][20018] Avg episode reward: [(0, '26.777')] +[2024-06-06 13:58:19,003][20018] Fps is (10 sec: 2867.0, 60 sec: 2526.1, 300 sec: 2707.5). Total num frames: 7299072. Throughput: 0: 656.2. Samples: 823706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:58:19,006][20018] Avg episode reward: [(0, '27.121')] +[2024-06-06 13:58:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2707.5). Total num frames: 7311360. Throughput: 0: 654.2. Samples: 825144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:58:24,005][20018] Avg episode reward: [(0, '27.621')] +[2024-06-06 13:58:27,288][20350] Updated weights for policy 0, policy_version 1788 (0.0031) +[2024-06-06 13:58:29,003][20018] Fps is (10 sec: 2867.4, 60 sec: 2662.4, 300 sec: 2721.4). Total num frames: 7327744. Throughput: 0: 688.5. Samples: 829790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:58:29,005][20018] Avg episode reward: [(0, '28.006')] +[2024-06-06 13:58:34,009][20018] Fps is (10 sec: 2865.3, 60 sec: 2662.1, 300 sec: 2707.5). Total num frames: 7340032. Throughput: 0: 716.7. Samples: 834632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:58:34,016][20018] Avg episode reward: [(0, '29.933')] +[2024-06-06 13:58:39,003][20018] Fps is (10 sec: 2457.5, 60 sec: 2730.6, 300 sec: 2693.7). Total num frames: 7352320. Throughput: 0: 698.5. Samples: 836162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:58:39,011][20018] Avg episode reward: [(0, '29.013')] +[2024-06-06 13:58:42,890][20350] Updated weights for policy 0, policy_version 1798 (0.0041) +[2024-06-06 13:58:44,003][20018] Fps is (10 sec: 2459.2, 60 sec: 2730.8, 300 sec: 2707.5). Total num frames: 7364608. Throughput: 0: 667.6. Samples: 839660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:58:44,011][20018] Avg episode reward: [(0, '29.093')] +[2024-06-06 13:58:49,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2735.3). Total num frames: 7385088. Throughput: 0: 711.5. Samples: 844692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:58:49,011][20018] Avg episode reward: [(0, '29.243')] +[2024-06-06 13:58:54,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2693.6). Total num frames: 7393280. Throughput: 0: 717.2. Samples: 847002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-06-06 13:58:54,008][20018] Avg episode reward: [(0, '28.239')] +[2024-06-06 13:58:58,093][20350] Updated weights for policy 0, policy_version 1808 (0.0041) +[2024-06-06 13:58:59,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2693.6). Total num frames: 7405568. Throughput: 0: 670.0. Samples: 849968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:58:59,011][20018] Avg episode reward: [(0, '29.264')] +[2024-06-06 13:59:04,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2799.0, 300 sec: 2721.4). Total num frames: 7421952. Throughput: 0: 680.3. Samples: 854318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:04,005][20018] Avg episode reward: [(0, '29.837')] +[2024-06-06 13:59:09,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2707.5). Total num frames: 7434240. Throughput: 0: 689.5. Samples: 856172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:09,006][20018] Avg episode reward: [(0, '30.005')] +[2024-06-06 13:59:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7442432. Throughput: 0: 658.9. Samples: 859440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-06-06 13:59:14,009][20018] Avg episode reward: [(0, '29.818')] +[2024-06-06 13:59:14,980][20350] Updated weights for policy 0, policy_version 1818 (0.0026) +[2024-06-06 13:59:19,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2594.2, 300 sec: 2679.8). Total num frames: 7454720. Throughput: 0: 613.3. Samples: 862226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:19,009][20018] Avg episode reward: [(0, '29.373')] +[2024-06-06 13:59:19,030][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth... +[2024-06-06 13:59:19,200][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001664_6815744.pth +[2024-06-06 13:59:24,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7467008. Throughput: 0: 626.1. Samples: 864338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 13:59:24,005][20018] Avg episode reward: [(0, '27.982')] +[2024-06-06 13:59:29,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2679.8). Total num frames: 7483392. Throughput: 0: 648.6. Samples: 868846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:29,005][20018] Avg episode reward: [(0, '28.805')] +[2024-06-06 13:59:30,588][20350] Updated weights for policy 0, policy_version 1828 (0.0028) +[2024-06-06 13:59:34,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.1, 300 sec: 2652.0). Total num frames: 7491584. Throughput: 0: 611.2. Samples: 872194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:34,005][20018] Avg episode reward: [(0, '28.086')] +[2024-06-06 13:59:39,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7503872. Throughput: 0: 594.5. Samples: 873756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 13:59:39,005][20018] Avg episode reward: [(0, '28.578')] +[2024-06-06 13:59:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2665.9). Total num frames: 7520256. Throughput: 0: 625.4. Samples: 878112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:44,005][20018] Avg episode reward: [(0, '29.435')] +[2024-06-06 13:59:46,179][20350] Updated weights for policy 0, policy_version 1838 (0.0038) +[2024-06-06 13:59:49,006][20018] Fps is (10 sec: 2866.2, 60 sec: 2457.5, 300 sec: 2652.0). Total num frames: 7532544. Throughput: 0: 621.8. Samples: 882302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:49,014][20018] Avg episode reward: [(0, '29.261')] +[2024-06-06 13:59:54,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2457.6, 300 sec: 2638.1). Total num frames: 7540736. Throughput: 0: 612.9. Samples: 883752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 13:59:54,010][20018] Avg episode reward: [(0, '28.890')] +[2024-06-06 13:59:59,003][20018] Fps is (10 sec: 2458.5, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7557120. Throughput: 0: 614.4. Samples: 887086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 13:59:59,011][20018] Avg episode reward: [(0, '28.550')] +[2024-06-06 14:00:02,285][20350] Updated weights for policy 0, policy_version 1848 (0.0042) +[2024-06-06 14:00:04,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2525.9, 300 sec: 2665.9). Total num frames: 7573504. Throughput: 0: 662.1. Samples: 892022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:04,005][20018] Avg episode reward: [(0, '29.092')] +[2024-06-06 14:00:09,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2652.0). Total num frames: 7585792. Throughput: 0: 661.4. Samples: 894100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:00:09,005][20018] Avg episode reward: [(0, '32.084')] +[2024-06-06 14:00:09,020][20333] Saving new best policy, reward=32.084! +[2024-06-06 14:00:14,007][20018] Fps is (10 sec: 2047.1, 60 sec: 2525.7, 300 sec: 2651.9). Total num frames: 7593984. Throughput: 0: 621.5. Samples: 896818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:14,014][20018] Avg episode reward: [(0, '32.263')] +[2024-06-06 14:00:14,017][20333] Saving new best policy, reward=32.263! +[2024-06-06 14:00:19,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2638.1). Total num frames: 7606272. Throughput: 0: 631.0. Samples: 900588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:19,010][20018] Avg episode reward: [(0, '32.543')] +[2024-06-06 14:00:19,025][20333] Saving new best policy, reward=32.543! +[2024-06-06 14:00:19,670][20350] Updated weights for policy 0, policy_version 1858 (0.0034) +[2024-06-06 14:00:24,003][20018] Fps is (10 sec: 2868.5, 60 sec: 2594.1, 300 sec: 2638.1). Total num frames: 7622656. Throughput: 0: 646.2. Samples: 902836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:00:24,005][20018] Avg episode reward: [(0, '32.598')] +[2024-06-06 14:00:24,008][20333] Saving new best policy, reward=32.598! +[2024-06-06 14:00:29,003][20018] Fps is (10 sec: 2867.0, 60 sec: 2525.8, 300 sec: 2638.1). Total num frames: 7634944. Throughput: 0: 642.3. Samples: 907016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:00:29,006][20018] Avg episode reward: [(0, '32.495')] +[2024-06-06 14:00:34,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2624.2). Total num frames: 7643136. Throughput: 0: 617.0. Samples: 910064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:34,005][20018] Avg episode reward: [(0, '33.102')] +[2024-06-06 14:00:34,008][20333] Saving new best policy, reward=33.102! +[2024-06-06 14:00:35,915][20350] Updated weights for policy 0, policy_version 1868 (0.0053) +[2024-06-06 14:00:39,003][20018] Fps is (10 sec: 2457.7, 60 sec: 2594.1, 300 sec: 2624.2). Total num frames: 7659520. Throughput: 0: 628.1. Samples: 912016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-06-06 14:00:39,008][20018] Avg episode reward: [(0, '33.295')] +[2024-06-06 14:00:39,020][20333] Saving new best policy, reward=33.295! +[2024-06-06 14:00:44,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2610.4). Total num frames: 7671808. Throughput: 0: 649.7. Samples: 916324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:00:44,011][20018] Avg episode reward: [(0, '32.684')] +[2024-06-06 14:00:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2526.0, 300 sec: 2624.2). Total num frames: 7684096. Throughput: 0: 612.2. Samples: 919572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:49,005][20018] Avg episode reward: [(0, '32.187')] +[2024-06-06 14:00:52,530][20350] Updated weights for policy 0, policy_version 1878 (0.0034) +[2024-06-06 14:00:54,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2624.2). Total num frames: 7696384. Throughput: 0: 598.6. Samples: 921038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:00:54,012][20018] Avg episode reward: [(0, '30.082')] +[2024-06-06 14:00:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2624.2). Total num frames: 7712768. Throughput: 0: 640.7. Samples: 925646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 14:00:59,008][20018] Avg episode reward: [(0, '28.313')] +[2024-06-06 14:01:04,003][20018] Fps is (10 sec: 2867.3, 60 sec: 2525.9, 300 sec: 2610.4). Total num frames: 7725056. Throughput: 0: 656.2. Samples: 930116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:01:04,011][20018] Avg episode reward: [(0, '26.783')] +[2024-06-06 14:01:07,433][20350] Updated weights for policy 0, policy_version 1888 (0.0020) +[2024-06-06 14:01:09,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2457.6, 300 sec: 2610.4). Total num frames: 7733248. Throughput: 0: 638.0. Samples: 931548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 14:01:09,005][20018] Avg episode reward: [(0, '27.125')] +[2024-06-06 14:01:14,003][20018] Fps is (10 sec: 2048.0, 60 sec: 2526.1, 300 sec: 2596.4). Total num frames: 7745536. Throughput: 0: 615.3. Samples: 934704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 14:01:14,005][20018] Avg episode reward: [(0, '26.927')] +[2024-06-06 14:01:19,003][20018] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7766016. Throughput: 0: 662.2. Samples: 939864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:01:19,012][20018] Avg episode reward: [(0, '26.410')] +[2024-06-06 14:01:19,028][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001896_7766016.pth... +[2024-06-06 14:01:19,176][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001745_7147520.pth +[2024-06-06 14:01:22,154][20350] Updated weights for policy 0, policy_version 1898 (0.0029) +[2024-06-06 14:01:24,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2525.9, 300 sec: 2596.4). Total num frames: 7774208. Throughput: 0: 665.9. Samples: 941980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:01:24,007][20018] Avg episode reward: [(0, '27.335')] +[2024-06-06 14:01:29,003][20018] Fps is (10 sec: 2047.9, 60 sec: 2525.9, 300 sec: 2610.3). Total num frames: 7786496. Throughput: 0: 637.6. Samples: 945016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:01:29,006][20018] Avg episode reward: [(0, '26.305')] +[2024-06-06 14:01:34,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2610.4). Total num frames: 7802880. Throughput: 0: 658.5. Samples: 949204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:01:34,011][20018] Avg episode reward: [(0, '26.415')] +[2024-06-06 14:01:37,617][20350] Updated weights for policy 0, policy_version 1908 (0.0027) +[2024-06-06 14:01:39,005][20018] Fps is (10 sec: 3276.2, 60 sec: 2662.3, 300 sec: 2596.4). Total num frames: 7819264. Throughput: 0: 682.5. Samples: 951752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:01:39,008][20018] Avg episode reward: [(0, '28.778')] +[2024-06-06 14:01:44,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2594.1, 300 sec: 2596.5). Total num frames: 7827456. Throughput: 0: 667.5. Samples: 955686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:01:44,006][20018] Avg episode reward: [(0, '29.039')] +[2024-06-06 14:01:49,003][20018] Fps is (10 sec: 2458.2, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7843840. Throughput: 0: 647.1. Samples: 959234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-06-06 14:01:49,010][20018] Avg episode reward: [(0, '29.154')] +[2024-06-06 14:01:52,550][20350] Updated weights for policy 0, policy_version 1918 (0.0023) +[2024-06-06 14:01:54,003][20018] Fps is (10 sec: 3277.1, 60 sec: 2730.7, 300 sec: 2610.3). Total num frames: 7860224. Throughput: 0: 672.3. Samples: 961800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:01:54,006][20018] Avg episode reward: [(0, '29.461')] +[2024-06-06 14:01:59,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 2596.4). Total num frames: 7872512. Throughput: 0: 711.3. Samples: 966714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:01:59,009][20018] Avg episode reward: [(0, '29.328')] +[2024-06-06 14:02:04,003][20018] Fps is (10 sec: 2457.4, 60 sec: 2662.4, 300 sec: 2610.3). Total num frames: 7884800. Throughput: 0: 664.7. Samples: 969778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:02:04,006][20018] Avg episode reward: [(0, '29.197')] +[2024-06-06 14:02:08,162][20350] Updated weights for policy 0, policy_version 1928 (0.0033) +[2024-06-06 14:02:09,004][20018] Fps is (10 sec: 2457.2, 60 sec: 2730.6, 300 sec: 2596.4). Total num frames: 7897088. Throughput: 0: 656.0. Samples: 971502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-06-06 14:02:09,014][20018] Avg episode reward: [(0, '28.760')] +[2024-06-06 14:02:14,003][20018] Fps is (10 sec: 3277.0, 60 sec: 2867.2, 300 sec: 2610.4). Total num frames: 7917568. Throughput: 0: 706.2. Samples: 976794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 14:02:14,008][20018] Avg episode reward: [(0, '29.731')] +[2024-06-06 14:02:19,003][20018] Fps is (10 sec: 3277.1, 60 sec: 2730.6, 300 sec: 2624.2). Total num frames: 7929856. Throughput: 0: 708.1. Samples: 981068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 14:02:19,008][20018] Avg episode reward: [(0, '29.703')] +[2024-06-06 14:02:23,768][20350] Updated weights for policy 0, policy_version 1938 (0.0034) +[2024-06-06 14:02:24,004][20018] Fps is (10 sec: 2047.7, 60 sec: 2730.6, 300 sec: 2610.3). Total num frames: 7938048. Throughput: 0: 680.6. Samples: 982378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 14:02:24,011][20018] Avg episode reward: [(0, '28.214')] +[2024-06-06 14:02:29,003][20018] Fps is (10 sec: 1638.5, 60 sec: 2662.4, 300 sec: 2596.4). Total num frames: 7946240. Throughput: 0: 647.7. Samples: 984832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 14:02:29,007][20018] Avg episode reward: [(0, '28.220')] +[2024-06-06 14:02:34,003][20018] Fps is (10 sec: 2048.3, 60 sec: 2594.1, 300 sec: 2610.3). Total num frames: 7958528. Throughput: 0: 652.4. Samples: 988592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-06-06 14:02:34,005][20018] Avg episode reward: [(0, '28.241')] +[2024-06-06 14:02:39,003][20018] Fps is (10 sec: 2867.2, 60 sec: 2594.2, 300 sec: 2624.3). Total num frames: 7974912. Throughput: 0: 654.2. Samples: 991240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 14:02:39,006][20018] Avg episode reward: [(0, '29.947')] +[2024-06-06 14:02:39,863][20350] Updated weights for policy 0, policy_version 1948 (0.0046) +[2024-06-06 14:02:44,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2610.3). Total num frames: 7983104. Throughput: 0: 623.1. Samples: 994754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-06-06 14:02:44,009][20018] Avg episode reward: [(0, '29.953')] +[2024-06-06 14:02:49,003][20018] Fps is (10 sec: 2457.6, 60 sec: 2594.1, 300 sec: 2610.3). Total num frames: 7999488. Throughput: 0: 642.5. Samples: 998692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-06-06 14:02:49,009][20018] Avg episode reward: [(0, '30.059')] +[2024-06-06 14:02:50,689][20333] Stopping Batcher_0... +[2024-06-06 14:02:50,691][20333] Loop batcher_evt_loop terminating... +[2024-06-06 14:02:50,692][20018] Component Batcher_0 stopped! +[2024-06-06 14:02:50,709][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-06-06 14:02:50,782][20350] Weights refcount: 2 0 +[2024-06-06 14:02:50,785][20350] Stopping InferenceWorker_p0-w0... +[2024-06-06 14:02:50,786][20350] Loop inference_proc0-0_evt_loop terminating... +[2024-06-06 14:02:50,786][20018] Component InferenceWorker_p0-w0 stopped! +[2024-06-06 14:02:50,799][20353] Stopping RolloutWorker_w2... +[2024-06-06 14:02:50,802][20355] Stopping RolloutWorker_w4... +[2024-06-06 14:02:50,799][20018] Component RolloutWorker_w2 stopped! +[2024-06-06 14:02:50,806][20018] Component RolloutWorker_w4 stopped! +[2024-06-06 14:02:50,803][20353] Loop rollout_proc2_evt_loop terminating... +[2024-06-06 14:02:50,819][20018] Component RolloutWorker_w6 stopped! +[2024-06-06 14:02:50,812][20355] Loop rollout_proc4_evt_loop terminating... +[2024-06-06 14:02:50,818][20357] Stopping RolloutWorker_w6... +[2024-06-06 14:02:50,822][20357] Loop rollout_proc6_evt_loop terminating... +[2024-06-06 14:02:50,835][20018] Component RolloutWorker_w5 stopped! +[2024-06-06 14:02:50,835][20356] Stopping RolloutWorker_w5... +[2024-06-06 14:02:50,845][20356] Loop rollout_proc5_evt_loop terminating... +[2024-06-06 14:02:50,851][20018] Component RolloutWorker_w0 stopped! +[2024-06-06 14:02:50,853][20351] Stopping RolloutWorker_w0... +[2024-06-06 14:02:50,854][20351] Loop rollout_proc0_evt_loop terminating... +[2024-06-06 14:02:50,895][20018] Component RolloutWorker_w7 stopped! +[2024-06-06 14:02:50,904][20352] Stopping RolloutWorker_w1... +[2024-06-06 14:02:50,904][20352] Loop rollout_proc1_evt_loop terminating... +[2024-06-06 14:02:50,904][20018] Component RolloutWorker_w1 stopped! +[2024-06-06 14:02:50,895][20358] Stopping RolloutWorker_w7... +[2024-06-06 14:02:50,917][20354] Stopping RolloutWorker_w3... +[2024-06-06 14:02:50,912][20358] Loop rollout_proc7_evt_loop terminating... +[2024-06-06 14:02:50,917][20018] Component RolloutWorker_w3 stopped! +[2024-06-06 14:02:50,919][20354] Loop rollout_proc3_evt_loop terminating... +[2024-06-06 14:02:50,933][20333] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001820_7454720.pth +[2024-06-06 14:02:50,959][20333] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-06-06 14:02:51,157][20018] Component LearnerWorker_p0 stopped! +[2024-06-06 14:02:51,160][20018] Waiting for process learner_proc0 to stop... +[2024-06-06 14:02:51,162][20333] Stopping LearnerWorker_p0... +[2024-06-06 14:02:51,163][20333] Loop learner_proc0_evt_loop terminating... +[2024-06-06 14:02:53,237][20018] Waiting for process inference_proc0-0 to join... +[2024-06-06 14:02:53,243][20018] Waiting for process rollout_proc0 to join... +[2024-06-06 14:02:55,217][20018] Waiting for process rollout_proc1 to join... +[2024-06-06 14:02:55,443][20018] Waiting for process rollout_proc2 to join... +[2024-06-06 14:02:55,449][20018] Waiting for process rollout_proc3 to join... +[2024-06-06 14:02:55,452][20018] Waiting for process rollout_proc4 to join... +[2024-06-06 14:02:55,457][20018] Waiting for process rollout_proc5 to join... +[2024-06-06 14:02:55,462][20018] Waiting for process rollout_proc6 to join... +[2024-06-06 14:02:55,467][20018] Waiting for process rollout_proc7 to join... +[2024-06-06 14:02:55,470][20018] Batcher 0 profile tree view: +batching: 29.3518, releasing_batches: 0.0397 +[2024-06-06 14:02:55,472][20018] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0016 + wait_policy_total: 507.1168 +update_model: 14.6358 + weight_update: 0.0043 +one_step: 0.0068 + handle_policy_step: 910.5851 + deserialize: 24.4844, stack: 5.5174, obs_to_device_normalize: 183.6583, forward: 499.7374, send_messages: 43.8067 + prepare_outputs: 109.8175 + to_cpu: 56.8004 +[2024-06-06 14:02:55,474][20018] Learner 0 profile tree view: +misc: 0.0068, prepare_batch: 14.2515 +train: 79.3266 + epoch_init: 0.0070, minibatch_init: 0.0106, losses_postprocess: 0.7325, kl_divergence: 0.8510, after_optimizer: 4.9770 + calculate_losses: 28.0587 + losses_init: 0.0044, forward_head: 1.6277, bptt_initial: 17.9514, tail: 1.3531, advantages_returns: 0.4493, losses: 3.7936 + bptt: 2.3988 + bptt_forward_core: 2.2933 + update: 43.8737 + clip: 1.1407 +[2024-06-06 14:02:55,477][20018] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.5428, enqueue_policy_requests: 159.9715, env_step: 1153.6276, overhead: 24.4674, complete_rollouts: 9.4613 +save_policy_outputs: 31.6610 + split_output_tensors: 12.4379 +[2024-06-06 14:02:55,479][20018] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3626, enqueue_policy_requests: 165.2894, env_step: 1155.8202, overhead: 23.7456, complete_rollouts: 9.1582 +save_policy_outputs: 30.2311 + split_output_tensors: 11.9144 +[2024-06-06 14:02:55,480][20018] Loop Runner_EvtLoop terminating... +[2024-06-06 14:02:55,482][20018] Runner profile tree view: +main_loop: 1528.1796 +[2024-06-06 14:02:55,483][20018] Collected {0: 8007680}, FPS: 2618.7 +[2024-06-06 14:02:55,533][20018] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-06-06 14:02:55,536][20018] Overriding arg 'num_workers' with value 1 passed from command line +[2024-06-06 14:02:55,538][20018] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-06-06 14:02:55,540][20018] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-06-06 14:02:55,542][20018] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:02:55,544][20018] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-06-06 14:02:55,545][20018] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:02:55,546][20018] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-06-06 14:02:55,548][20018] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-06-06 14:02:55,549][20018] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-06-06 14:02:55,550][20018] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-06-06 14:02:55,551][20018] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-06-06 14:02:55,552][20018] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-06-06 14:02:55,553][20018] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-06-06 14:02:55,555][20018] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-06-06 14:02:55,605][20018] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-06-06 14:02:55,611][20018] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:02:55,613][20018] RunningMeanStd input shape: (1,) +[2024-06-06 14:02:55,634][20018] ConvEncoder: input_channels=3 +[2024-06-06 14:02:55,784][20018] Conv encoder output size: 512 +[2024-06-06 14:02:55,786][20018] Policy head output size: 512 +[2024-06-06 14:02:56,177][20018] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-06-06 14:02:57,558][20018] Num frames 100... +[2024-06-06 14:02:57,775][20018] Num frames 200... +[2024-06-06 14:02:58,021][20018] Num frames 300... +[2024-06-06 14:02:58,250][20018] Num frames 400... +[2024-06-06 14:02:58,467][20018] Num frames 500... +[2024-06-06 14:02:58,679][20018] Num frames 600... +[2024-06-06 14:02:58,757][20018] Avg episode rewards: #0: 15.080, true rewards: #0: 6.080 +[2024-06-06 14:02:58,759][20018] Avg episode reward: 15.080, avg true_objective: 6.080 +[2024-06-06 14:02:58,967][20018] Num frames 700... +[2024-06-06 14:02:59,157][20018] Num frames 800... +[2024-06-06 14:02:59,316][20018] Num frames 900... +[2024-06-06 14:02:59,456][20018] Num frames 1000... +[2024-06-06 14:02:59,596][20018] Num frames 1100... +[2024-06-06 14:02:59,741][20018] Num frames 1200... +[2024-06-06 14:02:59,889][20018] Num frames 1300... +[2024-06-06 14:03:00,039][20018] Num frames 1400... +[2024-06-06 14:03:00,152][20018] Avg episode rewards: #0: 15.700, true rewards: #0: 7.200 +[2024-06-06 14:03:00,155][20018] Avg episode reward: 15.700, avg true_objective: 7.200 +[2024-06-06 14:03:00,244][20018] Num frames 1500... +[2024-06-06 14:03:00,398][20018] Num frames 1600... +[2024-06-06 14:03:00,541][20018] Num frames 1700... +[2024-06-06 14:03:00,683][20018] Num frames 1800... +[2024-06-06 14:03:00,830][20018] Num frames 1900... +[2024-06-06 14:03:00,970][20018] Num frames 2000... +[2024-06-06 14:03:01,122][20018] Num frames 2100... +[2024-06-06 14:03:01,272][20018] Num frames 2200... +[2024-06-06 14:03:01,442][20018] Num frames 2300... +[2024-06-06 14:03:01,590][20018] Num frames 2400... +[2024-06-06 14:03:01,734][20018] Num frames 2500... +[2024-06-06 14:03:01,881][20018] Num frames 2600... +[2024-06-06 14:03:02,022][20018] Num frames 2700... +[2024-06-06 14:03:02,172][20018] Num frames 2800... +[2024-06-06 14:03:02,327][20018] Num frames 2900... +[2024-06-06 14:03:02,475][20018] Num frames 3000... +[2024-06-06 14:03:02,634][20018] Num frames 3100... +[2024-06-06 14:03:02,785][20018] Num frames 3200... +[2024-06-06 14:03:02,932][20018] Num frames 3300... +[2024-06-06 14:03:03,083][20018] Num frames 3400... +[2024-06-06 14:03:03,228][20018] Num frames 3500... +[2024-06-06 14:03:03,345][20018] Avg episode rewards: #0: 30.133, true rewards: #0: 11.800 +[2024-06-06 14:03:03,347][20018] Avg episode reward: 30.133, avg true_objective: 11.800 +[2024-06-06 14:03:03,452][20018] Num frames 3600... +[2024-06-06 14:03:03,596][20018] Num frames 3700... +[2024-06-06 14:03:03,745][20018] Num frames 3800... +[2024-06-06 14:03:03,893][20018] Num frames 3900... +[2024-06-06 14:03:04,033][20018] Num frames 4000... +[2024-06-06 14:03:04,182][20018] Num frames 4100... +[2024-06-06 14:03:04,321][20018] Num frames 4200... +[2024-06-06 14:03:04,472][20018] Num frames 4300... +[2024-06-06 14:03:04,623][20018] Num frames 4400... +[2024-06-06 14:03:04,771][20018] Num frames 4500... +[2024-06-06 14:03:04,913][20018] Num frames 4600... +[2024-06-06 14:03:05,056][20018] Num frames 4700... +[2024-06-06 14:03:05,209][20018] Num frames 4800... +[2024-06-06 14:03:05,355][20018] Num frames 4900... +[2024-06-06 14:03:05,424][20018] Avg episode rewards: #0: 32.020, true rewards: #0: 12.270 +[2024-06-06 14:03:05,425][20018] Avg episode reward: 32.020, avg true_objective: 12.270 +[2024-06-06 14:03:05,572][20018] Num frames 5000... +[2024-06-06 14:03:05,720][20018] Num frames 5100... +[2024-06-06 14:03:05,864][20018] Num frames 5200... +[2024-06-06 14:03:06,008][20018] Num frames 5300... +[2024-06-06 14:03:06,153][20018] Num frames 5400... +[2024-06-06 14:03:06,297][20018] Num frames 5500... +[2024-06-06 14:03:06,457][20018] Num frames 5600... +[2024-06-06 14:03:06,616][20018] Num frames 5700... +[2024-06-06 14:03:06,761][20018] Num frames 5800... +[2024-06-06 14:03:06,900][20018] Num frames 5900... +[2024-06-06 14:03:07,055][20018] Avg episode rewards: #0: 29.936, true rewards: #0: 11.936 +[2024-06-06 14:03:07,057][20018] Avg episode reward: 29.936, avg true_objective: 11.936 +[2024-06-06 14:03:07,108][20018] Num frames 6000... +[2024-06-06 14:03:07,244][20018] Num frames 6100... +[2024-06-06 14:03:07,399][20018] Num frames 6200... +[2024-06-06 14:03:07,559][20018] Num frames 6300... +[2024-06-06 14:03:07,698][20018] Num frames 6400... +[2024-06-06 14:03:07,841][20018] Num frames 6500... +[2024-06-06 14:03:07,984][20018] Num frames 6600... +[2024-06-06 14:03:08,130][20018] Num frames 6700... +[2024-06-06 14:03:08,280][20018] Num frames 6800... +[2024-06-06 14:03:08,424][20018] Num frames 6900... +[2024-06-06 14:03:08,580][20018] Num frames 7000... +[2024-06-06 14:03:08,731][20018] Num frames 7100... +[2024-06-06 14:03:08,886][20018] Num frames 7200... +[2024-06-06 14:03:09,039][20018] Num frames 7300... +[2024-06-06 14:03:09,228][20018] Num frames 7400... +[2024-06-06 14:03:09,454][20018] Num frames 7500... +[2024-06-06 14:03:09,697][20018] Num frames 7600... +[2024-06-06 14:03:09,914][20018] Num frames 7700... +[2024-06-06 14:03:10,129][20018] Num frames 7800... +[2024-06-06 14:03:10,346][20018] Num frames 7900... +[2024-06-06 14:03:10,563][20018] Num frames 8000... +[2024-06-06 14:03:10,780][20018] Avg episode rewards: #0: 34.280, true rewards: #0: 13.447 +[2024-06-06 14:03:10,783][20018] Avg episode reward: 34.280, avg true_objective: 13.447 +[2024-06-06 14:03:10,861][20018] Num frames 8100... +[2024-06-06 14:03:11,085][20018] Num frames 8200... +[2024-06-06 14:03:11,316][20018] Num frames 8300... +[2024-06-06 14:03:11,540][20018] Num frames 8400... +[2024-06-06 14:03:11,768][20018] Num frames 8500... +[2024-06-06 14:03:11,984][20018] Num frames 8600... +[2024-06-06 14:03:12,249][20018] Avg episode rewards: #0: 31.128, true rewards: #0: 12.414 +[2024-06-06 14:03:12,251][20018] Avg episode reward: 31.128, avg true_objective: 12.414 +[2024-06-06 14:03:12,279][20018] Num frames 8700... +[2024-06-06 14:03:12,449][20018] Num frames 8800... +[2024-06-06 14:03:12,608][20018] Num frames 8900... +[2024-06-06 14:03:12,766][20018] Num frames 9000... +[2024-06-06 14:03:12,914][20018] Num frames 9100... +[2024-06-06 14:03:13,058][20018] Num frames 9200... +[2024-06-06 14:03:13,208][20018] Num frames 9300... +[2024-06-06 14:03:13,350][20018] Num frames 9400... +[2024-06-06 14:03:13,493][20018] Num frames 9500... +[2024-06-06 14:03:13,648][20018] Num frames 9600... +[2024-06-06 14:03:13,809][20018] Num frames 9700... +[2024-06-06 14:03:13,873][20018] Avg episode rewards: #0: 30.129, true rewards: #0: 12.129 +[2024-06-06 14:03:13,876][20018] Avg episode reward: 30.129, avg true_objective: 12.129 +[2024-06-06 14:03:14,018][20018] Num frames 9800... +[2024-06-06 14:03:14,167][20018] Num frames 9900... +[2024-06-06 14:03:14,313][20018] Num frames 10000... +[2024-06-06 14:03:14,462][20018] Num frames 10100... +[2024-06-06 14:03:14,612][20018] Num frames 10200... +[2024-06-06 14:03:14,762][20018] Num frames 10300... +[2024-06-06 14:03:14,920][20018] Num frames 10400... +[2024-06-06 14:03:15,066][20018] Num frames 10500... +[2024-06-06 14:03:15,217][20018] Num frames 10600... +[2024-06-06 14:03:15,364][20018] Num frames 10700... +[2024-06-06 14:03:15,512][20018] Num frames 10800... +[2024-06-06 14:03:15,658][20018] Num frames 10900... +[2024-06-06 14:03:15,818][20018] Num frames 11000... +[2024-06-06 14:03:15,971][20018] Num frames 11100... +[2024-06-06 14:03:16,123][20018] Num frames 11200... +[2024-06-06 14:03:16,194][20018] Avg episode rewards: #0: 31.230, true rewards: #0: 12.452 +[2024-06-06 14:03:16,196][20018] Avg episode reward: 31.230, avg true_objective: 12.452 +[2024-06-06 14:03:16,341][20018] Num frames 11300... +[2024-06-06 14:03:16,498][20018] Num frames 11400... +[2024-06-06 14:03:16,644][20018] Num frames 11500... +[2024-06-06 14:03:16,787][20018] Num frames 11600... +[2024-06-06 14:03:16,952][20018] Num frames 11700... +[2024-06-06 14:03:17,096][20018] Num frames 11800... +[2024-06-06 14:03:17,298][20018] Num frames 11900... +[2024-06-06 14:03:17,497][20018] Num frames 12000... +[2024-06-06 14:03:17,667][20018] Num frames 12100... +[2024-06-06 14:03:17,730][20018] Avg episode rewards: #0: 30.003, true rewards: #0: 12.103 +[2024-06-06 14:03:17,732][20018] Avg episode reward: 30.003, avg true_objective: 12.103 +[2024-06-06 14:04:41,427][20018] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-06-06 14:04:42,218][20018] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-06-06 14:04:42,221][20018] Overriding arg 'num_workers' with value 1 passed from command line +[2024-06-06 14:04:42,225][20018] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-06-06 14:04:42,227][20018] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-06-06 14:04:42,229][20018] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-06-06 14:04:42,230][20018] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-06-06 14:04:42,232][20018] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-06-06 14:04:42,233][20018] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-06-06 14:04:42,234][20018] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-06-06 14:04:42,236][20018] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-06-06 14:04:42,237][20018] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-06-06 14:04:42,238][20018] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-06-06 14:04:42,239][20018] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-06-06 14:04:42,240][20018] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-06-06 14:04:42,241][20018] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-06-06 14:04:42,287][20018] RunningMeanStd input shape: (3, 72, 128) +[2024-06-06 14:04:42,290][20018] RunningMeanStd input shape: (1,) +[2024-06-06 14:04:42,314][20018] ConvEncoder: input_channels=3 +[2024-06-06 14:04:42,381][20018] Conv encoder output size: 512 +[2024-06-06 14:04:42,384][20018] Policy head output size: 512 +[2024-06-06 14:04:42,413][20018] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-06-06 14:04:43,189][20018] Num frames 100... +[2024-06-06 14:04:43,400][20018] Num frames 200... +[2024-06-06 14:04:43,618][20018] Num frames 300... +[2024-06-06 14:04:43,841][20018] Num frames 400... +[2024-06-06 14:04:44,052][20018] Num frames 500... +[2024-06-06 14:04:44,271][20018] Num frames 600... +[2024-06-06 14:04:44,491][20018] Num frames 700... +[2024-06-06 14:04:44,725][20018] Num frames 800... +[2024-06-06 14:04:44,955][20018] Num frames 900... +[2024-06-06 14:04:45,171][20018] Num frames 1000... +[2024-06-06 14:04:45,378][20018] Num frames 1100... +[2024-06-06 14:04:45,582][20018] Num frames 1200... +[2024-06-06 14:04:45,811][20018] Num frames 1300... +[2024-06-06 14:04:46,056][20018] Num frames 1400... +[2024-06-06 14:04:46,283][20018] Num frames 1500... +[2024-06-06 14:04:46,528][20018] Num frames 1600... +[2024-06-06 14:04:46,808][20018] Num frames 1700... +[2024-06-06 14:04:47,110][20018] Num frames 1800... +[2024-06-06 14:04:47,344][20018] Num frames 1900... +[2024-06-06 14:04:47,588][20018] Num frames 2000... +[2024-06-06 14:04:47,858][20018] Num frames 2100... +[2024-06-06 14:04:47,913][20018] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 +[2024-06-06 14:04:47,915][20018] Avg episode reward: 58.999, avg true_objective: 21.000 +[2024-06-06 14:04:48,209][20018] Num frames 2200... +[2024-06-06 14:04:48,495][20018] Num frames 2300... +[2024-06-06 14:04:48,740][20018] Num frames 2400... +[2024-06-06 14:04:49,003][20018] Num frames 2500... +[2024-06-06 14:04:49,252][20018] Num frames 2600... +[2024-06-06 14:04:49,517][20018] Num frames 2700... +[2024-06-06 14:04:49,783][20018] Num frames 2800... +[2024-06-06 14:04:50,061][20018] Num frames 2900... +[2024-06-06 14:04:50,160][20018] Avg episode rewards: #0: 37.564, true rewards: #0: 14.565 +[2024-06-06 14:04:50,163][20018] Avg episode reward: 37.564, avg true_objective: 14.565 +[2024-06-06 14:04:50,382][20018] Num frames 3000... +[2024-06-06 14:04:50,646][20018] Num frames 3100... +[2024-06-06 14:04:50,911][20018] Num frames 3200... +[2024-06-06 14:04:51,245][20018] Avg episode rewards: #0: 26.323, true rewards: #0: 10.990 +[2024-06-06 14:04:51,248][20018] Avg episode reward: 26.323, avg true_objective: 10.990 +[2024-06-06 14:04:51,258][20018] Num frames 3300... +[2024-06-06 14:04:51,551][20018] Num frames 3400... +[2024-06-06 14:04:51,857][20018] Num frames 3500... +[2024-06-06 14:04:52,100][20018] Num frames 3600... +[2024-06-06 14:04:52,329][20018] Num frames 3700... +[2024-06-06 14:04:52,425][20018] Avg episode rewards: #0: 21.040, true rewards: #0: 9.290 +[2024-06-06 14:04:52,427][20018] Avg episode reward: 21.040, avg true_objective: 9.290 +[2024-06-06 14:04:52,575][20018] Num frames 3800... +[2024-06-06 14:04:52,723][20018] Num frames 3900... +[2024-06-06 14:04:52,879][20018] Num frames 4000... +[2024-06-06 14:04:53,027][20018] Num frames 4100... +[2024-06-06 14:04:53,176][20018] Num frames 4200... +[2024-06-06 14:04:53,336][20018] Num frames 4300... +[2024-06-06 14:04:53,487][20018] Num frames 4400... +[2024-06-06 14:04:53,644][20018] Num frames 4500... +[2024-06-06 14:04:53,797][20018] Num frames 4600... +[2024-06-06 14:04:53,956][20018] Num frames 4700... +[2024-06-06 14:04:54,119][20018] Num frames 4800... +[2024-06-06 14:04:54,320][20018] Num frames 4900... +[2024-06-06 14:04:54,394][20018] Avg episode rewards: #0: 22.218, true rewards: #0: 9.818 +[2024-06-06 14:04:54,396][20018] Avg episode reward: 22.218, avg true_objective: 9.818 +[2024-06-06 14:04:54,533][20018] Num frames 5000... +[2024-06-06 14:04:54,718][20018] Num frames 5100... +[2024-06-06 14:04:54,893][20018] Num frames 5200... +[2024-06-06 14:04:55,080][20018] Num frames 5300... +[2024-06-06 14:04:55,253][20018] Num frames 5400... +[2024-06-06 14:04:55,425][20018] Num frames 5500... +[2024-06-06 14:04:55,604][20018] Num frames 5600... +[2024-06-06 14:04:55,697][20018] Avg episode rewards: #0: 20.522, true rewards: #0: 9.355 +[2024-06-06 14:04:55,699][20018] Avg episode reward: 20.522, avg true_objective: 9.355 +[2024-06-06 14:04:55,830][20018] Num frames 5700... +[2024-06-06 14:04:55,976][20018] Num frames 5800... +[2024-06-06 14:04:56,127][20018] Num frames 5900... +[2024-06-06 14:04:56,288][20018] Num frames 6000... +[2024-06-06 14:04:56,476][20018] Avg episode rewards: #0: 19.116, true rewards: #0: 8.687 +[2024-06-06 14:04:56,478][20018] Avg episode reward: 19.116, avg true_objective: 8.687 +[2024-06-06 14:04:56,515][20018] Num frames 6100... +[2024-06-06 14:04:56,727][20018] Num frames 6200... +[2024-06-06 14:04:56,939][20018] Num frames 6300... +[2024-06-06 14:04:57,115][20018] Num frames 6400... +[2024-06-06 14:04:57,265][20018] Num frames 6500... +[2024-06-06 14:04:57,431][20018] Num frames 6600... +[2024-06-06 14:04:57,583][20018] Num frames 6700... +[2024-06-06 14:04:57,738][20018] Num frames 6800... +[2024-06-06 14:04:57,886][20018] Num frames 6900... +[2024-06-06 14:04:58,062][20018] Avg episode rewards: #0: 19.467, true rewards: #0: 8.717 +[2024-06-06 14:04:58,064][20018] Avg episode reward: 19.467, avg true_objective: 8.717 +[2024-06-06 14:04:58,110][20018] Num frames 7000... +[2024-06-06 14:04:58,258][20018] Num frames 7100... +[2024-06-06 14:04:58,415][20018] Num frames 7200... +[2024-06-06 14:04:58,577][20018] Num frames 7300... +[2024-06-06 14:04:58,731][20018] Num frames 7400... +[2024-06-06 14:04:58,890][20018] Num frames 7500... +[2024-06-06 14:04:58,976][20018] Avg episode rewards: #0: 18.353, true rewards: #0: 8.353 +[2024-06-06 14:04:58,978][20018] Avg episode reward: 18.353, avg true_objective: 8.353 +[2024-06-06 14:04:59,107][20018] Num frames 7600... +[2024-06-06 14:04:59,256][20018] Num frames 7700... +[2024-06-06 14:04:59,417][20018] Num frames 7800... +[2024-06-06 14:04:59,586][20018] Num frames 7900... +[2024-06-06 14:04:59,738][20018] Num frames 8000... +[2024-06-06 14:04:59,904][20018] Num frames 8100... +[2024-06-06 14:05:00,057][20018] Num frames 8200... +[2024-06-06 14:05:00,212][20018] Num frames 8300... +[2024-06-06 14:05:00,374][20018] Avg episode rewards: #0: 18.369, true rewards: #0: 8.369 +[2024-06-06 14:05:00,377][20018] Avg episode reward: 18.369, avg true_objective: 8.369 +[2024-06-06 14:06:03,065][20018] Replay video saved to /content/train_dir/default_experiment/replay.mp4!