[2024-06-06 14:15:06,308][01062] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-06-06 14:15:06,314][01062] Rollout worker 0 uses device cpu [2024-06-06 14:15:06,317][01062] Rollout worker 1 uses device cpu [2024-06-06 14:15:06,320][01062] Rollout worker 2 uses device cpu [2024-06-06 14:15:06,321][01062] Rollout worker 3 uses device cpu [2024-06-06 14:15:06,323][01062] Rollout worker 4 uses device cpu [2024-06-06 14:15:06,324][01062] Rollout worker 5 uses device cpu [2024-06-06 14:15:06,326][01062] Rollout worker 6 uses device cpu [2024-06-06 14:15:06,329][01062] Rollout worker 7 uses device cpu [2024-06-06 14:15:06,587][01062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:15:06,589][01062] InferenceWorker_p0-w0: min num requests: 2 [2024-06-06 14:15:06,624][01062] Starting all processes... [2024-06-06 14:15:06,625][01062] Starting process learner_proc0 [2024-06-06 14:15:08,172][01062] Starting all processes... [2024-06-06 14:15:08,184][01062] Starting process inference_proc0-0 [2024-06-06 14:15:08,184][01062] Starting process rollout_proc0 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc1 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc2 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc3 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc4 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc5 [2024-06-06 14:15:08,189][01062] Starting process rollout_proc6 [2024-06-06 14:15:08,190][01062] Starting process rollout_proc7 [2024-06-06 14:15:23,103][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:15:23,110][03191] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-06 14:15:23,194][03191] Num visible devices: 1 [2024-06-06 14:15:23,239][03191] Starting seed is not provided [2024-06-06 14:15:23,240][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:15:23,241][03191] Initializing actor-critic model on device cuda:0 [2024-06-06 14:15:23,242][03191] RunningMeanStd input shape: (3, 72, 128) [2024-06-06 14:15:23,245][03191] RunningMeanStd input shape: (1,) [2024-06-06 14:15:23,326][03191] ConvEncoder: input_channels=3 [2024-06-06 14:15:23,449][03204] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:15:23,451][03204] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-06 14:15:23,559][03204] Num visible devices: 1 [2024-06-06 14:15:23,767][03208] Worker 3 uses CPU cores [1] [2024-06-06 14:15:23,794][03205] Worker 0 uses CPU cores [0] [2024-06-06 14:15:23,831][03211] Worker 6 uses CPU cores [0] [2024-06-06 14:15:23,842][03209] Worker 5 uses CPU cores [1] [2024-06-06 14:15:23,966][03206] Worker 1 uses CPU cores [1] [2024-06-06 14:15:23,973][03207] Worker 2 uses CPU cores [0] [2024-06-06 14:15:24,029][03212] Worker 7 uses CPU cores [1] [2024-06-06 14:15:24,050][03210] Worker 4 uses CPU cores [0] [2024-06-06 14:15:24,083][03191] Conv encoder output size: 512 [2024-06-06 14:15:24,084][03191] Policy head output size: 512 [2024-06-06 14:15:24,146][03191] Created Actor Critic model with architecture: [2024-06-06 14:15:24,146][03191] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-06-06 14:15:24,463][03191] Using optimizer [2024-06-06 14:15:25,947][03191] No checkpoints found [2024-06-06 14:15:25,947][03191] Did not load from checkpoint, starting from scratch! [2024-06-06 14:15:25,947][03191] Initialized policy 0 weights for model version 0 [2024-06-06 14:15:25,952][03191] LearnerWorker_p0 finished initialization! [2024-06-06 14:15:25,956][03191] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-06 14:15:26,161][03204] RunningMeanStd input shape: (3, 72, 128) [2024-06-06 14:15:26,162][03204] RunningMeanStd input shape: (1,) [2024-06-06 14:15:26,182][03204] ConvEncoder: input_channels=3 [2024-06-06 14:15:26,342][03204] Conv encoder output size: 512 [2024-06-06 14:15:26,343][03204] Policy head output size: 512 [2024-06-06 14:15:26,420][01062] Inference worker 0-0 is ready! [2024-06-06 14:15:26,423][01062] All inference workers are ready! Signal rollout workers to start! [2024-06-06 14:15:26,582][01062] Heartbeat connected on Batcher_0 [2024-06-06 14:15:26,586][01062] Heartbeat connected on LearnerWorker_p0 [2024-06-06 14:15:26,625][01062] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-06 14:15:26,890][03208] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,905][03205] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,910][03206] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,966][03212] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,979][03207] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,982][03210] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:26,997][03211] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:27,022][03209] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:15:28,138][01062] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:29,163][03206] Decorrelating experience for 0 frames... [2024-06-06 14:15:29,162][03209] Decorrelating experience for 0 frames... [2024-06-06 14:15:29,161][03208] Decorrelating experience for 0 frames... [2024-06-06 14:15:29,165][03207] Decorrelating experience for 0 frames... [2024-06-06 14:15:29,164][03211] Decorrelating experience for 0 frames... [2024-06-06 14:15:29,161][03205] Decorrelating experience for 0 frames... [2024-06-06 14:15:30,277][03205] Decorrelating experience for 32 frames... [2024-06-06 14:15:30,288][03207] Decorrelating experience for 32 frames... [2024-06-06 14:15:30,291][03210] Decorrelating experience for 0 frames... [2024-06-06 14:15:30,557][03208] Decorrelating experience for 32 frames... [2024-06-06 14:15:30,560][03206] Decorrelating experience for 32 frames... [2024-06-06 14:15:30,562][03209] Decorrelating experience for 32 frames... [2024-06-06 14:15:30,633][03212] Decorrelating experience for 0 frames... [2024-06-06 14:15:31,295][03211] Decorrelating experience for 32 frames... [2024-06-06 14:15:31,342][03207] Decorrelating experience for 64 frames... [2024-06-06 14:15:31,765][03212] Decorrelating experience for 32 frames... [2024-06-06 14:15:31,786][03206] Decorrelating experience for 64 frames... [2024-06-06 14:15:31,783][03208] Decorrelating experience for 64 frames... [2024-06-06 14:15:32,656][03209] Decorrelating experience for 64 frames... [2024-06-06 14:15:32,728][03208] Decorrelating experience for 96 frames... [2024-06-06 14:15:32,909][03205] Decorrelating experience for 64 frames... [2024-06-06 14:15:32,940][03211] Decorrelating experience for 64 frames... [2024-06-06 14:15:32,973][03210] Decorrelating experience for 32 frames... [2024-06-06 14:15:33,136][03207] Decorrelating experience for 96 frames... [2024-06-06 14:15:33,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:33,935][03206] Decorrelating experience for 96 frames... [2024-06-06 14:15:34,182][03212] Decorrelating experience for 64 frames... [2024-06-06 14:15:34,315][03210] Decorrelating experience for 64 frames... [2024-06-06 14:15:34,361][03211] Decorrelating experience for 96 frames... [2024-06-06 14:15:34,413][03209] Decorrelating experience for 96 frames... [2024-06-06 14:15:35,312][03207] Decorrelating experience for 128 frames... [2024-06-06 14:15:35,595][03205] Decorrelating experience for 96 frames... [2024-06-06 14:15:35,974][03208] Decorrelating experience for 128 frames... [2024-06-06 14:15:36,038][03212] Decorrelating experience for 96 frames... [2024-06-06 14:15:36,369][03206] Decorrelating experience for 128 frames... [2024-06-06 14:15:36,415][03211] Decorrelating experience for 128 frames... [2024-06-06 14:15:36,954][03209] Decorrelating experience for 128 frames... [2024-06-06 14:15:37,012][03210] Decorrelating experience for 96 frames... [2024-06-06 14:15:37,638][03205] Decorrelating experience for 128 frames... [2024-06-06 14:15:37,765][03207] Decorrelating experience for 160 frames... [2024-06-06 14:15:37,797][03208] Decorrelating experience for 160 frames... [2024-06-06 14:15:38,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:38,505][03206] Decorrelating experience for 160 frames... [2024-06-06 14:15:38,578][03212] Decorrelating experience for 128 frames... [2024-06-06 14:15:39,389][03210] Decorrelating experience for 128 frames... [2024-06-06 14:15:39,416][03211] Decorrelating experience for 160 frames... [2024-06-06 14:15:39,429][03209] Decorrelating experience for 160 frames... [2024-06-06 14:15:39,947][03207] Decorrelating experience for 192 frames... [2024-06-06 14:15:40,936][03212] Decorrelating experience for 160 frames... [2024-06-06 14:15:41,695][03206] Decorrelating experience for 192 frames... [2024-06-06 14:15:41,747][03205] Decorrelating experience for 160 frames... [2024-06-06 14:15:42,196][03208] Decorrelating experience for 192 frames... [2024-06-06 14:15:42,256][03209] Decorrelating experience for 192 frames... [2024-06-06 14:15:42,380][03211] Decorrelating experience for 192 frames... [2024-06-06 14:15:42,931][03207] Decorrelating experience for 224 frames... [2024-06-06 14:15:43,139][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:43,776][03212] Decorrelating experience for 192 frames... [2024-06-06 14:15:43,929][01062] Heartbeat connected on RolloutWorker_w2 [2024-06-06 14:15:44,486][03210] Decorrelating experience for 160 frames... [2024-06-06 14:15:45,068][03206] Decorrelating experience for 224 frames... [2024-06-06 14:15:45,276][03205] Decorrelating experience for 192 frames... [2024-06-06 14:15:45,675][03209] Decorrelating experience for 224 frames... [2024-06-06 14:15:45,997][03211] Decorrelating experience for 224 frames... [2024-06-06 14:15:46,070][01062] Heartbeat connected on RolloutWorker_w1 [2024-06-06 14:15:46,599][01062] Heartbeat connected on RolloutWorker_w5 [2024-06-06 14:15:46,785][03208] Decorrelating experience for 224 frames... [2024-06-06 14:15:47,086][01062] Heartbeat connected on RolloutWorker_w6 [2024-06-06 14:15:48,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 29.6. Samples: 592. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:48,143][01062] Avg episode reward: [(0, '0.823')] [2024-06-06 14:15:48,208][01062] Heartbeat connected on RolloutWorker_w3 [2024-06-06 14:15:49,190][03210] Decorrelating experience for 192 frames... [2024-06-06 14:15:49,353][03205] Decorrelating experience for 224 frames... [2024-06-06 14:15:51,458][01062] Heartbeat connected on RolloutWorker_w0 [2024-06-06 14:15:52,633][03212] Decorrelating experience for 224 frames... [2024-06-06 14:15:53,138][01062] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 77.8. Samples: 1944. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-06 14:15:53,141][01062] Avg episode reward: [(0, '1.875')] [2024-06-06 14:15:53,420][03191] Signal inference workers to stop experience collection... [2024-06-06 14:15:53,437][03204] InferenceWorker_p0-w0: stopping experience collection [2024-06-06 14:15:53,480][01062] Heartbeat connected on RolloutWorker_w7 [2024-06-06 14:15:53,693][03210] Decorrelating experience for 224 frames... [2024-06-06 14:15:53,938][01062] Heartbeat connected on RolloutWorker_w4 [2024-06-06 14:15:55,002][03191] Signal inference workers to resume experience collection... [2024-06-06 14:15:55,003][03204] InferenceWorker_p0-w0: resuming experience collection [2024-06-06 14:15:58,138][01062] Fps is (10 sec: 1638.4, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 16384. Throughput: 0: 134.4. Samples: 4032. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) [2024-06-06 14:15:58,140][01062] Avg episode reward: [(0, '2.384')] [2024-06-06 14:16:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 936.2, 300 sec: 936.2). Total num frames: 32768. Throughput: 0: 256.6. Samples: 8980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:16:03,145][01062] Avg episode reward: [(0, '3.114')] [2024-06-06 14:16:05,548][03204] Updated weights for policy 0, policy_version 10 (0.0025) [2024-06-06 14:16:08,138][01062] Fps is (10 sec: 2867.2, 60 sec: 1126.4, 300 sec: 1126.4). Total num frames: 45056. Throughput: 0: 281.3. Samples: 11252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:16:08,140][01062] Avg episode reward: [(0, '3.869')] [2024-06-06 14:16:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 1547.4, 300 sec: 1547.4). Total num frames: 69632. Throughput: 0: 371.1. Samples: 16700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:16:13,140][01062] Avg episode reward: [(0, '4.295')] [2024-06-06 14:16:15,821][03204] Updated weights for policy 0, policy_version 20 (0.0041) [2024-06-06 14:16:18,141][01062] Fps is (10 sec: 4504.5, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 90112. Throughput: 0: 525.9. Samples: 23668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:16:18,143][01062] Avg episode reward: [(0, '4.471')] [2024-06-06 14:16:23,141][01062] Fps is (10 sec: 3685.5, 60 sec: 1936.2, 300 sec: 1936.2). Total num frames: 106496. Throughput: 0: 588.6. Samples: 26488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:16:23,149][01062] Avg episode reward: [(0, '4.388')] [2024-06-06 14:16:23,180][03191] Saving new best policy, reward=4.388! [2024-06-06 14:16:27,677][03204] Updated weights for policy 0, policy_version 30 (0.0015) [2024-06-06 14:16:28,138][01062] Fps is (10 sec: 3277.6, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 122880. Throughput: 0: 686.8. Samples: 30908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:16:28,144][01062] Avg episode reward: [(0, '4.407')] [2024-06-06 14:16:28,162][03191] Saving new best policy, reward=4.407! [2024-06-06 14:16:33,138][01062] Fps is (10 sec: 3687.3, 60 sec: 2389.3, 300 sec: 2205.5). Total num frames: 143360. Throughput: 0: 801.4. Samples: 36656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:16:33,141][01062] Avg episode reward: [(0, '4.372')] [2024-06-06 14:16:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 2594.1, 300 sec: 2223.5). Total num frames: 155648. Throughput: 0: 826.2. Samples: 39124. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:16:38,140][01062] Avg episode reward: [(0, '4.499')] [2024-06-06 14:16:38,156][03191] Saving new best policy, reward=4.499! [2024-06-06 14:16:41,557][03204] Updated weights for policy 0, policy_version 40 (0.0022) [2024-06-06 14:16:43,138][01062] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2239.1). Total num frames: 167936. Throughput: 0: 846.8. Samples: 42136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:16:43,141][01062] Avg episode reward: [(0, '4.490')] [2024-06-06 14:16:48,141][01062] Fps is (10 sec: 2866.5, 60 sec: 3071.9, 300 sec: 2303.9). Total num frames: 184320. Throughput: 0: 840.3. Samples: 46796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:16:48,143][01062] Avg episode reward: [(0, '4.416')] [2024-06-06 14:16:53,003][03204] Updated weights for policy 0, policy_version 50 (0.0019) [2024-06-06 14:16:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2409.4). Total num frames: 204800. Throughput: 0: 850.0. Samples: 49504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:16:53,141][01062] Avg episode reward: [(0, '4.348')] [2024-06-06 14:16:58,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 2503.1). Total num frames: 225280. Throughput: 0: 880.7. Samples: 56332. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:16:58,140][01062] Avg episode reward: [(0, '4.553')] [2024-06-06 14:16:58,148][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000055_225280.pth... [2024-06-06 14:16:58,339][03191] Saving new best policy, reward=4.553! [2024-06-06 14:17:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2500.7). Total num frames: 237568. Throughput: 0: 826.9. Samples: 60876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:17:03,144][01062] Avg episode reward: [(0, '4.536')] [2024-06-06 14:17:05,293][03204] Updated weights for policy 0, policy_version 60 (0.0014) [2024-06-06 14:17:08,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3413.2, 300 sec: 2498.5). Total num frames: 249856. Throughput: 0: 806.0. Samples: 62760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:17:08,143][01062] Avg episode reward: [(0, '4.382')] [2024-06-06 14:17:13,139][01062] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 2496.6). Total num frames: 262144. Throughput: 0: 788.7. Samples: 66400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:17:13,145][01062] Avg episode reward: [(0, '4.361')] [2024-06-06 14:17:17,829][03204] Updated weights for policy 0, policy_version 70 (0.0021) [2024-06-06 14:17:18,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3276.9, 300 sec: 2606.5). Total num frames: 286720. Throughput: 0: 788.4. Samples: 72136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:17:18,146][01062] Avg episode reward: [(0, '4.555')] [2024-06-06 14:17:18,155][03191] Saving new best policy, reward=4.555! [2024-06-06 14:17:23,138][01062] Fps is (10 sec: 4505.8, 60 sec: 3345.2, 300 sec: 2671.3). Total num frames: 307200. Throughput: 0: 810.0. Samples: 75572. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:17:23,144][01062] Avg episode reward: [(0, '4.358')] [2024-06-06 14:17:28,139][01062] Fps is (10 sec: 3686.2, 60 sec: 3345.0, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 875.8. Samples: 81548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:17:28,144][01062] Avg episode reward: [(0, '4.320')] [2024-06-06 14:17:28,464][03204] Updated weights for policy 0, policy_version 80 (0.0015) [2024-06-06 14:17:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2719.7). Total num frames: 339968. Throughput: 0: 874.8. Samples: 86160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:17:33,142][01062] Avg episode reward: [(0, '4.445')] [2024-06-06 14:17:38,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 2772.7). Total num frames: 360448. Throughput: 0: 875.6. Samples: 88908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:17:38,140][01062] Avg episode reward: [(0, '4.510')] [2024-06-06 14:17:39,070][03204] Updated weights for policy 0, policy_version 90 (0.0019) [2024-06-06 14:17:43,139][01062] Fps is (10 sec: 4505.4, 60 sec: 3618.1, 300 sec: 2852.0). Total num frames: 385024. Throughput: 0: 879.2. Samples: 95896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:17:43,143][01062] Avg episode reward: [(0, '4.438')] [2024-06-06 14:17:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 2867.2). Total num frames: 401408. Throughput: 0: 910.0. Samples: 101828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:17:48,142][01062] Avg episode reward: [(0, '4.380')] [2024-06-06 14:17:49,421][03204] Updated weights for policy 0, policy_version 100 (0.0025) [2024-06-06 14:17:53,141][01062] Fps is (10 sec: 3276.1, 60 sec: 3549.7, 300 sec: 2881.3). Total num frames: 417792. Throughput: 0: 919.5. Samples: 104136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:17:53,143][01062] Avg episode reward: [(0, '4.462')] [2024-06-06 14:17:58,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 2921.8). Total num frames: 438272. Throughput: 0: 947.9. Samples: 109056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:17:58,141][01062] Avg episode reward: [(0, '4.451')] [2024-06-06 14:18:00,560][03204] Updated weights for policy 0, policy_version 110 (0.0014) [2024-06-06 14:18:03,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 2959.7). Total num frames: 458752. Throughput: 0: 970.1. Samples: 115792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:18:03,142][01062] Avg episode reward: [(0, '4.736')] [2024-06-06 14:18:03,149][03191] Saving new best policy, reward=4.736! [2024-06-06 14:18:08,140][01062] Fps is (10 sec: 4095.3, 60 sec: 3823.0, 300 sec: 2995.2). Total num frames: 479232. Throughput: 0: 967.2. Samples: 119096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:18:08,143][01062] Avg episode reward: [(0, '4.829')] [2024-06-06 14:18:08,150][03191] Saving new best policy, reward=4.829! [2024-06-06 14:18:12,072][03204] Updated weights for policy 0, policy_version 120 (0.0036) [2024-06-06 14:18:13,139][01062] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 2978.9). Total num frames: 491520. Throughput: 0: 930.8. Samples: 123432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:18:13,143][01062] Avg episode reward: [(0, '4.837')] [2024-06-06 14:18:13,150][03191] Saving new best policy, reward=4.837! [2024-06-06 14:18:18,138][01062] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3011.8). Total num frames: 512000. Throughput: 0: 948.1. Samples: 128824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:18:18,141][01062] Avg episode reward: [(0, '4.952')] [2024-06-06 14:18:18,156][03191] Saving new best policy, reward=4.952! [2024-06-06 14:18:22,748][03204] Updated weights for policy 0, policy_version 130 (0.0017) [2024-06-06 14:18:23,138][01062] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3042.7). Total num frames: 532480. Throughput: 0: 950.8. Samples: 131696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:18:23,141][01062] Avg episode reward: [(0, '4.646')] [2024-06-06 14:18:28,139][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3072.0). Total num frames: 552960. Throughput: 0: 937.5. Samples: 138084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:18:28,147][01062] Avg episode reward: [(0, '4.636')] [2024-06-06 14:18:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3055.4). Total num frames: 565248. Throughput: 0: 901.2. Samples: 142380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:18:33,141][01062] Avg episode reward: [(0, '4.697')] [2024-06-06 14:18:35,059][03204] Updated weights for policy 0, policy_version 140 (0.0014) [2024-06-06 14:18:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3082.8). Total num frames: 585728. Throughput: 0: 902.5. Samples: 144748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:18:38,141][01062] Avg episode reward: [(0, '4.681')] [2024-06-06 14:18:43,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3108.8). Total num frames: 606208. Throughput: 0: 942.4. Samples: 151464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:18:43,144][01062] Avg episode reward: [(0, '4.927')] [2024-06-06 14:18:44,389][03204] Updated weights for policy 0, policy_version 150 (0.0014) [2024-06-06 14:18:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3133.4). Total num frames: 626688. Throughput: 0: 932.7. Samples: 157764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:18:48,144][01062] Avg episode reward: [(0, '4.908')] [2024-06-06 14:18:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3136.9). Total num frames: 643072. Throughput: 0: 907.9. Samples: 159948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:18:53,141][01062] Avg episode reward: [(0, '5.032')] [2024-06-06 14:18:53,146][03191] Saving new best policy, reward=5.032! [2024-06-06 14:18:56,770][03204] Updated weights for policy 0, policy_version 160 (0.0028) [2024-06-06 14:18:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 659456. Throughput: 0: 913.5. Samples: 164540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:18:58,144][01062] Avg episode reward: [(0, '4.678')] [2024-06-06 14:18:58,160][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth... [2024-06-06 14:19:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3162.5). Total num frames: 679936. Throughput: 0: 940.4. Samples: 171144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:19:03,140][01062] Avg episode reward: [(0, '4.798')] [2024-06-06 14:19:05,836][03204] Updated weights for policy 0, policy_version 170 (0.0017) [2024-06-06 14:19:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3183.7). Total num frames: 700416. Throughput: 0: 955.0. Samples: 174672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:19:08,141][01062] Avg episode reward: [(0, '5.174')] [2024-06-06 14:19:08,156][03191] Saving new best policy, reward=5.174! [2024-06-06 14:19:13,140][01062] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3185.8). Total num frames: 716800. Throughput: 0: 913.8. Samples: 179208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:19:13,142][01062] Avg episode reward: [(0, '5.269')] [2024-06-06 14:19:13,147][03191] Saving new best policy, reward=5.269! [2024-06-06 14:19:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3187.8). Total num frames: 733184. Throughput: 0: 937.5. Samples: 184568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:19:18,141][01062] Avg episode reward: [(0, '5.044')] [2024-06-06 14:19:18,173][03204] Updated weights for policy 0, policy_version 180 (0.0017) [2024-06-06 14:19:23,139][01062] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3224.5). Total num frames: 757760. Throughput: 0: 963.0. Samples: 188084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:19:23,141][01062] Avg episode reward: [(0, '5.440')] [2024-06-06 14:19:23,144][03191] Saving new best policy, reward=5.440! [2024-06-06 14:19:27,737][03204] Updated weights for policy 0, policy_version 190 (0.0033) [2024-06-06 14:19:28,139][01062] Fps is (10 sec: 4505.3, 60 sec: 3754.6, 300 sec: 3242.7). Total num frames: 778240. Throughput: 0: 959.5. Samples: 194644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:19:28,141][01062] Avg episode reward: [(0, '6.046')] [2024-06-06 14:19:28,158][03191] Saving new best policy, reward=6.046! [2024-06-06 14:19:33,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3243.4). Total num frames: 794624. Throughput: 0: 918.3. Samples: 199088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:19:33,142][01062] Avg episode reward: [(0, '5.971')] [2024-06-06 14:19:38,138][01062] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3244.0). Total num frames: 811008. Throughput: 0: 923.8. Samples: 201520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:19:38,141][01062] Avg episode reward: [(0, '5.575')] [2024-06-06 14:19:39,554][03204] Updated weights for policy 0, policy_version 200 (0.0029) [2024-06-06 14:19:43,140][01062] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3276.8). Total num frames: 835584. Throughput: 0: 965.9. Samples: 208008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:19:43,143][01062] Avg episode reward: [(0, '5.472')] [2024-06-06 14:19:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3292.6). Total num frames: 856064. Throughput: 0: 971.4. Samples: 214856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:19:48,142][01062] Avg episode reward: [(0, '5.546')] [2024-06-06 14:19:48,882][03204] Updated weights for policy 0, policy_version 210 (0.0025) [2024-06-06 14:19:53,139][01062] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3292.3). Total num frames: 872448. Throughput: 0: 942.9. Samples: 217104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:19:53,142][01062] Avg episode reward: [(0, '5.513')] [2024-06-06 14:19:58,140][01062] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3292.0). Total num frames: 888832. Throughput: 0: 943.4. Samples: 221660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:19:58,143][01062] Avg episode reward: [(0, '5.777')] [2024-06-06 14:20:00,674][03204] Updated weights for policy 0, policy_version 220 (0.0015) [2024-06-06 14:20:03,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3306.6). Total num frames: 909312. Throughput: 0: 974.9. Samples: 228440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:20:03,141][01062] Avg episode reward: [(0, '5.791')] [2024-06-06 14:20:08,138][01062] Fps is (10 sec: 4506.3, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 933888. Throughput: 0: 974.7. Samples: 231944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:20:08,145][01062] Avg episode reward: [(0, '5.619')] [2024-06-06 14:20:10,329][03204] Updated weights for policy 0, policy_version 230 (0.0040) [2024-06-06 14:20:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3334.3). Total num frames: 950272. Throughput: 0: 947.7. Samples: 237292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:13,142][01062] Avg episode reward: [(0, '5.875')] [2024-06-06 14:20:18,140][01062] Fps is (10 sec: 3276.2, 60 sec: 3891.1, 300 sec: 3333.3). Total num frames: 966656. Throughput: 0: 956.0. Samples: 242108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:18,145][01062] Avg episode reward: [(0, '5.897')] [2024-06-06 14:20:21,337][03204] Updated weights for policy 0, policy_version 240 (0.0020) [2024-06-06 14:20:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 982.4. Samples: 245728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:23,141][01062] Avg episode reward: [(0, '6.586')] [2024-06-06 14:20:23,147][03191] Saving new best policy, reward=6.586! [2024-06-06 14:20:28,138][01062] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 992.4. Samples: 252664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:28,143][01062] Avg episode reward: [(0, '7.013')] [2024-06-06 14:20:28,153][03191] Saving new best policy, reward=7.013! [2024-06-06 14:20:32,014][03204] Updated weights for policy 0, policy_version 250 (0.0017) [2024-06-06 14:20:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 948.7. Samples: 257548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:33,145][01062] Avg episode reward: [(0, '7.335')] [2024-06-06 14:20:33,146][03191] Saving new best policy, reward=7.335! [2024-06-06 14:20:38,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3526.7). Total num frames: 1040384. Throughput: 0: 949.7. Samples: 259840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:20:38,146][01062] Avg episode reward: [(0, '7.087')] [2024-06-06 14:20:42,626][03204] Updated weights for policy 0, policy_version 260 (0.0027) [2024-06-06 14:20:43,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 985.4. Samples: 266000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:20:43,144][01062] Avg episode reward: [(0, '6.606')] [2024-06-06 14:20:48,139][01062] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 988.5. Samples: 272924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:20:48,145][01062] Avg episode reward: [(0, '6.679')] [2024-06-06 14:20:53,142][01062] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3679.4). Total num frames: 1101824. Throughput: 0: 962.9. Samples: 275276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:20:53,148][01062] Avg episode reward: [(0, '6.956')] [2024-06-06 14:20:53,477][03204] Updated weights for policy 0, policy_version 270 (0.0023) [2024-06-06 14:20:58,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 3679.5). Total num frames: 1118208. Throughput: 0: 943.4. Samples: 279744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:20:58,141][01062] Avg episode reward: [(0, '7.375')] [2024-06-06 14:20:58,150][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth... [2024-06-06 14:20:58,267][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000055_225280.pth [2024-06-06 14:20:58,287][03191] Saving new best policy, reward=7.375! [2024-06-06 14:21:03,138][01062] Fps is (10 sec: 4097.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1142784. Throughput: 0: 974.8. Samples: 285972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:21:03,144][01062] Avg episode reward: [(0, '7.984')] [2024-06-06 14:21:03,145][03191] Saving new best policy, reward=7.984! [2024-06-06 14:21:04,187][03204] Updated weights for policy 0, policy_version 280 (0.0014) [2024-06-06 14:21:08,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1163264. Throughput: 0: 972.2. Samples: 289476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:21:08,141][01062] Avg episode reward: [(0, '8.775')] [2024-06-06 14:21:08,148][03191] Saving new best policy, reward=8.775! [2024-06-06 14:21:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3693.4). Total num frames: 1179648. Throughput: 0: 938.8. Samples: 294912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:21:13,145][01062] Avg episode reward: [(0, '8.699')] [2024-06-06 14:21:15,737][03204] Updated weights for policy 0, policy_version 290 (0.0020) [2024-06-06 14:21:18,139][01062] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 1191936. Throughput: 0: 933.2. Samples: 299540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:21:18,149][01062] Avg episode reward: [(0, '8.852')] [2024-06-06 14:21:18,267][03191] Saving new best policy, reward=8.852! [2024-06-06 14:21:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1220608. Throughput: 0: 955.7. Samples: 302848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:21:23,144][01062] Avg episode reward: [(0, '8.610')] [2024-06-06 14:21:25,159][03204] Updated weights for policy 0, policy_version 300 (0.0019) [2024-06-06 14:21:28,143][01062] Fps is (10 sec: 4913.1, 60 sec: 3822.6, 300 sec: 3721.1). Total num frames: 1241088. Throughput: 0: 970.5. Samples: 309676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:21:28,145][01062] Avg episode reward: [(0, '9.365')] [2024-06-06 14:21:28,157][03191] Saving new best policy, reward=9.365! [2024-06-06 14:21:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1257472. Throughput: 0: 928.5. Samples: 314708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:21:33,141][01062] Avg episode reward: [(0, '8.847')] [2024-06-06 14:21:37,191][03204] Updated weights for policy 0, policy_version 310 (0.0020) [2024-06-06 14:21:38,138][01062] Fps is (10 sec: 2868.5, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1269760. Throughput: 0: 927.5. Samples: 317012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:21:38,142][01062] Avg episode reward: [(0, '9.023')] [2024-06-06 14:21:43,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3822.8, 300 sec: 3762.8). Total num frames: 1294336. Throughput: 0: 962.7. Samples: 323068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:21:43,142][01062] Avg episode reward: [(0, '8.614')] [2024-06-06 14:21:46,296][03204] Updated weights for policy 0, policy_version 320 (0.0018) [2024-06-06 14:21:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1314816. Throughput: 0: 978.8. Samples: 330020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:21:48,142][01062] Avg episode reward: [(0, '9.085')] [2024-06-06 14:21:53,140][01062] Fps is (10 sec: 3277.0, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1327104. Throughput: 0: 943.4. Samples: 331932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:21:53,145][01062] Avg episode reward: [(0, '8.994')] [2024-06-06 14:21:58,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3686.2, 300 sec: 3735.0). Total num frames: 1339392. Throughput: 0: 904.9. Samples: 335636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:21:58,143][01062] Avg episode reward: [(0, '9.775')] [2024-06-06 14:21:58,154][03191] Saving new best policy, reward=9.775! [2024-06-06 14:22:02,099][03204] Updated weights for policy 0, policy_version 330 (0.0021) [2024-06-06 14:22:03,138][01062] Fps is (10 sec: 2458.1, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 1351680. Throughput: 0: 878.6. Samples: 339076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:22:03,144][01062] Avg episode reward: [(0, '9.998')] [2024-06-06 14:22:03,148][03191] Saving new best policy, reward=9.998! [2024-06-06 14:22:08,139][01062] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 1376256. Throughput: 0: 873.7. Samples: 342164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:22:08,146][01062] Avg episode reward: [(0, '10.185')] [2024-06-06 14:22:08,158][03191] Saving new best policy, reward=10.185! [2024-06-06 14:22:11,558][03204] Updated weights for policy 0, policy_version 340 (0.0027) [2024-06-06 14:22:13,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1396736. Throughput: 0: 871.8. Samples: 348904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:22:13,149][01062] Avg episode reward: [(0, '10.488')] [2024-06-06 14:22:13,153][03191] Saving new best policy, reward=10.488! [2024-06-06 14:22:18,140][01062] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3748.9). Total num frames: 1413120. Throughput: 0: 874.9. Samples: 354080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:22:18,144][01062] Avg episode reward: [(0, '10.401')] [2024-06-06 14:22:23,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3748.9). Total num frames: 1429504. Throughput: 0: 875.1. Samples: 356392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:22:23,143][01062] Avg episode reward: [(0, '10.562')] [2024-06-06 14:22:23,146][03191] Saving new best policy, reward=10.562! [2024-06-06 14:22:24,044][03204] Updated weights for policy 0, policy_version 350 (0.0025) [2024-06-06 14:22:28,138][01062] Fps is (10 sec: 3686.8, 60 sec: 3481.9, 300 sec: 3762.8). Total num frames: 1449984. Throughput: 0: 871.0. Samples: 362260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:22:28,141][01062] Avg episode reward: [(0, '11.011')] [2024-06-06 14:22:28,152][03191] Saving new best policy, reward=11.011! [2024-06-06 14:22:33,138][01062] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 1470464. Throughput: 0: 865.6. Samples: 368972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:22:33,147][01062] Avg episode reward: [(0, '10.808')] [2024-06-06 14:22:33,162][03204] Updated weights for policy 0, policy_version 360 (0.0016) [2024-06-06 14:22:38,140][01062] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3735.0). Total num frames: 1486848. Throughput: 0: 884.2. Samples: 371720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:22:38,143][01062] Avg episode reward: [(0, '10.526')] [2024-06-06 14:22:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3735.0). Total num frames: 1503232. Throughput: 0: 904.8. Samples: 376348. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-06-06 14:22:43,146][01062] Avg episode reward: [(0, '11.324')] [2024-06-06 14:22:43,153][03191] Saving new best policy, reward=11.324! [2024-06-06 14:22:45,313][03204] Updated weights for policy 0, policy_version 370 (0.0024) [2024-06-06 14:22:48,138][01062] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 1527808. Throughput: 0: 963.6. Samples: 382436. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2024-06-06 14:22:48,141][01062] Avg episode reward: [(0, '11.287')] [2024-06-06 14:22:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 1548288. Throughput: 0: 974.0. Samples: 385992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:22:53,140][01062] Avg episode reward: [(0, '10.889')] [2024-06-06 14:22:54,099][03204] Updated weights for policy 0, policy_version 380 (0.0029) [2024-06-06 14:22:58,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 1568768. Throughput: 0: 959.7. Samples: 392092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:22:58,141][01062] Avg episode reward: [(0, '11.199')] [2024-06-06 14:22:58,151][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000383_1568768.pth... [2024-06-06 14:22:58,319][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000161_659456.pth [2024-06-06 14:23:03,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1581056. Throughput: 0: 943.9. Samples: 396556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:23:03,148][01062] Avg episode reward: [(0, '11.385')] [2024-06-06 14:23:03,155][03191] Saving new best policy, reward=11.385! [2024-06-06 14:23:06,336][03204] Updated weights for policy 0, policy_version 390 (0.0024) [2024-06-06 14:23:08,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1601536. Throughput: 0: 953.2. Samples: 399284. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-06-06 14:23:08,152][01062] Avg episode reward: [(0, '11.939')] [2024-06-06 14:23:08,227][03191] Saving new best policy, reward=11.939! [2024-06-06 14:23:13,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1626112. Throughput: 0: 973.7. Samples: 406076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:23:13,144][01062] Avg episode reward: [(0, '13.326')] [2024-06-06 14:23:13,150][03191] Saving new best policy, reward=13.326! [2024-06-06 14:23:16,002][03204] Updated weights for policy 0, policy_version 400 (0.0013) [2024-06-06 14:23:18,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1642496. Throughput: 0: 951.5. Samples: 411788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:23:18,146][01062] Avg episode reward: [(0, '13.823')] [2024-06-06 14:23:18,161][03191] Saving new best policy, reward=13.823! [2024-06-06 14:23:23,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 1658880. Throughput: 0: 939.9. Samples: 414016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:23:23,148][01062] Avg episode reward: [(0, '12.718')] [2024-06-06 14:23:27,784][03204] Updated weights for policy 0, policy_version 410 (0.0025) [2024-06-06 14:23:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 1679360. Throughput: 0: 954.5. Samples: 419300. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:23:28,141][01062] Avg episode reward: [(0, '12.704')] [2024-06-06 14:23:33,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1699840. Throughput: 0: 964.2. Samples: 425824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2024-06-06 14:23:33,145][01062] Avg episode reward: [(0, '12.302')] [2024-06-06 14:23:38,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1716224. Throughput: 0: 954.3. Samples: 428936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:23:38,145][01062] Avg episode reward: [(0, '12.086')] [2024-06-06 14:23:38,700][03204] Updated weights for policy 0, policy_version 420 (0.0015) [2024-06-06 14:23:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1732608. Throughput: 0: 917.9. Samples: 433396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:23:43,141][01062] Avg episode reward: [(0, '12.367')] [2024-06-06 14:23:48,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1753088. Throughput: 0: 945.1. Samples: 439084. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:23:48,141][01062] Avg episode reward: [(0, '13.654')] [2024-06-06 14:23:49,668][03204] Updated weights for policy 0, policy_version 430 (0.0032) [2024-06-06 14:23:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1777664. Throughput: 0: 960.9. Samples: 442524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:23:53,141][01062] Avg episode reward: [(0, '15.404')] [2024-06-06 14:23:53,143][03191] Saving new best policy, reward=15.404! [2024-06-06 14:23:58,145][01062] Fps is (10 sec: 4093.3, 60 sec: 3754.3, 300 sec: 3776.6). Total num frames: 1794048. Throughput: 0: 945.2. Samples: 448616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:23:58,147][01062] Avg episode reward: [(0, '15.927')] [2024-06-06 14:23:58,159][03191] Saving new best policy, reward=15.927! [2024-06-06 14:24:00,775][03204] Updated weights for policy 0, policy_version 440 (0.0033) [2024-06-06 14:24:03,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1806336. Throughput: 0: 912.1. Samples: 452832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:24:03,146][01062] Avg episode reward: [(0, '16.395')] [2024-06-06 14:24:03,150][03191] Saving new best policy, reward=16.395! [2024-06-06 14:24:08,138][01062] Fps is (10 sec: 3278.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1826816. Throughput: 0: 913.5. Samples: 455124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:24:08,141][01062] Avg episode reward: [(0, '15.661')] [2024-06-06 14:24:11,280][03204] Updated weights for policy 0, policy_version 450 (0.0028) [2024-06-06 14:24:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1847296. Throughput: 0: 945.5. Samples: 461848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:24:13,140][01062] Avg episode reward: [(0, '15.786')] [2024-06-06 14:24:18,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1867776. Throughput: 0: 936.2. Samples: 467952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:24:18,141][01062] Avg episode reward: [(0, '16.091')] [2024-06-06 14:24:23,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3686.3, 300 sec: 3735.0). Total num frames: 1880064. Throughput: 0: 917.2. Samples: 470212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:24:23,143][01062] Avg episode reward: [(0, '16.049')] [2024-06-06 14:24:23,202][03204] Updated weights for policy 0, policy_version 460 (0.0033) [2024-06-06 14:24:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1904640. Throughput: 0: 931.7. Samples: 475324. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-06-06 14:24:28,141][01062] Avg episode reward: [(0, '16.104')] [2024-06-06 14:24:32,776][03204] Updated weights for policy 0, policy_version 470 (0.0016) [2024-06-06 14:24:33,138][01062] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1925120. Throughput: 0: 954.9. Samples: 482056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:24:33,141][01062] Avg episode reward: [(0, '18.296')] [2024-06-06 14:24:33,145][03191] Saving new best policy, reward=18.296! [2024-06-06 14:24:38,142][01062] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3748.9). Total num frames: 1941504. Throughput: 0: 951.4. Samples: 485340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:24:38,149][01062] Avg episode reward: [(0, '17.789')] [2024-06-06 14:24:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1957888. Throughput: 0: 915.2. Samples: 489796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:24:43,142][01062] Avg episode reward: [(0, '17.496')] [2024-06-06 14:24:45,061][03204] Updated weights for policy 0, policy_version 480 (0.0023) [2024-06-06 14:24:48,138][01062] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1978368. Throughput: 0: 939.3. Samples: 495100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:24:48,143][01062] Avg episode reward: [(0, '16.777')] [2024-06-06 14:24:53,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2002944. Throughput: 0: 964.8. Samples: 498540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-06-06 14:24:53,141][01062] Avg episode reward: [(0, '17.355')] [2024-06-06 14:24:54,168][03204] Updated weights for policy 0, policy_version 490 (0.0021) [2024-06-06 14:24:58,142][01062] Fps is (10 sec: 4094.7, 60 sec: 3754.9, 300 sec: 3762.7). Total num frames: 2019328. Throughput: 0: 962.3. Samples: 505156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:24:58,144][01062] Avg episode reward: [(0, '17.685')] [2024-06-06 14:24:58,157][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth... [2024-06-06 14:24:58,348][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000273_1118208.pth [2024-06-06 14:25:03,141][01062] Fps is (10 sec: 3275.9, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 2035712. Throughput: 0: 924.1. Samples: 509540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:25:03,144][01062] Avg episode reward: [(0, '18.050')] [2024-06-06 14:25:06,887][03204] Updated weights for policy 0, policy_version 500 (0.0021) [2024-06-06 14:25:08,138][01062] Fps is (10 sec: 3277.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2052096. Throughput: 0: 923.8. Samples: 511780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2024-06-06 14:25:08,141][01062] Avg episode reward: [(0, '19.307')] [2024-06-06 14:25:08,153][03191] Saving new best policy, reward=19.307! [2024-06-06 14:25:13,139][01062] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2076672. Throughput: 0: 953.6. Samples: 518236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2024-06-06 14:25:13,141][01062] Avg episode reward: [(0, '18.955')] [2024-06-06 14:25:15,971][03204] Updated weights for policy 0, policy_version 510 (0.0016) [2024-06-06 14:25:18,139][01062] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 2093056. Throughput: 0: 942.3. Samples: 524460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:25:18,143][01062] Avg episode reward: [(0, '19.265')] [2024-06-06 14:25:23,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3721.1). Total num frames: 2109440. Throughput: 0: 918.4. Samples: 526664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:25:23,151][01062] Avg episode reward: [(0, '18.438')] [2024-06-06 14:25:28,138][01062] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2125824. Throughput: 0: 922.1. Samples: 531292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:25:28,144][01062] Avg episode reward: [(0, '18.850')] [2024-06-06 14:25:28,273][03204] Updated weights for policy 0, policy_version 520 (0.0017) [2024-06-06 14:25:33,138][01062] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2150400. Throughput: 0: 950.3. Samples: 537864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:25:33,151][01062] Avg episode reward: [(0, '18.175')] [2024-06-06 14:25:38,143][01062] Fps is (10 sec: 4094.2, 60 sec: 3754.6, 300 sec: 3734.9). Total num frames: 2166784. Throughput: 0: 948.3. Samples: 541216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:25:38,148][01062] Avg episode reward: [(0, '17.803')] [2024-06-06 14:25:38,614][03204] Updated weights for policy 0, policy_version 530 (0.0018) [2024-06-06 14:25:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2183168. Throughput: 0: 907.1. Samples: 545972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2024-06-06 14:25:43,148][01062] Avg episode reward: [(0, '17.692')] [2024-06-06 14:25:48,138][01062] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 2199552. Throughput: 0: 919.3. Samples: 550904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:25:48,144][01062] Avg episode reward: [(0, '16.980')] [2024-06-06 14:25:50,168][03204] Updated weights for policy 0, policy_version 540 (0.0030) [2024-06-06 14:25:53,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2224128. Throughput: 0: 946.9. Samples: 554392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:25:53,146][01062] Avg episode reward: [(0, '17.171')] [2024-06-06 14:25:58,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.9, 300 sec: 3735.0). Total num frames: 2244608. Throughput: 0: 954.4. Samples: 561184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:25:58,143][01062] Avg episode reward: [(0, '17.859')] [2024-06-06 14:26:00,186][03204] Updated weights for policy 0, policy_version 550 (0.0013) [2024-06-06 14:26:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2260992. Throughput: 0: 915.8. Samples: 565672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:26:03,142][01062] Avg episode reward: [(0, '18.680')] [2024-06-06 14:26:08,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2273280. Throughput: 0: 916.9. Samples: 567924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:26:08,140][01062] Avg episode reward: [(0, '18.439')] [2024-06-06 14:26:11,533][03204] Updated weights for policy 0, policy_version 560 (0.0023) [2024-06-06 14:26:13,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2297856. Throughput: 0: 953.2. Samples: 574184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:26:13,141][01062] Avg episode reward: [(0, '18.135')] [2024-06-06 14:26:18,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2318336. Throughput: 0: 959.3. Samples: 581032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:26:18,143][01062] Avg episode reward: [(0, '19.209')] [2024-06-06 14:26:22,657][03204] Updated weights for policy 0, policy_version 570 (0.0040) [2024-06-06 14:26:23,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 2334720. Throughput: 0: 935.4. Samples: 583304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:26:23,142][01062] Avg episode reward: [(0, '19.396')] [2024-06-06 14:26:23,147][03191] Saving new best policy, reward=19.396! [2024-06-06 14:26:28,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2351104. Throughput: 0: 927.7. Samples: 587720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:26:28,143][01062] Avg episode reward: [(0, '19.639')] [2024-06-06 14:26:28,156][03191] Saving new best policy, reward=19.639! [2024-06-06 14:26:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 2371584. Throughput: 0: 956.8. Samples: 593960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:26:33,144][01062] Avg episode reward: [(0, '19.612')] [2024-06-06 14:26:33,542][03204] Updated weights for policy 0, policy_version 580 (0.0016) [2024-06-06 14:26:38,144][01062] Fps is (10 sec: 4093.7, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 2392064. Throughput: 0: 951.9. Samples: 597232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-06-06 14:26:38,147][01062] Avg episode reward: [(0, '18.191')] [2024-06-06 14:26:43,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2404352. Throughput: 0: 908.4. Samples: 602064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:26:43,146][01062] Avg episode reward: [(0, '18.162')] [2024-06-06 14:26:46,382][03204] Updated weights for policy 0, policy_version 590 (0.0014) [2024-06-06 14:26:48,139][01062] Fps is (10 sec: 2458.8, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 2416640. Throughput: 0: 889.7. Samples: 605708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:26:48,149][01062] Avg episode reward: [(0, '16.772')] [2024-06-06 14:26:53,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3707.3). Total num frames: 2433024. Throughput: 0: 881.4. Samples: 607588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:26:53,145][01062] Avg episode reward: [(0, '16.765')] [2024-06-06 14:26:58,139][01062] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 2453504. Throughput: 0: 866.3. Samples: 613168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-06-06 14:26:58,141][01062] Avg episode reward: [(0, '16.780')] [2024-06-06 14:26:58,153][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000599_2453504.pth... [2024-06-06 14:26:58,277][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000383_1568768.pth [2024-06-06 14:26:58,494][03204] Updated weights for policy 0, policy_version 600 (0.0045) [2024-06-06 14:27:03,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2473984. Throughput: 0: 855.3. Samples: 619520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:27:03,145][01062] Avg episode reward: [(0, '17.099')] [2024-06-06 14:27:08,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3707.2). Total num frames: 2490368. Throughput: 0: 853.8. Samples: 621728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-06-06 14:27:08,145][01062] Avg episode reward: [(0, '18.185')] [2024-06-06 14:27:10,629][03204] Updated weights for policy 0, policy_version 610 (0.0036) [2024-06-06 14:27:13,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 2506752. Throughput: 0: 855.1. Samples: 626200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-06-06 14:27:13,141][01062] Avg episode reward: [(0, '18.622')] [2024-06-06 14:27:18,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 2527232. Throughput: 0: 860.9. Samples: 632700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:27:18,140][01062] Avg episode reward: [(0, '18.443')] [2024-06-06 14:27:20,149][03204] Updated weights for policy 0, policy_version 620 (0.0019) [2024-06-06 14:27:23,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2547712. Throughput: 0: 863.0. Samples: 636060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:27:23,155][01062] Avg episode reward: [(0, '18.576')] [2024-06-06 14:27:28,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 2564096. Throughput: 0: 873.5. Samples: 641372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:27:28,143][01062] Avg episode reward: [(0, '19.483')] [2024-06-06 14:27:32,700][03204] Updated weights for policy 0, policy_version 630 (0.0038) [2024-06-06 14:27:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 2580480. Throughput: 0: 894.7. Samples: 645968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:27:33,142][01062] Avg episode reward: [(0, '18.771')] [2024-06-06 14:27:38,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3550.2, 300 sec: 3735.0). Total num frames: 2605056. Throughput: 0: 926.8. Samples: 649292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:27:38,140][01062] Avg episode reward: [(0, '20.714')] [2024-06-06 14:27:38,153][03191] Saving new best policy, reward=20.714! [2024-06-06 14:27:41,542][03204] Updated weights for policy 0, policy_version 640 (0.0013) [2024-06-06 14:27:43,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2625536. Throughput: 0: 954.0. Samples: 656100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:27:43,145][01062] Avg episode reward: [(0, '20.295')] [2024-06-06 14:27:48,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2641920. Throughput: 0: 921.5. Samples: 660988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:27:48,144][01062] Avg episode reward: [(0, '20.871')] [2024-06-06 14:27:48,165][03191] Saving new best policy, reward=20.871! [2024-06-06 14:27:53,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2654208. Throughput: 0: 919.6. Samples: 663108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:27:53,145][01062] Avg episode reward: [(0, '19.491')] [2024-06-06 14:27:54,251][03204] Updated weights for policy 0, policy_version 650 (0.0016) [2024-06-06 14:27:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2674688. Throughput: 0: 950.1. Samples: 668956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:27:58,141][01062] Avg episode reward: [(0, '20.176')] [2024-06-06 14:28:03,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2699264. Throughput: 0: 955.3. Samples: 675688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:28:03,141][01062] Avg episode reward: [(0, '19.620')] [2024-06-06 14:28:03,980][03204] Updated weights for policy 0, policy_version 660 (0.0027) [2024-06-06 14:28:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3693.3). Total num frames: 2715648. Throughput: 0: 936.0. Samples: 678180. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:08,140][01062] Avg episode reward: [(0, '19.397')] [2024-06-06 14:28:13,139][01062] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2732032. Throughput: 0: 918.1. Samples: 682688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:13,146][01062] Avg episode reward: [(0, '19.502')] [2024-06-06 14:28:16,114][03204] Updated weights for policy 0, policy_version 670 (0.0026) [2024-06-06 14:28:18,141][01062] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3707.2). Total num frames: 2752512. Throughput: 0: 954.4. Samples: 688920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:28:18,152][01062] Avg episode reward: [(0, '19.841')] [2024-06-06 14:28:23,138][01062] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2772992. Throughput: 0: 957.6. Samples: 692384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:28:23,141][01062] Avg episode reward: [(0, '19.592')] [2024-06-06 14:28:25,902][03204] Updated weights for policy 0, policy_version 680 (0.0019) [2024-06-06 14:28:28,143][01062] Fps is (10 sec: 3685.7, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 2789376. Throughput: 0: 931.4. Samples: 698016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:28:28,145][01062] Avg episode reward: [(0, '19.944')] [2024-06-06 14:28:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2805760. Throughput: 0: 922.6. Samples: 702504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:33,146][01062] Avg episode reward: [(0, '20.631')] [2024-06-06 14:28:37,639][03204] Updated weights for policy 0, policy_version 690 (0.0019) [2024-06-06 14:28:38,138][01062] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2830336. Throughput: 0: 941.0. Samples: 705452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:38,146][01062] Avg episode reward: [(0, '21.620')] [2024-06-06 14:28:38,157][03191] Saving new best policy, reward=21.620! [2024-06-06 14:28:43,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2850816. Throughput: 0: 961.4. Samples: 712220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:43,140][01062] Avg episode reward: [(0, '20.966')] [2024-06-06 14:28:47,177][03204] Updated weights for policy 0, policy_version 700 (0.0018) [2024-06-06 14:28:48,142][01062] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 2867200. Throughput: 0: 932.5. Samples: 717656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:28:48,145][01062] Avg episode reward: [(0, '20.508')] [2024-06-06 14:28:53,141][01062] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3693.4). Total num frames: 2883584. Throughput: 0: 927.2. Samples: 719904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:28:53,143][01062] Avg episode reward: [(0, '20.375')] [2024-06-06 14:28:58,139][01062] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2904064. Throughput: 0: 951.8. Samples: 725520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:28:58,144][01062] Avg episode reward: [(0, '18.262')] [2024-06-06 14:28:58,155][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000709_2904064.pth... [2024-06-06 14:28:58,278][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth [2024-06-06 14:28:59,015][03204] Updated weights for policy 0, policy_version 710 (0.0022) [2024-06-06 14:29:03,143][01062] Fps is (10 sec: 4095.0, 60 sec: 3754.4, 300 sec: 3721.1). Total num frames: 2924544. Throughput: 0: 965.5. Samples: 732368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:29:03,148][01062] Avg episode reward: [(0, '18.459')] [2024-06-06 14:29:08,139][01062] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2940928. Throughput: 0: 948.0. Samples: 735044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:29:08,143][01062] Avg episode reward: [(0, '18.768')] [2024-06-06 14:29:09,333][03204] Updated weights for policy 0, policy_version 720 (0.0020) [2024-06-06 14:29:13,138][01062] Fps is (10 sec: 3278.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2957312. Throughput: 0: 923.9. Samples: 739588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:29:13,144][01062] Avg episode reward: [(0, '19.583')] [2024-06-06 14:29:18,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 2977792. Throughput: 0: 953.9. Samples: 745428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:29:18,144][01062] Avg episode reward: [(0, '19.984')] [2024-06-06 14:29:20,271][03204] Updated weights for policy 0, policy_version 730 (0.0018) [2024-06-06 14:29:23,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3002368. Throughput: 0: 965.7. Samples: 748908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:29:23,141][01062] Avg episode reward: [(0, '21.977')] [2024-06-06 14:29:23,144][03191] Saving new best policy, reward=21.977! [2024-06-06 14:29:28,139][01062] Fps is (10 sec: 4095.8, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 3018752. Throughput: 0: 948.3. Samples: 754892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:29:28,151][01062] Avg episode reward: [(0, '23.512')] [2024-06-06 14:29:28,161][03191] Saving new best policy, reward=23.512! [2024-06-06 14:29:31,729][03204] Updated weights for policy 0, policy_version 740 (0.0014) [2024-06-06 14:29:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 3035136. Throughput: 0: 927.5. Samples: 759392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:29:33,141][01062] Avg episode reward: [(0, '23.463')] [2024-06-06 14:29:38,138][01062] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3051520. Throughput: 0: 933.3. Samples: 761900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:29:38,141][01062] Avg episode reward: [(0, '22.775')] [2024-06-06 14:29:41,556][03204] Updated weights for policy 0, policy_version 750 (0.0018) [2024-06-06 14:29:43,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3076096. Throughput: 0: 966.5. Samples: 769012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:29:43,144][01062] Avg episode reward: [(0, '23.050')] [2024-06-06 14:29:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3823.2, 300 sec: 3707.2). Total num frames: 3096576. Throughput: 0: 945.4. Samples: 774908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:29:48,144][01062] Avg episode reward: [(0, '21.764')] [2024-06-06 14:29:53,139][01062] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3693.4). Total num frames: 3108864. Throughput: 0: 936.9. Samples: 777204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:29:53,141][01062] Avg episode reward: [(0, '20.159')] [2024-06-06 14:29:53,336][03204] Updated weights for policy 0, policy_version 760 (0.0020) [2024-06-06 14:29:58,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 3133440. Throughput: 0: 955.6. Samples: 782588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:29:58,141][01062] Avg episode reward: [(0, '20.834')] [2024-06-06 14:30:02,512][03204] Updated weights for policy 0, policy_version 770 (0.0014) [2024-06-06 14:30:03,138][01062] Fps is (10 sec: 4505.8, 60 sec: 3823.2, 300 sec: 3735.0). Total num frames: 3153920. Throughput: 0: 983.9. Samples: 789704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:30:03,142][01062] Avg episode reward: [(0, '20.693')] [2024-06-06 14:30:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 3174400. Throughput: 0: 973.9. Samples: 792732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:30:08,142][01062] Avg episode reward: [(0, '21.257')] [2024-06-06 14:30:13,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3186688. Throughput: 0: 940.1. Samples: 797196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:30:13,142][01062] Avg episode reward: [(0, '22.433')] [2024-06-06 14:30:14,652][03204] Updated weights for policy 0, policy_version 780 (0.0048) [2024-06-06 14:30:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3207168. Throughput: 0: 959.8. Samples: 802584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:30:18,149][01062] Avg episode reward: [(0, '23.303')] [2024-06-06 14:30:23,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3231744. Throughput: 0: 980.0. Samples: 806000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:30:23,143][01062] Avg episode reward: [(0, '23.025')] [2024-06-06 14:30:23,829][03204] Updated weights for policy 0, policy_version 790 (0.0013) [2024-06-06 14:30:28,140][01062] Fps is (10 sec: 4095.3, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3248128. Throughput: 0: 965.5. Samples: 812460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:30:28,142][01062] Avg episode reward: [(0, '22.641')] [2024-06-06 14:30:33,140][01062] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 3264512. Throughput: 0: 932.7. Samples: 816880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:30:33,144][01062] Avg episode reward: [(0, '22.827')] [2024-06-06 14:30:36,615][03204] Updated weights for policy 0, policy_version 800 (0.0030) [2024-06-06 14:30:38,138][01062] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3280896. Throughput: 0: 931.5. Samples: 819120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:30:38,141][01062] Avg episode reward: [(0, '21.787')] [2024-06-06 14:30:43,139][01062] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3305472. Throughput: 0: 965.2. Samples: 826024. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:30:43,145][01062] Avg episode reward: [(0, '20.264')] [2024-06-06 14:30:45,466][03204] Updated weights for policy 0, policy_version 810 (0.0021) [2024-06-06 14:30:48,141][01062] Fps is (10 sec: 4095.0, 60 sec: 3754.5, 300 sec: 3721.1). Total num frames: 3321856. Throughput: 0: 942.8. Samples: 832132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:30:48,143][01062] Avg episode reward: [(0, '21.097')] [2024-06-06 14:30:53,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3338240. Throughput: 0: 925.0. Samples: 834356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:30:53,143][01062] Avg episode reward: [(0, '21.727')] [2024-06-06 14:30:58,088][03204] Updated weights for policy 0, policy_version 820 (0.0028) [2024-06-06 14:30:58,138][01062] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3358720. Throughput: 0: 931.6. Samples: 839116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:30:58,140][01062] Avg episode reward: [(0, '21.903')] [2024-06-06 14:30:58,154][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth... [2024-06-06 14:30:58,283][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000599_2453504.pth [2024-06-06 14:31:03,138][01062] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3379200. Throughput: 0: 960.3. Samples: 845796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:31:03,143][01062] Avg episode reward: [(0, '23.119')] [2024-06-06 14:31:07,863][03204] Updated weights for policy 0, policy_version 830 (0.0014) [2024-06-06 14:31:08,140][01062] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 3399680. Throughput: 0: 959.4. Samples: 849176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:31:08,144][01062] Avg episode reward: [(0, '23.000')] [2024-06-06 14:31:13,144][01062] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3707.2). Total num frames: 3411968. Throughput: 0: 917.9. Samples: 853768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:31:13,149][01062] Avg episode reward: [(0, '22.060')] [2024-06-06 14:31:18,138][01062] Fps is (10 sec: 3277.2, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3432448. Throughput: 0: 931.7. Samples: 858804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:31:18,141][01062] Avg episode reward: [(0, '21.845')] [2024-06-06 14:31:19,482][03204] Updated weights for policy 0, policy_version 840 (0.0028) [2024-06-06 14:31:23,138][01062] Fps is (10 sec: 4508.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3457024. Throughput: 0: 957.5. Samples: 862208. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:31:23,140][01062] Avg episode reward: [(0, '20.799')] [2024-06-06 14:31:28,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 3473408. Throughput: 0: 953.6. Samples: 868936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:31:28,142][01062] Avg episode reward: [(0, '19.929')] [2024-06-06 14:31:29,780][03204] Updated weights for policy 0, policy_version 850 (0.0013) [2024-06-06 14:31:33,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3721.2). Total num frames: 3489792. Throughput: 0: 916.8. Samples: 873388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:31:33,141][01062] Avg episode reward: [(0, '20.564')] [2024-06-06 14:31:38,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3502080. Throughput: 0: 909.5. Samples: 875284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:31:38,148][01062] Avg episode reward: [(0, '21.687')] [2024-06-06 14:31:43,141][01062] Fps is (10 sec: 2457.0, 60 sec: 3481.5, 300 sec: 3721.1). Total num frames: 3514368. Throughput: 0: 893.1. Samples: 879308. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:31:43,144][01062] Avg episode reward: [(0, '21.494')] [2024-06-06 14:31:44,518][03204] Updated weights for policy 0, policy_version 860 (0.0014) [2024-06-06 14:31:48,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3735.0). Total num frames: 3534848. Throughput: 0: 864.7. Samples: 884708. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2024-06-06 14:31:48,141][01062] Avg episode reward: [(0, '21.781')] [2024-06-06 14:31:53,139][01062] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3551232. Throughput: 0: 845.3. Samples: 887212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2024-06-06 14:31:53,143][01062] Avg episode reward: [(0, '21.727')] [2024-06-06 14:31:57,241][03204] Updated weights for policy 0, policy_version 870 (0.0030) [2024-06-06 14:31:58,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 3567616. Throughput: 0: 842.9. Samples: 891692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:31:58,146][01062] Avg episode reward: [(0, '21.754')] [2024-06-06 14:32:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3721.1). Total num frames: 3588096. Throughput: 0: 865.5. Samples: 897752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:32:03,143][01062] Avg episode reward: [(0, '21.759')] [2024-06-06 14:32:06,106][03204] Updated weights for policy 0, policy_version 880 (0.0014) [2024-06-06 14:32:08,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3735.0). Total num frames: 3608576. Throughput: 0: 866.7. Samples: 901208. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) [2024-06-06 14:32:08,142][01062] Avg episode reward: [(0, '21.220')] [2024-06-06 14:32:13,139][01062] Fps is (10 sec: 3686.3, 60 sec: 3550.2, 300 sec: 3721.1). Total num frames: 3624960. Throughput: 0: 837.1. Samples: 906604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:32:13,142][01062] Avg episode reward: [(0, '21.725')] [2024-06-06 14:32:18,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3693.3). Total num frames: 3637248. Throughput: 0: 836.2. Samples: 911016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2024-06-06 14:32:18,147][01062] Avg episode reward: [(0, '23.268')] [2024-06-06 14:32:18,990][03204] Updated weights for policy 0, policy_version 890 (0.0021) [2024-06-06 14:32:23,138][01062] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3721.1). Total num frames: 3661824. Throughput: 0: 856.5. Samples: 913828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:32:23,141][01062] Avg episode reward: [(0, '24.169')] [2024-06-06 14:32:23,144][03191] Saving new best policy, reward=24.169! [2024-06-06 14:32:28,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3735.0). Total num frames: 3682304. Throughput: 0: 917.6. Samples: 920600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:32:28,141][01062] Avg episode reward: [(0, '22.924')] [2024-06-06 14:32:28,199][03204] Updated weights for policy 0, policy_version 900 (0.0018) [2024-06-06 14:32:33,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 3698688. Throughput: 0: 918.4. Samples: 926036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:32:33,143][01062] Avg episode reward: [(0, '23.302')] [2024-06-06 14:32:38,139][01062] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 3715072. Throughput: 0: 912.8. Samples: 928288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:32:38,141][01062] Avg episode reward: [(0, '22.850')] [2024-06-06 14:32:40,672][03204] Updated weights for policy 0, policy_version 910 (0.0028) [2024-06-06 14:32:43,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 3735552. Throughput: 0: 933.1. Samples: 933680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:32:43,140][01062] Avg episode reward: [(0, '22.013')] [2024-06-06 14:32:48,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3760128. Throughput: 0: 949.7. Samples: 940488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2024-06-06 14:32:48,144][01062] Avg episode reward: [(0, '22.537')] [2024-06-06 14:32:50,405][03204] Updated weights for policy 0, policy_version 920 (0.0017) [2024-06-06 14:32:53,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3772416. Throughput: 0: 935.6. Samples: 943308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:32:53,143][01062] Avg episode reward: [(0, '24.019')] [2024-06-06 14:32:58,138][01062] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3788800. Throughput: 0: 912.8. Samples: 947680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:32:58,144][01062] Avg episode reward: [(0, '22.859')] [2024-06-06 14:32:58,164][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000925_3788800.pth... [2024-06-06 14:32:58,351][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000709_2904064.pth [2024-06-06 14:33:02,167][03204] Updated weights for policy 0, policy_version 930 (0.0017) [2024-06-06 14:33:03,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3809280. Throughput: 0: 943.4. Samples: 953468. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2024-06-06 14:33:03,145][01062] Avg episode reward: [(0, '22.769')] [2024-06-06 14:33:08,138][01062] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3833856. Throughput: 0: 957.7. Samples: 956924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:33:08,147][01062] Avg episode reward: [(0, '21.846')] [2024-06-06 14:33:12,658][03204] Updated weights for policy 0, policy_version 940 (0.0013) [2024-06-06 14:33:13,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3850240. Throughput: 0: 937.7. Samples: 962796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:33:13,141][01062] Avg episode reward: [(0, '20.240')] [2024-06-06 14:33:18,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 3866624. Throughput: 0: 915.9. Samples: 967252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-06-06 14:33:18,142][01062] Avg episode reward: [(0, '19.769')] [2024-06-06 14:33:23,138][01062] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 3887104. Throughput: 0: 923.4. Samples: 969840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:33:23,148][01062] Avg episode reward: [(0, '19.411')] [2024-06-06 14:33:23,855][03204] Updated weights for policy 0, policy_version 950 (0.0018) [2024-06-06 14:33:28,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3907584. Throughput: 0: 956.5. Samples: 976724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:33:28,141][01062] Avg episode reward: [(0, '19.823')] [2024-06-06 14:33:33,140][01062] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 3923968. Throughput: 0: 929.4. Samples: 982312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-06-06 14:33:33,142][01062] Avg episode reward: [(0, '19.990')] [2024-06-06 14:33:34,596][03204] Updated weights for policy 0, policy_version 960 (0.0030) [2024-06-06 14:33:38,138][01062] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 3940352. Throughput: 0: 917.0. Samples: 984572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-06-06 14:33:38,146][01062] Avg episode reward: [(0, '21.245')] [2024-06-06 14:33:43,138][01062] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3960832. Throughput: 0: 933.2. Samples: 989676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:33:43,141][01062] Avg episode reward: [(0, '21.985')] [2024-06-06 14:33:45,490][03204] Updated weights for policy 0, policy_version 970 (0.0029) [2024-06-06 14:33:48,138][01062] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3981312. Throughput: 0: 955.4. Samples: 996460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-06-06 14:33:48,141][01062] Avg episode reward: [(0, '23.469')] [2024-06-06 14:33:53,142][01062] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3721.1). Total num frames: 4001792. Throughput: 0: 947.3. Samples: 999556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2024-06-06 14:33:53,148][01062] Avg episode reward: [(0, '23.932')] [2024-06-06 14:33:54,303][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-06-06 14:33:54,315][01062] Component Batcher_0 stopped! [2024-06-06 14:33:54,310][03191] Stopping Batcher_0... [2024-06-06 14:33:54,316][03191] Loop batcher_evt_loop terminating... [2024-06-06 14:33:54,394][03204] Weights refcount: 2 0 [2024-06-06 14:33:54,410][03204] Stopping InferenceWorker_p0-w0... [2024-06-06 14:33:54,412][03204] Loop inference_proc0-0_evt_loop terminating... [2024-06-06 14:33:54,416][01062] Component InferenceWorker_p0-w0 stopped! [2024-06-06 14:33:54,517][01062] Component RolloutWorker_w3 stopped! [2024-06-06 14:33:54,522][03208] Stopping RolloutWorker_w3... [2024-06-06 14:33:54,524][03208] Loop rollout_proc3_evt_loop terminating... [2024-06-06 14:33:54,558][03211] Stopping RolloutWorker_w6... [2024-06-06 14:33:54,562][03211] Loop rollout_proc6_evt_loop terminating... [2024-06-06 14:33:54,558][01062] Component RolloutWorker_w6 stopped! [2024-06-06 14:33:54,563][03191] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000820_3358720.pth [2024-06-06 14:33:54,573][01062] Component RolloutWorker_w4 stopped! [2024-06-06 14:33:54,569][03210] Stopping RolloutWorker_w4... [2024-06-06 14:33:54,582][03210] Loop rollout_proc4_evt_loop terminating... [2024-06-06 14:33:54,593][03191] Saving new best policy, reward=24.311! [2024-06-06 14:33:54,598][03205] Stopping RolloutWorker_w0... [2024-06-06 14:33:54,602][01062] Component RolloutWorker_w0 stopped! [2024-06-06 14:33:54,602][03205] Loop rollout_proc0_evt_loop terminating... [2024-06-06 14:33:54,641][01062] Component RolloutWorker_w7 stopped! [2024-06-06 14:33:54,644][03207] Stopping RolloutWorker_w2... [2024-06-06 14:33:54,644][03207] Loop rollout_proc2_evt_loop terminating... [2024-06-06 14:33:54,648][01062] Component RolloutWorker_w2 stopped! [2024-06-06 14:33:54,641][03212] Stopping RolloutWorker_w7... [2024-06-06 14:33:54,654][03212] Loop rollout_proc7_evt_loop terminating... [2024-06-06 14:33:54,680][01062] Component RolloutWorker_w1 stopped! [2024-06-06 14:33:54,686][03206] Stopping RolloutWorker_w1... [2024-06-06 14:33:54,686][03206] Loop rollout_proc1_evt_loop terminating... [2024-06-06 14:33:54,696][01062] Component RolloutWorker_w5 stopped! [2024-06-06 14:33:54,698][03209] Stopping RolloutWorker_w5... [2024-06-06 14:33:54,700][03209] Loop rollout_proc5_evt_loop terminating... [2024-06-06 14:33:54,839][03191] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-06-06 14:33:55,142][03191] Stopping LearnerWorker_p0... [2024-06-06 14:33:55,142][03191] Loop learner_proc0_evt_loop terminating... [2024-06-06 14:33:55,143][01062] Component LearnerWorker_p0 stopped! [2024-06-06 14:33:55,147][01062] Waiting for process learner_proc0 to stop... [2024-06-06 14:33:57,211][01062] Waiting for process inference_proc0-0 to join... [2024-06-06 14:33:57,218][01062] Waiting for process rollout_proc0 to join... [2024-06-06 14:33:59,419][01062] Waiting for process rollout_proc1 to join... [2024-06-06 14:33:59,442][01062] Waiting for process rollout_proc2 to join... [2024-06-06 14:33:59,445][01062] Waiting for process rollout_proc3 to join... [2024-06-06 14:33:59,449][01062] Waiting for process rollout_proc4 to join... [2024-06-06 14:33:59,454][01062] Waiting for process rollout_proc5 to join... [2024-06-06 14:33:59,457][01062] Waiting for process rollout_proc6 to join... [2024-06-06 14:33:59,461][01062] Waiting for process rollout_proc7 to join... [2024-06-06 14:33:59,465][01062] Batcher 0 profile tree view: batching: 27.6540, releasing_batches: 0.0784 [2024-06-06 14:33:59,467][01062] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0058 wait_policy_total: 666.2380 update_model: 6.5978 weight_update: 0.0033 one_step: 0.0196 handle_policy_step: 403.4676 deserialize: 12.4335, stack: 2.3212, obs_to_device_normalize: 85.2330, forward: 215.5863, send_messages: 15.0034 prepare_outputs: 53.5224 to_cpu: 31.6488 [2024-06-06 14:33:59,469][01062] Learner 0 profile tree view: misc: 0.0060, prepare_batch: 14.2031 train: 75.8572 epoch_init: 0.0261, minibatch_init: 0.0295, losses_postprocess: 0.5978, kl_divergence: 0.6925, after_optimizer: 33.7659 calculate_losses: 28.2320 losses_init: 0.0048, forward_head: 1.4720, bptt_initial: 18.3517, tail: 1.4701, advantages_returns: 0.3125, losses: 4.2265 bptt: 2.0431 bptt_forward_core: 1.9579 update: 11.8191 clip: 0.9475 [2024-06-06 14:33:59,470][01062] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3227, enqueue_policy_requests: 142.7240, env_step: 863.4403, overhead: 18.4747, complete_rollouts: 4.6592 save_policy_outputs: 22.1226 split_output_tensors: 8.9616 [2024-06-06 14:33:59,472][01062] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2890, enqueue_policy_requests: 140.4736, env_step: 865.7577, overhead: 18.1760, complete_rollouts: 4.6386 save_policy_outputs: 21.6645 split_output_tensors: 8.4890 [2024-06-06 14:33:59,474][01062] Loop Runner_EvtLoop terminating... [2024-06-06 14:33:59,479][01062] Runner profile tree view: main_loop: 1132.8555 [2024-06-06 14:33:59,483][01062] Collected {0: 4005888}, FPS: 3536.1 [2024-06-06 14:33:59,530][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-06-06 14:33:59,532][01062] Overriding arg 'num_workers' with value 1 passed from command line [2024-06-06 14:33:59,534][01062] Adding new argument 'no_render'=True that is not in the saved config file! [2024-06-06 14:33:59,535][01062] Adding new argument 'save_video'=True that is not in the saved config file! [2024-06-06 14:33:59,538][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-06-06 14:33:59,539][01062] Adding new argument 'video_name'=None that is not in the saved config file! [2024-06-06 14:33:59,541][01062] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-06-06 14:33:59,542][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-06-06 14:33:59,543][01062] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-06-06 14:33:59,544][01062] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-06-06 14:33:59,545][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-06-06 14:33:59,546][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-06-06 14:33:59,547][01062] Adding new argument 'train_script'=None that is not in the saved config file! [2024-06-06 14:33:59,549][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-06-06 14:33:59,550][01062] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-06-06 14:33:59,597][01062] Doom resolution: 160x120, resize resolution: (128, 72) [2024-06-06 14:33:59,604][01062] RunningMeanStd input shape: (3, 72, 128) [2024-06-06 14:33:59,607][01062] RunningMeanStd input shape: (1,) [2024-06-06 14:33:59,625][01062] ConvEncoder: input_channels=3 [2024-06-06 14:33:59,734][01062] Conv encoder output size: 512 [2024-06-06 14:33:59,735][01062] Policy head output size: 512 [2024-06-06 14:33:59,909][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-06-06 14:34:00,683][01062] Num frames 100... [2024-06-06 14:34:00,812][01062] Num frames 200... [2024-06-06 14:34:00,941][01062] Num frames 300... [2024-06-06 14:34:01,074][01062] Num frames 400... [2024-06-06 14:34:01,207][01062] Num frames 500... [2024-06-06 14:34:01,337][01062] Num frames 600... [2024-06-06 14:34:01,468][01062] Num frames 700... [2024-06-06 14:34:01,611][01062] Num frames 800... [2024-06-06 14:34:01,743][01062] Num frames 900... [2024-06-06 14:34:01,880][01062] Num frames 1000... [2024-06-06 14:34:02,054][01062] Avg episode rewards: #0: 27.880, true rewards: #0: 10.880 [2024-06-06 14:34:02,056][01062] Avg episode reward: 27.880, avg true_objective: 10.880 [2024-06-06 14:34:02,076][01062] Num frames 1100... [2024-06-06 14:34:02,204][01062] Num frames 1200... [2024-06-06 14:34:02,333][01062] Num frames 1300... [2024-06-06 14:34:02,468][01062] Num frames 1400... [2024-06-06 14:34:02,609][01062] Num frames 1500... [2024-06-06 14:34:02,740][01062] Num frames 1600... [2024-06-06 14:34:02,872][01062] Num frames 1700... [2024-06-06 14:34:03,001][01062] Num frames 1800... [2024-06-06 14:34:03,135][01062] Num frames 1900... [2024-06-06 14:34:03,264][01062] Num frames 2000... [2024-06-06 14:34:03,394][01062] Num frames 2100... [2024-06-06 14:34:03,526][01062] Num frames 2200... [2024-06-06 14:34:03,669][01062] Num frames 2300... [2024-06-06 14:34:03,798][01062] Avg episode rewards: #0: 26.765, true rewards: #0: 11.765 [2024-06-06 14:34:03,800][01062] Avg episode reward: 26.765, avg true_objective: 11.765 [2024-06-06 14:34:03,861][01062] Num frames 2400... [2024-06-06 14:34:03,992][01062] Num frames 2500... [2024-06-06 14:34:04,124][01062] Num frames 2600... [2024-06-06 14:34:04,255][01062] Num frames 2700... [2024-06-06 14:34:04,383][01062] Num frames 2800... [2024-06-06 14:34:04,524][01062] Avg episode rewards: #0: 22.550, true rewards: #0: 9.550 [2024-06-06 14:34:04,526][01062] Avg episode reward: 22.550, avg true_objective: 9.550 [2024-06-06 14:34:04,577][01062] Num frames 2900... [2024-06-06 14:34:04,710][01062] Num frames 3000... [2024-06-06 14:34:04,841][01062] Num frames 3100... [2024-06-06 14:34:04,971][01062] Avg episode rewards: #0: 18.133, true rewards: #0: 7.882 [2024-06-06 14:34:04,972][01062] Avg episode reward: 18.133, avg true_objective: 7.882 [2024-06-06 14:34:05,037][01062] Num frames 3200... [2024-06-06 14:34:05,169][01062] Num frames 3300... [2024-06-06 14:34:05,296][01062] Num frames 3400... [2024-06-06 14:34:05,427][01062] Num frames 3500... [2024-06-06 14:34:05,558][01062] Num frames 3600... [2024-06-06 14:34:05,696][01062] Num frames 3700... [2024-06-06 14:34:05,842][01062] Avg episode rewards: #0: 17.138, true rewards: #0: 7.538 [2024-06-06 14:34:05,843][01062] Avg episode reward: 17.138, avg true_objective: 7.538 [2024-06-06 14:34:05,887][01062] Num frames 3800... [2024-06-06 14:34:06,018][01062] Num frames 3900... [2024-06-06 14:34:06,147][01062] Num frames 4000... [2024-06-06 14:34:06,278][01062] Num frames 4100... [2024-06-06 14:34:06,411][01062] Num frames 4200... [2024-06-06 14:34:06,540][01062] Num frames 4300... [2024-06-06 14:34:06,680][01062] Num frames 4400... [2024-06-06 14:34:06,826][01062] Num frames 4500... [2024-06-06 14:34:06,936][01062] Avg episode rewards: #0: 17.228, true rewards: #0: 7.562 [2024-06-06 14:34:06,937][01062] Avg episode reward: 17.228, avg true_objective: 7.562 [2024-06-06 14:34:07,022][01062] Num frames 4600... [2024-06-06 14:34:07,152][01062] Num frames 4700... [2024-06-06 14:34:07,284][01062] Num frames 4800... [2024-06-06 14:34:07,428][01062] Num frames 4900... [2024-06-06 14:34:07,616][01062] Num frames 5000... [2024-06-06 14:34:07,811][01062] Num frames 5100... [2024-06-06 14:34:08,002][01062] Num frames 5200... [2024-06-06 14:34:08,188][01062] Num frames 5300... [2024-06-06 14:34:08,376][01062] Num frames 5400... [2024-06-06 14:34:08,561][01062] Num frames 5500... [2024-06-06 14:34:08,747][01062] Num frames 5600... [2024-06-06 14:34:08,940][01062] Num frames 5700... [2024-06-06 14:34:09,130][01062] Num frames 5800... [2024-06-06 14:34:09,316][01062] Num frames 5900... [2024-06-06 14:34:09,511][01062] Num frames 6000... [2024-06-06 14:34:09,705][01062] Num frames 6100... [2024-06-06 14:34:09,908][01062] Num frames 6200... [2024-06-06 14:34:10,100][01062] Num frames 6300... [2024-06-06 14:34:10,293][01062] Num frames 6400... [2024-06-06 14:34:10,466][01062] Num frames 6500... [2024-06-06 14:34:10,610][01062] Num frames 6600... [2024-06-06 14:34:10,720][01062] Avg episode rewards: #0: 23.338, true rewards: #0: 9.481 [2024-06-06 14:34:10,722][01062] Avg episode reward: 23.338, avg true_objective: 9.481 [2024-06-06 14:34:10,806][01062] Num frames 6700... [2024-06-06 14:34:10,941][01062] Num frames 6800... [2024-06-06 14:34:11,075][01062] Num frames 6900... [2024-06-06 14:34:11,209][01062] Num frames 7000... [2024-06-06 14:34:11,336][01062] Num frames 7100... [2024-06-06 14:34:11,468][01062] Num frames 7200... [2024-06-06 14:34:11,603][01062] Num frames 7300... [2024-06-06 14:34:11,746][01062] Num frames 7400... [2024-06-06 14:34:11,907][01062] Avg episode rewards: #0: 22.472, true rewards: #0: 9.347 [2024-06-06 14:34:11,910][01062] Avg episode reward: 22.472, avg true_objective: 9.347 [2024-06-06 14:34:11,941][01062] Num frames 7500... [2024-06-06 14:34:12,086][01062] Num frames 7600... [2024-06-06 14:34:12,226][01062] Num frames 7700... [2024-06-06 14:34:12,364][01062] Num frames 7800... [2024-06-06 14:34:12,497][01062] Num frames 7900... [2024-06-06 14:34:12,631][01062] Num frames 8000... [2024-06-06 14:34:12,758][01062] Num frames 8100... [2024-06-06 14:34:12,896][01062] Num frames 8200... [2024-06-06 14:34:13,031][01062] Num frames 8300... [2024-06-06 14:34:13,160][01062] Num frames 8400... [2024-06-06 14:34:13,291][01062] Num frames 8500... [2024-06-06 14:34:13,421][01062] Num frames 8600... [2024-06-06 14:34:13,551][01062] Num frames 8700... [2024-06-06 14:34:13,681][01062] Num frames 8800... [2024-06-06 14:34:13,808][01062] Num frames 8900... [2024-06-06 14:34:13,949][01062] Num frames 9000... [2024-06-06 14:34:14,080][01062] Num frames 9100... [2024-06-06 14:34:14,250][01062] Avg episode rewards: #0: 24.874, true rewards: #0: 10.208 [2024-06-06 14:34:14,252][01062] Avg episode reward: 24.874, avg true_objective: 10.208 [2024-06-06 14:34:14,272][01062] Num frames 9200... [2024-06-06 14:34:14,398][01062] Num frames 9300... [2024-06-06 14:34:14,526][01062] Num frames 9400... [2024-06-06 14:34:14,659][01062] Num frames 9500... [2024-06-06 14:34:14,790][01062] Num frames 9600... [2024-06-06 14:34:14,919][01062] Num frames 9700... [2024-06-06 14:34:15,056][01062] Num frames 9800... [2024-06-06 14:34:15,189][01062] Num frames 9900... [2024-06-06 14:34:15,320][01062] Num frames 10000... [2024-06-06 14:34:15,454][01062] Num frames 10100... [2024-06-06 14:34:15,584][01062] Num frames 10200... [2024-06-06 14:34:15,696][01062] Avg episode rewards: #0: 25.040, true rewards: #0: 10.240 [2024-06-06 14:34:15,697][01062] Avg episode reward: 25.040, avg true_objective: 10.240 [2024-06-06 14:35:21,028][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-06-06 14:35:21,661][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-06-06 14:35:21,663][01062] Overriding arg 'num_workers' with value 1 passed from command line [2024-06-06 14:35:21,664][01062] Adding new argument 'no_render'=True that is not in the saved config file! [2024-06-06 14:35:21,666][01062] Adding new argument 'save_video'=True that is not in the saved config file! [2024-06-06 14:35:21,668][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-06-06 14:35:21,669][01062] Adding new argument 'video_name'=None that is not in the saved config file! [2024-06-06 14:35:21,670][01062] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-06-06 14:35:21,671][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-06-06 14:35:21,673][01062] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-06-06 14:35:21,674][01062] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-06-06 14:35:21,675][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-06-06 14:35:21,676][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-06-06 14:35:21,677][01062] Adding new argument 'train_script'=None that is not in the saved config file! [2024-06-06 14:35:21,678][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-06-06 14:35:21,679][01062] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-06-06 14:35:21,721][01062] RunningMeanStd input shape: (3, 72, 128) [2024-06-06 14:35:21,723][01062] RunningMeanStd input shape: (1,) [2024-06-06 14:35:21,741][01062] ConvEncoder: input_channels=3 [2024-06-06 14:35:21,796][01062] Conv encoder output size: 512 [2024-06-06 14:35:21,798][01062] Policy head output size: 512 [2024-06-06 14:35:21,824][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-06-06 14:35:22,498][01062] Num frames 100... [2024-06-06 14:35:22,673][01062] Num frames 200... [2024-06-06 14:35:22,840][01062] Num frames 300... [2024-06-06 14:35:23,022][01062] Num frames 400... [2024-06-06 14:35:23,202][01062] Num frames 500... [2024-06-06 14:35:23,386][01062] Num frames 600... [2024-06-06 14:35:23,570][01062] Num frames 700... [2024-06-06 14:35:23,756][01062] Num frames 800... [2024-06-06 14:35:23,945][01062] Num frames 900... [2024-06-06 14:35:24,119][01062] Num frames 1000... [2024-06-06 14:35:24,308][01062] Num frames 1100... [2024-06-06 14:35:24,501][01062] Num frames 1200... [2024-06-06 14:35:24,644][01062] Avg episode rewards: #0: 26.480, true rewards: #0: 12.480 [2024-06-06 14:35:24,647][01062] Avg episode reward: 26.480, avg true_objective: 12.480 [2024-06-06 14:35:24,748][01062] Num frames 1300... [2024-06-06 14:35:24,956][01062] Num frames 1400... [2024-06-06 14:35:25,151][01062] Num frames 1500... [2024-06-06 14:35:25,326][01062] Num frames 1600... [2024-06-06 14:35:25,521][01062] Num frames 1700... [2024-06-06 14:35:25,738][01062] Num frames 1800... [2024-06-06 14:35:25,941][01062] Num frames 1900... [2024-06-06 14:35:26,136][01062] Num frames 2000... [2024-06-06 14:35:26,191][01062] Avg episode rewards: #0: 21.500, true rewards: #0: 10.000 [2024-06-06 14:35:26,194][01062] Avg episode reward: 21.500, avg true_objective: 10.000 [2024-06-06 14:35:26,376][01062] Num frames 2100... [2024-06-06 14:35:26,571][01062] Num frames 2200... [2024-06-06 14:35:26,753][01062] Num frames 2300... [2024-06-06 14:35:26,953][01062] Num frames 2400... [2024-06-06 14:35:27,159][01062] Num frames 2500... [2024-06-06 14:35:27,374][01062] Num frames 2600... [2024-06-06 14:35:27,565][01062] Num frames 2700... [2024-06-06 14:35:27,776][01062] Num frames 2800... [2024-06-06 14:35:27,991][01062] Num frames 2900... [2024-06-06 14:35:28,043][01062] Avg episode rewards: #0: 21.334, true rewards: #0: 9.667 [2024-06-06 14:35:28,045][01062] Avg episode reward: 21.334, avg true_objective: 9.667 [2024-06-06 14:35:28,269][01062] Num frames 3000... [2024-06-06 14:35:28,460][01062] Num frames 3100... [2024-06-06 14:35:28,660][01062] Num frames 3200... [2024-06-06 14:35:28,847][01062] Num frames 3300... [2024-06-06 14:35:29,035][01062] Num frames 3400... [2024-06-06 14:35:29,220][01062] Num frames 3500... [2024-06-06 14:35:29,414][01062] Num frames 3600... [2024-06-06 14:35:29,599][01062] Num frames 3700... [2024-06-06 14:35:29,797][01062] Num frames 3800... [2024-06-06 14:35:30,016][01062] Num frames 3900... [2024-06-06 14:35:30,210][01062] Num frames 4000... [2024-06-06 14:35:30,339][01062] Num frames 4100... [2024-06-06 14:35:30,468][01062] Num frames 4200... [2024-06-06 14:35:30,598][01062] Num frames 4300... [2024-06-06 14:35:30,736][01062] Num frames 4400... [2024-06-06 14:35:30,866][01062] Num frames 4500... [2024-06-06 14:35:30,998][01062] Num frames 4600... [2024-06-06 14:35:31,128][01062] Num frames 4700... [2024-06-06 14:35:31,257][01062] Num frames 4800... [2024-06-06 14:35:31,437][01062] Avg episode rewards: #0: 29.240, true rewards: #0: 12.240 [2024-06-06 14:35:31,439][01062] Avg episode reward: 29.240, avg true_objective: 12.240 [2024-06-06 14:35:31,449][01062] Num frames 4900... [2024-06-06 14:35:31,576][01062] Num frames 5000... [2024-06-06 14:35:31,712][01062] Num frames 5100... [2024-06-06 14:35:31,855][01062] Num frames 5200... [2024-06-06 14:35:31,988][01062] Num frames 5300... [2024-06-06 14:35:32,120][01062] Num frames 5400... [2024-06-06 14:35:32,247][01062] Num frames 5500... [2024-06-06 14:35:32,377][01062] Num frames 5600... [2024-06-06 14:35:32,503][01062] Num frames 5700... [2024-06-06 14:35:32,633][01062] Num frames 5800... [2024-06-06 14:35:32,765][01062] Num frames 5900... [2024-06-06 14:35:32,893][01062] Avg episode rewards: #0: 27.504, true rewards: #0: 11.904 [2024-06-06 14:35:32,894][01062] Avg episode reward: 27.504, avg true_objective: 11.904 [2024-06-06 14:35:32,960][01062] Num frames 6000... [2024-06-06 14:35:33,086][01062] Num frames 6100... [2024-06-06 14:35:33,214][01062] Num frames 6200... [2024-06-06 14:35:33,342][01062] Num frames 6300... [2024-06-06 14:35:33,404][01062] Avg episode rewards: #0: 23.673, true rewards: #0: 10.507 [2024-06-06 14:35:33,405][01062] Avg episode reward: 23.673, avg true_objective: 10.507 [2024-06-06 14:35:33,528][01062] Num frames 6400... [2024-06-06 14:35:33,659][01062] Num frames 6500... [2024-06-06 14:35:33,788][01062] Num frames 6600... [2024-06-06 14:35:33,922][01062] Num frames 6700... [2024-06-06 14:35:34,047][01062] Num frames 6800... [2024-06-06 14:35:34,124][01062] Avg episode rewards: #0: 21.309, true rewards: #0: 9.737 [2024-06-06 14:35:34,126][01062] Avg episode reward: 21.309, avg true_objective: 9.737 [2024-06-06 14:35:34,235][01062] Num frames 6900... [2024-06-06 14:35:34,360][01062] Num frames 7000... [2024-06-06 14:35:34,487][01062] Num frames 7100... [2024-06-06 14:35:34,619][01062] Num frames 7200... [2024-06-06 14:35:34,748][01062] Num frames 7300... [2024-06-06 14:35:34,883][01062] Num frames 7400... [2024-06-06 14:35:35,062][01062] Num frames 7500... [2024-06-06 14:35:35,239][01062] Num frames 7600... [2024-06-06 14:35:35,387][01062] Num frames 7700... [2024-06-06 14:35:35,514][01062] Num frames 7800... [2024-06-06 14:35:35,645][01062] Num frames 7900... [2024-06-06 14:35:35,789][01062] Avg episode rewards: #0: 21.835, true rewards: #0: 9.960 [2024-06-06 14:35:35,791][01062] Avg episode reward: 21.835, avg true_objective: 9.960 [2024-06-06 14:35:35,841][01062] Num frames 8000... [2024-06-06 14:35:35,976][01062] Num frames 8100... [2024-06-06 14:35:36,102][01062] Num frames 8200... [2024-06-06 14:35:36,229][01062] Num frames 8300... [2024-06-06 14:35:36,359][01062] Num frames 8400... [2024-06-06 14:35:36,528][01062] Avg episode rewards: #0: 20.311, true rewards: #0: 9.422 [2024-06-06 14:35:36,529][01062] Avg episode reward: 20.311, avg true_objective: 9.422 [2024-06-06 14:35:36,560][01062] Num frames 8500... [2024-06-06 14:35:36,689][01062] Num frames 8600... [2024-06-06 14:35:36,818][01062] Num frames 8700... [2024-06-06 14:35:36,956][01062] Num frames 8800... [2024-06-06 14:35:37,142][01062] Avg episode rewards: #0: 18.995, true rewards: #0: 8.895 [2024-06-06 14:35:37,144][01062] Avg episode reward: 18.995, avg true_objective: 8.895 [2024-06-06 14:36:34,978][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-06-06 14:45:01,257][01062] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-06-06 14:45:01,260][01062] Overriding arg 'num_workers' with value 1 passed from command line [2024-06-06 14:45:01,262][01062] Adding new argument 'no_render'=True that is not in the saved config file! [2024-06-06 14:45:01,264][01062] Adding new argument 'save_video'=True that is not in the saved config file! [2024-06-06 14:45:01,266][01062] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-06-06 14:45:01,268][01062] Adding new argument 'video_name'=None that is not in the saved config file! [2024-06-06 14:45:01,270][01062] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-06-06 14:45:01,271][01062] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-06-06 14:45:01,272][01062] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-06-06 14:45:01,274][01062] Adding new argument 'hf_repository'='swritchie/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-06-06 14:45:01,275][01062] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-06-06 14:45:01,276][01062] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-06-06 14:45:01,277][01062] Adding new argument 'train_script'=None that is not in the saved config file! [2024-06-06 14:45:01,278][01062] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-06-06 14:45:01,279][01062] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-06-06 14:45:01,322][01062] RunningMeanStd input shape: (3, 72, 128) [2024-06-06 14:45:01,323][01062] RunningMeanStd input shape: (1,) [2024-06-06 14:45:01,338][01062] ConvEncoder: input_channels=3 [2024-06-06 14:45:01,377][01062] Conv encoder output size: 512 [2024-06-06 14:45:01,378][01062] Policy head output size: 512 [2024-06-06 14:45:01,401][01062] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-06-06 14:45:01,859][01062] Num frames 100... [2024-06-06 14:45:01,988][01062] Num frames 200... [2024-06-06 14:45:02,120][01062] Num frames 300... [2024-06-06 14:45:02,254][01062] Num frames 400... [2024-06-06 14:45:02,389][01062] Num frames 500... [2024-06-06 14:45:02,526][01062] Num frames 600... [2024-06-06 14:45:02,663][01062] Num frames 700... [2024-06-06 14:45:02,736][01062] Avg episode rewards: #0: 14.120, true rewards: #0: 7.120 [2024-06-06 14:45:02,738][01062] Avg episode reward: 14.120, avg true_objective: 7.120 [2024-06-06 14:45:02,861][01062] Num frames 800... [2024-06-06 14:45:03,028][01062] Num frames 900... [2024-06-06 14:45:03,244][01062] Num frames 1000... [2024-06-06 14:45:03,429][01062] Num frames 1100... [2024-06-06 14:45:03,617][01062] Num frames 1200... [2024-06-06 14:45:03,809][01062] Num frames 1300... [2024-06-06 14:45:03,993][01062] Num frames 1400... [2024-06-06 14:45:04,175][01062] Num frames 1500... [2024-06-06 14:45:04,355][01062] Num frames 1600... [2024-06-06 14:45:04,545][01062] Num frames 1700... [2024-06-06 14:45:04,741][01062] Avg episode rewards: #0: 20.340, true rewards: #0: 8.840 [2024-06-06 14:45:04,743][01062] Avg episode reward: 20.340, avg true_objective: 8.840 [2024-06-06 14:45:04,812][01062] Num frames 1800... [2024-06-06 14:45:05,008][01062] Num frames 1900... [2024-06-06 14:45:05,200][01062] Num frames 2000... [2024-06-06 14:45:05,406][01062] Num frames 2100... [2024-06-06 14:45:05,603][01062] Num frames 2200... [2024-06-06 14:45:05,765][01062] Num frames 2300... [2024-06-06 14:45:05,893][01062] Num frames 2400... [2024-06-06 14:45:06,031][01062] Num frames 2500... [2024-06-06 14:45:06,162][01062] Num frames 2600... [2024-06-06 14:45:06,297][01062] Num frames 2700... [2024-06-06 14:45:06,433][01062] Avg episode rewards: #0: 21.200, true rewards: #0: 9.200 [2024-06-06 14:45:06,435][01062] Avg episode reward: 21.200, avg true_objective: 9.200 [2024-06-06 14:45:06,492][01062] Num frames 2800... [2024-06-06 14:45:06,624][01062] Num frames 2900... [2024-06-06 14:45:06,769][01062] Num frames 3000... [2024-06-06 14:45:06,904][01062] Num frames 3100... [2024-06-06 14:45:07,064][01062] Avg episode rewards: #0: 17.690, true rewards: #0: 7.940 [2024-06-06 14:45:07,067][01062] Avg episode reward: 17.690, avg true_objective: 7.940 [2024-06-06 14:45:07,103][01062] Num frames 3200... [2024-06-06 14:45:07,237][01062] Num frames 3300... [2024-06-06 14:45:07,368][01062] Num frames 3400... [2024-06-06 14:45:07,497][01062] Num frames 3500... [2024-06-06 14:45:07,626][01062] Num frames 3600... [2024-06-06 14:45:07,763][01062] Num frames 3700... [2024-06-06 14:45:07,891][01062] Num frames 3800... [2024-06-06 14:45:08,024][01062] Num frames 3900... [2024-06-06 14:45:08,152][01062] Num frames 4000... [2024-06-06 14:45:08,283][01062] Num frames 4100... [2024-06-06 14:45:08,415][01062] Num frames 4200... [2024-06-06 14:45:08,544][01062] Num frames 4300... [2024-06-06 14:45:08,678][01062] Num frames 4400... [2024-06-06 14:45:08,859][01062] Avg episode rewards: #0: 20.576, true rewards: #0: 8.976 [2024-06-06 14:45:08,861][01062] Avg episode reward: 20.576, avg true_objective: 8.976 [2024-06-06 14:45:08,881][01062] Num frames 4500... [2024-06-06 14:45:09,014][01062] Num frames 4600... [2024-06-06 14:45:09,144][01062] Num frames 4700... [2024-06-06 14:45:09,277][01062] Num frames 4800... [2024-06-06 14:45:09,407][01062] Num frames 4900... [2024-06-06 14:45:09,535][01062] Num frames 5000... [2024-06-06 14:45:09,674][01062] Num frames 5100... [2024-06-06 14:45:09,735][01062] Avg episode rewards: #0: 19.338, true rewards: #0: 8.505 [2024-06-06 14:45:09,737][01062] Avg episode reward: 19.338, avg true_objective: 8.505 [2024-06-06 14:45:09,866][01062] Num frames 5200... [2024-06-06 14:45:09,995][01062] Num frames 5300... [2024-06-06 14:45:10,124][01062] Num frames 5400... [2024-06-06 14:45:10,264][01062] Num frames 5500... [2024-06-06 14:45:10,408][01062] Num frames 5600... [2024-06-06 14:45:10,552][01062] Num frames 5700... [2024-06-06 14:45:10,705][01062] Num frames 5800... [2024-06-06 14:45:10,827][01062] Avg episode rewards: #0: 18.627, true rewards: #0: 8.341 [2024-06-06 14:45:10,829][01062] Avg episode reward: 18.627, avg true_objective: 8.341 [2024-06-06 14:45:10,908][01062] Num frames 5900... [2024-06-06 14:45:11,067][01062] Num frames 6000... [2024-06-06 14:45:11,203][01062] Num frames 6100... [2024-06-06 14:45:11,335][01062] Num frames 6200... [2024-06-06 14:45:11,469][01062] Num frames 6300... [2024-06-06 14:45:11,545][01062] Avg episode rewards: #0: 17.393, true rewards: #0: 7.892 [2024-06-06 14:45:11,546][01062] Avg episode reward: 17.393, avg true_objective: 7.892 [2024-06-06 14:45:11,670][01062] Num frames 6400... [2024-06-06 14:45:11,809][01062] Num frames 6500... [2024-06-06 14:45:11,940][01062] Num frames 6600... [2024-06-06 14:45:12,072][01062] Num frames 6700... [2024-06-06 14:45:12,211][01062] Num frames 6800... [2024-06-06 14:45:12,342][01062] Num frames 6900... [2024-06-06 14:45:12,471][01062] Num frames 7000... [2024-06-06 14:45:12,593][01062] Avg episode rewards: #0: 17.278, true rewards: #0: 7.833 [2024-06-06 14:45:12,597][01062] Avg episode reward: 17.278, avg true_objective: 7.833 [2024-06-06 14:45:12,663][01062] Num frames 7100... [2024-06-06 14:45:12,799][01062] Num frames 7200... [2024-06-06 14:45:12,932][01062] Num frames 7300... [2024-06-06 14:45:13,065][01062] Num frames 7400... [2024-06-06 14:45:13,195][01062] Num frames 7500... [2024-06-06 14:45:13,326][01062] Num frames 7600... [2024-06-06 14:45:13,458][01062] Num frames 7700... [2024-06-06 14:45:13,544][01062] Avg episode rewards: #0: 17.022, true rewards: #0: 7.722 [2024-06-06 14:45:13,545][01062] Avg episode reward: 17.022, avg true_objective: 7.722 [2024-06-06 14:46:03,108][01062] Replay video saved to /content/train_dir/default_experiment/replay.mp4!