[2025-02-05 11:26:26,351][01013] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-05 11:26:26,354][01013] Rollout worker 0 uses device cpu [2025-02-05 11:26:26,357][01013] Rollout worker 1 uses device cpu [2025-02-05 11:26:26,358][01013] Rollout worker 2 uses device cpu [2025-02-05 11:26:26,360][01013] Rollout worker 3 uses device cpu [2025-02-05 11:26:26,362][01013] Rollout worker 4 uses device cpu [2025-02-05 11:26:26,363][01013] Rollout worker 5 uses device cpu [2025-02-05 11:26:26,365][01013] Rollout worker 6 uses device cpu [2025-02-05 11:26:26,366][01013] Rollout worker 7 uses device cpu [2025-02-05 11:26:26,529][01013] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-05 11:26:26,531][01013] InferenceWorker_p0-w0: min num requests: 2 [2025-02-05 11:26:26,563][01013] Starting all processes... [2025-02-05 11:26:26,565][01013] Starting process learner_proc0 [2025-02-05 11:26:26,623][01013] Starting all processes... [2025-02-05 11:26:26,633][01013] Starting process inference_proc0-0 [2025-02-05 11:26:26,634][01013] Starting process rollout_proc0 [2025-02-05 11:26:26,634][01013] Starting process rollout_proc1 [2025-02-05 11:26:26,634][01013] Starting process rollout_proc2 [2025-02-05 11:26:26,634][01013] Starting process rollout_proc3 [2025-02-05 11:26:26,638][01013] Starting process rollout_proc4 [2025-02-05 11:26:26,638][01013] Starting process rollout_proc5 [2025-02-05 11:26:26,638][01013] Starting process rollout_proc6 [2025-02-05 11:26:26,638][01013] Starting process rollout_proc7 [2025-02-05 11:26:42,488][02128] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-05 11:26:42,491][02128] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-05 11:26:42,570][02128] Num visible devices: 1 [2025-02-05 11:26:42,624][02128] Starting seed is not provided [2025-02-05 11:26:42,625][02128] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-05 11:26:42,625][02128] Initializing actor-critic model on device cuda:0 [2025-02-05 11:26:42,626][02128] RunningMeanStd input shape: (3, 72, 128) [2025-02-05 11:26:42,628][02144] Worker 1 uses CPU cores [1] [2025-02-05 11:26:42,630][02128] RunningMeanStd input shape: (1,) [2025-02-05 11:26:42,690][02128] ConvEncoder: input_channels=3 [2025-02-05 11:26:42,833][02149] Worker 7 uses CPU cores [1] [2025-02-05 11:26:42,982][02146] Worker 5 uses CPU cores [1] [2025-02-05 11:26:42,986][02143] Worker 2 uses CPU cores [0] [2025-02-05 11:26:43,014][02148] Worker 6 uses CPU cores [0] [2025-02-05 11:26:43,083][02142] Worker 0 uses CPU cores [0] [2025-02-05 11:26:43,098][02147] Worker 3 uses CPU cores [1] [2025-02-05 11:26:43,142][02141] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-05 11:26:43,143][02141] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-05 11:26:43,157][02145] Worker 4 uses CPU cores [0] [2025-02-05 11:26:43,168][02141] Num visible devices: 1 [2025-02-05 11:26:43,220][02128] Conv encoder output size: 512 [2025-02-05 11:26:43,221][02128] Policy head output size: 512 [2025-02-05 11:26:43,271][02128] Created Actor Critic model with architecture: [2025-02-05 11:26:43,271][02128] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-05 11:26:43,507][02128] Using optimizer [2025-02-05 11:26:46,530][01013] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-05 11:26:46,538][01013] Heartbeat connected on RolloutWorker_w0 [2025-02-05 11:26:46,541][01013] Heartbeat connected on RolloutWorker_w1 [2025-02-05 11:26:46,545][01013] Heartbeat connected on RolloutWorker_w2 [2025-02-05 11:26:46,549][01013] Heartbeat connected on RolloutWorker_w3 [2025-02-05 11:26:46,552][01013] Heartbeat connected on RolloutWorker_w4 [2025-02-05 11:26:46,556][01013] Heartbeat connected on RolloutWorker_w5 [2025-02-05 11:26:46,563][01013] Heartbeat connected on RolloutWorker_w6 [2025-02-05 11:26:46,564][01013] Heartbeat connected on RolloutWorker_w7 [2025-02-05 11:26:46,631][01013] Heartbeat connected on Batcher_0 [2025-02-05 11:26:48,457][02128] No checkpoints found [2025-02-05 11:26:48,457][02128] Did not load from checkpoint, starting from scratch! [2025-02-05 11:26:48,457][02128] Initialized policy 0 weights for model version 0 [2025-02-05 11:26:48,460][02128] LearnerWorker_p0 finished initialization! [2025-02-05 11:26:48,465][02128] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-05 11:26:48,465][01013] Heartbeat connected on LearnerWorker_p0 [2025-02-05 11:26:48,618][02141] RunningMeanStd input shape: (3, 72, 128) [2025-02-05 11:26:48,619][02141] RunningMeanStd input shape: (1,) [2025-02-05 11:26:48,630][02141] ConvEncoder: input_channels=3 [2025-02-05 11:26:48,732][02141] Conv encoder output size: 512 [2025-02-05 11:26:48,733][02141] Policy head output size: 512 [2025-02-05 11:26:48,768][01013] Inference worker 0-0 is ready! [2025-02-05 11:26:48,769][01013] All inference workers are ready! Signal rollout workers to start! [2025-02-05 11:26:49,070][02143] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,069][02146] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,073][02147] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,126][02145] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,146][02142] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,147][02148] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,206][02144] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:49,200][02149] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:26:50,436][02147] Decorrelating experience for 0 frames... [2025-02-05 11:26:50,437][02149] Decorrelating experience for 0 frames... [2025-02-05 11:26:50,691][02148] Decorrelating experience for 0 frames... [2025-02-05 11:26:50,698][02143] Decorrelating experience for 0 frames... [2025-02-05 11:26:50,695][02145] Decorrelating experience for 0 frames... [2025-02-05 11:26:50,702][02142] Decorrelating experience for 0 frames... [2025-02-05 11:26:51,102][02149] Decorrelating experience for 32 frames... [2025-02-05 11:26:51,457][02142] Decorrelating experience for 32 frames... [2025-02-05 11:26:51,459][02148] Decorrelating experience for 32 frames... [2025-02-05 11:26:51,654][01013] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-05 11:26:51,906][02147] Decorrelating experience for 32 frames... [2025-02-05 11:26:51,927][02144] Decorrelating experience for 0 frames... [2025-02-05 11:26:52,553][02149] Decorrelating experience for 64 frames... [2025-02-05 11:26:52,904][02144] Decorrelating experience for 32 frames... [2025-02-05 11:26:53,077][02142] Decorrelating experience for 64 frames... [2025-02-05 11:26:53,084][02148] Decorrelating experience for 64 frames... [2025-02-05 11:26:53,383][02145] Decorrelating experience for 32 frames... [2025-02-05 11:26:53,382][02143] Decorrelating experience for 32 frames... [2025-02-05 11:26:53,521][02149] Decorrelating experience for 96 frames... [2025-02-05 11:26:54,375][02144] Decorrelating experience for 64 frames... [2025-02-05 11:26:54,900][02147] Decorrelating experience for 64 frames... [2025-02-05 11:26:54,942][02142] Decorrelating experience for 96 frames... [2025-02-05 11:26:54,944][02146] Decorrelating experience for 0 frames... [2025-02-05 11:26:55,125][02145] Decorrelating experience for 64 frames... [2025-02-05 11:26:55,758][02143] Decorrelating experience for 64 frames... [2025-02-05 11:26:56,359][02148] Decorrelating experience for 96 frames... [2025-02-05 11:26:56,654][01013] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-05 11:26:56,656][01013] Avg episode reward: [(0, '2.645')] [2025-02-05 11:26:56,815][02144] Decorrelating experience for 96 frames... [2025-02-05 11:26:56,949][02147] Decorrelating experience for 96 frames... [2025-02-05 11:26:57,529][02145] Decorrelating experience for 96 frames... [2025-02-05 11:26:58,908][02143] Decorrelating experience for 96 frames... [2025-02-05 11:27:00,652][02128] Signal inference workers to stop experience collection... [2025-02-05 11:27:00,677][02141] InferenceWorker_p0-w0: stopping experience collection [2025-02-05 11:27:00,953][02146] Decorrelating experience for 32 frames... [2025-02-05 11:27:01,654][01013] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 245.0. Samples: 2450. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-05 11:27:01,658][01013] Avg episode reward: [(0, '2.664')] [2025-02-05 11:27:01,901][02128] Signal inference workers to resume experience collection... [2025-02-05 11:27:01,902][02141] InferenceWorker_p0-w0: resuming experience collection [2025-02-05 11:27:03,297][02146] Decorrelating experience for 64 frames... [2025-02-05 11:27:06,654][01013] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 222.9. Samples: 3344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-02-05 11:27:06,656][01013] Avg episode reward: [(0, '3.332')] [2025-02-05 11:27:06,827][02146] Decorrelating experience for 96 frames... [2025-02-05 11:27:11,386][02141] Updated weights for policy 0, policy_version 10 (0.0024) [2025-02-05 11:27:11,654][01013] Fps is (10 sec: 4096.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 40960. Throughput: 0: 447.6. Samples: 8952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:27:11,659][01013] Avg episode reward: [(0, '4.011')] [2025-02-05 11:27:16,654][01013] Fps is (10 sec: 4505.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 644.2. Samples: 16104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:27:16,659][01013] Avg episode reward: [(0, '4.422')] [2025-02-05 11:27:21,654][01013] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 77824. Throughput: 0: 616.5. Samples: 18494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:27:21,658][01013] Avg episode reward: [(0, '4.407')] [2025-02-05 11:27:22,556][02141] Updated weights for policy 0, policy_version 20 (0.0025) [2025-02-05 11:27:26,654][01013] Fps is (10 sec: 3276.8, 60 sec: 2691.7, 300 sec: 2691.7). Total num frames: 94208. Throughput: 0: 653.7. Samples: 22878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:27:26,658][01013] Avg episode reward: [(0, '4.468')] [2025-02-05 11:27:31,654][01013] Fps is (10 sec: 4096.0, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 747.4. Samples: 29894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:27:31,660][01013] Avg episode reward: [(0, '4.367')] [2025-02-05 11:27:31,662][02128] Saving new best policy, reward=4.367! [2025-02-05 11:27:31,914][02141] Updated weights for policy 0, policy_version 30 (0.0025) [2025-02-05 11:27:36,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 135168. Throughput: 0: 736.7. Samples: 33150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:27:36,657][01013] Avg episode reward: [(0, '4.281')] [2025-02-05 11:27:41,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 831.3. Samples: 37420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:27:41,661][01013] Avg episode reward: [(0, '4.332')] [2025-02-05 11:27:43,595][02141] Updated weights for policy 0, policy_version 40 (0.0015) [2025-02-05 11:27:46,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 176128. Throughput: 0: 926.5. Samples: 44144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:27:46,661][01013] Avg episode reward: [(0, '4.352')] [2025-02-05 11:27:51,656][01013] Fps is (10 sec: 4504.8, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 196608. Throughput: 0: 984.9. Samples: 47664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:27:51,659][01013] Avg episode reward: [(0, '4.303')] [2025-02-05 11:27:53,201][02141] Updated weights for policy 0, policy_version 50 (0.0017) [2025-02-05 11:27:56,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 212992. Throughput: 0: 976.2. Samples: 52882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:27:56,658][01013] Avg episode reward: [(0, '4.302')] [2025-02-05 11:28:01,654][01013] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 944.8. Samples: 58620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:28:01,660][01013] Avg episode reward: [(0, '4.304')] [2025-02-05 11:28:03,710][02141] Updated weights for policy 0, policy_version 60 (0.0028) [2025-02-05 11:28:06,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 970.0. Samples: 62144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:28:06,657][01013] Avg episode reward: [(0, '4.415')] [2025-02-05 11:28:06,665][02128] Saving new best policy, reward=4.415! [2025-02-05 11:28:11,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 1008.2. Samples: 68246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:28:11,657][01013] Avg episode reward: [(0, '4.582')] [2025-02-05 11:28:11,660][02128] Saving new best policy, reward=4.582! [2025-02-05 11:28:15,115][02141] Updated weights for policy 0, policy_version 70 (0.0014) [2025-02-05 11:28:16,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3421.4). Total num frames: 290816. Throughput: 0: 958.9. Samples: 73046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:28:16,661][01013] Avg episode reward: [(0, '4.579')] [2025-02-05 11:28:16,673][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2025-02-05 11:28:21,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3504.4). Total num frames: 315392. Throughput: 0: 964.2. Samples: 76540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:28:21,661][01013] Avg episode reward: [(0, '4.478')] [2025-02-05 11:28:23,869][02141] Updated weights for policy 0, policy_version 80 (0.0029) [2025-02-05 11:28:26,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 1025.8. Samples: 83582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:28:26,658][01013] Avg episode reward: [(0, '4.498')] [2025-02-05 11:28:31,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3522.5). Total num frames: 352256. Throughput: 0: 977.8. Samples: 88146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:28:31,661][01013] Avg episode reward: [(0, '4.508')] [2025-02-05 11:28:35,378][02141] Updated weights for policy 0, policy_version 90 (0.0030) [2025-02-05 11:28:36,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 963.9. Samples: 91038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:28:36,660][01013] Avg episode reward: [(0, '4.590')] [2025-02-05 11:28:36,667][02128] Saving new best policy, reward=4.590! [2025-02-05 11:28:41,654][01013] Fps is (10 sec: 4506.1, 60 sec: 4096.0, 300 sec: 3611.9). Total num frames: 397312. Throughput: 0: 999.0. Samples: 97838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:28:41,660][01013] Avg episode reward: [(0, '4.567')] [2025-02-05 11:28:45,198][02141] Updated weights for policy 0, policy_version 100 (0.0034) [2025-02-05 11:28:46,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3561.7). Total num frames: 409600. Throughput: 0: 992.4. Samples: 103280. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:28:46,657][01013] Avg episode reward: [(0, '4.557')] [2025-02-05 11:28:51,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3584.0). Total num frames: 430080. Throughput: 0: 963.3. Samples: 105494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:28:51,656][01013] Avg episode reward: [(0, '4.529')] [2025-02-05 11:28:55,442][02141] Updated weights for policy 0, policy_version 110 (0.0021) [2025-02-05 11:28:56,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3637.3). Total num frames: 454656. Throughput: 0: 980.7. Samples: 112376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:28:56,657][01013] Avg episode reward: [(0, '4.665')] [2025-02-05 11:28:56,663][02128] Saving new best policy, reward=4.665! [2025-02-05 11:29:01,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3654.9). Total num frames: 475136. Throughput: 0: 1014.6. Samples: 118702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:29:01,657][01013] Avg episode reward: [(0, '4.751')] [2025-02-05 11:29:01,658][02128] Saving new best policy, reward=4.751! [2025-02-05 11:29:06,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3610.6). Total num frames: 487424. Throughput: 0: 983.8. Samples: 120812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:29:06,662][01013] Avg episode reward: [(0, '4.650')] [2025-02-05 11:29:07,212][02141] Updated weights for policy 0, policy_version 120 (0.0020) [2025-02-05 11:29:11,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3657.1). Total num frames: 512000. Throughput: 0: 953.9. Samples: 126508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:29:11,658][01013] Avg episode reward: [(0, '4.610')] [2025-02-05 11:29:15,717][02141] Updated weights for policy 0, policy_version 130 (0.0017) [2025-02-05 11:29:16,654][01013] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3700.5). Total num frames: 536576. Throughput: 0: 1011.2. Samples: 133650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:29:16,657][01013] Avg episode reward: [(0, '4.536')] [2025-02-05 11:29:21,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3659.1). Total num frames: 548864. Throughput: 0: 1004.5. Samples: 136242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:29:21,656][01013] Avg episode reward: [(0, '4.598')] [2025-02-05 11:29:26,654][01013] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3673.2). Total num frames: 569344. Throughput: 0: 959.6. Samples: 141018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:29:26,656][01013] Avg episode reward: [(0, '4.575')] [2025-02-05 11:29:27,259][02141] Updated weights for policy 0, policy_version 140 (0.0021) [2025-02-05 11:29:31,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3712.0). Total num frames: 593920. Throughput: 0: 998.4. Samples: 148206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:29:31,660][01013] Avg episode reward: [(0, '4.576')] [2025-02-05 11:29:36,655][01013] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3698.8). Total num frames: 610304. Throughput: 0: 1026.1. Samples: 151670. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-05 11:29:36,658][01013] Avg episode reward: [(0, '4.813')] [2025-02-05 11:29:36,663][02128] Saving new best policy, reward=4.813! [2025-02-05 11:29:37,092][02141] Updated weights for policy 0, policy_version 150 (0.0046) [2025-02-05 11:29:41,658][01013] Fps is (10 sec: 2866.1, 60 sec: 3754.4, 300 sec: 3662.2). Total num frames: 622592. Throughput: 0: 966.1. Samples: 155852. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-05 11:29:41,664][01013] Avg episode reward: [(0, '4.711')] [2025-02-05 11:29:46,657][01013] Fps is (10 sec: 2866.6, 60 sec: 3822.8, 300 sec: 3651.2). Total num frames: 638976. Throughput: 0: 915.5. Samples: 159904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-05 11:29:46,660][01013] Avg episode reward: [(0, '4.713')] [2025-02-05 11:29:50,394][02141] Updated weights for policy 0, policy_version 160 (0.0027) [2025-02-05 11:29:51,654][01013] Fps is (10 sec: 3687.8, 60 sec: 3822.9, 300 sec: 3663.6). Total num frames: 659456. Throughput: 0: 923.6. Samples: 162376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:29:51,657][01013] Avg episode reward: [(0, '4.569')] [2025-02-05 11:29:56,654][01013] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3631.1). Total num frames: 671744. Throughput: 0: 923.2. Samples: 168050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:29:56,658][01013] Avg episode reward: [(0, '4.766')] [2025-02-05 11:30:01,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3643.3). Total num frames: 692224. Throughput: 0: 877.7. Samples: 173148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:30:01,656][01013] Avg episode reward: [(0, '4.705')] [2025-02-05 11:30:02,124][02141] Updated weights for policy 0, policy_version 170 (0.0022) [2025-02-05 11:30:06,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3675.9). Total num frames: 716800. Throughput: 0: 899.3. Samples: 176712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:30:06,657][01013] Avg episode reward: [(0, '4.798')] [2025-02-05 11:30:11,656][01013] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3665.9). Total num frames: 733184. Throughput: 0: 941.3. Samples: 183380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:30:11,658][01013] Avg episode reward: [(0, '4.778')] [2025-02-05 11:30:12,128][02141] Updated weights for policy 0, policy_version 180 (0.0016) [2025-02-05 11:30:16,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 876.7. Samples: 187658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:30:16,660][01013] Avg episode reward: [(0, '4.688')] [2025-02-05 11:30:16,669][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth... [2025-02-05 11:30:21,654][01013] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3666.9). Total num frames: 770048. Throughput: 0: 872.6. Samples: 190938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:30:21,662][01013] Avg episode reward: [(0, '4.648')] [2025-02-05 11:30:22,567][02141] Updated weights for policy 0, policy_version 190 (0.0022) [2025-02-05 11:30:26,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3695.9). Total num frames: 794624. Throughput: 0: 933.7. Samples: 197864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:30:26,662][01013] Avg episode reward: [(0, '4.698')] [2025-02-05 11:30:31,657][01013] Fps is (10 sec: 4094.8, 60 sec: 3618.0, 300 sec: 3686.4). Total num frames: 811008. Throughput: 0: 951.2. Samples: 202710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:30:31,660][01013] Avg episode reward: [(0, '4.849')] [2025-02-05 11:30:31,665][02128] Saving new best policy, reward=4.849! [2025-02-05 11:30:33,988][02141] Updated weights for policy 0, policy_version 200 (0.0017) [2025-02-05 11:30:36,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3677.3). Total num frames: 827392. Throughput: 0: 947.5. Samples: 205012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:30:36,661][01013] Avg episode reward: [(0, '4.974')] [2025-02-05 11:30:36,726][02128] Saving new best policy, reward=4.974! [2025-02-05 11:30:41,654][01013] Fps is (10 sec: 4097.2, 60 sec: 3823.2, 300 sec: 3704.2). Total num frames: 851968. Throughput: 0: 968.9. Samples: 211652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:30:41,657][01013] Avg episode reward: [(0, '4.976')] [2025-02-05 11:30:41,659][02128] Saving new best policy, reward=4.976! [2025-02-05 11:30:43,256][02141] Updated weights for policy 0, policy_version 210 (0.0018) [2025-02-05 11:30:46,655][01013] Fps is (10 sec: 4095.6, 60 sec: 3823.1, 300 sec: 3695.1). Total num frames: 868352. Throughput: 0: 982.5. Samples: 217360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:30:46,657][01013] Avg episode reward: [(0, '4.835')] [2025-02-05 11:30:51,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3686.4). Total num frames: 884736. Throughput: 0: 950.2. Samples: 219472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:30:51,664][01013] Avg episode reward: [(0, '4.603')] [2025-02-05 11:30:54,968][02141] Updated weights for policy 0, policy_version 220 (0.0021) [2025-02-05 11:30:56,654][01013] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3694.8). Total num frames: 905216. Throughput: 0: 942.2. Samples: 225778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:30:56,660][01013] Avg episode reward: [(0, '4.714')] [2025-02-05 11:31:01,656][01013] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 3719.1). Total num frames: 929792. Throughput: 0: 998.1. Samples: 232574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:31:01,666][01013] Avg episode reward: [(0, '4.546')] [2025-02-05 11:31:05,608][02141] Updated weights for policy 0, policy_version 230 (0.0032) [2025-02-05 11:31:06,659][01013] Fps is (10 sec: 3684.6, 60 sec: 3754.4, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 973.7. Samples: 234760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:31:06,662][01013] Avg episode reward: [(0, '4.527')] [2025-02-05 11:31:11,654][01013] Fps is (10 sec: 3277.5, 60 sec: 3823.0, 300 sec: 3702.2). Total num frames: 962560. Throughput: 0: 936.2. Samples: 239992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:31:11,664][01013] Avg episode reward: [(0, '4.530')] [2025-02-05 11:31:15,154][02141] Updated weights for policy 0, policy_version 240 (0.0018) [2025-02-05 11:31:16,654][01013] Fps is (10 sec: 4507.8, 60 sec: 3959.5, 300 sec: 3725.0). Total num frames: 987136. Throughput: 0: 985.4. Samples: 247052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:31:16,657][01013] Avg episode reward: [(0, '4.472')] [2025-02-05 11:31:21,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3716.7). Total num frames: 1003520. Throughput: 0: 1001.5. Samples: 250080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:31:21,660][01013] Avg episode reward: [(0, '4.775')] [2025-02-05 11:31:26,535][02141] Updated weights for policy 0, policy_version 250 (0.0025) [2025-02-05 11:31:26,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3723.6). Total num frames: 1024000. Throughput: 0: 948.2. Samples: 254322. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:31:26,658][01013] Avg episode reward: [(0, '4.781')] [2025-02-05 11:31:31,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3730.3). Total num frames: 1044480. Throughput: 0: 979.2. Samples: 261424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:31:31,656][01013] Avg episode reward: [(0, '5.045')] [2025-02-05 11:31:31,663][02128] Saving new best policy, reward=5.045! [2025-02-05 11:31:35,714][02141] Updated weights for policy 0, policy_version 260 (0.0028) [2025-02-05 11:31:36,657][01013] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 1006.7. Samples: 264778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:31:36,664][01013] Avg episode reward: [(0, '5.128')] [2025-02-05 11:31:36,675][02128] Saving new best policy, reward=5.128! [2025-02-05 11:31:41,655][01013] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3728.8). Total num frames: 1081344. Throughput: 0: 972.2. Samples: 269526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:31:41,660][01013] Avg episode reward: [(0, '5.245')] [2025-02-05 11:31:41,666][02128] Saving new best policy, reward=5.245! [2025-02-05 11:31:46,654][01013] Fps is (10 sec: 3687.5, 60 sec: 3891.3, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 946.1. Samples: 275148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:31:46,660][01013] Avg episode reward: [(0, '4.967')] [2025-02-05 11:31:47,372][02141] Updated weights for policy 0, policy_version 270 (0.0029) [2025-02-05 11:31:51,654][01013] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 974.6. Samples: 278614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:31:51,661][01013] Avg episode reward: [(0, '5.062')] [2025-02-05 11:31:56,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1138688. Throughput: 0: 988.9. Samples: 284492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:31:56,658][01013] Avg episode reward: [(0, '5.171')] [2025-02-05 11:31:58,711][02141] Updated weights for policy 0, policy_version 280 (0.0019) [2025-02-05 11:32:01,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 1159168. Throughput: 0: 939.6. Samples: 289334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:32:01,659][01013] Avg episode reward: [(0, '5.268')] [2025-02-05 11:32:01,661][02128] Saving new best policy, reward=5.268! [2025-02-05 11:32:06,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3860.0). Total num frames: 1179648. Throughput: 0: 950.2. Samples: 292838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:32:06,661][01013] Avg episode reward: [(0, '5.373')] [2025-02-05 11:32:06,667][02128] Saving new best policy, reward=5.373! [2025-02-05 11:32:08,103][02141] Updated weights for policy 0, policy_version 290 (0.0018) [2025-02-05 11:32:11,654][01013] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1200128. Throughput: 0: 1006.4. Samples: 299608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:32:11,659][01013] Avg episode reward: [(0, '5.380')] [2025-02-05 11:32:11,661][02128] Saving new best policy, reward=5.380! [2025-02-05 11:32:16,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1212416. Throughput: 0: 940.2. Samples: 303734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:32:16,656][01013] Avg episode reward: [(0, '5.344')] [2025-02-05 11:32:16,669][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... [2025-02-05 11:32:16,830][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2025-02-05 11:32:19,594][02141] Updated weights for policy 0, policy_version 300 (0.0031) [2025-02-05 11:32:21,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1236992. Throughput: 0: 930.3. Samples: 306638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:32:21,657][01013] Avg episode reward: [(0, '5.253')] [2025-02-05 11:32:26,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1257472. Throughput: 0: 979.9. Samples: 313620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:32:26,657][01013] Avg episode reward: [(0, '5.277')] [2025-02-05 11:32:29,644][02141] Updated weights for policy 0, policy_version 310 (0.0028) [2025-02-05 11:32:31,656][01013] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 1273856. Throughput: 0: 969.5. Samples: 318776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:32:31,659][01013] Avg episode reward: [(0, '5.528')] [2025-02-05 11:32:31,667][02128] Saving new best policy, reward=5.528! [2025-02-05 11:32:36,654][01013] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3873.8). Total num frames: 1294336. Throughput: 0: 939.8. Samples: 320906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:32:36,656][01013] Avg episode reward: [(0, '5.605')] [2025-02-05 11:32:36,667][02128] Saving new best policy, reward=5.605! [2025-02-05 11:32:40,284][02141] Updated weights for policy 0, policy_version 320 (0.0027) [2025-02-05 11:32:41,654][01013] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1314816. Throughput: 0: 959.9. Samples: 327686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:32:41,658][01013] Avg episode reward: [(0, '5.492')] [2025-02-05 11:32:46,654][01013] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1335296. Throughput: 0: 992.6. Samples: 333998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:32:46,657][01013] Avg episode reward: [(0, '5.319')] [2025-02-05 11:32:51,654][01013] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1347584. Throughput: 0: 962.4. Samples: 336144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:32:51,657][01013] Avg episode reward: [(0, '5.305')] [2025-02-05 11:32:51,670][02141] Updated weights for policy 0, policy_version 330 (0.0022) [2025-02-05 11:32:56,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1372160. Throughput: 0: 945.9. Samples: 342172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:32:56,658][01013] Avg episode reward: [(0, '5.477')] [2025-02-05 11:33:00,206][02141] Updated weights for policy 0, policy_version 340 (0.0021) [2025-02-05 11:33:01,654][01013] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1396736. Throughput: 0: 1014.4. Samples: 349380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:33:01,658][01013] Avg episode reward: [(0, '5.565')] [2025-02-05 11:33:06,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1409024. Throughput: 0: 1000.0. Samples: 351640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:33:06,660][01013] Avg episode reward: [(0, '5.363')] [2025-02-05 11:33:11,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1429504. Throughput: 0: 955.0. Samples: 356594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:33:11,657][01013] Avg episode reward: [(0, '5.393')] [2025-02-05 11:33:11,767][02141] Updated weights for policy 0, policy_version 350 (0.0023) [2025-02-05 11:33:16,654][01013] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 1454080. Throughput: 0: 1001.2. Samples: 363830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:33:16,659][01013] Avg episode reward: [(0, '5.502')] [2025-02-05 11:33:21,153][02141] Updated weights for policy 0, policy_version 360 (0.0021) [2025-02-05 11:33:21,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1474560. Throughput: 0: 1030.6. Samples: 367282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:33:21,657][01013] Avg episode reward: [(0, '5.870')] [2025-02-05 11:33:21,663][02128] Saving new best policy, reward=5.870! [2025-02-05 11:33:26,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1486848. Throughput: 0: 970.9. Samples: 371376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:33:26,656][01013] Avg episode reward: [(0, '5.888')] [2025-02-05 11:33:26,664][02128] Saving new best policy, reward=5.888! [2025-02-05 11:33:31,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 1511424. Throughput: 0: 973.2. Samples: 377794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:33:31,656][01013] Avg episode reward: [(0, '5.949')] [2025-02-05 11:33:31,662][02128] Saving new best policy, reward=5.949! [2025-02-05 11:33:32,189][02141] Updated weights for policy 0, policy_version 370 (0.0026) [2025-02-05 11:33:36,654][01013] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1531904. Throughput: 0: 1003.5. Samples: 381304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:33:36,660][01013] Avg episode reward: [(0, '6.250')] [2025-02-05 11:33:36,671][02128] Saving new best policy, reward=6.250! [2025-02-05 11:33:41,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1548288. Throughput: 0: 983.3. Samples: 386422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:33:41,659][01013] Avg episode reward: [(0, '6.023')] [2025-02-05 11:33:44,015][02141] Updated weights for policy 0, policy_version 380 (0.0023) [2025-02-05 11:33:46,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1568768. Throughput: 0: 941.2. Samples: 391732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:33:46,662][01013] Avg episode reward: [(0, '5.856')] [2025-02-05 11:33:51,654][01013] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1589248. Throughput: 0: 969.9. Samples: 395286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:33:51,663][01013] Avg episode reward: [(0, '6.301')] [2025-02-05 11:33:51,665][02128] Saving new best policy, reward=6.301! [2025-02-05 11:33:53,090][02141] Updated weights for policy 0, policy_version 390 (0.0023) [2025-02-05 11:33:56,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 1605632. Throughput: 0: 998.7. Samples: 401536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:33:56,657][01013] Avg episode reward: [(0, '6.489')] [2025-02-05 11:33:56,672][02128] Saving new best policy, reward=6.489! [2025-02-05 11:34:01,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1622016. Throughput: 0: 932.0. Samples: 405770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:34:01,662][01013] Avg episode reward: [(0, '6.269')] [2025-02-05 11:34:04,679][02141] Updated weights for policy 0, policy_version 400 (0.0015) [2025-02-05 11:34:06,654][01013] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1646592. Throughput: 0: 931.5. Samples: 409200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:34:06,665][01013] Avg episode reward: [(0, '6.035')] [2025-02-05 11:34:11,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1667072. Throughput: 0: 994.1. Samples: 416112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:34:11,656][01013] Avg episode reward: [(0, '6.262')] [2025-02-05 11:34:15,573][02141] Updated weights for policy 0, policy_version 410 (0.0028) [2025-02-05 11:34:16,655][01013] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 1679360. Throughput: 0: 950.5. Samples: 420568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:16,664][01013] Avg episode reward: [(0, '6.224')] [2025-02-05 11:34:16,671][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth... [2025-02-05 11:34:16,834][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000183_749568.pth [2025-02-05 11:34:21,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1699840. Throughput: 0: 923.6. Samples: 422866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:34:21,660][01013] Avg episode reward: [(0, '6.165')] [2025-02-05 11:34:25,862][02141] Updated weights for policy 0, policy_version 420 (0.0027) [2025-02-05 11:34:26,654][01013] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1720320. Throughput: 0: 958.2. Samples: 429540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:26,656][01013] Avg episode reward: [(0, '6.056')] [2025-02-05 11:34:31,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 1736704. Throughput: 0: 956.7. Samples: 434786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:31,660][01013] Avg episode reward: [(0, '5.807')] [2025-02-05 11:34:36,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1753088. Throughput: 0: 922.4. Samples: 436792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:36,661][01013] Avg episode reward: [(0, '6.203')] [2025-02-05 11:34:37,843][02141] Updated weights for policy 0, policy_version 430 (0.0014) [2025-02-05 11:34:41,654][01013] Fps is (10 sec: 3277.2, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1769472. Throughput: 0: 905.7. Samples: 442290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:41,659][01013] Avg episode reward: [(0, '6.192')] [2025-02-05 11:34:46,660][01013] Fps is (10 sec: 2865.6, 60 sec: 3549.5, 300 sec: 3804.3). Total num frames: 1781760. Throughput: 0: 900.2. Samples: 446282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:34:46,668][01013] Avg episode reward: [(0, '6.721')] [2025-02-05 11:34:46,677][02128] Saving new best policy, reward=6.721! [2025-02-05 11:34:51,654][01013] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3818.3). Total num frames: 1798144. Throughput: 0: 868.3. Samples: 448272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:34:51,656][01013] Avg episode reward: [(0, '7.065')] [2025-02-05 11:34:51,663][02128] Saving new best policy, reward=7.065! [2025-02-05 11:34:52,871][02141] Updated weights for policy 0, policy_version 440 (0.0028) [2025-02-05 11:34:56,654][01013] Fps is (10 sec: 3688.5, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 1818624. Throughput: 0: 828.3. Samples: 453386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:34:56,657][01013] Avg episode reward: [(0, '6.951')] [2025-02-05 11:35:01,654][01013] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1839104. Throughput: 0: 882.9. Samples: 460298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:35:01,657][01013] Avg episode reward: [(0, '6.399')] [2025-02-05 11:35:01,728][02141] Updated weights for policy 0, policy_version 450 (0.0025) [2025-02-05 11:35:06,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3804.4). Total num frames: 1855488. Throughput: 0: 896.3. Samples: 463198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:35:06,659][01013] Avg episode reward: [(0, '6.226')] [2025-02-05 11:35:11,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3804.4). Total num frames: 1871872. Throughput: 0: 835.2. Samples: 467124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:35:11,657][01013] Avg episode reward: [(0, '6.327')] [2025-02-05 11:35:13,657][02141] Updated weights for policy 0, policy_version 460 (0.0032) [2025-02-05 11:35:16,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3818.3). Total num frames: 1896448. Throughput: 0: 871.5. Samples: 474002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:35:16,657][01013] Avg episode reward: [(0, '6.979')] [2025-02-05 11:35:21,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1916928. Throughput: 0: 903.2. Samples: 477436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:35:21,664][01013] Avg episode reward: [(0, '7.309')] [2025-02-05 11:35:21,666][02128] Saving new best policy, reward=7.309! [2025-02-05 11:35:24,601][02141] Updated weights for policy 0, policy_version 470 (0.0022) [2025-02-05 11:35:26,655][01013] Fps is (10 sec: 3276.5, 60 sec: 3481.5, 300 sec: 3790.6). Total num frames: 1929216. Throughput: 0: 880.7. Samples: 481924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:35:26,661][01013] Avg episode reward: [(0, '7.504')] [2025-02-05 11:35:26,681][02128] Saving new best policy, reward=7.504! [2025-02-05 11:35:31,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 1949696. Throughput: 0: 913.6. Samples: 487388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:35:31,659][01013] Avg episode reward: [(0, '7.693')] [2025-02-05 11:35:31,663][02128] Saving new best policy, reward=7.693! [2025-02-05 11:35:35,027][02141] Updated weights for policy 0, policy_version 480 (0.0022) [2025-02-05 11:35:36,654][01013] Fps is (10 sec: 4096.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1970176. Throughput: 0: 942.4. Samples: 490680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:35:36,660][01013] Avg episode reward: [(0, '8.125')] [2025-02-05 11:35:36,667][02128] Saving new best policy, reward=8.125! [2025-02-05 11:35:41,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 1986560. Throughput: 0: 951.6. Samples: 496210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:35:41,657][01013] Avg episode reward: [(0, '8.285')] [2025-02-05 11:35:41,663][02128] Saving new best policy, reward=8.285! [2025-02-05 11:35:46,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3790.5). Total num frames: 2002944. Throughput: 0: 901.3. Samples: 500858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:35:46,657][01013] Avg episode reward: [(0, '7.711')] [2025-02-05 11:35:47,048][02141] Updated weights for policy 0, policy_version 490 (0.0027) [2025-02-05 11:35:51,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2027520. Throughput: 0: 911.9. Samples: 504234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:35:51,658][01013] Avg episode reward: [(0, '7.280')] [2025-02-05 11:35:56,657][01013] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 2043904. Throughput: 0: 975.4. Samples: 511018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:35:56,659][01013] Avg episode reward: [(0, '7.572')] [2025-02-05 11:35:57,140][02141] Updated weights for policy 0, policy_version 500 (0.0012) [2025-02-05 11:36:01,654][01013] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2056192. Throughput: 0: 913.1. Samples: 515092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:36:01,657][01013] Avg episode reward: [(0, '7.616')] [2025-02-05 11:36:06,654][01013] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2080768. Throughput: 0: 898.0. Samples: 517844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:36:06,661][01013] Avg episode reward: [(0, '8.349')] [2025-02-05 11:36:06,666][02128] Saving new best policy, reward=8.349! [2025-02-05 11:36:08,285][02141] Updated weights for policy 0, policy_version 510 (0.0038) [2025-02-05 11:36:11,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2101248. Throughput: 0: 944.8. Samples: 524438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:36:11,659][01013] Avg episode reward: [(0, '8.128')] [2025-02-05 11:36:16,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3686.3, 300 sec: 3776.6). Total num frames: 2117632. Throughput: 0: 937.2. Samples: 529564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:36:16,662][01013] Avg episode reward: [(0, '9.181')] [2025-02-05 11:36:16,671][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth... [2025-02-05 11:36:16,834][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth [2025-02-05 11:36:16,863][02128] Saving new best policy, reward=9.181! [2025-02-05 11:36:20,466][02141] Updated weights for policy 0, policy_version 520 (0.0021) [2025-02-05 11:36:21,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2134016. Throughput: 0: 907.3. Samples: 531508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:36:21,661][01013] Avg episode reward: [(0, '9.092')] [2025-02-05 11:36:26,654][01013] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2154496. Throughput: 0: 932.2. Samples: 538160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:36:26,661][01013] Avg episode reward: [(0, '9.037')] [2025-02-05 11:36:29,353][02141] Updated weights for policy 0, policy_version 530 (0.0012) [2025-02-05 11:36:31,655][01013] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 2174976. Throughput: 0: 967.0. Samples: 544376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:36:31,658][01013] Avg episode reward: [(0, '8.419')] [2025-02-05 11:36:36,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2187264. Throughput: 0: 936.5. Samples: 546378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:36:36,657][01013] Avg episode reward: [(0, '8.455')] [2025-02-05 11:36:41,325][02141] Updated weights for policy 0, policy_version 540 (0.0029) [2025-02-05 11:36:41,654][01013] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2211840. Throughput: 0: 911.8. Samples: 552048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:36:41,658][01013] Avg episode reward: [(0, '8.392')] [2025-02-05 11:36:46,654][01013] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2236416. Throughput: 0: 970.5. Samples: 558764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:36:46,657][01013] Avg episode reward: [(0, '8.141')] [2025-02-05 11:36:51,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2248704. Throughput: 0: 963.2. Samples: 561188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:36:51,658][01013] Avg episode reward: [(0, '8.241')] [2025-02-05 11:36:52,600][02141] Updated weights for policy 0, policy_version 550 (0.0023) [2025-02-05 11:36:56,654][01013] Fps is (10 sec: 2867.2, 60 sec: 3686.6, 300 sec: 3748.9). Total num frames: 2265088. Throughput: 0: 919.0. Samples: 565792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:36:56,656][01013] Avg episode reward: [(0, '8.833')] [2025-02-05 11:37:01,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2289664. Throughput: 0: 955.4. Samples: 572556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:01,662][01013] Avg episode reward: [(0, '9.003')] [2025-02-05 11:37:02,138][02141] Updated weights for policy 0, policy_version 560 (0.0018) [2025-02-05 11:37:06,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2306048. Throughput: 0: 985.6. Samples: 575858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:06,660][01013] Avg episode reward: [(0, '8.719')] [2025-02-05 11:37:11,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2322432. Throughput: 0: 930.4. Samples: 580030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:37:11,661][01013] Avg episode reward: [(0, '7.920')] [2025-02-05 11:37:14,060][02141] Updated weights for policy 0, policy_version 570 (0.0013) [2025-02-05 11:37:16,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2342912. Throughput: 0: 929.4. Samples: 586198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:37:16,662][01013] Avg episode reward: [(0, '8.421')] [2025-02-05 11:37:21,655][01013] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2367488. Throughput: 0: 958.8. Samples: 589526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-05 11:37:21,657][01013] Avg episode reward: [(0, '8.131')] [2025-02-05 11:37:24,527][02141] Updated weights for policy 0, policy_version 580 (0.0014) [2025-02-05 11:37:26,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2379776. Throughput: 0: 947.6. Samples: 594692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:26,657][01013] Avg episode reward: [(0, '8.159')] [2025-02-05 11:37:31,654][01013] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2400256. Throughput: 0: 919.0. Samples: 600118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:31,657][01013] Avg episode reward: [(0, '8.050')] [2025-02-05 11:37:34,858][02141] Updated weights for policy 0, policy_version 590 (0.0016) [2025-02-05 11:37:36,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2424832. Throughput: 0: 941.4. Samples: 603552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:36,661][01013] Avg episode reward: [(0, '8.795')] [2025-02-05 11:37:41,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2441216. Throughput: 0: 979.2. Samples: 609858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:41,657][01013] Avg episode reward: [(0, '9.867')] [2025-02-05 11:37:41,659][02128] Saving new best policy, reward=9.867! [2025-02-05 11:37:46,654][01013] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2453504. Throughput: 0: 917.2. Samples: 613830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:37:46,656][01013] Avg episode reward: [(0, '10.196')] [2025-02-05 11:37:46,665][02128] Saving new best policy, reward=10.196! [2025-02-05 11:37:46,902][02141] Updated weights for policy 0, policy_version 600 (0.0016) [2025-02-05 11:37:51,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2478080. Throughput: 0: 917.0. Samples: 617122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:37:51,662][01013] Avg episode reward: [(0, '9.784')] [2025-02-05 11:37:55,597][02141] Updated weights for policy 0, policy_version 610 (0.0032) [2025-02-05 11:37:56,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2498560. Throughput: 0: 979.1. Samples: 624088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:37:56,656][01013] Avg episode reward: [(0, '10.793')] [2025-02-05 11:37:56,662][02128] Saving new best policy, reward=10.793! [2025-02-05 11:38:01,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2514944. Throughput: 0: 948.2. Samples: 628868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:38:01,656][01013] Avg episode reward: [(0, '10.391')] [2025-02-05 11:38:06,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2535424. Throughput: 0: 931.3. Samples: 631436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:38:06,656][01013] Avg episode reward: [(0, '11.390')] [2025-02-05 11:38:06,664][02128] Saving new best policy, reward=11.390! [2025-02-05 11:38:07,383][02141] Updated weights for policy 0, policy_version 620 (0.0037) [2025-02-05 11:38:11,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 2560000. Throughput: 0: 973.6. Samples: 638504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:38:11,657][01013] Avg episode reward: [(0, '10.837')] [2025-02-05 11:38:16,659][01013] Fps is (10 sec: 4093.9, 60 sec: 3890.9, 300 sec: 3734.9). Total num frames: 2576384. Throughput: 0: 978.3. Samples: 644146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:38:16,662][01013] Avg episode reward: [(0, '10.837')] [2025-02-05 11:38:16,673][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... [2025-02-05 11:38:16,847][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000410_1679360.pth [2025-02-05 11:38:18,066][02141] Updated weights for policy 0, policy_version 630 (0.0012) [2025-02-05 11:38:21,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2592768. Throughput: 0: 948.7. Samples: 646244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:38:21,659][01013] Avg episode reward: [(0, '10.555')] [2025-02-05 11:38:26,654][01013] Fps is (10 sec: 4098.0, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 2617344. Throughput: 0: 953.8. Samples: 652778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:26,661][01013] Avg episode reward: [(0, '10.292')] [2025-02-05 11:38:27,376][02141] Updated weights for policy 0, policy_version 640 (0.0014) [2025-02-05 11:38:31,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 2637824. Throughput: 0: 1018.8. Samples: 659674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:31,659][01013] Avg episode reward: [(0, '9.631')] [2025-02-05 11:38:36,654][01013] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2650112. Throughput: 0: 993.0. Samples: 661806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:36,660][01013] Avg episode reward: [(0, '10.230')] [2025-02-05 11:38:38,830][02141] Updated weights for policy 0, policy_version 650 (0.0026) [2025-02-05 11:38:41,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2674688. Throughput: 0: 963.8. Samples: 667460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:41,661][01013] Avg episode reward: [(0, '10.746')] [2025-02-05 11:38:46,654][01013] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3748.9). Total num frames: 2695168. Throughput: 0: 1012.9. Samples: 674448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:46,661][01013] Avg episode reward: [(0, '11.986')] [2025-02-05 11:38:46,669][02128] Saving new best policy, reward=11.986! [2025-02-05 11:38:47,826][02141] Updated weights for policy 0, policy_version 660 (0.0032) [2025-02-05 11:38:51,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2711552. Throughput: 0: 1015.1. Samples: 677114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:38:51,657][01013] Avg episode reward: [(0, '11.927')] [2025-02-05 11:38:56,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2732032. Throughput: 0: 954.3. Samples: 681446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:38:56,660][01013] Avg episode reward: [(0, '12.402')] [2025-02-05 11:38:56,669][02128] Saving new best policy, reward=12.402! [2025-02-05 11:38:59,448][02141] Updated weights for policy 0, policy_version 670 (0.0026) [2025-02-05 11:39:01,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 2752512. Throughput: 0: 980.6. Samples: 688266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:39:01,659][01013] Avg episode reward: [(0, '12.694')] [2025-02-05 11:39:01,662][02128] Saving new best policy, reward=12.694! [2025-02-05 11:39:06,655][01013] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3748.9). Total num frames: 2772992. Throughput: 0: 1012.0. Samples: 691786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:39:06,662][01013] Avg episode reward: [(0, '13.335')] [2025-02-05 11:39:06,669][02128] Saving new best policy, reward=13.335! [2025-02-05 11:39:10,779][02141] Updated weights for policy 0, policy_version 680 (0.0012) [2025-02-05 11:39:11,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2785280. Throughput: 0: 967.6. Samples: 696322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:39:11,659][01013] Avg episode reward: [(0, '13.055')] [2025-02-05 11:39:16,654][01013] Fps is (10 sec: 3686.6, 60 sec: 3891.5, 300 sec: 3762.8). Total num frames: 2809856. Throughput: 0: 953.5. Samples: 702582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:39:16,661][01013] Avg episode reward: [(0, '12.355')] [2025-02-05 11:39:19,722][02141] Updated weights for policy 0, policy_version 690 (0.0030) [2025-02-05 11:39:21,654][01013] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 2834432. Throughput: 0: 986.4. Samples: 706196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-02-05 11:39:21,661][01013] Avg episode reward: [(0, '12.835')] [2025-02-05 11:39:26,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2846720. Throughput: 0: 986.9. Samples: 711870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:39:26,658][01013] Avg episode reward: [(0, '12.664')] [2025-02-05 11:39:31,177][02141] Updated weights for policy 0, policy_version 700 (0.0022) [2025-02-05 11:39:31,655][01013] Fps is (10 sec: 3276.5, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2867200. Throughput: 0: 948.5. Samples: 717132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:39:31,660][01013] Avg episode reward: [(0, '12.539')] [2025-02-05 11:39:36,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2879488. Throughput: 0: 938.2. Samples: 719334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:39:36,661][01013] Avg episode reward: [(0, '12.705')] [2025-02-05 11:39:41,654][01013] Fps is (10 sec: 2867.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2895872. Throughput: 0: 944.7. Samples: 723958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:39:41,657][01013] Avg episode reward: [(0, '12.825')] [2025-02-05 11:39:44,890][02141] Updated weights for policy 0, policy_version 710 (0.0028) [2025-02-05 11:39:46,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2912256. Throughput: 0: 888.2. Samples: 728236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:39:46,657][01013] Avg episode reward: [(0, '13.683')] [2025-02-05 11:39:46,670][02128] Saving new best policy, reward=13.683! [2025-02-05 11:39:51,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2932736. Throughput: 0: 879.1. Samples: 731346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:39:51,656][01013] Avg episode reward: [(0, '14.351')] [2025-02-05 11:39:51,717][02128] Saving new best policy, reward=14.351! [2025-02-05 11:39:54,479][02141] Updated weights for policy 0, policy_version 720 (0.0022) [2025-02-05 11:39:56,655][01013] Fps is (10 sec: 4505.3, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 2957312. Throughput: 0: 933.1. Samples: 738314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:39:56,661][01013] Avg episode reward: [(0, '15.986')] [2025-02-05 11:39:56,673][02128] Saving new best policy, reward=15.986! [2025-02-05 11:40:01,655][01013] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3776.6). Total num frames: 2969600. Throughput: 0: 902.3. Samples: 743188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:40:01,665][01013] Avg episode reward: [(0, '15.417')] [2025-02-05 11:40:06,094][02141] Updated weights for policy 0, policy_version 730 (0.0025) [2025-02-05 11:40:06,654][01013] Fps is (10 sec: 3277.0, 60 sec: 3618.2, 300 sec: 3790.5). Total num frames: 2990080. Throughput: 0: 870.7. Samples: 745378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:40:06,661][01013] Avg episode reward: [(0, '15.100')] [2025-02-05 11:40:11,654][01013] Fps is (10 sec: 4506.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3014656. Throughput: 0: 902.5. Samples: 752482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:40:11,662][01013] Avg episode reward: [(0, '16.347')] [2025-02-05 11:40:11,666][02128] Saving new best policy, reward=16.347! [2025-02-05 11:40:15,306][02141] Updated weights for policy 0, policy_version 740 (0.0019) [2025-02-05 11:40:16,655][01013] Fps is (10 sec: 4095.6, 60 sec: 3686.3, 300 sec: 3776.6). Total num frames: 3031040. Throughput: 0: 921.9. Samples: 758616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:40:16,663][01013] Avg episode reward: [(0, '16.584')] [2025-02-05 11:40:16,734][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000741_3035136.pth... [2025-02-05 11:40:16,890][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth [2025-02-05 11:40:16,921][02128] Saving new best policy, reward=16.584! [2025-02-05 11:40:21,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 3047424. Throughput: 0: 917.8. Samples: 760634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:40:21,663][01013] Avg episode reward: [(0, '16.717')] [2025-02-05 11:40:21,668][02128] Saving new best policy, reward=16.717! [2025-02-05 11:40:26,365][02141] Updated weights for policy 0, policy_version 750 (0.0023) [2025-02-05 11:40:26,654][01013] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3072000. Throughput: 0: 946.8. Samples: 766562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:40:26,660][01013] Avg episode reward: [(0, '15.937')] [2025-02-05 11:40:31,655][01013] Fps is (10 sec: 4505.2, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3092480. Throughput: 0: 1012.7. Samples: 773808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:40:31,657][01013] Avg episode reward: [(0, '15.116')] [2025-02-05 11:40:36,658][01013] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 3108864. Throughput: 0: 998.4. Samples: 776276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:40:36,677][01013] Avg episode reward: [(0, '14.974')] [2025-02-05 11:40:37,345][02141] Updated weights for policy 0, policy_version 760 (0.0012) [2025-02-05 11:40:41,654][01013] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3129344. Throughput: 0: 955.1. Samples: 781292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:40:41,662][01013] Avg episode reward: [(0, '14.797')] [2025-02-05 11:40:46,418][02141] Updated weights for policy 0, policy_version 770 (0.0022) [2025-02-05 11:40:46,654][01013] Fps is (10 sec: 4507.2, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3153920. Throughput: 0: 1003.2. Samples: 788332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:40:46,661][01013] Avg episode reward: [(0, '15.947')] [2025-02-05 11:40:51,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3170304. Throughput: 0: 1030.9. Samples: 791768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:40:51,657][01013] Avg episode reward: [(0, '16.833')] [2025-02-05 11:40:51,659][02128] Saving new best policy, reward=16.833! [2025-02-05 11:40:56,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 3186688. Throughput: 0: 968.8. Samples: 796078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-02-05 11:40:56,657][01013] Avg episode reward: [(0, '16.192')] [2025-02-05 11:40:58,097][02141] Updated weights for policy 0, policy_version 780 (0.0012) [2025-02-05 11:41:01,654][01013] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3832.2). Total num frames: 3211264. Throughput: 0: 979.0. Samples: 802668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:41:01,657][01013] Avg episode reward: [(0, '16.496')] [2025-02-05 11:41:06,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3231744. Throughput: 0: 1014.5. Samples: 806288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:41:06,659][01013] Avg episode reward: [(0, '17.875')] [2025-02-05 11:41:06,673][02128] Saving new best policy, reward=17.875! [2025-02-05 11:41:06,989][02141] Updated weights for policy 0, policy_version 790 (0.0013) [2025-02-05 11:41:11,656][01013] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3248128. Throughput: 0: 997.4. Samples: 811448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:41:11,658][01013] Avg episode reward: [(0, '18.450')] [2025-02-05 11:41:11,661][02128] Saving new best policy, reward=18.450! [2025-02-05 11:41:16,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3268608. Throughput: 0: 955.7. Samples: 816814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:41:16,660][01013] Avg episode reward: [(0, '19.196')] [2025-02-05 11:41:16,669][02128] Saving new best policy, reward=19.196! [2025-02-05 11:41:18,598][02141] Updated weights for policy 0, policy_version 800 (0.0049) [2025-02-05 11:41:21,654][01013] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3289088. Throughput: 0: 976.0. Samples: 820192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:41:21,661][01013] Avg episode reward: [(0, '20.332')] [2025-02-05 11:41:21,667][02128] Saving new best policy, reward=20.332! [2025-02-05 11:41:26,655][01013] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3305472. Throughput: 0: 997.9. Samples: 826198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:41:26,658][01013] Avg episode reward: [(0, '20.405')] [2025-02-05 11:41:26,675][02128] Saving new best policy, reward=20.405! [2025-02-05 11:41:30,096][02141] Updated weights for policy 0, policy_version 810 (0.0018) [2025-02-05 11:41:31,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 3321856. Throughput: 0: 940.5. Samples: 830656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:41:31,664][01013] Avg episode reward: [(0, '20.032')] [2025-02-05 11:41:36,654][01013] Fps is (10 sec: 4096.4, 60 sec: 3959.7, 300 sec: 3846.1). Total num frames: 3346432. Throughput: 0: 941.6. Samples: 834142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:41:36,658][01013] Avg episode reward: [(0, '18.856')] [2025-02-05 11:41:38,968][02141] Updated weights for policy 0, policy_version 820 (0.0014) [2025-02-05 11:41:41,658][01013] Fps is (10 sec: 4503.8, 60 sec: 3959.2, 300 sec: 3832.1). Total num frames: 3366912. Throughput: 0: 1001.3. Samples: 841140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-02-05 11:41:41,660][01013] Avg episode reward: [(0, '16.907')] [2025-02-05 11:41:46,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3379200. Throughput: 0: 955.9. Samples: 845682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:41:46,656][01013] Avg episode reward: [(0, '16.310')] [2025-02-05 11:41:50,597][02141] Updated weights for policy 0, policy_version 830 (0.0040) [2025-02-05 11:41:51,654][01013] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3403776. Throughput: 0: 932.8. Samples: 848266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:41:51,659][01013] Avg episode reward: [(0, '14.315')] [2025-02-05 11:41:56,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3424256. Throughput: 0: 976.8. Samples: 855404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:41:56,661][01013] Avg episode reward: [(0, '14.158')] [2025-02-05 11:42:00,244][02141] Updated weights for policy 0, policy_version 840 (0.0025) [2025-02-05 11:42:01,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3440640. Throughput: 0: 983.7. Samples: 861080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:01,659][01013] Avg episode reward: [(0, '14.889')] [2025-02-05 11:42:06,655][01013] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3461120. Throughput: 0: 957.3. Samples: 863270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:42:06,662][01013] Avg episode reward: [(0, '15.171')] [2025-02-05 11:42:10,739][02141] Updated weights for policy 0, policy_version 850 (0.0015) [2025-02-05 11:42:11,654][01013] Fps is (10 sec: 4095.9, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3481600. Throughput: 0: 969.9. Samples: 869844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:11,661][01013] Avg episode reward: [(0, '15.222')] [2025-02-05 11:42:16,654][01013] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3502080. Throughput: 0: 1019.5. Samples: 876534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-05 11:42:16,660][01013] Avg episode reward: [(0, '16.445')] [2025-02-05 11:42:16,730][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth... [2025-02-05 11:42:16,892][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth [2025-02-05 11:42:21,654][01013] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3518464. Throughput: 0: 986.9. Samples: 878554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:42:21,662][01013] Avg episode reward: [(0, '17.495')] [2025-02-05 11:42:22,516][02141] Updated weights for policy 0, policy_version 860 (0.0029) [2025-02-05 11:42:26,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3538944. Throughput: 0: 952.3. Samples: 883988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-02-05 11:42:26,660][01013] Avg episode reward: [(0, '17.506')] [2025-02-05 11:42:31,043][02141] Updated weights for policy 0, policy_version 870 (0.0016) [2025-02-05 11:42:31,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3563520. Throughput: 0: 1010.7. Samples: 891162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:42:31,657][01013] Avg episode reward: [(0, '19.227')] [2025-02-05 11:42:36,656][01013] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3579904. Throughput: 0: 1017.1. Samples: 894036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:36,659][01013] Avg episode reward: [(0, '19.000')] [2025-02-05 11:42:41,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 3596288. Throughput: 0: 956.9. Samples: 898464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:41,660][01013] Avg episode reward: [(0, '18.284')] [2025-02-05 11:42:42,546][02141] Updated weights for policy 0, policy_version 880 (0.0017) [2025-02-05 11:42:46,654][01013] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3620864. Throughput: 0: 983.8. Samples: 905350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:46,662][01013] Avg episode reward: [(0, '19.790')] [2025-02-05 11:42:51,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3641344. Throughput: 0: 1014.4. Samples: 908916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-02-05 11:42:51,659][01013] Avg episode reward: [(0, '21.327')] [2025-02-05 11:42:51,669][02128] Saving new best policy, reward=21.327! [2025-02-05 11:42:52,674][02141] Updated weights for policy 0, policy_version 890 (0.0034) [2025-02-05 11:42:56,655][01013] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3653632. Throughput: 0: 967.4. Samples: 913376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:42:56,661][01013] Avg episode reward: [(0, '21.515')] [2025-02-05 11:42:56,676][02128] Saving new best policy, reward=21.515! [2025-02-05 11:43:01,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3678208. Throughput: 0: 957.6. Samples: 919626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:43:01,661][01013] Avg episode reward: [(0, '21.246')] [2025-02-05 11:43:02,909][02141] Updated weights for policy 0, policy_version 900 (0.0017) [2025-02-05 11:43:06,659][01013] Fps is (10 sec: 4913.1, 60 sec: 4027.5, 300 sec: 3873.8). Total num frames: 3702784. Throughput: 0: 991.8. Samples: 923188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:43:06,661][01013] Avg episode reward: [(0, '20.455')] [2025-02-05 11:43:11,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3715072. Throughput: 0: 996.1. Samples: 928812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:43:11,656][01013] Avg episode reward: [(0, '19.618')] [2025-02-05 11:43:14,387][02141] Updated weights for policy 0, policy_version 910 (0.0036) [2025-02-05 11:43:16,654][01013] Fps is (10 sec: 3278.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3735552. Throughput: 0: 950.4. Samples: 933932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-05 11:43:16,657][01013] Avg episode reward: [(0, '18.331')] [2025-02-05 11:43:21,654][01013] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3760128. Throughput: 0: 964.6. Samples: 937440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:43:21,656][01013] Avg episode reward: [(0, '17.567')] [2025-02-05 11:43:23,138][02141] Updated weights for policy 0, policy_version 920 (0.0018) [2025-02-05 11:43:26,654][01013] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3776512. Throughput: 0: 1017.5. Samples: 944250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:43:26,658][01013] Avg episode reward: [(0, '18.102')] [2025-02-05 11:43:31,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3792896. Throughput: 0: 960.3. Samples: 948562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:43:31,656][01013] Avg episode reward: [(0, '19.144')] [2025-02-05 11:43:34,459][02141] Updated weights for policy 0, policy_version 930 (0.0018) [2025-02-05 11:43:36,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 3817472. Throughput: 0: 957.9. Samples: 952020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:43:36,656][01013] Avg episode reward: [(0, '19.506')] [2025-02-05 11:43:41,654][01013] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 3842048. Throughput: 0: 1018.7. Samples: 959216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:43:41,659][01013] Avg episode reward: [(0, '22.052')] [2025-02-05 11:43:41,664][02128] Saving new best policy, reward=22.052! [2025-02-05 11:43:44,583][02141] Updated weights for policy 0, policy_version 940 (0.0018) [2025-02-05 11:43:46,659][01013] Fps is (10 sec: 3684.5, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 3854336. Throughput: 0: 986.6. Samples: 964030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:43:46,662][01013] Avg episode reward: [(0, '22.243')] [2025-02-05 11:43:46,671][02128] Saving new best policy, reward=22.243! [2025-02-05 11:43:51,654][01013] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3874816. Throughput: 0: 957.5. Samples: 966272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:43:51,657][01013] Avg episode reward: [(0, '21.063')] [2025-02-05 11:43:55,006][02141] Updated weights for policy 0, policy_version 950 (0.0018) [2025-02-05 11:43:56,655][01013] Fps is (10 sec: 4097.9, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3895296. Throughput: 0: 983.4. Samples: 973064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:43:56,663][01013] Avg episode reward: [(0, '22.168')] [2025-02-05 11:44:01,654][01013] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3915776. Throughput: 0: 1003.5. Samples: 979090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:44:01,657][01013] Avg episode reward: [(0, '23.097')] [2025-02-05 11:44:01,659][02128] Saving new best policy, reward=23.097! [2025-02-05 11:44:06,654][01013] Fps is (10 sec: 3277.0, 60 sec: 3755.0, 300 sec: 3873.8). Total num frames: 3928064. Throughput: 0: 970.5. Samples: 981112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:44:06,661][01013] Avg episode reward: [(0, '23.322')] [2025-02-05 11:44:06,676][02128] Saving new best policy, reward=23.322! [2025-02-05 11:44:06,933][02141] Updated weights for policy 0, policy_version 960 (0.0029) [2025-02-05 11:44:11,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3952640. Throughput: 0: 952.9. Samples: 987130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:44:11,656][01013] Avg episode reward: [(0, '22.361')] [2025-02-05 11:44:15,655][02141] Updated weights for policy 0, policy_version 970 (0.0014) [2025-02-05 11:44:16,654][01013] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3973120. Throughput: 0: 1009.6. Samples: 993994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-02-05 11:44:16,660][01013] Avg episode reward: [(0, '21.850')] [2025-02-05 11:44:16,671][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000970_3973120.pth... [2025-02-05 11:44:16,848][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000741_3035136.pth [2025-02-05 11:44:21,654][01013] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3989504. Throughput: 0: 978.1. Samples: 996036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-02-05 11:44:21,659][01013] Avg episode reward: [(0, '21.354')] [2025-02-05 11:44:26,657][01013] Fps is (10 sec: 2866.4, 60 sec: 3754.5, 300 sec: 3846.0). Total num frames: 4001792. Throughput: 0: 919.1. Samples: 1000580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-02-05 11:44:26,659][01013] Avg episode reward: [(0, '19.820')] [2025-02-05 11:44:26,723][02128] Stopping Batcher_0... [2025-02-05 11:44:26,724][02128] Loop batcher_evt_loop terminating... [2025-02-05 11:44:26,724][01013] Component Batcher_0 stopped! [2025-02-05 11:44:26,731][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-05 11:44:26,838][02141] Weights refcount: 2 0 [2025-02-05 11:44:26,840][02141] Stopping InferenceWorker_p0-w0... [2025-02-05 11:44:26,841][02141] Loop inference_proc0-0_evt_loop terminating... [2025-02-05 11:44:26,843][01013] Component InferenceWorker_p0-w0 stopped! [2025-02-05 11:44:26,880][02128] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth [2025-02-05 11:44:26,895][02128] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-05 11:44:27,164][01013] Component LearnerWorker_p0 stopped! [2025-02-05 11:44:27,166][02128] Stopping LearnerWorker_p0... [2025-02-05 11:44:27,167][02128] Loop learner_proc0_evt_loop terminating... [2025-02-05 11:44:27,348][01013] Component RolloutWorker_w1 stopped! [2025-02-05 11:44:27,356][02144] Stopping RolloutWorker_w1... [2025-02-05 11:44:27,356][02144] Loop rollout_proc1_evt_loop terminating... [2025-02-05 11:44:27,380][01013] Component RolloutWorker_w7 stopped! [2025-02-05 11:44:27,386][02149] Stopping RolloutWorker_w7... [2025-02-05 11:44:27,386][02149] Loop rollout_proc7_evt_loop terminating... [2025-02-05 11:44:27,389][02148] Stopping RolloutWorker_w6... [2025-02-05 11:44:27,389][02148] Loop rollout_proc6_evt_loop terminating... [2025-02-05 11:44:27,392][01013] Component RolloutWorker_w6 stopped! [2025-02-05 11:44:27,406][02146] Stopping RolloutWorker_w5... [2025-02-05 11:44:27,407][02146] Loop rollout_proc5_evt_loop terminating... [2025-02-05 11:44:27,404][01013] Component RolloutWorker_w5 stopped! [2025-02-05 11:44:27,454][01013] Component RolloutWorker_w3 stopped! [2025-02-05 11:44:27,456][02147] Stopping RolloutWorker_w3... [2025-02-05 11:44:27,467][02147] Loop rollout_proc3_evt_loop terminating... [2025-02-05 11:44:27,477][02145] Stopping RolloutWorker_w4... [2025-02-05 11:44:27,479][01013] Component RolloutWorker_w4 stopped! [2025-02-05 11:44:27,478][02145] Loop rollout_proc4_evt_loop terminating... [2025-02-05 11:44:27,544][01013] Component RolloutWorker_w2 stopped! [2025-02-05 11:44:27,544][02143] Stopping RolloutWorker_w2... [2025-02-05 11:44:27,548][01013] Component RolloutWorker_w0 stopped! [2025-02-05 11:44:27,552][01013] Waiting for process learner_proc0 to stop... [2025-02-05 11:44:27,556][02142] Stopping RolloutWorker_w0... [2025-02-05 11:44:27,569][02142] Loop rollout_proc0_evt_loop terminating... [2025-02-05 11:44:27,557][02143] Loop rollout_proc2_evt_loop terminating... [2025-02-05 11:44:29,667][01013] Waiting for process inference_proc0-0 to join... [2025-02-05 11:44:29,676][01013] Waiting for process rollout_proc0 to join... [2025-02-05 11:44:31,915][01013] Waiting for process rollout_proc1 to join... [2025-02-05 11:44:31,922][01013] Waiting for process rollout_proc2 to join... [2025-02-05 11:44:31,936][01013] Waiting for process rollout_proc3 to join... [2025-02-05 11:44:31,938][01013] Waiting for process rollout_proc4 to join... [2025-02-05 11:44:31,944][01013] Waiting for process rollout_proc5 to join... [2025-02-05 11:44:31,947][01013] Waiting for process rollout_proc6 to join... [2025-02-05 11:44:31,950][01013] Waiting for process rollout_proc7 to join... [2025-02-05 11:44:31,953][01013] Batcher 0 profile tree view: batching: 26.4502, releasing_batches: 0.0277 [2025-02-05 11:44:31,956][01013] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0002 wait_policy_total: 418.2394 update_model: 8.5876 weight_update: 0.0041 one_step: 0.0133 handle_policy_step: 593.1450 deserialize: 14.1302, stack: 3.0352, obs_to_device_normalize: 124.4521, forward: 309.5701, send_messages: 28.5000 prepare_outputs: 87.2144 to_cpu: 53.1204 [2025-02-05 11:44:31,957][01013] Learner 0 profile tree view: misc: 0.0041, prepare_batch: 13.5354 train: 73.9358 epoch_init: 0.0100, minibatch_init: 0.0057, losses_postprocess: 0.6592, kl_divergence: 0.7014, after_optimizer: 33.3152 calculate_losses: 26.6116 losses_init: 0.0061, forward_head: 1.4959, bptt_initial: 17.3242, tail: 1.2739, advantages_returns: 0.3251, losses: 3.7804 bptt: 2.1196 bptt_forward_core: 2.0465 update: 12.0529 clip: 0.8865 [2025-02-05 11:44:31,958][01013] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2893, enqueue_policy_requests: 111.1616, env_step: 824.7142, overhead: 12.6535, complete_rollouts: 7.1985 save_policy_outputs: 18.7947 split_output_tensors: 7.1942 [2025-02-05 11:44:31,959][01013] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2752, enqueue_policy_requests: 112.7437, env_step: 821.4751, overhead: 12.4579, complete_rollouts: 7.0604 save_policy_outputs: 19.0826 split_output_tensors: 7.3204 [2025-02-05 11:44:31,961][01013] Loop Runner_EvtLoop terminating... [2025-02-05 11:44:31,962][01013] Runner profile tree view: main_loop: 1085.3997 [2025-02-05 11:44:31,965][01013] Collected {0: 4005888}, FPS: 3690.7 [2025-02-05 11:44:32,374][01013] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-05 11:44:32,377][01013] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-05 11:44:32,380][01013] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-05 11:44:32,382][01013] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-05 11:44:32,385][01013] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-05 11:44:32,386][01013] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-05 11:44:32,387][01013] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-05 11:44:32,389][01013] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-05 11:44:32,390][01013] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-05 11:44:32,391][01013] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-05 11:44:32,393][01013] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-05 11:44:32,394][01013] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-05 11:44:32,395][01013] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-05 11:44:32,396][01013] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-05 11:44:32,398][01013] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-05 11:44:32,431][01013] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-05 11:44:32,435][01013] RunningMeanStd input shape: (3, 72, 128) [2025-02-05 11:44:32,439][01013] RunningMeanStd input shape: (1,) [2025-02-05 11:44:32,453][01013] ConvEncoder: input_channels=3 [2025-02-05 11:44:32,549][01013] Conv encoder output size: 512 [2025-02-05 11:44:32,550][01013] Policy head output size: 512 [2025-02-05 11:44:32,741][01013] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-05 11:44:34,067][01013] Num frames 100... [2025-02-05 11:44:34,234][01013] Num frames 200... [2025-02-05 11:44:34,404][01013] Num frames 300... [2025-02-05 11:44:34,578][01013] Num frames 400... [2025-02-05 11:44:34,753][01013] Num frames 500... [2025-02-05 11:44:34,927][01013] Num frames 600... [2025-02-05 11:44:35,122][01013] Num frames 700... [2025-02-05 11:44:35,303][01013] Num frames 800... [2025-02-05 11:44:35,478][01013] Avg episode rewards: #0: 17.640, true rewards: #0: 8.640 [2025-02-05 11:44:35,480][01013] Avg episode reward: 17.640, avg true_objective: 8.640 [2025-02-05 11:44:35,550][01013] Num frames 900... [2025-02-05 11:44:35,731][01013] Num frames 1000... [2025-02-05 11:44:35,862][01013] Num frames 1100... [2025-02-05 11:44:35,998][01013] Num frames 1200... [2025-02-05 11:44:36,144][01013] Num frames 1300... [2025-02-05 11:44:36,274][01013] Num frames 1400... [2025-02-05 11:44:36,406][01013] Num frames 1500... [2025-02-05 11:44:36,533][01013] Num frames 1600... [2025-02-05 11:44:36,661][01013] Num frames 1700... [2025-02-05 11:44:36,797][01013] Num frames 1800... [2025-02-05 11:44:36,931][01013] Num frames 1900... [2025-02-05 11:44:37,061][01013] Num frames 2000... [2025-02-05 11:44:37,208][01013] Num frames 2100... [2025-02-05 11:44:37,336][01013] Num frames 2200... [2025-02-05 11:44:37,467][01013] Num frames 2300... [2025-02-05 11:44:37,536][01013] Avg episode rewards: #0: 27.545, true rewards: #0: 11.545 [2025-02-05 11:44:37,537][01013] Avg episode reward: 27.545, avg true_objective: 11.545 [2025-02-05 11:44:37,653][01013] Num frames 2400... [2025-02-05 11:44:37,781][01013] Num frames 2500... [2025-02-05 11:44:37,910][01013] Num frames 2600... [2025-02-05 11:44:38,041][01013] Num frames 2700... [2025-02-05 11:44:38,181][01013] Num frames 2800... [2025-02-05 11:44:38,319][01013] Num frames 2900... [2025-02-05 11:44:38,397][01013] Avg episode rewards: #0: 21.390, true rewards: #0: 9.723 [2025-02-05 11:44:38,398][01013] Avg episode reward: 21.390, avg true_objective: 9.723 [2025-02-05 11:44:38,506][01013] Num frames 3000... [2025-02-05 11:44:38,637][01013] Num frames 3100... [2025-02-05 11:44:38,768][01013] Num frames 3200... [2025-02-05 11:44:38,905][01013] Num frames 3300... [2025-02-05 11:44:39,042][01013] Num frames 3400... [2025-02-05 11:44:39,181][01013] Num frames 3500... [2025-02-05 11:44:39,318][01013] Num frames 3600... [2025-02-05 11:44:39,450][01013] Num frames 3700... [2025-02-05 11:44:39,529][01013] Avg episode rewards: #0: 19.543, true rewards: #0: 9.292 [2025-02-05 11:44:39,535][01013] Avg episode reward: 19.543, avg true_objective: 9.292 [2025-02-05 11:44:39,645][01013] Num frames 3800... [2025-02-05 11:44:39,778][01013] Num frames 3900... [2025-02-05 11:44:39,907][01013] Num frames 4000... [2025-02-05 11:44:40,039][01013] Num frames 4100... [2025-02-05 11:44:40,176][01013] Num frames 4200... [2025-02-05 11:44:40,312][01013] Num frames 4300... [2025-02-05 11:44:40,440][01013] Num frames 4400... [2025-02-05 11:44:40,570][01013] Num frames 4500... [2025-02-05 11:44:40,701][01013] Num frames 4600... [2025-02-05 11:44:40,832][01013] Num frames 4700... [2025-02-05 11:44:40,961][01013] Num frames 4800... [2025-02-05 11:44:41,091][01013] Num frames 4900... [2025-02-05 11:44:41,226][01013] Num frames 5000... [2025-02-05 11:44:41,347][01013] Avg episode rewards: #0: 21.884, true rewards: #0: 10.084 [2025-02-05 11:44:41,349][01013] Avg episode reward: 21.884, avg true_objective: 10.084 [2025-02-05 11:44:41,427][01013] Num frames 5100... [2025-02-05 11:44:41,553][01013] Num frames 5200... [2025-02-05 11:44:41,685][01013] Num frames 5300... [2025-02-05 11:44:41,816][01013] Num frames 5400... [2025-02-05 11:44:41,953][01013] Num frames 5500... [2025-02-05 11:44:42,082][01013] Num frames 5600... [2025-02-05 11:44:42,224][01013] Num frames 5700... [2025-02-05 11:44:42,362][01013] Num frames 5800... [2025-02-05 11:44:42,498][01013] Num frames 5900... [2025-02-05 11:44:42,634][01013] Avg episode rewards: #0: 22.268, true rewards: #0: 9.935 [2025-02-05 11:44:42,636][01013] Avg episode reward: 22.268, avg true_objective: 9.935 [2025-02-05 11:44:42,691][01013] Num frames 6000... [2025-02-05 11:44:42,823][01013] Num frames 6100... [2025-02-05 11:44:42,954][01013] Num frames 6200... [2025-02-05 11:44:43,083][01013] Num frames 6300... [2025-02-05 11:44:43,221][01013] Num frames 6400... [2025-02-05 11:44:43,358][01013] Num frames 6500... [2025-02-05 11:44:43,484][01013] Num frames 6600... [2025-02-05 11:44:43,613][01013] Num frames 6700... [2025-02-05 11:44:43,741][01013] Num frames 6800... [2025-02-05 11:44:43,871][01013] Num frames 6900... [2025-02-05 11:44:44,002][01013] Num frames 7000... [2025-02-05 11:44:44,146][01013] Num frames 7100... [2025-02-05 11:44:44,276][01013] Num frames 7200... [2025-02-05 11:44:44,415][01013] Num frames 7300... [2025-02-05 11:44:44,545][01013] Num frames 7400... [2025-02-05 11:44:44,679][01013] Num frames 7500... [2025-02-05 11:44:44,815][01013] Num frames 7600... [2025-02-05 11:44:44,982][01013] Num frames 7700... [2025-02-05 11:44:45,122][01013] Num frames 7800... [2025-02-05 11:44:45,254][01013] Num frames 7900... [2025-02-05 11:44:45,329][01013] Avg episode rewards: #0: 26.161, true rewards: #0: 11.304 [2025-02-05 11:44:45,331][01013] Avg episode reward: 26.161, avg true_objective: 11.304 [2025-02-05 11:44:45,447][01013] Num frames 8000... [2025-02-05 11:44:45,581][01013] Num frames 8100... [2025-02-05 11:44:45,714][01013] Num frames 8200... [2025-02-05 11:44:45,907][01013] Num frames 8300... [2025-02-05 11:44:46,078][01013] Num frames 8400... [2025-02-05 11:44:46,248][01013] Num frames 8500... [2025-02-05 11:44:46,420][01013] Num frames 8600... [2025-02-05 11:44:46,589][01013] Num frames 8700... [2025-02-05 11:44:46,760][01013] Num frames 8800... [2025-02-05 11:44:46,924][01013] Num frames 8900... [2025-02-05 11:44:47,100][01013] Num frames 9000... [2025-02-05 11:44:47,286][01013] Num frames 9100... [2025-02-05 11:44:47,464][01013] Num frames 9200... [2025-02-05 11:44:47,644][01013] Num frames 9300... [2025-02-05 11:44:47,827][01013] Num frames 9400... [2025-02-05 11:44:48,010][01013] Num frames 9500... [2025-02-05 11:44:48,196][01013] Num frames 9600... [2025-02-05 11:44:48,290][01013] Avg episode rewards: #0: 27.399, true rewards: #0: 12.024 [2025-02-05 11:44:48,293][01013] Avg episode reward: 27.399, avg true_objective: 12.024 [2025-02-05 11:44:48,406][01013] Num frames 9700... [2025-02-05 11:44:48,543][01013] Num frames 9800... [2025-02-05 11:44:48,674][01013] Num frames 9900... [2025-02-05 11:44:48,805][01013] Num frames 10000... [2025-02-05 11:44:48,937][01013] Num frames 10100... [2025-02-05 11:44:49,069][01013] Num frames 10200... [2025-02-05 11:44:49,206][01013] Num frames 10300... [2025-02-05 11:44:49,329][01013] Avg episode rewards: #0: 26.166, true rewards: #0: 11.499 [2025-02-05 11:44:49,331][01013] Avg episode reward: 26.166, avg true_objective: 11.499 [2025-02-05 11:44:49,396][01013] Num frames 10400... [2025-02-05 11:44:49,534][01013] Num frames 10500... [2025-02-05 11:44:49,662][01013] Num frames 10600... [2025-02-05 11:44:49,793][01013] Num frames 10700... [2025-02-05 11:44:49,925][01013] Num frames 10800... [2025-02-05 11:44:50,056][01013] Num frames 10900... [2025-02-05 11:44:50,191][01013] Num frames 11000... [2025-02-05 11:44:50,328][01013] Num frames 11100... [2025-02-05 11:44:50,457][01013] Num frames 11200... [2025-02-05 11:44:50,595][01013] Num frames 11300... [2025-02-05 11:44:50,747][01013] Avg episode rewards: #0: 25.773, true rewards: #0: 11.373 [2025-02-05 11:44:50,748][01013] Avg episode reward: 25.773, avg true_objective: 11.373 [2025-02-05 11:46:00,046][01013] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-05 11:47:40,761][01013] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-05 11:47:40,763][01013] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-05 11:47:40,765][01013] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-05 11:47:40,767][01013] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-05 11:47:40,769][01013] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-05 11:47:40,771][01013] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-05 11:47:40,773][01013] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-05 11:47:40,774][01013] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-05 11:47:40,775][01013] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-05 11:47:40,776][01013] Adding new argument 'hf_repository'='avneetreen-2397/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-05 11:47:40,777][01013] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-05 11:47:40,778][01013] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-05 11:47:40,779][01013] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-05 11:47:40,780][01013] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-05 11:47:40,781][01013] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-05 11:47:40,814][01013] RunningMeanStd input shape: (3, 72, 128) [2025-02-05 11:47:40,815][01013] RunningMeanStd input shape: (1,) [2025-02-05 11:47:40,828][01013] ConvEncoder: input_channels=3 [2025-02-05 11:47:40,867][01013] Conv encoder output size: 512 [2025-02-05 11:47:40,868][01013] Policy head output size: 512 [2025-02-05 11:47:40,890][01013] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-05 11:47:41,355][01013] Num frames 100... [2025-02-05 11:47:41,489][01013] Num frames 200... [2025-02-05 11:47:41,616][01013] Num frames 300... [2025-02-05 11:47:41,744][01013] Num frames 400... [2025-02-05 11:47:41,877][01013] Num frames 500... [2025-02-05 11:47:42,011][01013] Num frames 600... [2025-02-05 11:47:42,156][01013] Num frames 700... [2025-02-05 11:47:42,285][01013] Num frames 800... [2025-02-05 11:47:42,417][01013] Num frames 900... [2025-02-05 11:47:42,546][01013] Num frames 1000... [2025-02-05 11:47:42,675][01013] Num frames 1100... [2025-02-05 11:47:42,807][01013] Num frames 1200... [2025-02-05 11:47:42,884][01013] Avg episode rewards: #0: 28.160, true rewards: #0: 12.160 [2025-02-05 11:47:42,885][01013] Avg episode reward: 28.160, avg true_objective: 12.160 [2025-02-05 11:47:42,994][01013] Num frames 1300... [2025-02-05 11:47:43,127][01013] Num frames 1400... [2025-02-05 11:47:43,260][01013] Num frames 1500... [2025-02-05 11:47:43,396][01013] Num frames 1600... [2025-02-05 11:47:43,528][01013] Num frames 1700... [2025-02-05 11:47:43,657][01013] Num frames 1800... [2025-02-05 11:47:43,791][01013] Num frames 1900... [2025-02-05 11:47:43,920][01013] Num frames 2000... [2025-02-05 11:47:44,052][01013] Num frames 2100... [2025-02-05 11:47:44,198][01013] Num frames 2200... [2025-02-05 11:47:44,328][01013] Num frames 2300... [2025-02-05 11:47:44,464][01013] Num frames 2400... [2025-02-05 11:47:44,598][01013] Num frames 2500... [2025-02-05 11:47:44,730][01013] Num frames 2600... [2025-02-05 11:47:44,819][01013] Avg episode rewards: #0: 29.620, true rewards: #0: 13.120 [2025-02-05 11:47:44,820][01013] Avg episode reward: 29.620, avg true_objective: 13.120 [2025-02-05 11:47:44,914][01013] Num frames 2700... [2025-02-05 11:47:45,042][01013] Num frames 2800... [2025-02-05 11:47:45,180][01013] Num frames 2900... [2025-02-05 11:47:45,315][01013] Num frames 3000... [2025-02-05 11:47:45,446][01013] Num frames 3100... [2025-02-05 11:47:45,578][01013] Num frames 3200... [2025-02-05 11:47:45,676][01013] Avg episode rewards: #0: 23.107, true rewards: #0: 10.773 [2025-02-05 11:47:45,678][01013] Avg episode reward: 23.107, avg true_objective: 10.773 [2025-02-05 11:47:45,784][01013] Num frames 3300... [2025-02-05 11:47:45,910][01013] Num frames 3400... [2025-02-05 11:47:46,040][01013] Num frames 3500... [2025-02-05 11:47:46,173][01013] Num frames 3600... [2025-02-05 11:47:46,312][01013] Num frames 3700... [2025-02-05 11:47:46,438][01013] Num frames 3800... [2025-02-05 11:47:46,567][01013] Num frames 3900... [2025-02-05 11:47:46,696][01013] Num frames 4000... [2025-02-05 11:47:46,828][01013] Num frames 4100... [2025-02-05 11:47:46,966][01013] Avg episode rewards: #0: 21.900, true rewards: #0: 10.400 [2025-02-05 11:47:46,968][01013] Avg episode reward: 21.900, avg true_objective: 10.400 [2025-02-05 11:47:47,023][01013] Num frames 4200... [2025-02-05 11:47:47,168][01013] Num frames 4300... [2025-02-05 11:47:47,304][01013] Num frames 4400... [2025-02-05 11:47:47,430][01013] Num frames 4500... [2025-02-05 11:47:47,559][01013] Num frames 4600... [2025-02-05 11:47:47,695][01013] Num frames 4700... [2025-02-05 11:47:47,826][01013] Num frames 4800... [2025-02-05 11:47:47,955][01013] Num frames 4900... [2025-02-05 11:47:48,085][01013] Num frames 5000... [2025-02-05 11:47:48,223][01013] Avg episode rewards: #0: 21.516, true rewards: #0: 10.116 [2025-02-05 11:47:48,225][01013] Avg episode reward: 21.516, avg true_objective: 10.116 [2025-02-05 11:47:48,286][01013] Num frames 5100... [2025-02-05 11:47:48,416][01013] Num frames 5200... [2025-02-05 11:47:48,548][01013] Num frames 5300... [2025-02-05 11:47:48,681][01013] Num frames 5400... [2025-02-05 11:47:48,810][01013] Num frames 5500... [2025-02-05 11:47:48,937][01013] Num frames 5600... [2025-02-05 11:47:49,076][01013] Num frames 5700... [2025-02-05 11:47:49,214][01013] Num frames 5800... [2025-02-05 11:47:49,348][01013] Num frames 5900... [2025-02-05 11:47:49,471][01013] Avg episode rewards: #0: 20.757, true rewards: #0: 9.923 [2025-02-05 11:47:49,472][01013] Avg episode reward: 20.757, avg true_objective: 9.923 [2025-02-05 11:47:49,536][01013] Num frames 6000... [2025-02-05 11:47:49,660][01013] Num frames 6100... [2025-02-05 11:47:49,790][01013] Num frames 6200... [2025-02-05 11:47:49,928][01013] Num frames 6300... [2025-02-05 11:47:50,059][01013] Num frames 6400... [2025-02-05 11:47:50,247][01013] Avg episode rewards: #0: 18.854, true rewards: #0: 9.283 [2025-02-05 11:47:50,249][01013] Avg episode reward: 18.854, avg true_objective: 9.283 [2025-02-05 11:47:50,254][01013] Num frames 6500... [2025-02-05 11:47:50,440][01013] Num frames 6600... [2025-02-05 11:47:50,611][01013] Num frames 6700... [2025-02-05 11:47:50,788][01013] Num frames 6800... [2025-02-05 11:47:50,957][01013] Num frames 6900... [2025-02-05 11:47:51,129][01013] Num frames 7000... [2025-02-05 11:47:51,297][01013] Num frames 7100... [2025-02-05 11:47:51,470][01013] Num frames 7200... [2025-02-05 11:47:51,533][01013] Avg episode rewards: #0: 18.378, true rewards: #0: 9.002 [2025-02-05 11:47:51,535][01013] Avg episode reward: 18.378, avg true_objective: 9.002 [2025-02-05 11:47:51,712][01013] Num frames 7300... [2025-02-05 11:47:51,900][01013] Num frames 7400... [2025-02-05 11:47:52,100][01013] Num frames 7500... [2025-02-05 11:47:52,286][01013] Num frames 7600... [2025-02-05 11:47:52,484][01013] Num frames 7700... [2025-02-05 11:47:52,666][01013] Num frames 7800... [2025-02-05 11:47:52,859][01013] Num frames 7900... [2025-02-05 11:47:53,009][01013] Num frames 8000... [2025-02-05 11:47:53,148][01013] Num frames 8100... [2025-02-05 11:47:53,279][01013] Num frames 8200... [2025-02-05 11:47:53,408][01013] Num frames 8300... [2025-02-05 11:47:53,548][01013] Num frames 8400... [2025-02-05 11:47:53,628][01013] Avg episode rewards: #0: 19.465, true rewards: #0: 9.353 [2025-02-05 11:47:53,629][01013] Avg episode reward: 19.465, avg true_objective: 9.353 [2025-02-05 11:47:53,745][01013] Num frames 8500... [2025-02-05 11:47:53,875][01013] Num frames 8600... [2025-02-05 11:47:54,004][01013] Num frames 8700... [2025-02-05 11:47:54,138][01013] Num frames 8800... [2025-02-05 11:47:54,270][01013] Num frames 8900... [2025-02-05 11:47:54,404][01013] Avg episode rewards: #0: 18.262, true rewards: #0: 8.962 [2025-02-05 11:47:54,406][01013] Avg episode reward: 18.262, avg true_objective: 8.962 [2025-02-05 11:48:48,408][01013] Replay video saved to /content/train_dir/default_experiment/replay.mp4!