[2024-12-20 08:59:01,672][00248] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-20 08:59:01,675][00248] Rollout worker 0 uses device cpu [2024-12-20 08:59:01,676][00248] Rollout worker 1 uses device cpu [2024-12-20 08:59:01,678][00248] Rollout worker 2 uses device cpu [2024-12-20 08:59:01,679][00248] Rollout worker 3 uses device cpu [2024-12-20 08:59:01,680][00248] Rollout worker 4 uses device cpu [2024-12-20 08:59:01,681][00248] Rollout worker 5 uses device cpu [2024-12-20 08:59:01,682][00248] Rollout worker 6 uses device cpu [2024-12-20 08:59:01,683][00248] Rollout worker 7 uses device cpu [2024-12-20 08:59:01,839][00248] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-20 08:59:01,840][00248] InferenceWorker_p0-w0: min num requests: 2 [2024-12-20 08:59:01,874][00248] Starting all processes... [2024-12-20 08:59:01,876][00248] Starting process learner_proc0 [2024-12-20 08:59:01,930][00248] Starting all processes... [2024-12-20 08:59:01,939][00248] Starting process inference_proc0-0 [2024-12-20 08:59:01,939][00248] Starting process rollout_proc0 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc1 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc2 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc3 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc4 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc5 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc6 [2024-12-20 08:59:01,943][00248] Starting process rollout_proc7 [2024-12-20 08:59:18,988][04354] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-20 08:59:18,991][04354] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-20 08:59:19,103][04354] Num visible devices: 1 [2024-12-20 08:59:19,300][04355] Worker 0 uses CPU cores [0] [2024-12-20 08:59:19,486][04359] Worker 4 uses CPU cores [0] [2024-12-20 08:59:19,552][04362] Worker 7 uses CPU cores [1] [2024-12-20 08:59:19,638][04341] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-20 08:59:19,642][04341] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-20 08:59:19,694][04341] Num visible devices: 1 [2024-12-20 08:59:19,721][04341] Starting seed is not provided [2024-12-20 08:59:19,721][04341] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-20 08:59:19,722][04341] Initializing actor-critic model on device cuda:0 [2024-12-20 08:59:19,722][04341] RunningMeanStd input shape: (3, 72, 128) [2024-12-20 08:59:19,725][04341] RunningMeanStd input shape: (1,) [2024-12-20 08:59:19,730][04357] Worker 2 uses CPU cores [0] [2024-12-20 08:59:19,746][04361] Worker 6 uses CPU cores [0] [2024-12-20 08:59:19,753][04358] Worker 3 uses CPU cores [1] [2024-12-20 08:59:19,769][04341] ConvEncoder: input_channels=3 [2024-12-20 08:59:19,779][04360] Worker 5 uses CPU cores [1] [2024-12-20 08:59:19,783][04356] Worker 1 uses CPU cores [1] [2024-12-20 08:59:20,032][04341] Conv encoder output size: 512 [2024-12-20 08:59:20,033][04341] Policy head output size: 512 [2024-12-20 08:59:20,086][04341] Created Actor Critic model with architecture: [2024-12-20 08:59:20,086][04341] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-20 08:59:20,503][04341] Using optimizer [2024-12-20 08:59:21,832][00248] Heartbeat connected on Batcher_0 [2024-12-20 08:59:21,839][00248] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-20 08:59:21,851][00248] Heartbeat connected on RolloutWorker_w0 [2024-12-20 08:59:21,854][00248] Heartbeat connected on RolloutWorker_w1 [2024-12-20 08:59:21,857][00248] Heartbeat connected on RolloutWorker_w2 [2024-12-20 08:59:21,860][00248] Heartbeat connected on RolloutWorker_w3 [2024-12-20 08:59:21,864][00248] Heartbeat connected on RolloutWorker_w4 [2024-12-20 08:59:21,867][00248] Heartbeat connected on RolloutWorker_w5 [2024-12-20 08:59:21,870][00248] Heartbeat connected on RolloutWorker_w6 [2024-12-20 08:59:21,874][00248] Heartbeat connected on RolloutWorker_w7 [2024-12-20 08:59:23,961][04341] No checkpoints found [2024-12-20 08:59:23,961][04341] Did not load from checkpoint, starting from scratch! [2024-12-20 08:59:23,961][04341] Initialized policy 0 weights for model version 0 [2024-12-20 08:59:23,965][04341] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-20 08:59:23,974][04341] LearnerWorker_p0 finished initialization! [2024-12-20 08:59:23,975][00248] Heartbeat connected on LearnerWorker_p0 [2024-12-20 08:59:24,177][04354] RunningMeanStd input shape: (3, 72, 128) [2024-12-20 08:59:24,178][04354] RunningMeanStd input shape: (1,) [2024-12-20 08:59:24,191][04354] ConvEncoder: input_channels=3 [2024-12-20 08:59:24,299][04354] Conv encoder output size: 512 [2024-12-20 08:59:24,299][04354] Policy head output size: 512 [2024-12-20 08:59:24,355][00248] Inference worker 0-0 is ready! [2024-12-20 08:59:24,359][00248] All inference workers are ready! Signal rollout workers to start! [2024-12-20 08:59:24,577][04362] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,580][04356] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,582][04358] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,582][04360] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,576][04357] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,579][04359] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,581][04355] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:24,584][04361] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 08:59:25,616][04357] Decorrelating experience for 0 frames... [2024-12-20 08:59:25,760][00248] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-20 08:59:26,326][04357] Decorrelating experience for 32 frames... [2024-12-20 08:59:26,685][04356] Decorrelating experience for 0 frames... [2024-12-20 08:59:26,692][04362] Decorrelating experience for 0 frames... [2024-12-20 08:59:26,695][04358] Decorrelating experience for 0 frames... [2024-12-20 08:59:26,702][04360] Decorrelating experience for 0 frames... [2024-12-20 08:59:28,048][04356] Decorrelating experience for 32 frames... [2024-12-20 08:59:28,050][04358] Decorrelating experience for 32 frames... [2024-12-20 08:59:28,052][04360] Decorrelating experience for 32 frames... [2024-12-20 08:59:28,247][04361] Decorrelating experience for 0 frames... [2024-12-20 08:59:28,249][04359] Decorrelating experience for 0 frames... [2024-12-20 08:59:29,530][04361] Decorrelating experience for 32 frames... [2024-12-20 08:59:29,534][04359] Decorrelating experience for 32 frames... [2024-12-20 08:59:30,223][04362] Decorrelating experience for 32 frames... [2024-12-20 08:59:30,743][04360] Decorrelating experience for 64 frames... [2024-12-20 08:59:30,750][04356] Decorrelating experience for 64 frames... [2024-12-20 08:59:30,755][04358] Decorrelating experience for 64 frames... [2024-12-20 08:59:30,760][00248] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-20 08:59:31,552][04361] Decorrelating experience for 64 frames... [2024-12-20 08:59:31,554][04359] Decorrelating experience for 64 frames... [2024-12-20 08:59:31,736][04355] Decorrelating experience for 0 frames... [2024-12-20 08:59:32,167][04357] Decorrelating experience for 64 frames... [2024-12-20 08:59:32,272][04362] Decorrelating experience for 64 frames... [2024-12-20 08:59:32,393][04358] Decorrelating experience for 96 frames... [2024-12-20 08:59:32,390][04356] Decorrelating experience for 96 frames... [2024-12-20 08:59:32,818][04360] Decorrelating experience for 96 frames... [2024-12-20 08:59:33,307][04355] Decorrelating experience for 32 frames... [2024-12-20 08:59:33,666][04357] Decorrelating experience for 96 frames... [2024-12-20 08:59:33,668][04359] Decorrelating experience for 96 frames... [2024-12-20 08:59:34,002][04361] Decorrelating experience for 96 frames... [2024-12-20 08:59:34,393][04355] Decorrelating experience for 64 frames... [2024-12-20 08:59:34,609][04362] Decorrelating experience for 96 frames... [2024-12-20 08:59:35,760][00248] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.8. Samples: 28. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-20 08:59:35,766][00248] Avg episode reward: [(0, '1.234')] [2024-12-20 08:59:36,681][04355] Decorrelating experience for 96 frames... [2024-12-20 08:59:37,413][04341] Signal inference workers to stop experience collection... [2024-12-20 08:59:37,460][04354] InferenceWorker_p0-w0: stopping experience collection [2024-12-20 08:59:40,760][00248] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 165.3. Samples: 2480. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-20 08:59:40,761][00248] Avg episode reward: [(0, '2.229')] [2024-12-20 08:59:40,962][04341] Signal inference workers to resume experience collection... [2024-12-20 08:59:40,963][04354] InferenceWorker_p0-w0: resuming experience collection [2024-12-20 08:59:45,760][00248] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 245.9. Samples: 4918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0) [2024-12-20 08:59:45,763][00248] Avg episode reward: [(0, '3.130')] [2024-12-20 08:59:50,760][00248] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 244.4. Samples: 6110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 08:59:50,763][00248] Avg episode reward: [(0, '3.626')] [2024-12-20 08:59:52,905][04354] Updated weights for policy 0, policy_version 10 (0.0185) [2024-12-20 08:59:55,764][00248] Fps is (10 sec: 3684.9, 60 sec: 1774.7, 300 sec: 1774.7). Total num frames: 53248. Throughput: 0: 414.9. Samples: 12450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 08:59:55,767][00248] Avg episode reward: [(0, '4.372')] [2024-12-20 09:00:00,760][00248] Fps is (10 sec: 4505.6, 60 sec: 2106.5, 300 sec: 2106.5). Total num frames: 73728. Throughput: 0: 538.2. Samples: 18836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:00:00,764][00248] Avg episode reward: [(0, '4.355')] [2024-12-20 09:00:03,218][04354] Updated weights for policy 0, policy_version 20 (0.0039) [2024-12-20 09:00:05,760][00248] Fps is (10 sec: 3278.1, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 521.4. Samples: 20856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:00:05,767][00248] Avg episode reward: [(0, '4.337')] [2024-12-20 09:00:10,760][00248] Fps is (10 sec: 3276.8, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 585.3. Samples: 26338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:00:10,766][00248] Avg episode reward: [(0, '4.370')] [2024-12-20 09:00:10,768][04341] Saving new best policy, reward=4.370! [2024-12-20 09:00:13,348][04354] Updated weights for policy 0, policy_version 30 (0.0013) [2024-12-20 09:00:15,761][00248] Fps is (10 sec: 4505.1, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 739.0. Samples: 33256. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:00:15,767][00248] Avg episode reward: [(0, '4.488')] [2024-12-20 09:00:15,774][04341] Saving new best policy, reward=4.488! [2024-12-20 09:00:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 782.7. Samples: 35248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:00:20,765][00248] Avg episode reward: [(0, '4.399')] [2024-12-20 09:00:25,760][00248] Fps is (10 sec: 2457.9, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 805.6. Samples: 38730. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:00:25,765][00248] Avg episode reward: [(0, '4.481')] [2024-12-20 09:00:27,543][04354] Updated weights for policy 0, policy_version 40 (0.0031) [2024-12-20 09:00:30,760][00248] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2709.7). Total num frames: 176128. Throughput: 0: 886.6. Samples: 44814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:00:30,766][00248] Avg episode reward: [(0, '4.474')] [2024-12-20 09:00:35,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 939.3. Samples: 48378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:00:35,766][00248] Avg episode reward: [(0, '4.456')] [2024-12-20 09:00:36,082][04354] Updated weights for policy 0, policy_version 50 (0.0025) [2024-12-20 09:00:40,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 932.5. Samples: 54408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:00:40,763][00248] Avg episode reward: [(0, '4.441')] [2024-12-20 09:00:45,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 2969.6). Total num frames: 237568. Throughput: 0: 904.9. Samples: 59556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:00:45,762][00248] Avg episode reward: [(0, '4.478')] [2024-12-20 09:00:47,327][04354] Updated weights for policy 0, policy_version 60 (0.0032) [2024-12-20 09:00:50,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 939.3. Samples: 63126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:00:50,763][00248] Avg episode reward: [(0, '4.335')] [2024-12-20 09:00:55,760][00248] Fps is (10 sec: 4505.7, 60 sec: 3823.2, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 972.1. Samples: 70084. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:00:55,765][00248] Avg episode reward: [(0, '4.318')] [2024-12-20 09:00:55,773][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2024-12-20 09:00:57,246][04354] Updated weights for policy 0, policy_version 70 (0.0019) [2024-12-20 09:01:00,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3104.3). Total num frames: 294912. Throughput: 0: 916.3. Samples: 74490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:01:00,763][00248] Avg episode reward: [(0, '4.477')] [2024-12-20 09:01:05,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 949.4. Samples: 77970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:05,762][00248] Avg episode reward: [(0, '4.437')] [2024-12-20 09:01:07,026][04354] Updated weights for policy 0, policy_version 80 (0.0032) [2024-12-20 09:01:10,763][00248] Fps is (10 sec: 4913.9, 60 sec: 3959.3, 300 sec: 3276.7). Total num frames: 344064. Throughput: 0: 1033.8. Samples: 85254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:01:10,765][00248] Avg episode reward: [(0, '4.417')] [2024-12-20 09:01:15,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3239.6). Total num frames: 356352. Throughput: 0: 1009.8. Samples: 90254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:15,764][00248] Avg episode reward: [(0, '4.424')] [2024-12-20 09:01:18,339][04354] Updated weights for policy 0, policy_version 90 (0.0033) [2024-12-20 09:01:20,760][00248] Fps is (10 sec: 3277.6, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 986.0. Samples: 92748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:01:20,764][00248] Avg episode reward: [(0, '4.455')] [2024-12-20 09:01:25,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 1010.4. Samples: 99878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:01:25,762][00248] Avg episode reward: [(0, '4.537')] [2024-12-20 09:01:25,773][04341] Saving new best policy, reward=4.537! [2024-12-20 09:01:27,001][04354] Updated weights for policy 0, policy_version 100 (0.0019) [2024-12-20 09:01:30,761][00248] Fps is (10 sec: 4505.1, 60 sec: 4095.9, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 1028.4. Samples: 105836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:01:30,768][00248] Avg episode reward: [(0, '4.709')] [2024-12-20 09:01:30,770][04341] Saving new best policy, reward=4.709! [2024-12-20 09:01:35,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3339.8). Total num frames: 434176. Throughput: 0: 995.2. Samples: 107912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:01:35,762][00248] Avg episode reward: [(0, '4.587')] [2024-12-20 09:01:38,398][04354] Updated weights for policy 0, policy_version 110 (0.0036) [2024-12-20 09:01:40,760][00248] Fps is (10 sec: 3686.9, 60 sec: 4027.7, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 985.8. Samples: 114444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:40,762][00248] Avg episode reward: [(0, '4.493')] [2024-12-20 09:01:45,760][00248] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 1041.3. Samples: 121350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:45,762][00248] Avg episode reward: [(0, '4.617')] [2024-12-20 09:01:48,366][04354] Updated weights for policy 0, policy_version 120 (0.0018) [2024-12-20 09:01:50,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 1011.2. Samples: 123474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:50,762][00248] Avg episode reward: [(0, '4.555')] [2024-12-20 09:01:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 973.3. Samples: 129050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:01:55,763][00248] Avg episode reward: [(0, '4.578')] [2024-12-20 09:01:58,398][04354] Updated weights for policy 0, policy_version 130 (0.0030) [2024-12-20 09:02:00,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 1021.9. Samples: 136238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:02:00,762][00248] Avg episode reward: [(0, '4.531')] [2024-12-20 09:02:05,761][00248] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3481.6). Total num frames: 557056. Throughput: 0: 1031.3. Samples: 139156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:02:05,764][00248] Avg episode reward: [(0, '4.432')] [2024-12-20 09:02:09,540][04354] Updated weights for policy 0, policy_version 140 (0.0015) [2024-12-20 09:02:10,761][00248] Fps is (10 sec: 3685.9, 60 sec: 3891.3, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 977.7. Samples: 143876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:02:10,769][00248] Avg episode reward: [(0, '4.549')] [2024-12-20 09:02:15,760][00248] Fps is (10 sec: 4506.2, 60 sec: 4096.0, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 1004.1. Samples: 151020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:15,764][00248] Avg episode reward: [(0, '4.553')] [2024-12-20 09:02:17,887][04354] Updated weights for policy 0, policy_version 150 (0.0019) [2024-12-20 09:02:20,760][00248] Fps is (10 sec: 4506.1, 60 sec: 4096.0, 300 sec: 3557.7). Total num frames: 622592. Throughput: 0: 1038.3. Samples: 154636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:20,763][00248] Avg episode reward: [(0, '4.400')] [2024-12-20 09:02:25,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 992.0. Samples: 159082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:25,764][00248] Avg episode reward: [(0, '4.505')] [2024-12-20 09:02:29,218][04354] Updated weights for policy 0, policy_version 160 (0.0035) [2024-12-20 09:02:30,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3959.6, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 984.4. Samples: 165650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:30,766][00248] Avg episode reward: [(0, '4.466')] [2024-12-20 09:02:35,760][00248] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3600.2). Total num frames: 684032. Throughput: 0: 1018.0. Samples: 169286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:02:35,763][00248] Avg episode reward: [(0, '4.381')] [2024-12-20 09:02:39,228][04354] Updated weights for policy 0, policy_version 170 (0.0023) [2024-12-20 09:02:40,761][00248] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 3591.9). Total num frames: 700416. Throughput: 0: 1015.7. Samples: 174760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:02:40,764][00248] Avg episode reward: [(0, '4.418')] [2024-12-20 09:02:45,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 720896. Throughput: 0: 981.3. Samples: 180396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:45,762][00248] Avg episode reward: [(0, '4.473')] [2024-12-20 09:02:49,071][04354] Updated weights for policy 0, policy_version 180 (0.0027) [2024-12-20 09:02:50,760][00248] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 996.4. Samples: 183992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:02:50,763][00248] Avg episode reward: [(0, '4.679')] [2024-12-20 09:02:55,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 1030.6. Samples: 190252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:02:55,762][00248] Avg episode reward: [(0, '4.701')] [2024-12-20 09:02:55,775][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2024-12-20 09:03:00,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 973.9. Samples: 194844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:03:00,764][00248] Avg episode reward: [(0, '4.610')] [2024-12-20 09:03:00,783][04354] Updated weights for policy 0, policy_version 190 (0.0021) [2024-12-20 09:03:05,760][00248] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 971.6. Samples: 198360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:03:05,767][00248] Avg episode reward: [(0, '4.616')] [2024-12-20 09:03:09,332][04354] Updated weights for policy 0, policy_version 200 (0.0013) [2024-12-20 09:03:10,760][00248] Fps is (10 sec: 4915.2, 60 sec: 4096.1, 300 sec: 3659.1). Total num frames: 823296. Throughput: 0: 1032.4. Samples: 205538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:03:10,764][00248] Avg episode reward: [(0, '4.278')] [2024-12-20 09:03:15,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3633.0). Total num frames: 835584. Throughput: 0: 987.7. Samples: 210096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:03:15,766][00248] Avg episode reward: [(0, '4.437')] [2024-12-20 09:03:20,718][04354] Updated weights for policy 0, policy_version 210 (0.0036) [2024-12-20 09:03:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3660.3). Total num frames: 860160. Throughput: 0: 972.1. Samples: 213030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:03:20,762][00248] Avg episode reward: [(0, '4.593')] [2024-12-20 09:03:25,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3669.3). Total num frames: 880640. Throughput: 0: 1006.7. Samples: 220062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:03:25,762][00248] Avg episode reward: [(0, '4.661')] [2024-12-20 09:03:30,733][04354] Updated weights for policy 0, policy_version 220 (0.0013) [2024-12-20 09:03:30,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3678.0). Total num frames: 901120. Throughput: 0: 1006.1. Samples: 225672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:03:30,764][00248] Avg episode reward: [(0, '4.627')] [2024-12-20 09:03:35,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 975.8. Samples: 227904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:03:35,762][00248] Avg episode reward: [(0, '4.534')] [2024-12-20 09:03:40,344][04354] Updated weights for policy 0, policy_version 230 (0.0017) [2024-12-20 09:03:40,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 992.9. Samples: 234932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:03:40,762][00248] Avg episode reward: [(0, '4.650')] [2024-12-20 09:03:45,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3702.2). Total num frames: 962560. Throughput: 0: 1034.7. Samples: 241404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:03:45,765][00248] Avg episode reward: [(0, '4.650')] [2024-12-20 09:03:50,761][00248] Fps is (10 sec: 3276.5, 60 sec: 3891.1, 300 sec: 3678.7). Total num frames: 974848. Throughput: 0: 1004.7. Samples: 243572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:03:50,766][00248] Avg episode reward: [(0, '4.504')] [2024-12-20 09:03:51,757][04354] Updated weights for policy 0, policy_version 240 (0.0035) [2024-12-20 09:03:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3701.6). Total num frames: 999424. Throughput: 0: 975.2. Samples: 249424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:03:55,766][00248] Avg episode reward: [(0, '4.464')] [2024-12-20 09:04:00,514][04354] Updated weights for policy 0, policy_version 250 (0.0020) [2024-12-20 09:04:00,760][00248] Fps is (10 sec: 4915.6, 60 sec: 4164.3, 300 sec: 3723.6). Total num frames: 1024000. Throughput: 0: 1032.0. Samples: 256534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:04:00,767][00248] Avg episode reward: [(0, '4.424')] [2024-12-20 09:04:05,760][00248] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3715.7). Total num frames: 1040384. Throughput: 0: 1021.1. Samples: 258980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:04:05,765][00248] Avg episode reward: [(0, '4.614')] [2024-12-20 09:04:10,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3722.3). Total num frames: 1060864. Throughput: 0: 978.1. Samples: 264076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:04:10,766][00248] Avg episode reward: [(0, '4.606')] [2024-12-20 09:04:11,740][04354] Updated weights for policy 0, policy_version 260 (0.0022) [2024-12-20 09:04:15,760][00248] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3728.8). Total num frames: 1081344. Throughput: 0: 1012.9. Samples: 271254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:04:15,762][00248] Avg episode reward: [(0, '4.742')] [2024-12-20 09:04:15,774][04341] Saving new best policy, reward=4.742! [2024-12-20 09:04:20,764][00248] Fps is (10 sec: 4094.3, 60 sec: 4027.5, 300 sec: 3734.9). Total num frames: 1101824. Throughput: 0: 1037.2. Samples: 274584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:04:20,766][00248] Avg episode reward: [(0, '4.798')] [2024-12-20 09:04:20,772][04341] Saving new best policy, reward=4.798! [2024-12-20 09:04:22,453][04354] Updated weights for policy 0, policy_version 270 (0.0018) [2024-12-20 09:04:25,760][00248] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 961.2. Samples: 278188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:04:25,768][00248] Avg episode reward: [(0, '4.960')] [2024-12-20 09:04:25,826][04341] Saving new best policy, reward=4.960! [2024-12-20 09:04:30,760][00248] Fps is (10 sec: 2458.6, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1126400. Throughput: 0: 910.2. Samples: 282364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:04:30,766][00248] Avg episode reward: [(0, '4.881')] [2024-12-20 09:04:34,584][04354] Updated weights for policy 0, policy_version 280 (0.0024) [2024-12-20 09:04:35,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1150976. Throughput: 0: 941.4. Samples: 285934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:04:35,766][00248] Avg episode reward: [(0, '4.984')] [2024-12-20 09:04:35,775][04341] Saving new best policy, reward=4.984! [2024-12-20 09:04:40,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1171456. Throughput: 0: 965.8. Samples: 292884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:04:40,767][00248] Avg episode reward: [(0, '5.162')] [2024-12-20 09:04:40,769][04341] Saving new best policy, reward=5.162! [2024-12-20 09:04:45,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 1183744. Throughput: 0: 903.1. Samples: 297172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:04:45,766][00248] Avg episode reward: [(0, '5.248')] [2024-12-20 09:04:45,848][04354] Updated weights for policy 0, policy_version 290 (0.0018) [2024-12-20 09:04:45,856][04341] Saving new best policy, reward=5.248! [2024-12-20 09:04:50,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3915.6). Total num frames: 1208320. Throughput: 0: 924.3. Samples: 300572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:04:50,762][00248] Avg episode reward: [(0, '5.067')] [2024-12-20 09:04:54,454][04354] Updated weights for policy 0, policy_version 300 (0.0016) [2024-12-20 09:04:55,760][00248] Fps is (10 sec: 4915.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1232896. Throughput: 0: 969.2. Samples: 307692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:04:55,762][00248] Avg episode reward: [(0, '4.915')] [2024-12-20 09:04:55,770][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth... [2024-12-20 09:04:55,931][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2024-12-20 09:05:00,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 1249280. Throughput: 0: 920.2. Samples: 312664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:05:00,762][00248] Avg episode reward: [(0, '5.065')] [2024-12-20 09:05:05,760][00248] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 1265664. Throughput: 0: 900.1. Samples: 315084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:05:05,769][00248] Avg episode reward: [(0, '5.157')] [2024-12-20 09:05:05,811][04354] Updated weights for policy 0, policy_version 310 (0.0030) [2024-12-20 09:05:10,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1290240. Throughput: 0: 980.7. Samples: 322318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:05:10,763][00248] Avg episode reward: [(0, '5.271')] [2024-12-20 09:05:10,767][04341] Saving new best policy, reward=5.271! [2024-12-20 09:05:15,476][04354] Updated weights for policy 0, policy_version 320 (0.0031) [2024-12-20 09:05:15,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1310720. Throughput: 0: 1018.7. Samples: 328206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:05:15,762][00248] Avg episode reward: [(0, '5.470')] [2024-12-20 09:05:15,774][04341] Saving new best policy, reward=5.470! [2024-12-20 09:05:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3971.0). Total num frames: 1327104. Throughput: 0: 985.7. Samples: 330290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:05:20,766][00248] Avg episode reward: [(0, '5.174')] [2024-12-20 09:05:25,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1347584. Throughput: 0: 975.7. Samples: 336790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:05:25,762][00248] Avg episode reward: [(0, '5.050')] [2024-12-20 09:05:25,785][04354] Updated weights for policy 0, policy_version 330 (0.0033) [2024-12-20 09:05:30,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1372160. Throughput: 0: 1036.8. Samples: 343830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:05:30,764][00248] Avg episode reward: [(0, '5.123')] [2024-12-20 09:05:35,760][00248] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1388544. Throughput: 0: 1008.8. Samples: 345970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:05:35,762][00248] Avg episode reward: [(0, '5.110')] [2024-12-20 09:05:37,108][04354] Updated weights for policy 0, policy_version 340 (0.0018) [2024-12-20 09:05:40,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1409024. Throughput: 0: 973.0. Samples: 351476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:05:40,765][00248] Avg episode reward: [(0, '5.412')] [2024-12-20 09:05:45,745][04354] Updated weights for policy 0, policy_version 350 (0.0016) [2024-12-20 09:05:45,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 1433600. Throughput: 0: 1022.6. Samples: 358680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:05:45,766][00248] Avg episode reward: [(0, '5.374')] [2024-12-20 09:05:50,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1449984. Throughput: 0: 1034.0. Samples: 361614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:05:50,766][00248] Avg episode reward: [(0, '5.294')] [2024-12-20 09:05:55,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1466368. Throughput: 0: 972.9. Samples: 366100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:05:55,762][00248] Avg episode reward: [(0, '5.324')] [2024-12-20 09:05:56,985][04354] Updated weights for policy 0, policy_version 360 (0.0027) [2024-12-20 09:06:00,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1490944. Throughput: 0: 1000.8. Samples: 373240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:06:00,762][00248] Avg episode reward: [(0, '5.573')] [2024-12-20 09:06:00,765][04341] Saving new best policy, reward=5.573! [2024-12-20 09:06:05,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1511424. Throughput: 0: 1033.4. Samples: 376794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:06:05,764][00248] Avg episode reward: [(0, '5.644')] [2024-12-20 09:06:05,772][04341] Saving new best policy, reward=5.644! [2024-12-20 09:06:06,723][04354] Updated weights for policy 0, policy_version 370 (0.0016) [2024-12-20 09:06:10,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1523712. Throughput: 0: 992.6. Samples: 381456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:10,762][00248] Avg episode reward: [(0, '5.392')] [2024-12-20 09:06:15,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1548288. Throughput: 0: 977.8. Samples: 387832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:06:15,765][00248] Avg episode reward: [(0, '5.149')] [2024-12-20 09:06:16,980][04354] Updated weights for policy 0, policy_version 380 (0.0021) [2024-12-20 09:06:20,760][00248] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1572864. Throughput: 0: 1009.2. Samples: 391386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:20,762][00248] Avg episode reward: [(0, '5.785')] [2024-12-20 09:06:20,767][04341] Saving new best policy, reward=5.785! [2024-12-20 09:06:25,762][00248] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 3957.1). Total num frames: 1589248. Throughput: 0: 1009.0. Samples: 396882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:06:25,766][00248] Avg episode reward: [(0, '5.867')] [2024-12-20 09:06:25,780][04341] Saving new best policy, reward=5.867! [2024-12-20 09:06:28,430][04354] Updated weights for policy 0, policy_version 390 (0.0026) [2024-12-20 09:06:30,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1605632. Throughput: 0: 967.4. Samples: 402214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:30,762][00248] Avg episode reward: [(0, '5.508')] [2024-12-20 09:06:35,760][00248] Fps is (10 sec: 4096.7, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1630208. Throughput: 0: 982.1. Samples: 405808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:35,762][00248] Avg episode reward: [(0, '5.720')] [2024-12-20 09:06:36,916][04354] Updated weights for policy 0, policy_version 400 (0.0017) [2024-12-20 09:06:40,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1650688. Throughput: 0: 1028.9. Samples: 412400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:40,762][00248] Avg episode reward: [(0, '5.449')] [2024-12-20 09:06:45,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1662976. Throughput: 0: 961.5. Samples: 416508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:06:45,765][00248] Avg episode reward: [(0, '5.581')] [2024-12-20 09:06:48,825][04354] Updated weights for policy 0, policy_version 410 (0.0021) [2024-12-20 09:06:50,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1687552. Throughput: 0: 958.9. Samples: 419946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:50,763][00248] Avg episode reward: [(0, '5.397')] [2024-12-20 09:06:55,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1708032. Throughput: 0: 1003.4. Samples: 426610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:06:55,763][00248] Avg episode reward: [(0, '5.634')] [2024-12-20 09:06:55,777][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000417_1708032.pth... [2024-12-20 09:06:55,948][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2024-12-20 09:06:59,552][04354] Updated weights for policy 0, policy_version 420 (0.0021) [2024-12-20 09:07:00,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1720320. Throughput: 0: 965.3. Samples: 431272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:07:00,763][00248] Avg episode reward: [(0, '5.595')] [2024-12-20 09:07:05,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1740800. Throughput: 0: 946.2. Samples: 433966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:05,762][00248] Avg episode reward: [(0, '5.712')] [2024-12-20 09:07:09,260][04354] Updated weights for policy 0, policy_version 430 (0.0015) [2024-12-20 09:07:10,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1765376. Throughput: 0: 982.7. Samples: 441102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:10,763][00248] Avg episode reward: [(0, '5.572')] [2024-12-20 09:07:15,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1785856. Throughput: 0: 992.0. Samples: 446856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:15,763][00248] Avg episode reward: [(0, '5.719')] [2024-12-20 09:07:20,454][04354] Updated weights for policy 0, policy_version 440 (0.0030) [2024-12-20 09:07:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1802240. Throughput: 0: 960.5. Samples: 449030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:07:20,767][00248] Avg episode reward: [(0, '5.885')] [2024-12-20 09:07:20,770][04341] Saving new best policy, reward=5.885! [2024-12-20 09:07:25,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 1822720. Throughput: 0: 957.4. Samples: 455482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:25,762][00248] Avg episode reward: [(0, '5.843')] [2024-12-20 09:07:29,244][04354] Updated weights for policy 0, policy_version 450 (0.0037) [2024-12-20 09:07:30,760][00248] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1847296. Throughput: 0: 1016.9. Samples: 462268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:30,766][00248] Avg episode reward: [(0, '5.674')] [2024-12-20 09:07:35,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1859584. Throughput: 0: 986.3. Samples: 464330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:07:35,762][00248] Avg episode reward: [(0, '6.076')] [2024-12-20 09:07:35,775][04341] Saving new best policy, reward=6.076! [2024-12-20 09:07:40,706][04354] Updated weights for policy 0, policy_version 460 (0.0015) [2024-12-20 09:07:40,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1884160. Throughput: 0: 966.7. Samples: 470112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:07:40,766][00248] Avg episode reward: [(0, '5.947')] [2024-12-20 09:07:45,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1904640. Throughput: 0: 1022.2. Samples: 477270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:07:45,766][00248] Avg episode reward: [(0, '5.821')] [2024-12-20 09:07:50,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1921024. Throughput: 0: 1021.5. Samples: 479934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:07:50,763][00248] Avg episode reward: [(0, '5.659')] [2024-12-20 09:07:50,999][04354] Updated weights for policy 0, policy_version 470 (0.0022) [2024-12-20 09:07:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1941504. Throughput: 0: 963.0. Samples: 484436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:07:55,762][00248] Avg episode reward: [(0, '5.722')] [2024-12-20 09:08:00,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1961984. Throughput: 0: 994.7. Samples: 491616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:08:00,762][00248] Avg episode reward: [(0, '6.143')] [2024-12-20 09:08:00,793][04341] Saving new best policy, reward=6.143! [2024-12-20 09:08:00,798][04354] Updated weights for policy 0, policy_version 480 (0.0027) [2024-12-20 09:08:05,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1982464. Throughput: 0: 1023.2. Samples: 495074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:08:05,762][00248] Avg episode reward: [(0, '6.399')] [2024-12-20 09:08:05,775][04341] Saving new best policy, reward=6.399! [2024-12-20 09:08:10,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1998848. Throughput: 0: 977.9. Samples: 499488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:08:10,765][00248] Avg episode reward: [(0, '6.262')] [2024-12-20 09:08:12,265][04354] Updated weights for policy 0, policy_version 490 (0.0023) [2024-12-20 09:08:15,764][00248] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3943.2). Total num frames: 2023424. Throughput: 0: 972.5. Samples: 506032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:08:15,770][00248] Avg episode reward: [(0, '6.600')] [2024-12-20 09:08:15,781][04341] Saving new best policy, reward=6.600! [2024-12-20 09:08:20,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2043904. Throughput: 0: 1004.7. Samples: 509542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:08:20,767][00248] Avg episode reward: [(0, '6.687')] [2024-12-20 09:08:20,770][04341] Saving new best policy, reward=6.687! [2024-12-20 09:08:21,183][04354] Updated weights for policy 0, policy_version 500 (0.0029) [2024-12-20 09:08:25,761][00248] Fps is (10 sec: 3277.8, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 2056192. Throughput: 0: 990.0. Samples: 514662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:08:25,763][00248] Avg episode reward: [(0, '6.744')] [2024-12-20 09:08:25,780][04341] Saving new best policy, reward=6.744! [2024-12-20 09:08:30,760][00248] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 2068480. Throughput: 0: 907.6. Samples: 518114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:08:30,764][00248] Avg episode reward: [(0, '6.795')] [2024-12-20 09:08:30,768][04341] Saving new best policy, reward=6.795! [2024-12-20 09:08:35,285][04354] Updated weights for policy 0, policy_version 510 (0.0028) [2024-12-20 09:08:35,760][00248] Fps is (10 sec: 3277.1, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2088960. Throughput: 0: 902.2. Samples: 520534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:08:35,765][00248] Avg episode reward: [(0, '6.495')] [2024-12-20 09:08:40,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2113536. Throughput: 0: 961.2. Samples: 527690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-20 09:08:40,765][00248] Avg episode reward: [(0, '6.349')] [2024-12-20 09:08:45,770][04354] Updated weights for policy 0, policy_version 520 (0.0027) [2024-12-20 09:08:45,776][00248] Fps is (10 sec: 4089.3, 60 sec: 3753.6, 300 sec: 3915.3). Total num frames: 2129920. Throughput: 0: 914.8. Samples: 532796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:08:45,783][00248] Avg episode reward: [(0, '6.247')] [2024-12-20 09:08:50,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2146304. Throughput: 0: 891.0. Samples: 535170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:08:50,762][00248] Avg episode reward: [(0, '6.572')] [2024-12-20 09:08:55,415][04354] Updated weights for policy 0, policy_version 530 (0.0015) [2024-12-20 09:08:55,760][00248] Fps is (10 sec: 4102.7, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2170880. Throughput: 0: 950.3. Samples: 542250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:08:55,765][00248] Avg episode reward: [(0, '7.099')] [2024-12-20 09:08:55,773][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth... [2024-12-20 09:08:55,899][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth [2024-12-20 09:08:55,924][04341] Saving new best policy, reward=7.099! [2024-12-20 09:09:00,761][00248] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2191360. Throughput: 0: 935.9. Samples: 548146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:09:00,763][00248] Avg episode reward: [(0, '7.001')] [2024-12-20 09:09:05,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 2207744. Throughput: 0: 904.7. Samples: 550256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:09:05,763][00248] Avg episode reward: [(0, '6.349')] [2024-12-20 09:09:06,602][04354] Updated weights for policy 0, policy_version 540 (0.0014) [2024-12-20 09:09:10,760][00248] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2228224. Throughput: 0: 935.5. Samples: 556758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:09:10,762][00248] Avg episode reward: [(0, '6.085')] [2024-12-20 09:09:15,479][04354] Updated weights for policy 0, policy_version 550 (0.0015) [2024-12-20 09:09:15,765][00248] Fps is (10 sec: 4503.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2252800. Throughput: 0: 1014.0. Samples: 563748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:09:15,767][00248] Avg episode reward: [(0, '6.778')] [2024-12-20 09:09:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3915.5). Total num frames: 2265088. Throughput: 0: 1008.9. Samples: 565934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:09:20,764][00248] Avg episode reward: [(0, '7.350')] [2024-12-20 09:09:20,794][04341] Saving new best policy, reward=7.350! [2024-12-20 09:09:25,760][00248] Fps is (10 sec: 3688.2, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2289664. Throughput: 0: 968.6. Samples: 571278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:09:25,774][00248] Avg episode reward: [(0, '7.382')] [2024-12-20 09:09:25,794][04341] Saving new best policy, reward=7.382! [2024-12-20 09:09:26,710][04354] Updated weights for policy 0, policy_version 560 (0.0044) [2024-12-20 09:09:30,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2310144. Throughput: 0: 1009.2. Samples: 578192. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-20 09:09:30,762][00248] Avg episode reward: [(0, '7.341')] [2024-12-20 09:09:35,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2326528. Throughput: 0: 1023.8. Samples: 581242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:09:35,767][00248] Avg episode reward: [(0, '7.704')] [2024-12-20 09:09:35,774][04341] Saving new best policy, reward=7.704! [2024-12-20 09:09:37,431][04354] Updated weights for policy 0, policy_version 570 (0.0021) [2024-12-20 09:09:40,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2347008. Throughput: 0: 963.1. Samples: 585588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:09:40,764][00248] Avg episode reward: [(0, '7.584')] [2024-12-20 09:09:45,760][00248] Fps is (10 sec: 4096.1, 60 sec: 3960.6, 300 sec: 3929.4). Total num frames: 2367488. Throughput: 0: 991.7. Samples: 592770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:09:45,767][00248] Avg episode reward: [(0, '7.547')] [2024-12-20 09:09:46,886][04354] Updated weights for policy 0, policy_version 580 (0.0017) [2024-12-20 09:09:50,763][00248] Fps is (10 sec: 4094.8, 60 sec: 4027.5, 300 sec: 3915.5). Total num frames: 2387968. Throughput: 0: 1025.8. Samples: 596420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:09:50,769][00248] Avg episode reward: [(0, '8.133')] [2024-12-20 09:09:50,777][04341] Saving new best policy, reward=8.133! [2024-12-20 09:09:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2404352. Throughput: 0: 983.3. Samples: 601008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:09:55,762][00248] Avg episode reward: [(0, '8.142')] [2024-12-20 09:09:55,780][04341] Saving new best policy, reward=8.142! [2024-12-20 09:09:58,236][04354] Updated weights for policy 0, policy_version 590 (0.0026) [2024-12-20 09:10:00,760][00248] Fps is (10 sec: 3687.5, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 2424832. Throughput: 0: 965.8. Samples: 607206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:10:00,763][00248] Avg episode reward: [(0, '8.504')] [2024-12-20 09:10:00,767][04341] Saving new best policy, reward=8.504! [2024-12-20 09:10:05,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3929.4). Total num frames: 2449408. Throughput: 0: 996.2. Samples: 610762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:10:05,763][00248] Avg episode reward: [(0, '8.272')] [2024-12-20 09:10:07,041][04354] Updated weights for policy 0, policy_version 600 (0.0021) [2024-12-20 09:10:10,760][00248] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2465792. Throughput: 0: 1006.2. Samples: 616556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:10:10,762][00248] Avg episode reward: [(0, '8.451')] [2024-12-20 09:10:15,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3929.4). Total num frames: 2486272. Throughput: 0: 970.4. Samples: 621858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:10:15,762][00248] Avg episode reward: [(0, '8.506')] [2024-12-20 09:10:15,777][04341] Saving new best policy, reward=8.506! [2024-12-20 09:10:18,244][04354] Updated weights for policy 0, policy_version 610 (0.0021) [2024-12-20 09:10:20,760][00248] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2506752. Throughput: 0: 979.3. Samples: 625310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:10:20,766][00248] Avg episode reward: [(0, '8.377')] [2024-12-20 09:10:25,763][00248] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3915.5). Total num frames: 2527232. Throughput: 0: 1032.7. Samples: 632064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:10:25,765][00248] Avg episode reward: [(0, '8.168')] [2024-12-20 09:10:29,177][04354] Updated weights for policy 0, policy_version 620 (0.0020) [2024-12-20 09:10:30,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2543616. Throughput: 0: 969.7. Samples: 636406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:10:30,762][00248] Avg episode reward: [(0, '8.282')] [2024-12-20 09:10:35,760][00248] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2564096. Throughput: 0: 962.8. Samples: 639744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:10:35,762][00248] Avg episode reward: [(0, '8.423')] [2024-12-20 09:10:38,264][04354] Updated weights for policy 0, policy_version 630 (0.0014) [2024-12-20 09:10:40,760][00248] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2588672. Throughput: 0: 1020.9. Samples: 646948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:10:40,766][00248] Avg episode reward: [(0, '9.147')] [2024-12-20 09:10:40,767][04341] Saving new best policy, reward=9.147! [2024-12-20 09:10:45,761][00248] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2605056. Throughput: 0: 990.1. Samples: 651762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:10:45,763][00248] Avg episode reward: [(0, '9.227')] [2024-12-20 09:10:45,780][04341] Saving new best policy, reward=9.227! [2024-12-20 09:10:49,938][04354] Updated weights for policy 0, policy_version 640 (0.0025) [2024-12-20 09:10:50,760][00248] Fps is (10 sec: 3276.9, 60 sec: 3891.4, 300 sec: 3915.5). Total num frames: 2621440. Throughput: 0: 963.6. Samples: 654124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:10:50,765][00248] Avg episode reward: [(0, '9.172')] [2024-12-20 09:10:55,760][00248] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2646016. Throughput: 0: 988.2. Samples: 661026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:10:55,767][00248] Avg episode reward: [(0, '9.581')] [2024-12-20 09:10:55,775][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth... [2024-12-20 09:10:55,900][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000417_1708032.pth [2024-12-20 09:10:55,925][04341] Saving new best policy, reward=9.581! [2024-12-20 09:10:59,567][04354] Updated weights for policy 0, policy_version 650 (0.0020) [2024-12-20 09:11:00,760][00248] Fps is (10 sec: 4095.8, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 2662400. Throughput: 0: 993.0. Samples: 666544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:11:00,766][00248] Avg episode reward: [(0, '9.655')] [2024-12-20 09:11:00,770][04341] Saving new best policy, reward=9.655! [2024-12-20 09:11:05,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2678784. Throughput: 0: 961.9. Samples: 668596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:11:05,766][00248] Avg episode reward: [(0, '9.502')] [2024-12-20 09:11:10,747][04354] Updated weights for policy 0, policy_version 660 (0.0019) [2024-12-20 09:11:10,760][00248] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2703360. Throughput: 0: 952.4. Samples: 674918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:11:10,762][00248] Avg episode reward: [(0, '8.929')] [2024-12-20 09:11:15,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2723840. Throughput: 0: 1008.1. Samples: 681770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:11:15,764][00248] Avg episode reward: [(0, '9.117')] [2024-12-20 09:11:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2740224. Throughput: 0: 981.6. Samples: 683914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:11:20,765][00248] Avg episode reward: [(0, '9.413')] [2024-12-20 09:11:22,186][04354] Updated weights for policy 0, policy_version 670 (0.0022) [2024-12-20 09:11:25,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3901.6). Total num frames: 2756608. Throughput: 0: 938.7. Samples: 689190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:11:25,767][00248] Avg episode reward: [(0, '9.691')] [2024-12-20 09:11:25,812][04341] Saving new best policy, reward=9.691! [2024-12-20 09:11:30,760][00248] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2781184. Throughput: 0: 981.5. Samples: 695928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:11:30,766][00248] Avg episode reward: [(0, '11.349')] [2024-12-20 09:11:30,771][04341] Saving new best policy, reward=11.349! [2024-12-20 09:11:31,171][04354] Updated weights for policy 0, policy_version 680 (0.0019) [2024-12-20 09:11:35,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2797568. Throughput: 0: 993.2. Samples: 698818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-20 09:11:35,765][00248] Avg episode reward: [(0, '10.534')] [2024-12-20 09:11:40,760][00248] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 2813952. Throughput: 0: 934.1. Samples: 703060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:11:40,767][00248] Avg episode reward: [(0, '11.106')] [2024-12-20 09:11:43,105][04354] Updated weights for policy 0, policy_version 690 (0.0020) [2024-12-20 09:11:45,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 2838528. Throughput: 0: 963.6. Samples: 709904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:11:45,767][00248] Avg episode reward: [(0, '11.301')] [2024-12-20 09:11:50,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2859008. Throughput: 0: 995.2. Samples: 713382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:11:50,762][00248] Avg episode reward: [(0, '10.987')] [2024-12-20 09:11:53,545][04354] Updated weights for policy 0, policy_version 700 (0.0020) [2024-12-20 09:11:55,761][00248] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3901.6). Total num frames: 2871296. Throughput: 0: 957.9. Samples: 718024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:11:55,763][00248] Avg episode reward: [(0, '10.516')] [2024-12-20 09:12:00,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 2891776. Throughput: 0: 937.4. Samples: 723954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:00,762][00248] Avg episode reward: [(0, '10.263')] [2024-12-20 09:12:03,442][04354] Updated weights for policy 0, policy_version 710 (0.0014) [2024-12-20 09:12:05,760][00248] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2916352. Throughput: 0: 968.8. Samples: 727510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:05,767][00248] Avg episode reward: [(0, '10.778')] [2024-12-20 09:12:10,761][00248] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2932736. Throughput: 0: 978.4. Samples: 733220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:10,765][00248] Avg episode reward: [(0, '11.211')] [2024-12-20 09:12:15,011][04354] Updated weights for policy 0, policy_version 720 (0.0032) [2024-12-20 09:12:15,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2949120. Throughput: 0: 943.3. Samples: 738378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:15,765][00248] Avg episode reward: [(0, '11.619')] [2024-12-20 09:12:15,798][04341] Saving new best policy, reward=11.619! [2024-12-20 09:12:20,760][00248] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2973696. Throughput: 0: 957.5. Samples: 741904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:20,767][00248] Avg episode reward: [(0, '11.939')] [2024-12-20 09:12:20,775][04341] Saving new best policy, reward=11.939! [2024-12-20 09:12:23,653][04354] Updated weights for policy 0, policy_version 730 (0.0031) [2024-12-20 09:12:25,760][00248] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 2994176. Throughput: 0: 1010.3. Samples: 748524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:25,763][00248] Avg episode reward: [(0, '12.564')] [2024-12-20 09:12:25,790][04341] Saving new best policy, reward=12.564! [2024-12-20 09:12:30,762][00248] Fps is (10 sec: 3276.0, 60 sec: 3754.5, 300 sec: 3887.7). Total num frames: 3006464. Throughput: 0: 934.9. Samples: 751976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:12:30,767][00248] Avg episode reward: [(0, '13.074')] [2024-12-20 09:12:30,771][04341] Saving new best policy, reward=13.074! [2024-12-20 09:12:35,760][00248] Fps is (10 sec: 2457.7, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3018752. Throughput: 0: 893.7. Samples: 753598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:12:35,762][00248] Avg episode reward: [(0, '12.860')] [2024-12-20 09:12:37,993][04354] Updated weights for policy 0, policy_version 740 (0.0021) [2024-12-20 09:12:40,760][00248] Fps is (10 sec: 3687.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3043328. Throughput: 0: 932.9. Samples: 760004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:12:40,762][00248] Avg episode reward: [(0, '13.122')] [2024-12-20 09:12:40,766][04341] Saving new best policy, reward=13.122! [2024-12-20 09:12:45,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3063808. Throughput: 0: 945.4. Samples: 766496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:45,762][00248] Avg episode reward: [(0, '13.172')] [2024-12-20 09:12:45,776][04341] Saving new best policy, reward=13.172! [2024-12-20 09:12:48,103][04354] Updated weights for policy 0, policy_version 750 (0.0045) [2024-12-20 09:12:50,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3076096. Throughput: 0: 914.4. Samples: 768658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:12:50,770][00248] Avg episode reward: [(0, '12.247')] [2024-12-20 09:12:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3100672. Throughput: 0: 923.7. Samples: 774784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:12:55,765][00248] Avg episode reward: [(0, '12.402')] [2024-12-20 09:12:55,777][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth... [2024-12-20 09:12:55,892][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth [2024-12-20 09:12:57,886][04354] Updated weights for policy 0, policy_version 760 (0.0038) [2024-12-20 09:13:00,760][00248] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3125248. Throughput: 0: 966.8. Samples: 781884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:13:00,762][00248] Avg episode reward: [(0, '12.642')] [2024-12-20 09:13:05,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3137536. Throughput: 0: 942.1. Samples: 784300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:13:05,763][00248] Avg episode reward: [(0, '12.847')] [2024-12-20 09:13:09,083][04354] Updated weights for policy 0, policy_version 770 (0.0018) [2024-12-20 09:13:10,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3162112. Throughput: 0: 909.4. Samples: 789448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:13:10,762][00248] Avg episode reward: [(0, '13.537')] [2024-12-20 09:13:10,766][04341] Saving new best policy, reward=13.537! [2024-12-20 09:13:15,760][00248] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3182592. Throughput: 0: 990.3. Samples: 796536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:13:15,762][00248] Avg episode reward: [(0, '13.292')] [2024-12-20 09:13:17,599][04354] Updated weights for policy 0, policy_version 780 (0.0020) [2024-12-20 09:13:20,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3203072. Throughput: 0: 1029.2. Samples: 799912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:13:20,763][00248] Avg episode reward: [(0, '13.727')] [2024-12-20 09:13:20,765][04341] Saving new best policy, reward=13.727! [2024-12-20 09:13:25,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3901.6). Total num frames: 3219456. Throughput: 0: 983.7. Samples: 804272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:13:25,762][00248] Avg episode reward: [(0, '14.194')] [2024-12-20 09:13:25,768][04341] Saving new best policy, reward=14.194! [2024-12-20 09:13:29,137][04354] Updated weights for policy 0, policy_version 790 (0.0014) [2024-12-20 09:13:30,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3901.6). Total num frames: 3239936. Throughput: 0: 991.5. Samples: 811112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:13:30,765][00248] Avg episode reward: [(0, '13.336')] [2024-12-20 09:13:35,761][00248] Fps is (10 sec: 4505.0, 60 sec: 4095.9, 300 sec: 3901.6). Total num frames: 3264512. Throughput: 0: 1020.5. Samples: 814580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:13:35,767][00248] Avg episode reward: [(0, '14.283')] [2024-12-20 09:13:35,785][04341] Saving new best policy, reward=14.283! [2024-12-20 09:13:39,635][04354] Updated weights for policy 0, policy_version 800 (0.0021) [2024-12-20 09:13:40,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.9). Total num frames: 3276800. Throughput: 0: 994.5. Samples: 819538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:13:40,767][00248] Avg episode reward: [(0, '14.501')] [2024-12-20 09:13:40,771][04341] Saving new best policy, reward=14.501! [2024-12-20 09:13:45,760][00248] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3301376. Throughput: 0: 973.1. Samples: 825674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:13:45,762][00248] Avg episode reward: [(0, '14.080')] [2024-12-20 09:13:48,931][04354] Updated weights for policy 0, policy_version 810 (0.0013) [2024-12-20 09:13:50,760][00248] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3915.5). Total num frames: 3325952. Throughput: 0: 1000.1. Samples: 829304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:13:50,768][00248] Avg episode reward: [(0, '15.178')] [2024-12-20 09:13:50,771][04341] Saving new best policy, reward=15.178! [2024-12-20 09:13:55,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3338240. Throughput: 0: 1015.2. Samples: 835132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:13:55,762][00248] Avg episode reward: [(0, '15.281')] [2024-12-20 09:13:55,776][04341] Saving new best policy, reward=15.281! [2024-12-20 09:14:00,617][04354] Updated weights for policy 0, policy_version 820 (0.0023) [2024-12-20 09:14:00,761][00248] Fps is (10 sec: 3276.4, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 3358720. Throughput: 0: 965.3. Samples: 839974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:00,767][00248] Avg episode reward: [(0, '14.735')] [2024-12-20 09:14:05,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3383296. Throughput: 0: 969.9. Samples: 843556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:05,763][00248] Avg episode reward: [(0, '14.198')] [2024-12-20 09:14:09,029][04354] Updated weights for policy 0, policy_version 830 (0.0013) [2024-12-20 09:14:10,766][00248] Fps is (10 sec: 4503.4, 60 sec: 4027.3, 300 sec: 3901.6). Total num frames: 3403776. Throughput: 0: 1034.7. Samples: 850838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:14:10,768][00248] Avg episode reward: [(0, '13.639')] [2024-12-20 09:14:15,760][00248] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3416064. Throughput: 0: 982.1. Samples: 855308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:14:15,765][00248] Avg episode reward: [(0, '13.010')] [2024-12-20 09:14:20,212][04354] Updated weights for policy 0, policy_version 840 (0.0023) [2024-12-20 09:14:20,763][00248] Fps is (10 sec: 3687.3, 60 sec: 3959.2, 300 sec: 3901.6). Total num frames: 3440640. Throughput: 0: 975.8. Samples: 858494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:14:20,768][00248] Avg episode reward: [(0, '13.522')] [2024-12-20 09:14:25,760][00248] Fps is (10 sec: 4915.4, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3465216. Throughput: 0: 1025.5. Samples: 865684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:25,767][00248] Avg episode reward: [(0, '14.170')] [2024-12-20 09:14:30,404][04354] Updated weights for policy 0, policy_version 850 (0.0020) [2024-12-20 09:14:30,762][00248] Fps is (10 sec: 4096.5, 60 sec: 4027.6, 300 sec: 3915.5). Total num frames: 3481600. Throughput: 0: 1007.3. Samples: 871006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:30,770][00248] Avg episode reward: [(0, '14.035')] [2024-12-20 09:14:35,760][00248] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 3497984. Throughput: 0: 975.4. Samples: 873196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:35,767][00248] Avg episode reward: [(0, '14.705')] [2024-12-20 09:14:40,165][04354] Updated weights for policy 0, policy_version 860 (0.0017) [2024-12-20 09:14:40,760][00248] Fps is (10 sec: 4096.8, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3522560. Throughput: 0: 1006.8. Samples: 880438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:14:40,767][00248] Avg episode reward: [(0, '13.680')] [2024-12-20 09:14:45,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 3543040. Throughput: 0: 1042.7. Samples: 886894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:14:45,765][00248] Avg episode reward: [(0, '15.311')] [2024-12-20 09:14:45,775][04341] Saving new best policy, reward=15.311! [2024-12-20 09:14:50,760][00248] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3559424. Throughput: 0: 1011.3. Samples: 889066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-20 09:14:50,766][00248] Avg episode reward: [(0, '16.050')] [2024-12-20 09:14:50,771][04341] Saving new best policy, reward=16.050! [2024-12-20 09:14:51,314][04354] Updated weights for policy 0, policy_version 870 (0.0028) [2024-12-20 09:14:55,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 3584000. Throughput: 0: 985.2. Samples: 895166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-20 09:14:55,765][00248] Avg episode reward: [(0, '17.914')] [2024-12-20 09:14:55,776][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000875_3584000.pth... [2024-12-20 09:14:55,906][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000646_2646016.pth [2024-12-20 09:14:55,920][04341] Saving new best policy, reward=17.914! [2024-12-20 09:15:00,252][04354] Updated weights for policy 0, policy_version 880 (0.0023) [2024-12-20 09:15:00,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3915.5). Total num frames: 3604480. Throughput: 0: 1034.5. Samples: 901860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:15:00,765][00248] Avg episode reward: [(0, '19.485')] [2024-12-20 09:15:00,767][04341] Saving new best policy, reward=19.485! [2024-12-20 09:15:05,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3620864. Throughput: 0: 1016.9. Samples: 904252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:15:05,767][00248] Avg episode reward: [(0, '20.078')] [2024-12-20 09:15:05,774][04341] Saving new best policy, reward=20.078! [2024-12-20 09:15:10,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3915.5). Total num frames: 3641344. Throughput: 0: 971.1. Samples: 909384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:15:10,762][00248] Avg episode reward: [(0, '19.071')] [2024-12-20 09:15:11,399][04354] Updated weights for policy 0, policy_version 890 (0.0031) [2024-12-20 09:15:15,760][00248] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3661824. Throughput: 0: 1012.3. Samples: 916558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:15:15,761][00248] Avg episode reward: [(0, '18.990')] [2024-12-20 09:15:20,762][00248] Fps is (10 sec: 4095.3, 60 sec: 4027.9, 300 sec: 3915.5). Total num frames: 3682304. Throughput: 0: 1040.1. Samples: 920004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:15:20,764][00248] Avg episode reward: [(0, '18.760')] [2024-12-20 09:15:21,336][04354] Updated weights for policy 0, policy_version 900 (0.0033) [2024-12-20 09:15:25,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3698688. Throughput: 0: 976.8. Samples: 924394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:15:25,766][00248] Avg episode reward: [(0, '18.225')] [2024-12-20 09:15:30,760][00248] Fps is (10 sec: 4096.7, 60 sec: 4027.9, 300 sec: 3929.4). Total num frames: 3723264. Throughput: 0: 987.7. Samples: 931342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-20 09:15:30,762][00248] Avg episode reward: [(0, '18.180')] [2024-12-20 09:15:31,194][04354] Updated weights for policy 0, policy_version 910 (0.0022) [2024-12-20 09:15:35,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 3743744. Throughput: 0: 1020.5. Samples: 934988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:15:35,770][00248] Avg episode reward: [(0, '17.563')] [2024-12-20 09:15:40,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3760128. Throughput: 0: 994.8. Samples: 939932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-20 09:15:40,764][00248] Avg episode reward: [(0, '17.224')] [2024-12-20 09:15:42,427][04354] Updated weights for policy 0, policy_version 920 (0.0018) [2024-12-20 09:15:45,760][00248] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3780608. Throughput: 0: 981.1. Samples: 946010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:15:45,762][00248] Avg episode reward: [(0, '18.223')] [2024-12-20 09:15:50,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 3805184. Throughput: 0: 1010.1. Samples: 949708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:15:50,766][00248] Avg episode reward: [(0, '17.727')] [2024-12-20 09:15:51,019][04354] Updated weights for policy 0, policy_version 930 (0.0025) [2024-12-20 09:15:55,760][00248] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3821568. Throughput: 0: 1031.0. Samples: 955780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-20 09:15:55,762][00248] Avg episode reward: [(0, '17.236')] [2024-12-20 09:16:00,760][00248] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3842048. Throughput: 0: 981.7. Samples: 960736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:16:00,762][00248] Avg episode reward: [(0, '18.238')] [2024-12-20 09:16:02,416][04354] Updated weights for policy 0, policy_version 940 (0.0030) [2024-12-20 09:16:05,760][00248] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 3866624. Throughput: 0: 985.2. Samples: 964336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:16:05,766][00248] Avg episode reward: [(0, '18.669')] [2024-12-20 09:16:10,760][00248] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 3887104. Throughput: 0: 1046.2. Samples: 971472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:16:10,766][00248] Avg episode reward: [(0, '19.462')] [2024-12-20 09:16:12,025][04354] Updated weights for policy 0, policy_version 950 (0.0028) [2024-12-20 09:16:15,765][00248] Fps is (10 sec: 3275.1, 60 sec: 3959.1, 300 sec: 3929.3). Total num frames: 3899392. Throughput: 0: 989.0. Samples: 975852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:16:15,768][00248] Avg episode reward: [(0, '20.496')] [2024-12-20 09:16:15,781][04341] Saving new best policy, reward=20.496! [2024-12-20 09:16:20,760][00248] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 3923968. Throughput: 0: 981.5. Samples: 979156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:16:20,767][00248] Avg episode reward: [(0, '21.513')] [2024-12-20 09:16:20,771][04341] Saving new best policy, reward=21.513! [2024-12-20 09:16:22,084][04354] Updated weights for policy 0, policy_version 960 (0.0022) [2024-12-20 09:16:25,760][00248] Fps is (10 sec: 4917.7, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 3948544. Throughput: 0: 1032.9. Samples: 986412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:16:25,762][00248] Avg episode reward: [(0, '20.086')] [2024-12-20 09:16:30,762][00248] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3943.2). Total num frames: 3960832. Throughput: 0: 1004.8. Samples: 991230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-20 09:16:30,764][00248] Avg episode reward: [(0, '20.565')] [2024-12-20 09:16:34,870][04354] Updated weights for policy 0, policy_version 970 (0.0027) [2024-12-20 09:16:35,760][00248] Fps is (10 sec: 2457.6, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3973120. Throughput: 0: 961.5. Samples: 992976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-20 09:16:35,762][00248] Avg episode reward: [(0, '20.768')] [2024-12-20 09:16:40,760][00248] Fps is (10 sec: 3277.5, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3993600. Throughput: 0: 936.0. Samples: 997902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-20 09:16:40,761][00248] Avg episode reward: [(0, '19.667')] [2024-12-20 09:16:42,791][04341] Stopping Batcher_0... [2024-12-20 09:16:42,791][04341] Loop batcher_evt_loop terminating... [2024-12-20 09:16:42,792][00248] Component Batcher_0 stopped! [2024-12-20 09:16:42,805][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-20 09:16:42,846][04354] Weights refcount: 2 0 [2024-12-20 09:16:42,849][04354] Stopping InferenceWorker_p0-w0... [2024-12-20 09:16:42,849][04354] Loop inference_proc0-0_evt_loop terminating... [2024-12-20 09:16:42,849][00248] Component InferenceWorker_p0-w0 stopped! [2024-12-20 09:16:42,938][04341] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth [2024-12-20 09:16:42,960][04341] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-20 09:16:43,132][04341] Stopping LearnerWorker_p0... [2024-12-20 09:16:43,135][04341] Loop learner_proc0_evt_loop terminating... [2024-12-20 09:16:43,132][00248] Component LearnerWorker_p0 stopped! [2024-12-20 09:16:43,201][00248] Component RolloutWorker_w6 stopped! [2024-12-20 09:16:43,205][04356] Stopping RolloutWorker_w1... [2024-12-20 09:16:43,207][00248] Component RolloutWorker_w1 stopped! [2024-12-20 09:16:43,211][04361] Stopping RolloutWorker_w6... [2024-12-20 09:16:43,209][04356] Loop rollout_proc1_evt_loop terminating... [2024-12-20 09:16:43,218][04360] Stopping RolloutWorker_w5... [2024-12-20 09:16:43,218][00248] Component RolloutWorker_w5 stopped! [2024-12-20 09:16:43,219][04360] Loop rollout_proc5_evt_loop terminating... [2024-12-20 09:16:43,211][04361] Loop rollout_proc6_evt_loop terminating... [2024-12-20 09:16:43,236][04362] Stopping RolloutWorker_w7... [2024-12-20 09:16:43,236][00248] Component RolloutWorker_w7 stopped! [2024-12-20 09:16:43,236][04362] Loop rollout_proc7_evt_loop terminating... [2024-12-20 09:16:43,247][04358] Stopping RolloutWorker_w3... [2024-12-20 09:16:43,247][00248] Component RolloutWorker_w3 stopped! [2024-12-20 09:16:43,248][04358] Loop rollout_proc3_evt_loop terminating... [2024-12-20 09:16:43,255][00248] Component RolloutWorker_w0 stopped! [2024-12-20 09:16:43,260][04355] Stopping RolloutWorker_w0... [2024-12-20 09:16:43,267][04355] Loop rollout_proc0_evt_loop terminating... [2024-12-20 09:16:43,285][00248] Component RolloutWorker_w4 stopped! [2024-12-20 09:16:43,290][04359] Stopping RolloutWorker_w4... [2024-12-20 09:16:43,300][04359] Loop rollout_proc4_evt_loop terminating... [2024-12-20 09:16:43,305][00248] Component RolloutWorker_w2 stopped! [2024-12-20 09:16:43,310][00248] Waiting for process learner_proc0 to stop... [2024-12-20 09:16:43,314][04357] Stopping RolloutWorker_w2... [2024-12-20 09:16:43,324][04357] Loop rollout_proc2_evt_loop terminating... [2024-12-20 09:16:44,701][00248] Waiting for process inference_proc0-0 to join... [2024-12-20 09:16:44,707][00248] Waiting for process rollout_proc0 to join... [2024-12-20 09:16:46,989][00248] Waiting for process rollout_proc1 to join... [2024-12-20 09:16:46,997][00248] Waiting for process rollout_proc2 to join... [2024-12-20 09:16:47,002][00248] Waiting for process rollout_proc3 to join... [2024-12-20 09:16:47,020][00248] Waiting for process rollout_proc4 to join... [2024-12-20 09:16:47,023][00248] Waiting for process rollout_proc5 to join... [2024-12-20 09:16:47,026][00248] Waiting for process rollout_proc6 to join... [2024-12-20 09:16:47,033][00248] Waiting for process rollout_proc7 to join... [2024-12-20 09:16:47,036][00248] Batcher 0 profile tree view: batching: 25.7768, releasing_batches: 0.0259 [2024-12-20 09:16:47,041][00248] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0022 wait_policy_total: 423.3173 update_model: 8.3883 weight_update: 0.0027 one_step: 0.0039 handle_policy_step: 561.7595 deserialize: 14.2113, stack: 3.1711, obs_to_device_normalize: 121.2390, forward: 280.2151, send_messages: 27.7658 prepare_outputs: 86.9552 to_cpu: 52.5157 [2024-12-20 09:16:47,043][00248] Learner 0 profile tree view: misc: 0.0101, prepare_batch: 13.3892 train: 73.1009 epoch_init: 0.0067, minibatch_init: 0.0062, losses_postprocess: 0.6646, kl_divergence: 0.5030, after_optimizer: 33.6950 calculate_losses: 25.9979 losses_init: 0.0159, forward_head: 1.1827, bptt_initial: 17.5537, tail: 1.0841, advantages_returns: 0.2926, losses: 3.7085 bptt: 1.8831 bptt_forward_core: 1.7915 update: 11.6531 clip: 0.8345 [2024-12-20 09:16:47,046][00248] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3722, enqueue_policy_requests: 100.9610, env_step: 808.9115, overhead: 12.4214, complete_rollouts: 6.5032 save_policy_outputs: 21.1292 split_output_tensors: 8.3141 [2024-12-20 09:16:47,047][00248] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3317, enqueue_policy_requests: 102.0719, env_step: 809.1501, overhead: 12.8644, complete_rollouts: 6.5869 save_policy_outputs: 20.8547 split_output_tensors: 8.3624 [2024-12-20 09:16:47,049][00248] Loop Runner_EvtLoop terminating... [2024-12-20 09:16:47,050][00248] Runner profile tree view: main_loop: 1065.1760 [2024-12-20 09:16:47,053][00248] Collected {0: 4005888}, FPS: 3760.8 [2024-12-20 09:19:23,715][00248] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-20 09:19:23,717][00248] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-20 09:19:23,719][00248] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-20 09:19:23,721][00248] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-20 09:19:23,723][00248] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-20 09:19:23,725][00248] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-20 09:19:23,726][00248] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-20 09:19:23,727][00248] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-20 09:19:23,728][00248] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-20 09:19:23,729][00248] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-20 09:19:23,730][00248] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-20 09:19:23,731][00248] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-20 09:19:23,733][00248] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-20 09:19:23,734][00248] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-20 09:19:23,735][00248] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-20 09:19:23,768][00248] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-20 09:19:23,772][00248] RunningMeanStd input shape: (3, 72, 128) [2024-12-20 09:19:23,774][00248] RunningMeanStd input shape: (1,) [2024-12-20 09:19:23,793][00248] ConvEncoder: input_channels=3 [2024-12-20 09:19:23,899][00248] Conv encoder output size: 512 [2024-12-20 09:19:23,901][00248] Policy head output size: 512 [2024-12-20 09:19:24,183][00248] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-20 09:19:24,963][00248] Num frames 100... [2024-12-20 09:19:25,088][00248] Num frames 200... [2024-12-20 09:19:25,211][00248] Num frames 300... [2024-12-20 09:19:25,333][00248] Num frames 400... [2024-12-20 09:19:25,454][00248] Num frames 500... [2024-12-20 09:19:25,600][00248] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2024-12-20 09:19:25,602][00248] Avg episode reward: 9.760, avg true_objective: 5.760 [2024-12-20 09:19:25,638][00248] Num frames 600... [2024-12-20 09:19:25,757][00248] Num frames 700... [2024-12-20 09:19:25,884][00248] Num frames 800... [2024-12-20 09:19:26,002][00248] Num frames 900... [2024-12-20 09:19:26,120][00248] Num frames 1000... [2024-12-20 09:19:26,256][00248] Num frames 1100... [2024-12-20 09:19:26,376][00248] Num frames 1200... [2024-12-20 09:19:26,497][00248] Num frames 1300... [2024-12-20 09:19:26,623][00248] Num frames 1400... [2024-12-20 09:19:26,743][00248] Num frames 1500... [2024-12-20 09:19:26,863][00248] Num frames 1600... [2024-12-20 09:19:26,991][00248] Num frames 1700... [2024-12-20 09:19:27,122][00248] Avg episode rewards: #0: 17.300, true rewards: #0: 8.800 [2024-12-20 09:19:27,124][00248] Avg episode reward: 17.300, avg true_objective: 8.800 [2024-12-20 09:19:27,183][00248] Num frames 1800... [2024-12-20 09:19:27,301][00248] Num frames 1900... [2024-12-20 09:19:27,451][00248] Num frames 2000... [2024-12-20 09:19:27,624][00248] Num frames 2100... [2024-12-20 09:19:27,792][00248] Num frames 2200... [2024-12-20 09:19:27,973][00248] Num frames 2300... [2024-12-20 09:19:28,097][00248] Avg episode rewards: #0: 15.120, true rewards: #0: 7.787 [2024-12-20 09:19:28,099][00248] Avg episode reward: 15.120, avg true_objective: 7.787 [2024-12-20 09:19:28,221][00248] Num frames 2400... [2024-12-20 09:19:28,387][00248] Num frames 2500... [2024-12-20 09:19:28,551][00248] Num frames 2600... [2024-12-20 09:19:28,731][00248] Num frames 2700... [2024-12-20 09:19:28,908][00248] Num frames 2800... [2024-12-20 09:19:29,054][00248] Avg episode rewards: #0: 14.120, true rewards: #0: 7.120 [2024-12-20 09:19:29,056][00248] Avg episode reward: 14.120, avg true_objective: 7.120 [2024-12-20 09:19:29,143][00248] Num frames 2900... [2024-12-20 09:19:29,327][00248] Num frames 3000... [2024-12-20 09:19:29,502][00248] Num frames 3100... [2024-12-20 09:19:29,680][00248] Num frames 3200... [2024-12-20 09:19:29,862][00248] Num frames 3300... [2024-12-20 09:19:30,035][00248] Num frames 3400... [2024-12-20 09:19:30,159][00248] Num frames 3500... [2024-12-20 09:19:30,296][00248] Avg episode rewards: #0: 14.532, true rewards: #0: 7.132 [2024-12-20 09:19:30,298][00248] Avg episode reward: 14.532, avg true_objective: 7.132 [2024-12-20 09:19:30,340][00248] Num frames 3600... [2024-12-20 09:19:30,458][00248] Num frames 3700... [2024-12-20 09:19:30,582][00248] Num frames 3800... [2024-12-20 09:19:30,701][00248] Num frames 3900... [2024-12-20 09:19:30,820][00248] Num frames 4000... [2024-12-20 09:19:30,890][00248] Avg episode rewards: #0: 13.350, true rewards: #0: 6.683 [2024-12-20 09:19:30,891][00248] Avg episode reward: 13.350, avg true_objective: 6.683 [2024-12-20 09:19:30,998][00248] Num frames 4100... [2024-12-20 09:19:31,124][00248] Num frames 4200... [2024-12-20 09:19:31,250][00248] Num frames 4300... [2024-12-20 09:19:31,373][00248] Num frames 4400... [2024-12-20 09:19:31,500][00248] Avg episode rewards: #0: 12.226, true rewards: #0: 6.369 [2024-12-20 09:19:31,501][00248] Avg episode reward: 12.226, avg true_objective: 6.369 [2024-12-20 09:19:31,554][00248] Num frames 4500... [2024-12-20 09:19:31,672][00248] Num frames 4600... [2024-12-20 09:19:31,792][00248] Num frames 4700... [2024-12-20 09:19:31,909][00248] Num frames 4800... [2024-12-20 09:19:32,029][00248] Num frames 4900... [2024-12-20 09:19:32,167][00248] Num frames 5000... [2024-12-20 09:19:32,293][00248] Num frames 5100... [2024-12-20 09:19:32,412][00248] Num frames 5200... [2024-12-20 09:19:32,535][00248] Num frames 5300... [2024-12-20 09:19:32,655][00248] Num frames 5400... [2024-12-20 09:19:32,775][00248] Num frames 5500... [2024-12-20 09:19:32,895][00248] Num frames 5600... [2024-12-20 09:19:33,018][00248] Num frames 5700... [2024-12-20 09:19:33,150][00248] Num frames 5800... [2024-12-20 09:19:33,279][00248] Num frames 5900... [2024-12-20 09:19:33,399][00248] Num frames 6000... [2024-12-20 09:19:33,523][00248] Num frames 6100... [2024-12-20 09:19:33,649][00248] Num frames 6200... [2024-12-20 09:19:33,770][00248] Num frames 6300... [2024-12-20 09:19:33,894][00248] Num frames 6400... [2024-12-20 09:19:34,024][00248] Num frames 6500... [2024-12-20 09:19:34,149][00248] Avg episode rewards: #0: 17.447, true rewards: #0: 8.197 [2024-12-20 09:19:34,151][00248] Avg episode reward: 17.447, avg true_objective: 8.197 [2024-12-20 09:19:34,207][00248] Num frames 6600... [2024-12-20 09:19:34,323][00248] Num frames 6700... [2024-12-20 09:19:34,443][00248] Num frames 6800... [2024-12-20 09:19:34,600][00248] Avg episode rewards: #0: 15.983, true rewards: #0: 7.650 [2024-12-20 09:19:34,602][00248] Avg episode reward: 15.983, avg true_objective: 7.650 [2024-12-20 09:19:34,622][00248] Num frames 6900... [2024-12-20 09:19:34,742][00248] Num frames 7000... [2024-12-20 09:19:34,857][00248] Num frames 7100... [2024-12-20 09:19:34,979][00248] Num frames 7200... [2024-12-20 09:19:35,094][00248] Num frames 7300... [2024-12-20 09:19:35,197][00248] Avg episode rewards: #0: 14.933, true rewards: #0: 7.333 [2024-12-20 09:19:35,199][00248] Avg episode reward: 14.933, avg true_objective: 7.333 [2024-12-20 09:20:17,839][00248] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-20 09:28:56,696][00248] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-20 09:28:56,703][00248] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-20 09:28:56,704][00248] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-20 09:28:56,706][00248] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-20 09:28:56,708][00248] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-20 09:28:56,709][00248] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-20 09:28:56,714][00248] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-20 09:28:56,715][00248] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-20 09:28:56,716][00248] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-20 09:28:56,717][00248] Adding new argument 'hf_repository'='ITSheep/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-20 09:28:56,721][00248] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-20 09:28:56,722][00248] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-20 09:28:56,723][00248] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-20 09:28:56,724][00248] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-20 09:28:56,725][00248] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-20 09:28:56,774][00248] RunningMeanStd input shape: (3, 72, 128) [2024-12-20 09:28:56,776][00248] RunningMeanStd input shape: (1,) [2024-12-20 09:28:56,794][00248] ConvEncoder: input_channels=3 [2024-12-20 09:28:56,852][00248] Conv encoder output size: 512 [2024-12-20 09:28:56,853][00248] Policy head output size: 512 [2024-12-20 09:28:56,883][00248] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-20 09:28:57,600][00248] Num frames 100... [2024-12-20 09:28:57,798][00248] Num frames 200... [2024-12-20 09:28:58,033][00248] Num frames 300... [2024-12-20 09:28:58,225][00248] Num frames 400... [2024-12-20 09:28:58,496][00248] Num frames 500... [2024-12-20 09:28:58,677][00248] Num frames 600... [2024-12-20 09:28:58,918][00248] Num frames 700... [2024-12-20 09:28:59,160][00248] Num frames 800... [2024-12-20 09:28:59,354][00248] Num frames 900... [2024-12-20 09:28:59,577][00248] Num frames 1000... [2024-12-20 09:28:59,812][00248] Num frames 1100... [2024-12-20 09:28:59,914][00248] Avg episode rewards: #0: 23.200, true rewards: #0: 11.200 [2024-12-20 09:28:59,919][00248] Avg episode reward: 23.200, avg true_objective: 11.200 [2024-12-20 09:29:00,104][00248] Num frames 1200... [2024-12-20 09:29:00,426][00248] Num frames 1300... [2024-12-20 09:29:00,742][00248] Num frames 1400... [2024-12-20 09:29:00,949][00248] Num frames 1500... [2024-12-20 09:29:01,133][00248] Num frames 1600... [2024-12-20 09:29:01,327][00248] Num frames 1700... [2024-12-20 09:29:01,553][00248] Num frames 1800... [2024-12-20 09:29:01,748][00248] Num frames 1900... [2024-12-20 09:29:01,822][00248] Avg episode rewards: #0: 19.530, true rewards: #0: 9.530 [2024-12-20 09:29:01,823][00248] Avg episode reward: 19.530, avg true_objective: 9.530 [2024-12-20 09:29:01,937][00248] Num frames 2000... [2024-12-20 09:29:02,063][00248] Num frames 2100... [2024-12-20 09:29:02,191][00248] Num frames 2200... [2024-12-20 09:29:02,327][00248] Num frames 2300... [2024-12-20 09:29:02,464][00248] Num frames 2400... [2024-12-20 09:29:02,585][00248] Num frames 2500... [2024-12-20 09:29:02,706][00248] Num frames 2600... [2024-12-20 09:29:02,824][00248] Num frames 2700... [2024-12-20 09:29:02,976][00248] Avg episode rewards: #0: 20.943, true rewards: #0: 9.277 [2024-12-20 09:29:02,978][00248] Avg episode reward: 20.943, avg true_objective: 9.277 [2024-12-20 09:29:03,007][00248] Num frames 2800... [2024-12-20 09:29:03,124][00248] Num frames 2900... [2024-12-20 09:29:03,254][00248] Num frames 3000... [2024-12-20 09:29:03,375][00248] Num frames 3100... [2024-12-20 09:29:03,496][00248] Num frames 3200... [2024-12-20 09:29:03,619][00248] Num frames 3300... [2024-12-20 09:29:03,741][00248] Num frames 3400... [2024-12-20 09:29:03,859][00248] Num frames 3500... [2024-12-20 09:29:03,978][00248] Num frames 3600... [2024-12-20 09:29:04,107][00248] Num frames 3700... [2024-12-20 09:29:04,230][00248] Num frames 3800... [2024-12-20 09:29:04,350][00248] Num frames 3900... [2024-12-20 09:29:04,480][00248] Num frames 4000... [2024-12-20 09:29:04,601][00248] Num frames 4100... [2024-12-20 09:29:04,721][00248] Num frames 4200... [2024-12-20 09:29:04,844][00248] Num frames 4300... [2024-12-20 09:29:04,964][00248] Num frames 4400... [2024-12-20 09:29:05,099][00248] Num frames 4500... [2024-12-20 09:29:05,231][00248] Num frames 4600... [2024-12-20 09:29:05,351][00248] Num frames 4700... [2024-12-20 09:29:05,470][00248] Num frames 4800... [2024-12-20 09:29:05,624][00248] Avg episode rewards: #0: 28.957, true rewards: #0: 12.208 [2024-12-20 09:29:05,625][00248] Avg episode reward: 28.957, avg true_objective: 12.208 [2024-12-20 09:29:05,649][00248] Num frames 4900... [2024-12-20 09:29:05,764][00248] Num frames 5000... [2024-12-20 09:29:05,881][00248] Num frames 5100... [2024-12-20 09:29:06,000][00248] Num frames 5200... [2024-12-20 09:29:06,098][00248] Avg episode rewards: #0: 24.070, true rewards: #0: 10.470 [2024-12-20 09:29:06,100][00248] Avg episode reward: 24.070, avg true_objective: 10.470 [2024-12-20 09:29:06,181][00248] Num frames 5300... [2024-12-20 09:29:06,303][00248] Num frames 5400... [2024-12-20 09:29:06,422][00248] Num frames 5500... [2024-12-20 09:29:06,540][00248] Num frames 5600... [2024-12-20 09:29:06,656][00248] Num frames 5700... [2024-12-20 09:29:06,771][00248] Num frames 5800... [2024-12-20 09:29:06,890][00248] Num frames 5900... [2024-12-20 09:29:07,010][00248] Num frames 6000... [2024-12-20 09:29:07,106][00248] Avg episode rewards: #0: 22.392, true rewards: #0: 10.058 [2024-12-20 09:29:07,108][00248] Avg episode reward: 22.392, avg true_objective: 10.058 [2024-12-20 09:29:07,194][00248] Num frames 6100... [2024-12-20 09:29:07,312][00248] Num frames 6200... [2024-12-20 09:29:07,436][00248] Num frames 6300... [2024-12-20 09:29:07,552][00248] Num frames 6400... [2024-12-20 09:29:07,682][00248] Num frames 6500... [2024-12-20 09:29:07,852][00248] Num frames 6600... [2024-12-20 09:29:08,022][00248] Num frames 6700... [2024-12-20 09:29:08,204][00248] Num frames 6800... [2024-12-20 09:29:08,367][00248] Num frames 6900... [2024-12-20 09:29:08,530][00248] Num frames 7000... [2024-12-20 09:29:08,733][00248] Avg episode rewards: #0: 22.701, true rewards: #0: 10.130 [2024-12-20 09:29:08,738][00248] Avg episode reward: 22.701, avg true_objective: 10.130 [2024-12-20 09:29:08,756][00248] Num frames 7100... [2024-12-20 09:29:08,912][00248] Num frames 7200... [2024-12-20 09:29:09,087][00248] Num frames 7300... [2024-12-20 09:29:09,266][00248] Num frames 7400... [2024-12-20 09:29:09,433][00248] Num frames 7500... [2024-12-20 09:29:09,606][00248] Num frames 7600... [2024-12-20 09:29:09,778][00248] Num frames 7700... [2024-12-20 09:29:09,947][00248] Num frames 7800... [2024-12-20 09:29:10,119][00248] Num frames 7900... [2024-12-20 09:29:10,261][00248] Num frames 8000... [2024-12-20 09:29:10,379][00248] Num frames 8100... [2024-12-20 09:29:10,531][00248] Avg episode rewards: #0: 22.724, true rewards: #0: 10.224 [2024-12-20 09:29:10,533][00248] Avg episode reward: 22.724, avg true_objective: 10.224 [2024-12-20 09:29:10,562][00248] Num frames 8200... [2024-12-20 09:29:10,685][00248] Num frames 8300... [2024-12-20 09:29:10,803][00248] Num frames 8400... [2024-12-20 09:29:10,932][00248] Num frames 8500... [2024-12-20 09:29:11,061][00248] Num frames 8600... [2024-12-20 09:29:11,203][00248] Num frames 8700... [2024-12-20 09:29:11,332][00248] Num frames 8800... [2024-12-20 09:29:11,459][00248] Num frames 8900... [2024-12-20 09:29:11,577][00248] Num frames 9000... [2024-12-20 09:29:11,698][00248] Num frames 9100... [2024-12-20 09:29:11,817][00248] Num frames 9200... [2024-12-20 09:29:11,938][00248] Num frames 9300... [2024-12-20 09:29:12,060][00248] Num frames 9400... [2024-12-20 09:29:12,186][00248] Num frames 9500... [2024-12-20 09:29:12,319][00248] Num frames 9600... [2024-12-20 09:29:12,441][00248] Num frames 9700... [2024-12-20 09:29:12,566][00248] Num frames 9800... [2024-12-20 09:29:12,688][00248] Num frames 9900... [2024-12-20 09:29:12,817][00248] Num frames 10000... [2024-12-20 09:29:12,951][00248] Num frames 10100... [2024-12-20 09:29:13,071][00248] Num frames 10200... [2024-12-20 09:29:13,228][00248] Avg episode rewards: #0: 25.754, true rewards: #0: 11.421 [2024-12-20 09:29:13,229][00248] Avg episode reward: 25.754, avg true_objective: 11.421 [2024-12-20 09:29:13,263][00248] Num frames 10300... [2024-12-20 09:29:13,381][00248] Num frames 10400... [2024-12-20 09:29:13,502][00248] Num frames 10500... [2024-12-20 09:29:13,625][00248] Num frames 10600... [2024-12-20 09:29:13,743][00248] Num frames 10700... [2024-12-20 09:29:13,905][00248] Avg episode rewards: #0: 24.191, true rewards: #0: 10.791 [2024-12-20 09:29:13,907][00248] Avg episode reward: 24.191, avg true_objective: 10.791 [2024-12-20 09:30:17,485][00248] Replay video saved to /content/train_dir/default_experiment/replay.mp4!