diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1155 @@ +[2024-09-13 07:42:07,500][06556] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-13 07:42:07,504][06556] Rollout worker 0 uses device cpu +[2024-09-13 07:42:07,506][06556] Rollout worker 1 uses device cpu +[2024-09-13 07:42:07,508][06556] Rollout worker 2 uses device cpu +[2024-09-13 07:42:07,511][06556] Rollout worker 3 uses device cpu +[2024-09-13 07:42:07,512][06556] Rollout worker 4 uses device cpu +[2024-09-13 07:42:07,514][06556] Rollout worker 5 uses device cpu +[2024-09-13 07:42:07,515][06556] Rollout worker 6 uses device cpu +[2024-09-13 07:42:07,516][06556] Rollout worker 7 uses device cpu +[2024-09-13 07:42:07,697][06556] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-13 07:42:07,698][06556] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-13 07:42:07,736][06556] Starting all processes... +[2024-09-13 07:42:07,738][06556] Starting process learner_proc0 +[2024-09-13 07:42:08,428][06556] Starting all processes... +[2024-09-13 07:42:08,443][06556] Starting process inference_proc0-0 +[2024-09-13 07:42:08,443][06556] Starting process rollout_proc0 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc1 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc2 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc3 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc4 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc5 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc6 +[2024-09-13 07:42:08,447][06556] Starting process rollout_proc7 +[2024-09-13 07:42:23,889][08718] Worker 0 uses CPU cores [0] +[2024-09-13 07:42:23,999][08723] Worker 6 uses CPU cores [0] +[2024-09-13 07:42:24,112][08716] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-13 07:42:24,116][08716] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-13 07:42:24,186][08722] Worker 5 uses CPU cores [1] +[2024-09-13 07:42:24,228][08716] Num visible devices: 1 +[2024-09-13 07:42:24,325][08717] Worker 1 uses CPU cores [1] +[2024-09-13 07:42:24,350][08721] Worker 4 uses CPU cores [0] +[2024-09-13 07:42:24,378][08719] Worker 2 uses CPU cores [0] +[2024-09-13 07:42:24,392][08703] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-13 07:42:24,393][08703] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-13 07:42:24,439][08703] Num visible devices: 1 +[2024-09-13 07:42:24,468][08724] Worker 7 uses CPU cores [1] +[2024-09-13 07:42:24,473][08703] Starting seed is not provided +[2024-09-13 07:42:24,474][08703] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-13 07:42:24,474][08703] Initializing actor-critic model on device cuda:0 +[2024-09-13 07:42:24,474][08703] RunningMeanStd input shape: (3, 72, 128) +[2024-09-13 07:42:24,478][08703] RunningMeanStd input shape: (1,) +[2024-09-13 07:42:24,521][08703] ConvEncoder: input_channels=3 +[2024-09-13 07:42:24,531][08720] Worker 3 uses CPU cores [1] +[2024-09-13 07:42:24,846][08703] Conv encoder output size: 512 +[2024-09-13 07:42:24,847][08703] Policy head output size: 512 +[2024-09-13 07:42:24,911][08703] Created Actor Critic model with architecture: +[2024-09-13 07:42:24,911][08703] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-13 07:42:25,355][08703] Using optimizer +[2024-09-13 07:42:26,296][08703] No checkpoints found +[2024-09-13 07:42:26,297][08703] Did not load from checkpoint, starting from scratch! +[2024-09-13 07:42:26,298][08703] Initialized policy 0 weights for model version 0 +[2024-09-13 07:42:26,305][08703] LearnerWorker_p0 finished initialization! +[2024-09-13 07:42:26,306][08703] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-13 07:42:26,521][08716] RunningMeanStd input shape: (3, 72, 128) +[2024-09-13 07:42:26,522][08716] RunningMeanStd input shape: (1,) +[2024-09-13 07:42:26,534][08716] ConvEncoder: input_channels=3 +[2024-09-13 07:42:26,648][08716] Conv encoder output size: 512 +[2024-09-13 07:42:26,649][08716] Policy head output size: 512 +[2024-09-13 07:42:26,701][06556] Inference worker 0-0 is ready! +[2024-09-13 07:42:26,703][06556] All inference workers are ready! Signal rollout workers to start! +[2024-09-13 07:42:26,897][08724] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,898][08722] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,900][08720] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,904][08717] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,901][08719] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,905][08718] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,910][08721] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:26,910][08723] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 07:42:27,553][08722] Decorrelating experience for 0 frames... +[2024-09-13 07:42:27,691][06556] Heartbeat connected on Batcher_0 +[2024-09-13 07:42:27,696][06556] Heartbeat connected on LearnerWorker_p0 +[2024-09-13 07:42:27,747][06556] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-13 07:42:28,099][08722] Decorrelating experience for 32 frames... +[2024-09-13 07:42:28,230][06556] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-13 07:42:28,668][08719] Decorrelating experience for 0 frames... +[2024-09-13 07:42:28,665][08721] Decorrelating experience for 0 frames... +[2024-09-13 07:42:28,677][08723] Decorrelating experience for 0 frames... +[2024-09-13 07:42:28,679][08718] Decorrelating experience for 0 frames... +[2024-09-13 07:42:29,289][08722] Decorrelating experience for 64 frames... +[2024-09-13 07:42:29,811][08721] Decorrelating experience for 32 frames... +[2024-09-13 07:42:29,826][08723] Decorrelating experience for 32 frames... +[2024-09-13 07:42:30,158][08722] Decorrelating experience for 96 frames... +[2024-09-13 07:42:30,319][06556] Heartbeat connected on RolloutWorker_w5 +[2024-09-13 07:42:30,739][08724] Decorrelating experience for 0 frames... +[2024-09-13 07:42:30,935][08719] Decorrelating experience for 32 frames... +[2024-09-13 07:42:31,759][08720] Decorrelating experience for 0 frames... +[2024-09-13 07:42:32,424][08721] Decorrelating experience for 64 frames... +[2024-09-13 07:42:33,098][08723] Decorrelating experience for 64 frames... +[2024-09-13 07:42:33,229][06556] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 6.0. Samples: 30. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-13 07:42:33,231][06556] Avg episode reward: [(0, '1.280')] +[2024-09-13 07:42:33,430][08720] Decorrelating experience for 32 frames... +[2024-09-13 07:42:34,031][08719] Decorrelating experience for 64 frames... +[2024-09-13 07:42:34,198][08718] Decorrelating experience for 32 frames... +[2024-09-13 07:42:34,425][08724] Decorrelating experience for 32 frames... +[2024-09-13 07:42:34,608][08721] Decorrelating experience for 96 frames... +[2024-09-13 07:42:34,778][06556] Heartbeat connected on RolloutWorker_w4 +[2024-09-13 07:42:34,870][08723] Decorrelating experience for 96 frames... +[2024-09-13 07:42:35,087][06556] Heartbeat connected on RolloutWorker_w6 +[2024-09-13 07:42:36,629][08718] Decorrelating experience for 64 frames... +[2024-09-13 07:42:36,881][08717] Decorrelating experience for 0 frames... +[2024-09-13 07:42:37,831][08720] Decorrelating experience for 64 frames... +[2024-09-13 07:42:38,229][06556] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 106.4. Samples: 1064. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-13 07:42:38,234][06556] Avg episode reward: [(0, '3.031')] +[2024-09-13 07:42:38,422][08724] Decorrelating experience for 64 frames... +[2024-09-13 07:42:40,888][08703] Signal inference workers to stop experience collection... +[2024-09-13 07:42:40,908][08716] InferenceWorker_p0-w0: stopping experience collection +[2024-09-13 07:42:41,220][08717] Decorrelating experience for 32 frames... +[2024-09-13 07:42:41,440][08719] Decorrelating experience for 96 frames... +[2024-09-13 07:42:42,004][06556] Heartbeat connected on RolloutWorker_w2 +[2024-09-13 07:42:42,099][08720] Decorrelating experience for 96 frames... +[2024-09-13 07:42:42,593][06556] Heartbeat connected on RolloutWorker_w3 +[2024-09-13 07:42:43,235][06556] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 151.9. Samples: 2280. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-13 07:42:43,238][06556] Avg episode reward: [(0, '3.506')] +[2024-09-13 07:42:43,442][08718] Decorrelating experience for 96 frames... +[2024-09-13 07:42:43,746][08703] Signal inference workers to resume experience collection... +[2024-09-13 07:42:43,750][08716] InferenceWorker_p0-w0: resuming experience collection +[2024-09-13 07:42:43,829][06556] Heartbeat connected on RolloutWorker_w0 +[2024-09-13 07:42:43,898][08717] Decorrelating experience for 64 frames... +[2024-09-13 07:42:43,914][08724] Decorrelating experience for 96 frames... +[2024-09-13 07:42:44,319][06556] Heartbeat connected on RolloutWorker_w7 +[2024-09-13 07:42:46,083][08717] Decorrelating experience for 96 frames... +[2024-09-13 07:42:46,472][06556] Heartbeat connected on RolloutWorker_w1 +[2024-09-13 07:42:48,229][06556] Fps is (10 sec: 2048.0, 60 sec: 1024.1, 300 sec: 1024.1). Total num frames: 20480. Throughput: 0: 298.9. Samples: 5978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:42:48,231][06556] Avg episode reward: [(0, '3.729')] +[2024-09-13 07:42:51,815][08716] Updated weights for policy 0, policy_version 10 (0.0236) +[2024-09-13 07:42:53,229][06556] Fps is (10 sec: 4508.4, 60 sec: 1802.3, 300 sec: 1802.3). Total num frames: 45056. Throughput: 0: 378.7. Samples: 9466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:42:53,231][06556] Avg episode reward: [(0, '4.083')] +[2024-09-13 07:42:58,229][06556] Fps is (10 sec: 4095.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 500.6. Samples: 15018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:42:58,232][06556] Avg episode reward: [(0, '4.263')] +[2024-09-13 07:43:03,229][06556] Fps is (10 sec: 3276.8, 60 sec: 2223.6, 300 sec: 2223.6). Total num frames: 77824. Throughput: 0: 582.6. Samples: 20392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:03,235][06556] Avg episode reward: [(0, '4.440')] +[2024-09-13 07:43:03,326][08716] Updated weights for policy 0, policy_version 20 (0.0036) +[2024-09-13 07:43:08,231][06556] Fps is (10 sec: 4095.3, 60 sec: 2559.9, 300 sec: 2559.9). Total num frames: 102400. Throughput: 0: 598.8. Samples: 23952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:43:08,234][06556] Avg episode reward: [(0, '4.443')] +[2024-09-13 07:43:08,243][08703] Saving new best policy, reward=4.443! +[2024-09-13 07:43:13,230][06556] Fps is (10 sec: 4095.3, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 665.1. Samples: 29932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:43:13,237][06556] Avg episode reward: [(0, '4.601')] +[2024-09-13 07:43:13,240][08703] Saving new best policy, reward=4.601! +[2024-09-13 07:43:14,753][08716] Updated weights for policy 0, policy_version 30 (0.0015) +[2024-09-13 07:43:18,229][06556] Fps is (10 sec: 3277.6, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 749.2. Samples: 33742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:18,231][06556] Avg episode reward: [(0, '4.459')] +[2024-09-13 07:43:23,229][06556] Fps is (10 sec: 3687.0, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 805.6. Samples: 37314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:23,231][06556] Avg episode reward: [(0, '4.285')] +[2024-09-13 07:43:24,225][08716] Updated weights for policy 0, policy_version 40 (0.0018) +[2024-09-13 07:43:28,236][06556] Fps is (10 sec: 4502.4, 60 sec: 3003.4, 300 sec: 3003.4). Total num frames: 180224. Throughput: 0: 937.2. Samples: 44456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:28,241][06556] Avg episode reward: [(0, '4.494')] +[2024-09-13 07:43:33,229][06556] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 2961.8). Total num frames: 192512. Throughput: 0: 951.9. Samples: 48812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:33,234][06556] Avg episode reward: [(0, '4.606')] +[2024-09-13 07:43:33,241][08703] Saving new best policy, reward=4.606! +[2024-09-13 07:43:35,660][08716] Updated weights for policy 0, policy_version 50 (0.0036) +[2024-09-13 07:43:38,229][06556] Fps is (10 sec: 3689.0, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 938.4. Samples: 51692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:43:38,237][06556] Avg episode reward: [(0, '4.270')] +[2024-09-13 07:43:43,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3959.9, 300 sec: 3167.6). Total num frames: 237568. Throughput: 0: 968.2. Samples: 58588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:43:43,233][06556] Avg episode reward: [(0, '4.337')] +[2024-09-13 07:43:45,058][08716] Updated weights for policy 0, policy_version 60 (0.0038) +[2024-09-13 07:43:48,230][06556] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 967.1. Samples: 63912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:43:48,240][06556] Avg episode reward: [(0, '4.458')] +[2024-09-13 07:43:53,229][06556] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3180.5). Total num frames: 270336. Throughput: 0: 936.2. Samples: 66080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:43:53,234][06556] Avg episode reward: [(0, '4.477')] +[2024-09-13 07:43:55,905][08716] Updated weights for policy 0, policy_version 70 (0.0041) +[2024-09-13 07:43:58,229][06556] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 294912. Throughput: 0: 962.7. Samples: 73250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:43:58,232][06556] Avg episode reward: [(0, '4.318')] +[2024-09-13 07:43:58,245][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth... +[2024-09-13 07:44:03,229][06556] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3320.0). Total num frames: 315392. Throughput: 0: 1016.3. Samples: 79476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:44:03,232][06556] Avg episode reward: [(0, '4.525')] +[2024-09-13 07:44:06,763][08716] Updated weights for policy 0, policy_version 80 (0.0042) +[2024-09-13 07:44:08,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 985.6. Samples: 81668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:44:08,231][06556] Avg episode reward: [(0, '4.515')] +[2024-09-13 07:44:13,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3354.9). Total num frames: 352256. Throughput: 0: 956.8. Samples: 87506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:44:13,236][06556] Avg episode reward: [(0, '4.199')] +[2024-09-13 07:44:16,205][08716] Updated weights for policy 0, policy_version 90 (0.0035) +[2024-09-13 07:44:18,229][06556] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3425.8). Total num frames: 376832. Throughput: 0: 1018.5. Samples: 94644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:44:18,232][06556] Avg episode reward: [(0, '4.388')] +[2024-09-13 07:44:23,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 998.2. Samples: 96610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:44:23,233][06556] Avg episode reward: [(0, '4.503')] +[2024-09-13 07:44:28,008][08716] Updated weights for policy 0, policy_version 100 (0.0021) +[2024-09-13 07:44:28,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3823.4, 300 sec: 3413.4). Total num frames: 409600. Throughput: 0: 953.4. Samples: 101490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:44:28,233][06556] Avg episode reward: [(0, '4.503')] +[2024-09-13 07:44:33,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3440.7). Total num frames: 430080. Throughput: 0: 987.9. Samples: 108364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:44:33,235][06556] Avg episode reward: [(0, '4.591')] +[2024-09-13 07:44:38,148][08716] Updated weights for policy 0, policy_version 110 (0.0029) +[2024-09-13 07:44:38,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3465.9). Total num frames: 450560. Throughput: 0: 1009.3. Samples: 111500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:44:38,232][06556] Avg episode reward: [(0, '4.476')] +[2024-09-13 07:44:43,232][06556] Fps is (10 sec: 2866.3, 60 sec: 3686.2, 300 sec: 3398.1). Total num frames: 458752. Throughput: 0: 936.3. Samples: 115386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:44:43,236][06556] Avg episode reward: [(0, '4.520')] +[2024-09-13 07:44:48,233][06556] Fps is (10 sec: 2456.6, 60 sec: 3686.2, 300 sec: 3393.7). Total num frames: 475136. Throughput: 0: 888.5. Samples: 119462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:44:48,238][06556] Avg episode reward: [(0, '4.530')] +[2024-09-13 07:44:51,353][08716] Updated weights for policy 0, policy_version 120 (0.0035) +[2024-09-13 07:44:53,229][06556] Fps is (10 sec: 4097.2, 60 sec: 3822.9, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 911.1. Samples: 122668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:44:53,233][06556] Avg episode reward: [(0, '4.596')] +[2024-09-13 07:44:58,229][06556] Fps is (10 sec: 3688.0, 60 sec: 3618.1, 300 sec: 3413.4). Total num frames: 512000. Throughput: 0: 897.3. Samples: 127886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:44:58,232][06556] Avg episode reward: [(0, '4.673')] +[2024-09-13 07:44:58,241][08703] Saving new best policy, reward=4.673! +[2024-09-13 07:45:03,231][06556] Fps is (10 sec: 2866.6, 60 sec: 3549.7, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 852.7. Samples: 133018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:03,238][06556] Avg episode reward: [(0, '4.528')] +[2024-09-13 07:45:03,380][08716] Updated weights for policy 0, policy_version 130 (0.0031) +[2024-09-13 07:45:08,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 883.0. Samples: 136346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:08,237][06556] Avg episode reward: [(0, '4.378')] +[2024-09-13 07:45:13,229][06556] Fps is (10 sec: 2867.8, 60 sec: 3413.3, 300 sec: 3376.1). Total num frames: 557056. Throughput: 0: 844.1. Samples: 139474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:13,231][06556] Avg episode reward: [(0, '4.446')] +[2024-09-13 07:45:17,890][08716] Updated weights for policy 0, policy_version 140 (0.0046) +[2024-09-13 07:45:18,229][06556] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3373.2). Total num frames: 573440. Throughput: 0: 784.7. Samples: 143674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:45:18,230][06556] Avg episode reward: [(0, '4.458')] +[2024-09-13 07:45:23,229][06556] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3417.3). Total num frames: 598016. Throughput: 0: 791.8. Samples: 147130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:23,230][06556] Avg episode reward: [(0, '4.312')] +[2024-09-13 07:45:26,578][08716] Updated weights for policy 0, policy_version 150 (0.0027) +[2024-09-13 07:45:28,232][06556] Fps is (10 sec: 4504.0, 60 sec: 3481.4, 300 sec: 3436.0). Total num frames: 618496. Throughput: 0: 864.5. Samples: 154288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:45:28,237][06556] Avg episode reward: [(0, '4.288')] +[2024-09-13 07:45:33,229][06556] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3409.7). Total num frames: 630784. Throughput: 0: 869.8. Samples: 158598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:33,235][06556] Avg episode reward: [(0, '4.372')] +[2024-09-13 07:45:37,972][08716] Updated weights for policy 0, policy_version 160 (0.0033) +[2024-09-13 07:45:38,229][06556] Fps is (10 sec: 3687.7, 60 sec: 3413.3, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 867.1. Samples: 161688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:38,232][06556] Avg episode reward: [(0, '4.585')] +[2024-09-13 07:45:43,229][06556] Fps is (10 sec: 4505.8, 60 sec: 3618.3, 300 sec: 3465.9). Total num frames: 675840. Throughput: 0: 907.2. Samples: 168710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:45:43,231][06556] Avg episode reward: [(0, '4.627')] +[2024-09-13 07:45:48,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3618.4, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 910.4. Samples: 173986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:48,233][06556] Avg episode reward: [(0, '4.605')] +[2024-09-13 07:45:48,676][08716] Updated weights for policy 0, policy_version 170 (0.0045) +[2024-09-13 07:45:53,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 885.1. Samples: 176176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:53,234][06556] Avg episode reward: [(0, '4.656')] +[2024-09-13 07:45:57,974][08716] Updated weights for policy 0, policy_version 180 (0.0023) +[2024-09-13 07:45:58,229][06556] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3510.9). Total num frames: 737280. Throughput: 0: 976.5. Samples: 183418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:45:58,231][06556] Avg episode reward: [(0, '4.616')] +[2024-09-13 07:45:58,243][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth... +[2024-09-13 07:46:03,229][06556] Fps is (10 sec: 4505.7, 60 sec: 3823.1, 300 sec: 3524.5). Total num frames: 757760. Throughput: 0: 1024.1. Samples: 189758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:03,234][06556] Avg episode reward: [(0, '4.828')] +[2024-09-13 07:46:03,238][08703] Saving new best policy, reward=4.828! +[2024-09-13 07:46:08,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3500.2). Total num frames: 770048. Throughput: 0: 992.0. Samples: 191770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:46:08,230][06556] Avg episode reward: [(0, '4.729')] +[2024-09-13 07:46:09,254][08716] Updated weights for policy 0, policy_version 190 (0.0032) +[2024-09-13 07:46:13,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 970.6. Samples: 197960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:46:13,235][06556] Avg episode reward: [(0, '4.602')] +[2024-09-13 07:46:18,229][06556] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3543.9). Total num frames: 815104. Throughput: 0: 1026.1. Samples: 204774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:46:18,230][06556] Avg episode reward: [(0, '4.606')] +[2024-09-13 07:46:18,713][08716] Updated weights for policy 0, policy_version 200 (0.0041) +[2024-09-13 07:46:23,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3520.8). Total num frames: 827392. Throughput: 0: 998.1. Samples: 206602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:23,234][06556] Avg episode reward: [(0, '4.524')] +[2024-09-13 07:46:28,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3532.8). Total num frames: 847872. Throughput: 0: 955.7. Samples: 211718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:28,231][06556] Avg episode reward: [(0, '4.629')] +[2024-09-13 07:46:30,113][08716] Updated weights for policy 0, policy_version 210 (0.0019) +[2024-09-13 07:46:33,229][06556] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 994.5. Samples: 218740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:33,235][06556] Avg episode reward: [(0, '4.737')] +[2024-09-13 07:46:38,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3555.3). Total num frames: 888832. Throughput: 0: 1014.8. Samples: 221842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:46:38,235][06556] Avg episode reward: [(0, '4.555')] +[2024-09-13 07:46:41,417][08716] Updated weights for policy 0, policy_version 220 (0.0021) +[2024-09-13 07:46:43,229][06556] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 905216. Throughput: 0: 947.8. Samples: 226068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:43,237][06556] Avg episode reward: [(0, '4.531')] +[2024-09-13 07:46:48,229][06556] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3528.9). Total num frames: 917504. Throughput: 0: 899.6. Samples: 230242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:46:48,231][06556] Avg episode reward: [(0, '4.560')] +[2024-09-13 07:46:53,229][06556] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3539.6). Total num frames: 937984. Throughput: 0: 920.7. Samples: 233202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:46:53,232][06556] Avg episode reward: [(0, '4.614')] +[2024-09-13 07:46:54,017][08716] Updated weights for policy 0, policy_version 230 (0.0022) +[2024-09-13 07:46:58,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3519.5). Total num frames: 950272. Throughput: 0: 887.0. Samples: 237874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:46:58,233][06556] Avg episode reward: [(0, '4.698')] +[2024-09-13 07:47:03,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3544.9). Total num frames: 974848. Throughput: 0: 868.3. Samples: 243848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:47:03,231][06556] Avg episode reward: [(0, '4.739')] +[2024-09-13 07:47:04,951][08716] Updated weights for policy 0, policy_version 240 (0.0020) +[2024-09-13 07:47:08,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3554.8). Total num frames: 995328. Throughput: 0: 904.4. Samples: 247302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:47:08,238][06556] Avg episode reward: [(0, '4.693')] +[2024-09-13 07:47:13,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3549.9). Total num frames: 1011712. Throughput: 0: 914.0. Samples: 252848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:47:13,235][06556] Avg episode reward: [(0, '4.911')] +[2024-09-13 07:47:13,238][08703] Saving new best policy, reward=4.911! +[2024-09-13 07:47:16,872][08716] Updated weights for policy 0, policy_version 250 (0.0019) +[2024-09-13 07:47:18,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3545.2). Total num frames: 1028096. Throughput: 0: 863.0. Samples: 257576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:47:18,233][06556] Avg episode reward: [(0, '4.736')] +[2024-09-13 07:47:23,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 867.0. Samples: 260858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:47:23,231][06556] Avg episode reward: [(0, '4.583')] +[2024-09-13 07:47:26,068][08716] Updated weights for policy 0, policy_version 260 (0.0029) +[2024-09-13 07:47:28,232][06556] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 914.0. Samples: 267200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:47:28,234][06556] Avg episode reward: [(0, '4.809')] +[2024-09-13 07:47:33,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 908.7. Samples: 271134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:47:33,231][06556] Avg episode reward: [(0, '4.901')] +[2024-09-13 07:47:38,091][08716] Updated weights for policy 0, policy_version 270 (0.0017) +[2024-09-13 07:47:38,229][06556] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 913.6. Samples: 274316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:47:38,235][06556] Avg episode reward: [(0, '5.159')] +[2024-09-13 07:47:38,246][08703] Saving new best policy, reward=5.159! +[2024-09-13 07:47:43,233][06556] Fps is (10 sec: 4503.8, 60 sec: 3686.2, 300 sec: 3748.8). Total num frames: 1126400. Throughput: 0: 959.7. Samples: 281064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:47:43,237][06556] Avg episode reward: [(0, '5.225')] +[2024-09-13 07:47:43,239][08703] Saving new best policy, reward=5.225! +[2024-09-13 07:47:48,229][06556] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 1138688. Throughput: 0: 929.0. Samples: 285654. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:47:48,235][06556] Avg episode reward: [(0, '5.281')] +[2024-09-13 07:47:48,246][08703] Saving new best policy, reward=5.281! +[2024-09-13 07:47:50,111][08716] Updated weights for policy 0, policy_version 280 (0.0039) +[2024-09-13 07:47:53,229][06556] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1159168. Throughput: 0: 903.1. Samples: 287940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:47:53,236][06556] Avg episode reward: [(0, '5.115')] +[2024-09-13 07:47:58,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1183744. Throughput: 0: 931.6. Samples: 294770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:47:58,231][06556] Avg episode reward: [(0, '4.934')] +[2024-09-13 07:47:58,239][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000289_1183744.pth... +[2024-09-13 07:47:58,357][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth +[2024-09-13 07:47:59,036][08716] Updated weights for policy 0, policy_version 290 (0.0042) +[2024-09-13 07:48:03,234][06556] Fps is (10 sec: 4093.9, 60 sec: 3754.3, 300 sec: 3721.1). Total num frames: 1200128. Throughput: 0: 953.0. Samples: 300464. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:48:03,236][06556] Avg episode reward: [(0, '4.970')] +[2024-09-13 07:48:08,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 1216512. Throughput: 0: 927.2. Samples: 302580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:48:08,232][06556] Avg episode reward: [(0, '5.045')] +[2024-09-13 07:48:10,582][08716] Updated weights for policy 0, policy_version 300 (0.0019) +[2024-09-13 07:48:13,229][06556] Fps is (10 sec: 3688.3, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1236992. Throughput: 0: 930.6. Samples: 309074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:48:13,231][06556] Avg episode reward: [(0, '5.119')] +[2024-09-13 07:48:18,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1261568. Throughput: 0: 991.1. Samples: 315732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:48:18,237][06556] Avg episode reward: [(0, '5.495')] +[2024-09-13 07:48:18,249][08703] Saving new best policy, reward=5.495! +[2024-09-13 07:48:21,131][08716] Updated weights for policy 0, policy_version 310 (0.0023) +[2024-09-13 07:48:23,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 1273856. Throughput: 0: 964.9. Samples: 317734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:48:23,231][06556] Avg episode reward: [(0, '5.555')] +[2024-09-13 07:48:23,234][08703] Saving new best policy, reward=5.555! +[2024-09-13 07:48:28,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3735.0). Total num frames: 1294336. Throughput: 0: 935.9. Samples: 323174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:48:28,230][06556] Avg episode reward: [(0, '5.388')] +[2024-09-13 07:48:30,896][08716] Updated weights for policy 0, policy_version 320 (0.0023) +[2024-09-13 07:48:33,229][06556] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 1318912. Throughput: 0: 992.2. Samples: 330302. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:48:33,236][06556] Avg episode reward: [(0, '5.229')] +[2024-09-13 07:48:38,231][06556] Fps is (10 sec: 4095.2, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1335296. Throughput: 0: 1002.4. Samples: 333048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:48:38,238][06556] Avg episode reward: [(0, '5.462')] +[2024-09-13 07:48:42,617][08716] Updated weights for policy 0, policy_version 330 (0.0016) +[2024-09-13 07:48:43,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 1351680. Throughput: 0: 950.2. Samples: 337528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:48:43,232][06556] Avg episode reward: [(0, '5.348')] +[2024-09-13 07:48:48,229][06556] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 1376256. Throughput: 0: 978.6. Samples: 344496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:48:48,232][06556] Avg episode reward: [(0, '5.463')] +[2024-09-13 07:48:51,933][08716] Updated weights for policy 0, policy_version 340 (0.0030) +[2024-09-13 07:48:53,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1392640. Throughput: 0: 1007.2. Samples: 347906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:48:53,235][06556] Avg episode reward: [(0, '5.486')] +[2024-09-13 07:48:58,231][06556] Fps is (10 sec: 2866.6, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 1404928. Throughput: 0: 941.2. Samples: 351428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:48:58,238][06556] Avg episode reward: [(0, '5.529')] +[2024-09-13 07:49:03,229][06556] Fps is (10 sec: 2457.6, 60 sec: 3618.4, 300 sec: 3679.5). Total num frames: 1417216. Throughput: 0: 879.6. Samples: 355312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:49:03,231][06556] Avg episode reward: [(0, '5.313')] +[2024-09-13 07:49:05,846][08716] Updated weights for policy 0, policy_version 350 (0.0032) +[2024-09-13 07:49:08,229][06556] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1441792. Throughput: 0: 912.3. Samples: 358786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:49:08,235][06556] Avg episode reward: [(0, '5.554')] +[2024-09-13 07:49:13,233][06556] Fps is (10 sec: 4503.8, 60 sec: 3754.4, 300 sec: 3679.4). Total num frames: 1462272. Throughput: 0: 936.4. Samples: 365316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:49:13,235][06556] Avg episode reward: [(0, '5.604')] +[2024-09-13 07:49:13,239][08703] Saving new best policy, reward=5.604! +[2024-09-13 07:49:17,543][08716] Updated weights for policy 0, policy_version 360 (0.0016) +[2024-09-13 07:49:18,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 1474560. Throughput: 0: 865.2. Samples: 369236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:49:18,234][06556] Avg episode reward: [(0, '5.749')] +[2024-09-13 07:49:18,251][08703] Saving new best policy, reward=5.749! +[2024-09-13 07:49:23,229][06556] Fps is (10 sec: 3278.1, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1495040. Throughput: 0: 871.0. Samples: 372242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:49:23,231][06556] Avg episode reward: [(0, '5.943')] +[2024-09-13 07:49:23,237][08703] Saving new best policy, reward=5.943! +[2024-09-13 07:49:27,302][08716] Updated weights for policy 0, policy_version 370 (0.0025) +[2024-09-13 07:49:28,233][06556] Fps is (10 sec: 4503.5, 60 sec: 3754.4, 300 sec: 3693.3). Total num frames: 1519616. Throughput: 0: 917.3. Samples: 378812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:49:28,235][06556] Avg episode reward: [(0, '6.383')] +[2024-09-13 07:49:28,246][08703] Saving new best policy, reward=6.383! +[2024-09-13 07:49:33,231][06556] Fps is (10 sec: 3685.5, 60 sec: 3549.7, 300 sec: 3665.5). Total num frames: 1531904. Throughput: 0: 870.6. Samples: 383674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:49:33,236][06556] Avg episode reward: [(0, '6.351')] +[2024-09-13 07:49:38,229][06556] Fps is (10 sec: 3278.3, 60 sec: 3618.2, 300 sec: 3707.3). Total num frames: 1552384. Throughput: 0: 845.9. Samples: 385970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:49:38,231][06556] Avg episode reward: [(0, '6.314')] +[2024-09-13 07:49:39,019][08716] Updated weights for policy 0, policy_version 380 (0.0023) +[2024-09-13 07:49:43,229][06556] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3721.2). Total num frames: 1572864. Throughput: 0: 919.4. Samples: 392798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:49:43,232][06556] Avg episode reward: [(0, '6.300')] +[2024-09-13 07:49:48,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 1589248. Throughput: 0: 958.1. Samples: 398428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:49:48,237][06556] Avg episode reward: [(0, '6.140')] +[2024-09-13 07:49:49,989][08716] Updated weights for policy 0, policy_version 390 (0.0027) +[2024-09-13 07:49:53,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 1605632. Throughput: 0: 926.1. Samples: 400460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:49:53,231][06556] Avg episode reward: [(0, '5.911')] +[2024-09-13 07:49:58,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3735.0). Total num frames: 1630208. Throughput: 0: 919.5. Samples: 406688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:49:58,233][06556] Avg episode reward: [(0, '6.122')] +[2024-09-13 07:49:58,242][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth... +[2024-09-13 07:49:58,384][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_737280.pth +[2024-09-13 07:49:59,744][08716] Updated weights for policy 0, policy_version 400 (0.0014) +[2024-09-13 07:50:03,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 1650688. Throughput: 0: 981.6. Samples: 413408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:50:03,237][06556] Avg episode reward: [(0, '6.071')] +[2024-09-13 07:50:08,231][06556] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3762.7). Total num frames: 1667072. Throughput: 0: 960.3. Samples: 415458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:50:08,237][06556] Avg episode reward: [(0, '6.041')] +[2024-09-13 07:50:11,454][08716] Updated weights for policy 0, policy_version 410 (0.0030) +[2024-09-13 07:50:13,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 1683456. Throughput: 0: 932.4. Samples: 420766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:50:13,231][06556] Avg episode reward: [(0, '6.095')] +[2024-09-13 07:50:18,229][06556] Fps is (10 sec: 4096.8, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1708032. Throughput: 0: 975.6. Samples: 427576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:50:18,231][06556] Avg episode reward: [(0, '6.233')] +[2024-09-13 07:50:20,491][08716] Updated weights for policy 0, policy_version 420 (0.0023) +[2024-09-13 07:50:23,233][06556] Fps is (10 sec: 4094.3, 60 sec: 3822.7, 300 sec: 3748.9). Total num frames: 1724416. Throughput: 0: 991.1. Samples: 430574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:50:23,237][06556] Avg episode reward: [(0, '7.068')] +[2024-09-13 07:50:23,242][08703] Saving new best policy, reward=7.068! +[2024-09-13 07:50:28,230][06556] Fps is (10 sec: 3276.5, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 1740800. Throughput: 0: 932.9. Samples: 434778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:50:28,233][06556] Avg episode reward: [(0, '7.311')] +[2024-09-13 07:50:28,254][08703] Saving new best policy, reward=7.311! +[2024-09-13 07:50:32,043][08716] Updated weights for policy 0, policy_version 430 (0.0015) +[2024-09-13 07:50:33,229][06556] Fps is (10 sec: 4097.7, 60 sec: 3891.4, 300 sec: 3762.8). Total num frames: 1765376. Throughput: 0: 963.3. Samples: 441778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:50:33,232][06556] Avg episode reward: [(0, '7.644')] +[2024-09-13 07:50:33,235][08703] Saving new best policy, reward=7.644! +[2024-09-13 07:50:38,229][06556] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1785856. Throughput: 0: 992.9. Samples: 445140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:50:38,233][06556] Avg episode reward: [(0, '7.413')] +[2024-09-13 07:50:43,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1798144. Throughput: 0: 956.6. Samples: 449736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:50:43,233][06556] Avg episode reward: [(0, '8.253')] +[2024-09-13 07:50:43,239][08703] Saving new best policy, reward=8.253! +[2024-09-13 07:50:43,701][08716] Updated weights for policy 0, policy_version 440 (0.0027) +[2024-09-13 07:50:48,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 1822720. Throughput: 0: 942.7. Samples: 455830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:50:48,232][06556] Avg episode reward: [(0, '7.967')] +[2024-09-13 07:50:52,840][08716] Updated weights for policy 0, policy_version 450 (0.0013) +[2024-09-13 07:50:53,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 1843200. Throughput: 0: 973.9. Samples: 459282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:50:53,231][06556] Avg episode reward: [(0, '8.189')] +[2024-09-13 07:50:58,233][06556] Fps is (10 sec: 2866.1, 60 sec: 3686.1, 300 sec: 3707.2). Total num frames: 1851392. Throughput: 0: 949.9. Samples: 463516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:50:58,237][06556] Avg episode reward: [(0, '8.696')] +[2024-09-13 07:50:58,314][08703] Saving new best policy, reward=8.696! +[2024-09-13 07:51:03,229][06556] Fps is (10 sec: 2457.6, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1867776. Throughput: 0: 877.7. Samples: 467074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:51:03,231][06556] Avg episode reward: [(0, '8.941')] +[2024-09-13 07:51:03,239][08703] Saving new best policy, reward=8.941! +[2024-09-13 07:51:06,837][08716] Updated weights for policy 0, policy_version 460 (0.0038) +[2024-09-13 07:51:08,229][06556] Fps is (10 sec: 3687.9, 60 sec: 3686.5, 300 sec: 3707.2). Total num frames: 1888256. Throughput: 0: 883.5. Samples: 470330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:51:08,231][06556] Avg episode reward: [(0, '9.828')] +[2024-09-13 07:51:08,245][08703] Saving new best policy, reward=9.828! +[2024-09-13 07:51:13,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1912832. Throughput: 0: 944.8. Samples: 477292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:51:13,231][06556] Avg episode reward: [(0, '10.600')] +[2024-09-13 07:51:13,238][08703] Saving new best policy, reward=10.600! +[2024-09-13 07:51:17,466][08716] Updated weights for policy 0, policy_version 470 (0.0045) +[2024-09-13 07:51:18,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1925120. Throughput: 0: 896.0. Samples: 482098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:51:18,233][06556] Avg episode reward: [(0, '10.656')] +[2024-09-13 07:51:18,243][08703] Saving new best policy, reward=10.656! +[2024-09-13 07:51:23,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3721.1). Total num frames: 1945600. Throughput: 0: 867.6. Samples: 484182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:51:23,233][06556] Avg episode reward: [(0, '11.067')] +[2024-09-13 07:51:23,238][08703] Saving new best policy, reward=11.067! +[2024-09-13 07:51:28,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 1961984. Throughput: 0: 915.2. Samples: 490920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:51:28,234][06556] Avg episode reward: [(0, '11.214')] +[2024-09-13 07:51:28,250][08703] Saving new best policy, reward=11.214! +[2024-09-13 07:51:28,740][08716] Updated weights for policy 0, policy_version 480 (0.0041) +[2024-09-13 07:51:33,229][06556] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 1974272. Throughput: 0: 848.2. Samples: 494000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:51:33,235][06556] Avg episode reward: [(0, '11.460')] +[2024-09-13 07:51:33,242][08703] Saving new best policy, reward=11.460! +[2024-09-13 07:51:38,235][06556] Fps is (10 sec: 2046.8, 60 sec: 3276.5, 300 sec: 3651.6). Total num frames: 1982464. Throughput: 0: 804.5. Samples: 495488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:51:38,237][06556] Avg episode reward: [(0, '11.404')] +[2024-09-13 07:51:43,229][06556] Fps is (10 sec: 2048.0, 60 sec: 3276.8, 300 sec: 3651.7). Total num frames: 1994752. Throughput: 0: 779.0. Samples: 498568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:51:43,232][06556] Avg episode reward: [(0, '11.211')] +[2024-09-13 07:51:48,229][06556] Fps is (10 sec: 2049.2, 60 sec: 3003.7, 300 sec: 3610.0). Total num frames: 2002944. Throughput: 0: 768.5. Samples: 501658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:51:48,233][06556] Avg episode reward: [(0, '10.885')] +[2024-09-13 07:51:49,147][08716] Updated weights for policy 0, policy_version 490 (0.0071) +[2024-09-13 07:51:53,232][06556] Fps is (10 sec: 2047.4, 60 sec: 2867.1, 300 sec: 3610.0). Total num frames: 2015232. Throughput: 0: 732.1. Samples: 503276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:51:53,238][06556] Avg episode reward: [(0, '10.710')] +[2024-09-13 07:51:58,232][06556] Fps is (10 sec: 2047.3, 60 sec: 2867.2, 300 sec: 3554.5). Total num frames: 2023424. Throughput: 0: 647.0. Samples: 506410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:51:58,235][06556] Avg episode reward: [(0, '11.324')] +[2024-09-13 07:51:58,248][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth... +[2024-09-13 07:51:58,489][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000289_1183744.pth +[2024-09-13 07:52:03,229][06556] Fps is (10 sec: 2048.6, 60 sec: 2798.9, 300 sec: 3526.7). Total num frames: 2035712. Throughput: 0: 607.5. Samples: 509434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:52:03,237][06556] Avg episode reward: [(0, '11.434')] +[2024-09-13 07:52:08,229][06556] Fps is (10 sec: 2048.7, 60 sec: 2594.1, 300 sec: 3499.0). Total num frames: 2043904. Throughput: 0: 595.1. Samples: 510962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:52:08,233][06556] Avg episode reward: [(0, '11.187')] +[2024-09-13 07:52:09,080][08716] Updated weights for policy 0, policy_version 500 (0.0044) +[2024-09-13 07:52:13,229][06556] Fps is (10 sec: 2048.0, 60 sec: 2389.3, 300 sec: 3485.1). Total num frames: 2056192. Throughput: 0: 517.4. Samples: 514202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:52:13,235][06556] Avg episode reward: [(0, '12.064')] +[2024-09-13 07:52:13,238][08703] Saving new best policy, reward=12.064! +[2024-09-13 07:52:18,231][06556] Fps is (10 sec: 2457.1, 60 sec: 2389.3, 300 sec: 3457.3). Total num frames: 2068480. Throughput: 0: 522.4. Samples: 517510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:52:18,235][06556] Avg episode reward: [(0, '11.948')] +[2024-09-13 07:52:23,229][06556] Fps is (10 sec: 2048.0, 60 sec: 2184.5, 300 sec: 3415.7). Total num frames: 2076672. Throughput: 0: 522.3. Samples: 518990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:52:23,232][06556] Avg episode reward: [(0, '12.522')] +[2024-09-13 07:52:23,237][08703] Saving new best policy, reward=12.522! +[2024-09-13 07:52:27,865][08716] Updated weights for policy 0, policy_version 510 (0.0062) +[2024-09-13 07:52:28,229][06556] Fps is (10 sec: 2048.4, 60 sec: 2116.3, 300 sec: 3415.6). Total num frames: 2088960. Throughput: 0: 525.1. Samples: 522198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:52:28,235][06556] Avg episode reward: [(0, '12.601')] +[2024-09-13 07:52:28,244][08703] Saving new best policy, reward=12.601! +[2024-09-13 07:52:33,231][06556] Fps is (10 sec: 2047.6, 60 sec: 2047.9, 300 sec: 3360.1). Total num frames: 2097152. Throughput: 0: 530.9. Samples: 525548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:52:33,237][06556] Avg episode reward: [(0, '13.221')] +[2024-09-13 07:52:33,243][08703] Saving new best policy, reward=13.221! +[2024-09-13 07:52:38,232][06556] Fps is (10 sec: 2047.4, 60 sec: 2116.4, 300 sec: 3332.4). Total num frames: 2109440. Throughput: 0: 530.7. Samples: 527156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:52:38,234][06556] Avg episode reward: [(0, '12.854')] +[2024-09-13 07:52:43,231][06556] Fps is (10 sec: 2457.6, 60 sec: 2116.2, 300 sec: 3332.3). Total num frames: 2121728. Throughput: 0: 536.0. Samples: 530530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-13 07:52:43,236][06556] Avg episode reward: [(0, '14.008')] +[2024-09-13 07:52:43,238][08703] Saving new best policy, reward=14.008! +[2024-09-13 07:52:47,017][08716] Updated weights for policy 0, policy_version 520 (0.0030) +[2024-09-13 07:52:48,229][06556] Fps is (10 sec: 2048.6, 60 sec: 2116.3, 300 sec: 3290.7). Total num frames: 2129920. Throughput: 0: 537.0. Samples: 533598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:52:48,235][06556] Avg episode reward: [(0, '13.610')] +[2024-09-13 07:52:53,231][06556] Fps is (10 sec: 2048.0, 60 sec: 2116.3, 300 sec: 3249.0). Total num frames: 2142208. Throughput: 0: 537.8. Samples: 535162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:52:53,237][06556] Avg episode reward: [(0, '13.722')] +[2024-09-13 07:52:58,229][06556] Fps is (10 sec: 2867.2, 60 sec: 2252.9, 300 sec: 3249.1). Total num frames: 2158592. Throughput: 0: 556.8. Samples: 539256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:52:58,233][06556] Avg episode reward: [(0, '14.267')] +[2024-09-13 07:52:58,243][08703] Saving new best policy, reward=14.267! +[2024-09-13 07:53:00,332][08716] Updated weights for policy 0, policy_version 530 (0.0043) +[2024-09-13 07:53:03,229][06556] Fps is (10 sec: 3687.2, 60 sec: 2389.3, 300 sec: 3262.9). Total num frames: 2179072. Throughput: 0: 624.3. Samples: 545604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:53:03,230][06556] Avg episode reward: [(0, '12.735')] +[2024-09-13 07:53:08,233][06556] Fps is (10 sec: 3275.5, 60 sec: 2457.4, 300 sec: 3235.1). Total num frames: 2191360. Throughput: 0: 635.1. Samples: 547570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:53:08,235][06556] Avg episode reward: [(0, '11.693')] +[2024-09-13 07:53:13,230][06556] Fps is (10 sec: 2457.2, 60 sec: 2457.5, 300 sec: 3193.5). Total num frames: 2203648. Throughput: 0: 642.6. Samples: 551118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:53:13,236][06556] Avg episode reward: [(0, '11.991')] +[2024-09-13 07:53:14,817][08716] Updated weights for policy 0, policy_version 540 (0.0019) +[2024-09-13 07:53:18,229][06556] Fps is (10 sec: 3278.1, 60 sec: 2594.2, 300 sec: 3221.3). Total num frames: 2224128. Throughput: 0: 697.1. Samples: 556916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:53:18,231][06556] Avg episode reward: [(0, '12.036')] +[2024-09-13 07:53:23,231][06556] Fps is (10 sec: 4095.9, 60 sec: 2798.8, 300 sec: 3221.2). Total num frames: 2244608. Throughput: 0: 732.7. Samples: 560128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:53:23,235][06556] Avg episode reward: [(0, '12.534')] +[2024-09-13 07:53:25,791][08716] Updated weights for policy 0, policy_version 550 (0.0036) +[2024-09-13 07:53:28,229][06556] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 3193.5). Total num frames: 2260992. Throughput: 0: 752.8. Samples: 564404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:53:28,231][06556] Avg episode reward: [(0, '13.825')] +[2024-09-13 07:53:33,229][06556] Fps is (10 sec: 3687.2, 60 sec: 3072.1, 300 sec: 3207.4). Total num frames: 2281472. Throughput: 0: 834.0. Samples: 571130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:53:33,231][06556] Avg episode reward: [(0, '14.675')] +[2024-09-13 07:53:33,236][08703] Saving new best policy, reward=14.675! +[2024-09-13 07:53:35,283][08716] Updated weights for policy 0, policy_version 560 (0.0020) +[2024-09-13 07:53:38,233][06556] Fps is (10 sec: 4094.4, 60 sec: 3208.5, 300 sec: 3221.2). Total num frames: 2301952. Throughput: 0: 874.8. Samples: 574530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:53:38,240][06556] Avg episode reward: [(0, '15.060')] +[2024-09-13 07:53:38,250][08703] Saving new best policy, reward=15.060! +[2024-09-13 07:53:43,233][06556] Fps is (10 sec: 3684.9, 60 sec: 3276.7, 300 sec: 3193.4). Total num frames: 2318336. Throughput: 0: 888.2. Samples: 579228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:53:43,235][06556] Avg episode reward: [(0, '17.042')] +[2024-09-13 07:53:43,239][08703] Saving new best policy, reward=17.042! +[2024-09-13 07:53:47,133][08716] Updated weights for policy 0, policy_version 570 (0.0045) +[2024-09-13 07:53:48,229][06556] Fps is (10 sec: 3687.8, 60 sec: 3481.6, 300 sec: 3207.4). Total num frames: 2338816. Throughput: 0: 874.2. Samples: 584942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:53:48,231][06556] Avg episode reward: [(0, '17.394')] +[2024-09-13 07:53:48,243][08703] Saving new best policy, reward=17.394! +[2024-09-13 07:53:53,229][06556] Fps is (10 sec: 4097.5, 60 sec: 3618.2, 300 sec: 3235.2). Total num frames: 2359296. Throughput: 0: 903.9. Samples: 588240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:53:53,239][06556] Avg episode reward: [(0, '19.085')] +[2024-09-13 07:53:53,243][08703] Saving new best policy, reward=19.085! +[2024-09-13 07:53:57,604][08716] Updated weights for policy 0, policy_version 580 (0.0015) +[2024-09-13 07:53:58,231][06556] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3249.0). Total num frames: 2375680. Throughput: 0: 952.4. Samples: 593976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:53:58,238][06556] Avg episode reward: [(0, '19.111')] +[2024-09-13 07:53:58,247][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000580_2375680.pth... +[2024-09-13 07:53:58,386][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth +[2024-09-13 07:53:58,413][08703] Saving new best policy, reward=19.111! +[2024-09-13 07:54:03,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3221.3). Total num frames: 2392064. Throughput: 0: 932.4. Samples: 598876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:54:03,231][06556] Avg episode reward: [(0, '18.149')] +[2024-09-13 07:54:07,950][08716] Updated weights for policy 0, policy_version 590 (0.0022) +[2024-09-13 07:54:08,229][06556] Fps is (10 sec: 4096.9, 60 sec: 3754.9, 300 sec: 3235.2). Total num frames: 2416640. Throughput: 0: 937.3. Samples: 602304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-13 07:54:08,234][06556] Avg episode reward: [(0, '17.476')] +[2024-09-13 07:54:13,229][06556] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3249.0). Total num frames: 2433024. Throughput: 0: 986.2. Samples: 608784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:54:13,231][06556] Avg episode reward: [(0, '14.861')] +[2024-09-13 07:54:18,229][06556] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3235.1). Total num frames: 2449408. Throughput: 0: 928.3. Samples: 612902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:54:18,233][06556] Avg episode reward: [(0, '15.612')] +[2024-09-13 07:54:19,736][08716] Updated weights for policy 0, policy_version 600 (0.0040) +[2024-09-13 07:54:23,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3235.2). Total num frames: 2473984. Throughput: 0: 926.8. Samples: 616230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:54:23,234][06556] Avg episode reward: [(0, '16.141')] +[2024-09-13 07:54:28,229][06556] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3262.9). Total num frames: 2494464. Throughput: 0: 974.7. Samples: 623086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:54:28,233][06556] Avg episode reward: [(0, '17.850')] +[2024-09-13 07:54:29,011][08716] Updated weights for policy 0, policy_version 610 (0.0025) +[2024-09-13 07:54:33,231][06556] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3235.1). Total num frames: 2506752. Throughput: 0: 950.4. Samples: 627712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:54:33,234][06556] Avg episode reward: [(0, '16.840')] +[2024-09-13 07:54:38,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3235.1). Total num frames: 2527232. Throughput: 0: 931.5. Samples: 630156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:54:38,236][06556] Avg episode reward: [(0, '18.947')] +[2024-09-13 07:54:40,394][08716] Updated weights for policy 0, policy_version 620 (0.0033) +[2024-09-13 07:54:43,229][06556] Fps is (10 sec: 4506.6, 60 sec: 3891.4, 300 sec: 3262.9). Total num frames: 2551808. Throughput: 0: 955.2. Samples: 636958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:54:43,236][06556] Avg episode reward: [(0, '19.311')] +[2024-09-13 07:54:43,240][08703] Saving new best policy, reward=19.311! +[2024-09-13 07:54:48,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3262.9). Total num frames: 2568192. Throughput: 0: 966.7. Samples: 642378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:54:48,235][06556] Avg episode reward: [(0, '18.451')] +[2024-09-13 07:54:52,274][08716] Updated weights for policy 0, policy_version 630 (0.0019) +[2024-09-13 07:54:53,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3235.1). Total num frames: 2584576. Throughput: 0: 936.2. Samples: 644432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 07:54:53,235][06556] Avg episode reward: [(0, '18.076')] +[2024-09-13 07:54:58,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3235.1). Total num frames: 2605056. Throughput: 0: 933.9. Samples: 650808. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:54:58,236][06556] Avg episode reward: [(0, '16.460')] +[2024-09-13 07:55:01,157][08716] Updated weights for policy 0, policy_version 640 (0.0023) +[2024-09-13 07:55:03,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3249.1). Total num frames: 2625536. Throughput: 0: 987.3. Samples: 657332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:55:03,235][06556] Avg episode reward: [(0, '16.498')] +[2024-09-13 07:55:08,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3235.1). Total num frames: 2637824. Throughput: 0: 954.0. Samples: 659160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:55:08,234][06556] Avg episode reward: [(0, '16.264')] +[2024-09-13 07:55:13,229][06556] Fps is (10 sec: 2457.6, 60 sec: 3618.1, 300 sec: 3193.5). Total num frames: 2650112. Throughput: 0: 877.6. Samples: 662580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-13 07:55:13,233][06556] Avg episode reward: [(0, '16.489')] +[2024-09-13 07:55:15,383][08716] Updated weights for policy 0, policy_version 650 (0.0025) +[2024-09-13 07:55:18,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3221.3). Total num frames: 2674688. Throughput: 0: 911.4. Samples: 668722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:55:18,231][06556] Avg episode reward: [(0, '17.310')] +[2024-09-13 07:55:23,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3235.2). Total num frames: 2695168. Throughput: 0: 933.2. Samples: 672150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:55:23,231][06556] Avg episode reward: [(0, '18.721')] +[2024-09-13 07:55:26,524][08716] Updated weights for policy 0, policy_version 660 (0.0046) +[2024-09-13 07:55:28,233][06556] Fps is (10 sec: 3275.5, 60 sec: 3549.6, 300 sec: 3193.4). Total num frames: 2707456. Throughput: 0: 878.7. Samples: 676502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:55:28,236][06556] Avg episode reward: [(0, '19.153')] +[2024-09-13 07:55:33,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3193.5). Total num frames: 2727936. Throughput: 0: 893.9. Samples: 682602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:55:33,230][06556] Avg episode reward: [(0, '20.116')] +[2024-09-13 07:55:33,240][08703] Saving new best policy, reward=20.116! +[2024-09-13 07:55:36,458][08716] Updated weights for policy 0, policy_version 670 (0.0021) +[2024-09-13 07:55:38,229][06556] Fps is (10 sec: 4097.7, 60 sec: 3686.4, 300 sec: 3221.3). Total num frames: 2748416. Throughput: 0: 921.8. Samples: 685912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:55:38,231][06556] Avg episode reward: [(0, '20.785')] +[2024-09-13 07:55:38,243][08703] Saving new best policy, reward=20.785! +[2024-09-13 07:55:43,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3193.5). Total num frames: 2764800. Throughput: 0: 899.4. Samples: 691280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:55:43,236][06556] Avg episode reward: [(0, '20.117')] +[2024-09-13 07:55:47,752][08716] Updated weights for policy 0, policy_version 680 (0.0030) +[2024-09-13 07:55:48,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3193.5). Total num frames: 2785280. Throughput: 0: 877.2. Samples: 696806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:55:48,231][06556] Avg episode reward: [(0, '20.374')] +[2024-09-13 07:55:53,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3249.1). Total num frames: 2809856. Throughput: 0: 915.2. Samples: 700344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:55:53,231][06556] Avg episode reward: [(0, '18.752')] +[2024-09-13 07:55:57,082][08716] Updated weights for policy 0, policy_version 690 (0.0030) +[2024-09-13 07:55:58,229][06556] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3249.0). Total num frames: 2826240. Throughput: 0: 982.9. Samples: 706812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:55:58,231][06556] Avg episode reward: [(0, '19.737')] +[2024-09-13 07:55:58,250][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000690_2826240.pth... +[2024-09-13 07:55:58,452][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth +[2024-09-13 07:56:03,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3235.1). Total num frames: 2842624. Throughput: 0: 944.5. Samples: 711224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:56:03,231][06556] Avg episode reward: [(0, '20.367')] +[2024-09-13 07:56:07,837][08716] Updated weights for policy 0, policy_version 700 (0.0022) +[2024-09-13 07:56:08,229][06556] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3235.1). Total num frames: 2867200. Throughput: 0: 945.9. Samples: 714714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:56:08,235][06556] Avg episode reward: [(0, '21.938')] +[2024-09-13 07:56:08,246][08703] Saving new best policy, reward=21.938! +[2024-09-13 07:56:13,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3262.9). Total num frames: 2887680. Throughput: 0: 1009.7. Samples: 721934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:56:13,232][06556] Avg episode reward: [(0, '22.099')] +[2024-09-13 07:56:13,241][08703] Saving new best policy, reward=22.099! +[2024-09-13 07:56:18,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3249.0). Total num frames: 2904064. Throughput: 0: 973.5. Samples: 726410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:56:18,233][06556] Avg episode reward: [(0, '23.312')] +[2024-09-13 07:56:18,246][08703] Saving new best policy, reward=23.312! +[2024-09-13 07:56:19,295][08716] Updated weights for policy 0, policy_version 710 (0.0040) +[2024-09-13 07:56:23,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3262.9). Total num frames: 2924544. Throughput: 0: 962.8. Samples: 729240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:56:23,231][06556] Avg episode reward: [(0, '24.438')] +[2024-09-13 07:56:23,238][08703] Saving new best policy, reward=24.438! +[2024-09-13 07:56:28,088][08716] Updated weights for policy 0, policy_version 720 (0.0021) +[2024-09-13 07:56:28,229][06556] Fps is (10 sec: 4505.7, 60 sec: 4028.0, 300 sec: 3304.6). Total num frames: 2949120. Throughput: 0: 1004.0. Samples: 736460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:56:28,234][06556] Avg episode reward: [(0, '23.997')] +[2024-09-13 07:56:33,231][06556] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3332.4). Total num frames: 2965504. Throughput: 0: 1004.7. Samples: 742020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:56:33,233][06556] Avg episode reward: [(0, '24.831')] +[2024-09-13 07:56:33,235][08703] Saving new best policy, reward=24.831! +[2024-09-13 07:56:38,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3346.2). Total num frames: 2981888. Throughput: 0: 974.1. Samples: 744178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:56:38,231][06556] Avg episode reward: [(0, '24.032')] +[2024-09-13 07:56:39,244][08716] Updated weights for policy 0, policy_version 730 (0.0023) +[2024-09-13 07:56:43,231][06556] Fps is (10 sec: 4096.0, 60 sec: 4027.6, 300 sec: 3401.7). Total num frames: 3006464. Throughput: 0: 986.8. Samples: 751220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:56:43,237][06556] Avg episode reward: [(0, '24.377')] +[2024-09-13 07:56:48,229][06556] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3429.6). Total num frames: 3026944. Throughput: 0: 1035.5. Samples: 757820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:56:48,234][06556] Avg episode reward: [(0, '22.515')] +[2024-09-13 07:56:48,415][08716] Updated weights for policy 0, policy_version 740 (0.0018) +[2024-09-13 07:56:53,229][06556] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3457.3). Total num frames: 3043328. Throughput: 0: 1002.5. Samples: 759826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:56:53,232][06556] Avg episode reward: [(0, '22.849')] +[2024-09-13 07:56:58,229][06556] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3499.0). Total num frames: 3067904. Throughput: 0: 979.0. Samples: 765990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:56:58,233][06556] Avg episode reward: [(0, '21.958')] +[2024-09-13 07:56:58,876][08716] Updated weights for policy 0, policy_version 750 (0.0025) +[2024-09-13 07:57:03,234][06556] Fps is (10 sec: 4912.8, 60 sec: 4163.9, 300 sec: 3554.4). Total num frames: 3092480. Throughput: 0: 1041.0. Samples: 773262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:57:03,237][06556] Avg episode reward: [(0, '22.647')] +[2024-09-13 07:57:08,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3554.5). Total num frames: 3104768. Throughput: 0: 1032.3. Samples: 775692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:57:08,231][06556] Avg episode reward: [(0, '21.454')] +[2024-09-13 07:57:10,031][08716] Updated weights for policy 0, policy_version 760 (0.0018) +[2024-09-13 07:57:13,229][06556] Fps is (10 sec: 3278.4, 60 sec: 3959.5, 300 sec: 3582.3). Total num frames: 3125248. Throughput: 0: 987.7. Samples: 780908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:57:13,231][06556] Avg episode reward: [(0, '20.963')] +[2024-09-13 07:57:18,229][06556] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3623.9). Total num frames: 3145728. Throughput: 0: 1009.4. Samples: 787442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:57:18,231][06556] Avg episode reward: [(0, '20.804')] +[2024-09-13 07:57:20,217][08716] Updated weights for policy 0, policy_version 770 (0.0021) +[2024-09-13 07:57:23,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 3158016. Throughput: 0: 1006.9. Samples: 789490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:57:23,233][06556] Avg episode reward: [(0, '20.484')] +[2024-09-13 07:57:28,230][06556] Fps is (10 sec: 2866.9, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 3174400. Throughput: 0: 934.4. Samples: 793266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:57:28,232][06556] Avg episode reward: [(0, '20.563')] +[2024-09-13 07:57:32,522][08716] Updated weights for policy 0, policy_version 780 (0.0013) +[2024-09-13 07:57:33,229][06556] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3679.5). Total num frames: 3194880. Throughput: 0: 930.2. Samples: 799680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:57:33,232][06556] Avg episode reward: [(0, '21.385')] +[2024-09-13 07:57:38,229][06556] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 3219456. Throughput: 0: 967.5. Samples: 803362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:57:38,232][06556] Avg episode reward: [(0, '20.795')] +[2024-09-13 07:57:42,244][08716] Updated weights for policy 0, policy_version 790 (0.0030) +[2024-09-13 07:57:43,235][06556] Fps is (10 sec: 4093.5, 60 sec: 3822.7, 300 sec: 3748.8). Total num frames: 3235840. Throughput: 0: 958.5. Samples: 809128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:57:43,237][06556] Avg episode reward: [(0, '21.572')] +[2024-09-13 07:57:48,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3256320. Throughput: 0: 920.0. Samples: 814658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:57:48,237][06556] Avg episode reward: [(0, '22.758')] +[2024-09-13 07:57:52,127][08716] Updated weights for policy 0, policy_version 800 (0.0028) +[2024-09-13 07:57:53,229][06556] Fps is (10 sec: 4508.4, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3280896. Throughput: 0: 942.0. Samples: 818080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:57:53,235][06556] Avg episode reward: [(0, '23.295')] +[2024-09-13 07:57:58,232][06556] Fps is (10 sec: 4094.7, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 3297280. Throughput: 0: 975.5. Samples: 824810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:57:58,239][06556] Avg episode reward: [(0, '22.572')] +[2024-09-13 07:57:58,267][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000806_3301376.pth... +[2024-09-13 07:57:58,453][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000580_2375680.pth +[2024-09-13 07:58:03,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3804.5). Total num frames: 3313664. Throughput: 0: 932.0. Samples: 829380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:58:03,230][06556] Avg episode reward: [(0, '22.942')] +[2024-09-13 07:58:03,252][08716] Updated weights for policy 0, policy_version 810 (0.0044) +[2024-09-13 07:58:08,229][06556] Fps is (10 sec: 4097.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3338240. Throughput: 0: 967.2. Samples: 833012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:58:08,231][06556] Avg episode reward: [(0, '22.372')] +[2024-09-13 07:58:11,734][08716] Updated weights for policy 0, policy_version 820 (0.0014) +[2024-09-13 07:58:13,231][06556] Fps is (10 sec: 4914.2, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 3362816. Throughput: 0: 1040.4. Samples: 840084. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:58:13,236][06556] Avg episode reward: [(0, '22.221')] +[2024-09-13 07:58:18,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3375104. Throughput: 0: 994.8. Samples: 844448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 07:58:18,233][06556] Avg episode reward: [(0, '22.734')] +[2024-09-13 07:58:23,229][06556] Fps is (10 sec: 3277.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3395584. Throughput: 0: 980.1. Samples: 847468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:58:23,236][06556] Avg episode reward: [(0, '22.756')] +[2024-09-13 07:58:23,245][08716] Updated weights for policy 0, policy_version 830 (0.0025) +[2024-09-13 07:58:28,229][06556] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3860.0). Total num frames: 3420160. Throughput: 0: 1011.9. Samples: 854656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:58:28,236][06556] Avg episode reward: [(0, '23.802')] +[2024-09-13 07:58:33,229][06556] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 1007.0. Samples: 859974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:58:33,231][06556] Avg episode reward: [(0, '24.945')] +[2024-09-13 07:58:33,243][08703] Saving new best policy, reward=24.945! +[2024-09-13 07:58:33,949][08716] Updated weights for policy 0, policy_version 840 (0.0037) +[2024-09-13 07:58:38,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3457024. Throughput: 0: 978.3. Samples: 862104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:58:38,231][06556] Avg episode reward: [(0, '23.971')] +[2024-09-13 07:58:43,156][08716] Updated weights for policy 0, policy_version 850 (0.0040) +[2024-09-13 07:58:43,229][06556] Fps is (10 sec: 4505.7, 60 sec: 4096.4, 300 sec: 3873.9). Total num frames: 3481600. Throughput: 0: 990.9. Samples: 869396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:58:43,234][06556] Avg episode reward: [(0, '23.277')] +[2024-09-13 07:58:48,231][06556] Fps is (10 sec: 4095.1, 60 sec: 4027.6, 300 sec: 3859.9). Total num frames: 3497984. Throughput: 0: 1026.4. Samples: 875570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:58:48,236][06556] Avg episode reward: [(0, '23.599')] +[2024-09-13 07:58:53,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3514368. Throughput: 0: 992.4. Samples: 877672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:58:53,236][06556] Avg episode reward: [(0, '23.988')] +[2024-09-13 07:58:54,330][08716] Updated weights for policy 0, policy_version 860 (0.0030) +[2024-09-13 07:58:58,229][06556] Fps is (10 sec: 4097.0, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 3538944. Throughput: 0: 976.8. Samples: 884040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:58:58,231][06556] Avg episode reward: [(0, '24.404')] +[2024-09-13 07:59:02,827][08716] Updated weights for policy 0, policy_version 870 (0.0026) +[2024-09-13 07:59:03,236][06556] Fps is (10 sec: 4911.7, 60 sec: 4163.8, 300 sec: 3887.6). Total num frames: 3563520. Throughput: 0: 1042.1. Samples: 891350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 07:59:03,238][06556] Avg episode reward: [(0, '25.270')] +[2024-09-13 07:59:03,240][08703] Saving new best policy, reward=25.270! +[2024-09-13 07:59:08,231][06556] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3575808. Throughput: 0: 1021.0. Samples: 893414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:59:08,238][06556] Avg episode reward: [(0, '24.936')] +[2024-09-13 07:59:13,229][06556] Fps is (10 sec: 3279.0, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3596288. Throughput: 0: 974.5. Samples: 898510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:59:13,231][06556] Avg episode reward: [(0, '24.969')] +[2024-09-13 07:59:14,337][08716] Updated weights for policy 0, policy_version 880 (0.0021) +[2024-09-13 07:59:18,229][06556] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 3616768. Throughput: 0: 998.0. Samples: 904884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:59:18,234][06556] Avg episode reward: [(0, '24.339')] +[2024-09-13 07:59:23,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 992.8. Samples: 906782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:59:23,231][06556] Avg episode reward: [(0, '23.931')] +[2024-09-13 07:59:28,235][06556] Fps is (10 sec: 2456.1, 60 sec: 3686.0, 300 sec: 3846.0). Total num frames: 3641344. Throughput: 0: 910.7. Samples: 910382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:59:28,242][06556] Avg episode reward: [(0, '24.730')] +[2024-09-13 07:59:28,879][08716] Updated weights for policy 0, policy_version 890 (0.0019) +[2024-09-13 07:59:33,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3665920. Throughput: 0: 908.2. Samples: 916438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-13 07:59:33,237][06556] Avg episode reward: [(0, '25.515')] +[2024-09-13 07:59:33,240][08703] Saving new best policy, reward=25.515! +[2024-09-13 07:59:37,522][08716] Updated weights for policy 0, policy_version 900 (0.0036) +[2024-09-13 07:59:38,229][06556] Fps is (10 sec: 4508.2, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3686400. Throughput: 0: 937.7. Samples: 919870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:59:38,236][06556] Avg episode reward: [(0, '25.519')] +[2024-09-13 07:59:38,247][08703] Saving new best policy, reward=25.519! +[2024-09-13 07:59:43,243][06556] Fps is (10 sec: 3681.2, 60 sec: 3685.5, 300 sec: 3845.9). Total num frames: 3702784. Throughput: 0: 921.0. Samples: 925496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:59:43,246][06556] Avg episode reward: [(0, '24.352')] +[2024-09-13 07:59:48,229][06556] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3846.1). Total num frames: 3719168. Throughput: 0: 874.7. Samples: 930704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-13 07:59:48,237][06556] Avg episode reward: [(0, '24.549')] +[2024-09-13 07:59:49,179][08716] Updated weights for policy 0, policy_version 910 (0.0043) +[2024-09-13 07:59:53,229][06556] Fps is (10 sec: 4101.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3743744. Throughput: 0: 904.3. Samples: 934104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-13 07:59:53,237][06556] Avg episode reward: [(0, '24.033')] +[2024-09-13 07:59:58,229][06556] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3764224. Throughput: 0: 937.4. Samples: 940692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 07:59:58,237][06556] Avg episode reward: [(0, '22.913')] +[2024-09-13 07:59:58,248][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000919_3764224.pth... +[2024-09-13 07:59:58,433][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000690_2826240.pth +[2024-09-13 07:59:59,628][08716] Updated weights for policy 0, policy_version 920 (0.0019) +[2024-09-13 08:00:03,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3550.3, 300 sec: 3860.0). Total num frames: 3776512. Throughput: 0: 890.0. Samples: 944934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 08:00:03,236][06556] Avg episode reward: [(0, '22.472')] +[2024-09-13 08:00:08,229][06556] Fps is (10 sec: 3686.3, 60 sec: 3754.8, 300 sec: 3901.6). Total num frames: 3801088. Throughput: 0: 926.1. Samples: 948456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 08:00:08,237][06556] Avg episode reward: [(0, '22.402')] +[2024-09-13 08:00:09,541][08716] Updated weights for policy 0, policy_version 930 (0.0026) +[2024-09-13 08:00:13,229][06556] Fps is (10 sec: 4915.2, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 3825664. Throughput: 0: 1004.8. Samples: 955590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 08:00:13,233][06556] Avg episode reward: [(0, '22.276')] +[2024-09-13 08:00:18,229][06556] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3837952. Throughput: 0: 972.3. Samples: 960190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 08:00:18,235][06556] Avg episode reward: [(0, '21.903')] +[2024-09-13 08:00:21,058][08716] Updated weights for policy 0, policy_version 940 (0.0034) +[2024-09-13 08:00:23,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.7). Total num frames: 3858432. Throughput: 0: 958.0. Samples: 962978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-13 08:00:23,231][06556] Avg episode reward: [(0, '21.812')] +[2024-09-13 08:00:28,229][06556] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3915.5). Total num frames: 3883008. Throughput: 0: 989.3. Samples: 970002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 08:00:28,231][06556] Avg episode reward: [(0, '21.481')] +[2024-09-13 08:00:29,956][08716] Updated weights for policy 0, policy_version 950 (0.0027) +[2024-09-13 08:00:33,229][06556] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3899392. Throughput: 0: 994.2. Samples: 975442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 08:00:33,234][06556] Avg episode reward: [(0, '21.655')] +[2024-09-13 08:00:38,229][06556] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3915776. Throughput: 0: 965.6. Samples: 977556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 08:00:38,232][06556] Avg episode reward: [(0, '21.733')] +[2024-09-13 08:00:41,192][08716] Updated weights for policy 0, policy_version 960 (0.0028) +[2024-09-13 08:00:43,229][06556] Fps is (10 sec: 4096.0, 60 sec: 3960.4, 300 sec: 3915.5). Total num frames: 3940352. Throughput: 0: 970.0. Samples: 984344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 08:00:43,232][06556] Avg episode reward: [(0, '23.182')] +[2024-09-13 08:00:48,229][06556] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3960832. Throughput: 0: 1018.7. Samples: 990774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-13 08:00:48,232][06556] Avg episode reward: [(0, '22.205')] +[2024-09-13 08:00:52,256][08716] Updated weights for policy 0, policy_version 970 (0.0038) +[2024-09-13 08:00:53,229][06556] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3973120. Throughput: 0: 984.8. Samples: 992770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-13 08:00:53,231][06556] Avg episode reward: [(0, '23.980')] +[2024-09-13 08:00:58,229][06556] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3997696. Throughput: 0: 956.3. Samples: 998622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-13 08:00:58,236][06556] Avg episode reward: [(0, '24.793')] +[2024-09-13 08:00:59,584][06556] Component Batcher_0 stopped! +[2024-09-13 08:00:59,582][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-13 08:00:59,583][08703] Stopping Batcher_0... +[2024-09-13 08:00:59,599][08703] Loop batcher_evt_loop terminating... +[2024-09-13 08:00:59,658][08716] Weights refcount: 2 0 +[2024-09-13 08:00:59,664][06556] Component InferenceWorker_p0-w0 stopped! +[2024-09-13 08:00:59,670][08716] Stopping InferenceWorker_p0-w0... +[2024-09-13 08:00:59,671][08716] Loop inference_proc0-0_evt_loop terminating... +[2024-09-13 08:00:59,728][08703] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000806_3301376.pth +[2024-09-13 08:00:59,739][08703] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-13 08:00:59,885][08722] Stopping RolloutWorker_w5... +[2024-09-13 08:00:59,885][06556] Component RolloutWorker_w5 stopped! +[2024-09-13 08:00:59,886][08722] Loop rollout_proc5_evt_loop terminating... +[2024-09-13 08:00:59,912][08720] Stopping RolloutWorker_w3... +[2024-09-13 08:00:59,912][06556] Component RolloutWorker_w3 stopped! +[2024-09-13 08:00:59,945][08720] Loop rollout_proc3_evt_loop terminating... +[2024-09-13 08:00:59,951][08724] Stopping RolloutWorker_w7... +[2024-09-13 08:00:59,948][06556] Component LearnerWorker_p0 stopped! +[2024-09-13 08:00:59,955][08717] Stopping RolloutWorker_w1... +[2024-09-13 08:00:59,953][08724] Loop rollout_proc7_evt_loop terminating... +[2024-09-13 08:00:59,953][06556] Component RolloutWorker_w7 stopped! +[2024-09-13 08:00:59,956][08717] Loop rollout_proc1_evt_loop terminating... +[2024-09-13 08:00:59,958][06556] Component RolloutWorker_w1 stopped! +[2024-09-13 08:00:59,960][08703] Stopping LearnerWorker_p0... +[2024-09-13 08:00:59,962][08703] Loop learner_proc0_evt_loop terminating... +[2024-09-13 08:01:00,143][06556] Component RolloutWorker_w2 stopped! +[2024-09-13 08:01:00,153][06556] Component RolloutWorker_w0 stopped! +[2024-09-13 08:01:00,151][08719] Stopping RolloutWorker_w2... +[2024-09-13 08:01:00,158][08718] Stopping RolloutWorker_w0... +[2024-09-13 08:01:00,166][06556] Component RolloutWorker_w4 stopped! +[2024-09-13 08:01:00,175][08721] Stopping RolloutWorker_w4... +[2024-09-13 08:01:00,159][08719] Loop rollout_proc2_evt_loop terminating... +[2024-09-13 08:01:00,162][08718] Loop rollout_proc0_evt_loop terminating... +[2024-09-13 08:01:00,175][08721] Loop rollout_proc4_evt_loop terminating... +[2024-09-13 08:01:00,258][08723] Stopping RolloutWorker_w6... +[2024-09-13 08:01:00,254][06556] Component RolloutWorker_w6 stopped! +[2024-09-13 08:01:00,260][06556] Waiting for process learner_proc0 to stop... +[2024-09-13 08:01:00,265][08723] Loop rollout_proc6_evt_loop terminating... +[2024-09-13 08:01:01,586][06556] Waiting for process inference_proc0-0 to join... +[2024-09-13 08:01:01,591][06556] Waiting for process rollout_proc0 to join... +[2024-09-13 08:01:03,660][06556] Waiting for process rollout_proc1 to join... +[2024-09-13 08:01:03,760][06556] Waiting for process rollout_proc2 to join... +[2024-09-13 08:01:03,767][06556] Waiting for process rollout_proc3 to join... +[2024-09-13 08:01:03,772][06556] Waiting for process rollout_proc4 to join... +[2024-09-13 08:01:03,776][06556] Waiting for process rollout_proc5 to join... +[2024-09-13 08:01:03,780][06556] Waiting for process rollout_proc6 to join... +[2024-09-13 08:01:03,785][06556] Waiting for process rollout_proc7 to join... +[2024-09-13 08:01:03,788][06556] Batcher 0 profile tree view: +batching: 27.6738, releasing_batches: 0.0273 +[2024-09-13 08:01:03,793][06556] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 439.0637 +update_model: 9.3492 + weight_update: 0.0037 +one_step: 0.0110 + handle_policy_step: 618.2118 + deserialize: 15.6031, stack: 3.0755, obs_to_device_normalize: 123.7936, forward: 329.4834, send_messages: 30.6046 + prepare_outputs: 85.8051 + to_cpu: 49.6883 +[2024-09-13 08:01:03,795][06556] Learner 0 profile tree view: +misc: 0.0049, prepare_batch: 14.5482 +train: 75.9954 + epoch_init: 0.0059, minibatch_init: 0.0076, losses_postprocess: 0.6887, kl_divergence: 0.5920, after_optimizer: 33.6075 + calculate_losses: 27.5135 + losses_init: 0.0037, forward_head: 1.3443, bptt_initial: 18.3959, tail: 1.2359, advantages_returns: 0.2355, losses: 3.8971 + bptt: 2.0301 + bptt_forward_core: 1.9392 + update: 12.7821 + clip: 0.9648 +[2024-09-13 08:01:03,796][06556] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3427, enqueue_policy_requests: 108.2061, env_step: 859.9153, overhead: 14.2251, complete_rollouts: 7.3306 +save_policy_outputs: 22.1663 + split_output_tensors: 8.9209 +[2024-09-13 08:01:03,797][06556] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3571, enqueue_policy_requests: 111.1229, env_step: 856.0089, overhead: 14.1207, complete_rollouts: 6.6716 +save_policy_outputs: 21.2568 + split_output_tensors: 8.4563 +[2024-09-13 08:01:03,802][06556] Loop Runner_EvtLoop terminating... +[2024-09-13 08:01:03,803][06556] Runner profile tree view: +main_loop: 1136.0680 +[2024-09-13 08:01:03,805][06556] Collected {0: 4005888}, FPS: 3526.1 +[2024-09-13 08:01:04,228][06556] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-13 08:01:04,230][06556] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-13 08:01:04,232][06556] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-13 08:01:04,235][06556] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-13 08:01:04,237][06556] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-13 08:01:04,240][06556] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-13 08:01:04,242][06556] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-13 08:01:04,243][06556] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-13 08:01:04,245][06556] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-13 08:01:04,246][06556] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-13 08:01:04,247][06556] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-13 08:01:04,248][06556] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-13 08:01:04,249][06556] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-13 08:01:04,251][06556] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-13 08:01:04,252][06556] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-13 08:01:04,299][06556] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-13 08:01:04,304][06556] RunningMeanStd input shape: (3, 72, 128) +[2024-09-13 08:01:04,306][06556] RunningMeanStd input shape: (1,) +[2024-09-13 08:01:04,327][06556] ConvEncoder: input_channels=3 +[2024-09-13 08:01:04,486][06556] Conv encoder output size: 512 +[2024-09-13 08:01:04,489][06556] Policy head output size: 512 +[2024-09-13 08:01:04,837][06556] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-13 08:01:06,017][06556] Num frames 100... +[2024-09-13 08:01:06,140][06556] Num frames 200... +[2024-09-13 08:01:06,265][06556] Num frames 300... +[2024-09-13 08:01:06,385][06556] Num frames 400... +[2024-09-13 08:01:06,506][06556] Num frames 500... +[2024-09-13 08:01:06,632][06556] Num frames 600... +[2024-09-13 08:01:06,755][06556] Num frames 700... +[2024-09-13 08:01:06,848][06556] Avg episode rewards: #0: 15.310, true rewards: #0: 7.310 +[2024-09-13 08:01:06,851][06556] Avg episode reward: 15.310, avg true_objective: 7.310 +[2024-09-13 08:01:06,939][06556] Num frames 800... +[2024-09-13 08:01:07,068][06556] Num frames 900... +[2024-09-13 08:01:07,192][06556] Num frames 1000... +[2024-09-13 08:01:07,311][06556] Num frames 1100... +[2024-09-13 08:01:07,433][06556] Num frames 1200... +[2024-09-13 08:01:07,559][06556] Num frames 1300... +[2024-09-13 08:01:07,677][06556] Num frames 1400... +[2024-09-13 08:01:07,800][06556] Num frames 1500... +[2024-09-13 08:01:07,930][06556] Num frames 1600... +[2024-09-13 08:01:08,020][06556] Avg episode rewards: #0: 16.135, true rewards: #0: 8.135 +[2024-09-13 08:01:08,022][06556] Avg episode reward: 16.135, avg true_objective: 8.135 +[2024-09-13 08:01:08,124][06556] Num frames 1700... +[2024-09-13 08:01:08,247][06556] Num frames 1800... +[2024-09-13 08:01:08,373][06556] Num frames 1900... +[2024-09-13 08:01:08,524][06556] Avg episode rewards: #0: 12.263, true rewards: #0: 6.597 +[2024-09-13 08:01:08,526][06556] Avg episode reward: 12.263, avg true_objective: 6.597 +[2024-09-13 08:01:08,556][06556] Num frames 2000... +[2024-09-13 08:01:08,684][06556] Num frames 2100... +[2024-09-13 08:01:08,805][06556] Num frames 2200... +[2024-09-13 08:01:08,939][06556] Num frames 2300... +[2024-09-13 08:01:09,067][06556] Num frames 2400... +[2024-09-13 08:01:09,188][06556] Num frames 2500... +[2024-09-13 08:01:09,313][06556] Num frames 2600... +[2024-09-13 08:01:09,432][06556] Num frames 2700... +[2024-09-13 08:01:09,556][06556] Num frames 2800... +[2024-09-13 08:01:09,686][06556] Num frames 2900... +[2024-09-13 08:01:09,808][06556] Num frames 3000... +[2024-09-13 08:01:09,939][06556] Num frames 3100... +[2024-09-13 08:01:10,066][06556] Num frames 3200... +[2024-09-13 08:01:10,189][06556] Num frames 3300... +[2024-09-13 08:01:10,313][06556] Num frames 3400... +[2024-09-13 08:01:10,442][06556] Num frames 3500... +[2024-09-13 08:01:10,566][06556] Num frames 3600... +[2024-09-13 08:01:10,739][06556] Avg episode rewards: #0: 20.495, true rewards: #0: 9.245 +[2024-09-13 08:01:10,740][06556] Avg episode reward: 20.495, avg true_objective: 9.245 +[2024-09-13 08:01:10,746][06556] Num frames 3700... +[2024-09-13 08:01:10,864][06556] Num frames 3800... +[2024-09-13 08:01:10,996][06556] Num frames 3900... +[2024-09-13 08:01:11,124][06556] Num frames 4000... +[2024-09-13 08:01:11,256][06556] Num frames 4100... +[2024-09-13 08:01:11,379][06556] Num frames 4200... +[2024-09-13 08:01:11,498][06556] Num frames 4300... +[2024-09-13 08:01:11,619][06556] Num frames 4400... +[2024-09-13 08:01:11,739][06556] Num frames 4500... +[2024-09-13 08:01:11,836][06556] Avg episode rewards: #0: 19.468, true rewards: #0: 9.068 +[2024-09-13 08:01:11,838][06556] Avg episode reward: 19.468, avg true_objective: 9.068 +[2024-09-13 08:01:11,918][06556] Num frames 4600... +[2024-09-13 08:01:12,049][06556] Num frames 4700... +[2024-09-13 08:01:12,175][06556] Num frames 4800... +[2024-09-13 08:01:12,295][06556] Num frames 4900... +[2024-09-13 08:01:12,417][06556] Num frames 5000... +[2024-09-13 08:01:12,539][06556] Num frames 5100... +[2024-09-13 08:01:12,664][06556] Num frames 5200... +[2024-09-13 08:01:12,803][06556] Avg episode rewards: #0: 18.950, true rewards: #0: 8.783 +[2024-09-13 08:01:12,804][06556] Avg episode reward: 18.950, avg true_objective: 8.783 +[2024-09-13 08:01:12,843][06556] Num frames 5300... +[2024-09-13 08:01:12,965][06556] Num frames 5400... +[2024-09-13 08:01:13,100][06556] Num frames 5500... +[2024-09-13 08:01:13,219][06556] Num frames 5600... +[2024-09-13 08:01:13,342][06556] Num frames 5700... +[2024-09-13 08:01:13,463][06556] Num frames 5800... +[2024-09-13 08:01:13,586][06556] Num frames 5900... +[2024-09-13 08:01:13,720][06556] Avg episode rewards: #0: 17.810, true rewards: #0: 8.524 +[2024-09-13 08:01:13,722][06556] Avg episode reward: 17.810, avg true_objective: 8.524 +[2024-09-13 08:01:13,764][06556] Num frames 6000... +[2024-09-13 08:01:13,886][06556] Num frames 6100... +[2024-09-13 08:01:14,011][06556] Num frames 6200... +[2024-09-13 08:01:14,151][06556] Num frames 6300... +[2024-09-13 08:01:14,271][06556] Num frames 6400... +[2024-09-13 08:01:14,398][06556] Num frames 6500... +[2024-09-13 08:01:14,519][06556] Num frames 6600... +[2024-09-13 08:01:14,640][06556] Num frames 6700... +[2024-09-13 08:01:14,761][06556] Num frames 6800... +[2024-09-13 08:01:14,882][06556] Num frames 6900... +[2024-09-13 08:01:15,005][06556] Num frames 7000... +[2024-09-13 08:01:15,142][06556] Num frames 7100... +[2024-09-13 08:01:15,263][06556] Num frames 7200... +[2024-09-13 08:01:15,386][06556] Num frames 7300... +[2024-09-13 08:01:15,506][06556] Num frames 7400... +[2024-09-13 08:01:15,572][06556] Avg episode rewards: #0: 19.884, true rewards: #0: 9.259 +[2024-09-13 08:01:15,573][06556] Avg episode reward: 19.884, avg true_objective: 9.259 +[2024-09-13 08:01:15,689][06556] Num frames 7500... +[2024-09-13 08:01:15,810][06556] Num frames 7600... +[2024-09-13 08:01:15,982][06556] Num frames 7700... +[2024-09-13 08:01:16,163][06556] Num frames 7800... +[2024-09-13 08:01:16,330][06556] Num frames 7900... +[2024-09-13 08:01:16,500][06556] Num frames 8000... +[2024-09-13 08:01:16,666][06556] Num frames 8100... +[2024-09-13 08:01:16,833][06556] Num frames 8200... +[2024-09-13 08:01:16,996][06556] Num frames 8300... +[2024-09-13 08:01:17,200][06556] Num frames 8400... +[2024-09-13 08:01:17,381][06556] Num frames 8500... +[2024-09-13 08:01:17,555][06556] Num frames 8600... +[2024-09-13 08:01:17,729][06556] Num frames 8700... +[2024-09-13 08:01:17,907][06556] Num frames 8800... +[2024-09-13 08:01:18,091][06556] Num frames 8900... +[2024-09-13 08:01:18,270][06556] Num frames 9000... +[2024-09-13 08:01:18,415][06556] Num frames 9100... +[2024-09-13 08:01:18,537][06556] Num frames 9200... +[2024-09-13 08:01:18,664][06556] Num frames 9300... +[2024-09-13 08:01:18,790][06556] Avg episode rewards: #0: 23.399, true rewards: #0: 10.399 +[2024-09-13 08:01:18,791][06556] Avg episode reward: 23.399, avg true_objective: 10.399 +[2024-09-13 08:01:18,845][06556] Num frames 9400... +[2024-09-13 08:01:18,968][06556] Num frames 9500... +[2024-09-13 08:01:19,097][06556] Num frames 9600... +[2024-09-13 08:01:19,220][06556] Num frames 9700... +[2024-09-13 08:01:19,352][06556] Num frames 9800... +[2024-09-13 08:01:19,485][06556] Num frames 9900... +[2024-09-13 08:01:19,607][06556] Num frames 10000... +[2024-09-13 08:01:19,725][06556] Num frames 10100... +[2024-09-13 08:01:19,849][06556] Num frames 10200... +[2024-09-13 08:01:19,969][06556] Num frames 10300... +[2024-09-13 08:01:20,098][06556] Num frames 10400... +[2024-09-13 08:01:20,222][06556] Num frames 10500... +[2024-09-13 08:01:20,353][06556] Num frames 10600... +[2024-09-13 08:01:20,483][06556] Num frames 10700... +[2024-09-13 08:01:20,621][06556] Num frames 10800... +[2024-09-13 08:01:20,749][06556] Num frames 10900... +[2024-09-13 08:01:20,873][06556] Num frames 11000... +[2024-09-13 08:01:20,997][06556] Num frames 11100... +[2024-09-13 08:01:21,128][06556] Num frames 11200... +[2024-09-13 08:01:21,250][06556] Num frames 11300... +[2024-09-13 08:01:21,384][06556] Num frames 11400... +[2024-09-13 08:01:21,539][06556] Avg episode rewards: #0: 27.159, true rewards: #0: 11.459 +[2024-09-13 08:01:21,541][06556] Avg episode reward: 27.159, avg true_objective: 11.459 +[2024-09-13 08:02:27,662][06556] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-13 08:03:24,278][06556] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-13 08:03:24,280][06556] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-13 08:03:24,282][06556] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-13 08:03:24,283][06556] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-13 08:03:24,285][06556] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-13 08:03:24,286][06556] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-13 08:03:24,288][06556] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-13 08:03:24,289][06556] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-13 08:03:24,292][06556] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-13 08:03:24,293][06556] Adding new argument 'hf_repository'='D3MI4N/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-13 08:03:24,295][06556] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-13 08:03:24,297][06556] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-13 08:03:24,298][06556] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-13 08:03:24,300][06556] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-13 08:03:24,301][06556] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-13 08:03:24,329][06556] RunningMeanStd input shape: (3, 72, 128) +[2024-09-13 08:03:24,332][06556] RunningMeanStd input shape: (1,) +[2024-09-13 08:03:24,344][06556] ConvEncoder: input_channels=3 +[2024-09-13 08:03:24,380][06556] Conv encoder output size: 512 +[2024-09-13 08:03:24,383][06556] Policy head output size: 512 +[2024-09-13 08:03:24,401][06556] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-13 08:03:24,836][06556] Num frames 100... +[2024-09-13 08:03:24,959][06556] Num frames 200... +[2024-09-13 08:03:25,093][06556] Num frames 300... +[2024-09-13 08:03:25,224][06556] Num frames 400... +[2024-09-13 08:03:25,347][06556] Num frames 500... +[2024-09-13 08:03:25,467][06556] Num frames 600... +[2024-09-13 08:03:25,591][06556] Num frames 700... +[2024-09-13 08:03:25,721][06556] Num frames 800... +[2024-09-13 08:03:25,847][06556] Num frames 900... +[2024-09-13 08:03:25,982][06556] Num frames 1000... +[2024-09-13 08:03:26,165][06556] Num frames 1100... +[2024-09-13 08:03:26,336][06556] Num frames 1200... +[2024-09-13 08:03:26,498][06556] Num frames 1300... +[2024-09-13 08:03:26,665][06556] Num frames 1400... +[2024-09-13 08:03:26,847][06556] Num frames 1500... +[2024-09-13 08:03:27,015][06556] Num frames 1600... +[2024-09-13 08:03:27,176][06556] Num frames 1700... +[2024-09-13 08:03:27,272][06556] Avg episode rewards: #0: 40.230, true rewards: #0: 17.230 +[2024-09-13 08:03:27,274][06556] Avg episode reward: 40.230, avg true_objective: 17.230 +[2024-09-13 08:03:27,419][06556] Num frames 1800... +[2024-09-13 08:03:27,587][06556] Num frames 1900... +[2024-09-13 08:03:27,764][06556] Num frames 2000... +[2024-09-13 08:03:27,952][06556] Num frames 2100... +[2024-09-13 08:03:28,134][06556] Avg episode rewards: #0: 24.335, true rewards: #0: 10.835 +[2024-09-13 08:03:28,137][06556] Avg episode reward: 24.335, avg true_objective: 10.835 +[2024-09-13 08:03:28,202][06556] Num frames 2200... +[2024-09-13 08:03:28,379][06556] Num frames 2300... +[2024-09-13 08:03:28,509][06556] Num frames 2400... +[2024-09-13 08:03:28,640][06556] Num frames 2500... +[2024-09-13 08:03:28,759][06556] Num frames 2600... +[2024-09-13 08:03:28,891][06556] Num frames 2700... +[2024-09-13 08:03:29,013][06556] Num frames 2800... +[2024-09-13 08:03:29,145][06556] Num frames 2900... +[2024-09-13 08:03:29,263][06556] Num frames 3000... +[2024-09-13 08:03:29,388][06556] Num frames 3100... +[2024-09-13 08:03:29,513][06556] Num frames 3200... +[2024-09-13 08:03:29,638][06556] Avg episode rewards: #0: 24.183, true rewards: #0: 10.850 +[2024-09-13 08:03:29,639][06556] Avg episode reward: 24.183, avg true_objective: 10.850 +[2024-09-13 08:03:29,698][06556] Num frames 3300... +[2024-09-13 08:03:29,819][06556] Num frames 3400... +[2024-09-13 08:03:29,953][06556] Num frames 3500... +[2024-09-13 08:03:30,083][06556] Num frames 3600... +[2024-09-13 08:03:30,208][06556] Num frames 3700... +[2024-09-13 08:03:30,332][06556] Num frames 3800... +[2024-09-13 08:03:30,492][06556] Num frames 3900... +[2024-09-13 08:03:30,662][06556] Num frames 4000... +[2024-09-13 08:03:30,829][06556] Num frames 4100... +[2024-09-13 08:03:30,998][06556] Num frames 4200... +[2024-09-13 08:03:31,169][06556] Num frames 4300... +[2024-09-13 08:03:31,352][06556] Num frames 4400... +[2024-09-13 08:03:31,521][06556] Avg episode rewards: #0: 24.928, true rewards: #0: 11.177 +[2024-09-13 08:03:31,523][06556] Avg episode reward: 24.928, avg true_objective: 11.177 +[2024-09-13 08:03:31,574][06556] Num frames 4500... +[2024-09-13 08:03:31,749][06556] Num frames 4600... +[2024-09-13 08:03:31,923][06556] Num frames 4700... +[2024-09-13 08:03:32,104][06556] Num frames 4800... +[2024-09-13 08:03:32,276][06556] Num frames 4900... +[2024-09-13 08:03:32,458][06556] Num frames 5000... +[2024-09-13 08:03:32,649][06556] Avg episode rewards: #0: 21.758, true rewards: #0: 10.158 +[2024-09-13 08:03:32,652][06556] Avg episode reward: 21.758, avg true_objective: 10.158 +[2024-09-13 08:03:32,699][06556] Num frames 5100... +[2024-09-13 08:03:32,861][06556] Num frames 5200... +[2024-09-13 08:03:32,990][06556] Num frames 5300... +[2024-09-13 08:03:33,116][06556] Num frames 5400... +[2024-09-13 08:03:33,237][06556] Num frames 5500... +[2024-09-13 08:03:33,366][06556] Num frames 5600... +[2024-09-13 08:03:33,488][06556] Num frames 5700... +[2024-09-13 08:03:33,608][06556] Num frames 5800... +[2024-09-13 08:03:33,729][06556] Num frames 5900... +[2024-09-13 08:03:33,855][06556] Num frames 6000... +[2024-09-13 08:03:33,975][06556] Num frames 6100... +[2024-09-13 08:03:34,109][06556] Num frames 6200... +[2024-09-13 08:03:34,231][06556] Num frames 6300... +[2024-09-13 08:03:34,355][06556] Num frames 6400... +[2024-09-13 08:03:34,475][06556] Num frames 6500... +[2024-09-13 08:03:34,603][06556] Num frames 6600... +[2024-09-13 08:03:34,723][06556] Num frames 6700... +[2024-09-13 08:03:34,848][06556] Num frames 6800... +[2024-09-13 08:03:34,989][06556] Avg episode rewards: #0: 25.285, true rewards: #0: 11.452 +[2024-09-13 08:03:34,992][06556] Avg episode reward: 25.285, avg true_objective: 11.452 +[2024-09-13 08:03:35,038][06556] Num frames 6900... +[2024-09-13 08:03:35,168][06556] Num frames 7000... +[2024-09-13 08:03:35,287][06556] Num frames 7100... +[2024-09-13 08:03:35,410][06556] Num frames 7200... +[2024-09-13 08:03:35,533][06556] Num frames 7300... +[2024-09-13 08:03:35,660][06556] Num frames 7400... +[2024-09-13 08:03:35,780][06556] Num frames 7500... +[2024-09-13 08:03:35,905][06556] Num frames 7600... +[2024-09-13 08:03:36,028][06556] Num frames 7700... +[2024-09-13 08:03:36,163][06556] Num frames 7800... +[2024-09-13 08:03:36,288][06556] Num frames 7900... +[2024-09-13 08:03:36,414][06556] Num frames 8000... +[2024-09-13 08:03:36,540][06556] Num frames 8100... +[2024-09-13 08:03:36,600][06556] Avg episode rewards: #0: 25.574, true rewards: #0: 11.574 +[2024-09-13 08:03:36,602][06556] Avg episode reward: 25.574, avg true_objective: 11.574 +[2024-09-13 08:03:36,721][06556] Num frames 8200... +[2024-09-13 08:03:36,855][06556] Num frames 8300... +[2024-09-13 08:03:36,977][06556] Num frames 8400... +[2024-09-13 08:03:37,117][06556] Num frames 8500... +[2024-09-13 08:03:37,237][06556] Num frames 8600... +[2024-09-13 08:03:37,360][06556] Num frames 8700... +[2024-09-13 08:03:37,480][06556] Num frames 8800... +[2024-09-13 08:03:37,605][06556] Num frames 8900... +[2024-09-13 08:03:37,727][06556] Num frames 9000... +[2024-09-13 08:03:37,848][06556] Num frames 9100... +[2024-09-13 08:03:37,970][06556] Num frames 9200... +[2024-09-13 08:03:38,099][06556] Num frames 9300... +[2024-09-13 08:03:38,282][06556] Avg episode rewards: #0: 26.746, true rewards: #0: 11.746 +[2024-09-13 08:03:38,284][06556] Avg episode reward: 26.746, avg true_objective: 11.746 +[2024-09-13 08:03:38,290][06556] Num frames 9400... +[2024-09-13 08:03:38,413][06556] Num frames 9500... +[2024-09-13 08:03:38,538][06556] Num frames 9600... +[2024-09-13 08:03:38,661][06556] Num frames 9700... +[2024-09-13 08:03:38,783][06556] Num frames 9800... +[2024-09-13 08:03:38,897][06556] Avg episode rewards: #0: 24.383, true rewards: #0: 10.939 +[2024-09-13 08:03:38,898][06556] Avg episode reward: 24.383, avg true_objective: 10.939 +[2024-09-13 08:03:38,964][06556] Num frames 9900... +[2024-09-13 08:03:39,093][06556] Num frames 10000... +[2024-09-13 08:03:39,222][06556] Num frames 10100... +[2024-09-13 08:03:39,343][06556] Num frames 10200... +[2024-09-13 08:03:39,469][06556] Num frames 10300... +[2024-09-13 08:03:39,592][06556] Num frames 10400... +[2024-09-13 08:03:39,716][06556] Num frames 10500... +[2024-09-13 08:03:39,838][06556] Num frames 10600... +[2024-09-13 08:03:39,961][06556] Num frames 10700... +[2024-09-13 08:03:40,087][06556] Num frames 10800... +[2024-09-13 08:03:40,215][06556] Num frames 10900... +[2024-09-13 08:03:40,349][06556] Num frames 11000... +[2024-09-13 08:03:40,470][06556] Num frames 11100... +[2024-09-13 08:03:40,600][06556] Avg episode rewards: #0: 24.957, true rewards: #0: 11.157 +[2024-09-13 08:03:40,602][06556] Avg episode reward: 24.957, avg true_objective: 11.157 +[2024-09-13 08:04:42,283][06556] Replay video saved to /content/train_dir/default_experiment/replay.mp4!