[2024-09-30 13:50:20,068][02144] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-30 13:50:20,072][02144] Rollout worker 0 uses device cpu [2024-09-30 13:50:20,073][02144] Rollout worker 1 uses device cpu [2024-09-30 13:50:20,074][02144] Rollout worker 2 uses device cpu [2024-09-30 13:50:20,078][02144] Rollout worker 3 uses device cpu [2024-09-30 13:50:20,079][02144] Rollout worker 4 uses device cpu [2024-09-30 13:50:20,080][02144] Rollout worker 5 uses device cpu [2024-09-30 13:50:20,081][02144] Rollout worker 6 uses device cpu [2024-09-30 13:50:20,082][02144] Rollout worker 7 uses device cpu [2024-09-30 13:50:20,254][02144] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 13:50:20,256][02144] InferenceWorker_p0-w0: min num requests: 2 [2024-09-30 13:50:20,290][02144] Starting all processes... [2024-09-30 13:50:20,291][02144] Starting process learner_proc0 [2024-09-30 13:50:20,993][02144] Starting all processes... [2024-09-30 13:50:21,003][02144] Starting process inference_proc0-0 [2024-09-30 13:50:21,004][02144] Starting process rollout_proc0 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc1 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc2 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc3 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc4 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc5 [2024-09-30 13:50:21,005][02144] Starting process rollout_proc6 [2024-09-30 13:50:21,006][02144] Starting process rollout_proc7 [2024-09-30 13:50:36,971][05217] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 13:50:36,978][05217] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-30 13:50:37,083][05230] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 13:50:37,086][05230] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-30 13:50:37,087][05217] Num visible devices: 1 [2024-09-30 13:50:37,144][05217] Starting seed is not provided [2024-09-30 13:50:37,145][05217] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 13:50:37,145][05217] Initializing actor-critic model on device cuda:0 [2024-09-30 13:50:37,145][05217] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 13:50:37,157][05217] RunningMeanStd input shape: (1,) [2024-09-30 13:50:37,189][05237] Worker 6 uses CPU cores [0] [2024-09-30 13:50:37,253][05230] Num visible devices: 1 [2024-09-30 13:50:37,305][05232] Worker 2 uses CPU cores [0] [2024-09-30 13:50:37,310][05217] ConvEncoder: input_channels=3 [2024-09-30 13:50:37,314][05233] Worker 1 uses CPU cores [1] [2024-09-30 13:50:37,405][05231] Worker 0 uses CPU cores [0] [2024-09-30 13:50:37,531][05234] Worker 3 uses CPU cores [1] [2024-09-30 13:50:37,641][05238] Worker 7 uses CPU cores [1] [2024-09-30 13:50:37,650][05235] Worker 4 uses CPU cores [0] [2024-09-30 13:50:37,675][05236] Worker 5 uses CPU cores [1] [2024-09-30 13:50:37,792][05217] Conv encoder output size: 512 [2024-09-30 13:50:37,792][05217] Policy head output size: 512 [2024-09-30 13:50:37,866][05217] Created Actor Critic model with architecture: [2024-09-30 13:50:37,866][05217] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-30 13:50:38,349][05217] Using optimizer [2024-09-30 13:50:39,324][05217] No checkpoints found [2024-09-30 13:50:39,325][05217] Did not load from checkpoint, starting from scratch! [2024-09-30 13:50:39,325][05217] Initialized policy 0 weights for model version 0 [2024-09-30 13:50:39,335][05217] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-30 13:50:39,343][05217] LearnerWorker_p0 finished initialization! [2024-09-30 13:50:39,610][05230] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 13:50:39,612][05230] RunningMeanStd input shape: (1,) [2024-09-30 13:50:39,631][05230] ConvEncoder: input_channels=3 [2024-09-30 13:50:39,798][05230] Conv encoder output size: 512 [2024-09-30 13:50:39,800][05230] Policy head output size: 512 [2024-09-30 13:50:39,882][02144] Inference worker 0-0 is ready! [2024-09-30 13:50:39,888][02144] All inference workers are ready! Signal rollout workers to start! [2024-09-30 13:50:40,006][02144] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-30 13:50:40,114][05232] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,113][05231] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,112][05237] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,110][05235] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,250][02144] Heartbeat connected on Batcher_0 [2024-09-30 13:50:40,254][02144] Heartbeat connected on LearnerWorker_p0 [2024-09-30 13:50:40,266][05238] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,268][05233] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,267][05236] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,271][05234] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 13:50:40,293][02144] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-30 13:50:41,285][05236] Decorrelating experience for 0 frames... [2024-09-30 13:50:41,283][05233] Decorrelating experience for 0 frames... [2024-09-30 13:50:41,508][05231] Decorrelating experience for 0 frames... [2024-09-30 13:50:41,510][05235] Decorrelating experience for 0 frames... [2024-09-30 13:50:41,513][05237] Decorrelating experience for 0 frames... [2024-09-30 13:50:41,769][05234] Decorrelating experience for 0 frames... [2024-09-30 13:50:42,524][05238] Decorrelating experience for 0 frames... [2024-09-30 13:50:42,653][05231] Decorrelating experience for 32 frames... [2024-09-30 13:50:42,657][05235] Decorrelating experience for 32 frames... [2024-09-30 13:50:42,659][05237] Decorrelating experience for 32 frames... [2024-09-30 13:50:42,907][05234] Decorrelating experience for 32 frames... [2024-09-30 13:50:42,962][05233] Decorrelating experience for 32 frames... [2024-09-30 13:50:43,581][05238] Decorrelating experience for 32 frames... [2024-09-30 13:50:44,069][05234] Decorrelating experience for 64 frames... [2024-09-30 13:50:44,220][05232] Decorrelating experience for 0 frames... [2024-09-30 13:50:44,567][05238] Decorrelating experience for 64 frames... [2024-09-30 13:50:44,598][05237] Decorrelating experience for 64 frames... [2024-09-30 13:50:44,600][05231] Decorrelating experience for 64 frames... [2024-09-30 13:50:44,602][05235] Decorrelating experience for 64 frames... [2024-09-30 13:50:45,006][02144] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-30 13:50:45,341][05232] Decorrelating experience for 32 frames... [2024-09-30 13:50:45,410][05236] Decorrelating experience for 32 frames... [2024-09-30 13:50:45,675][05231] Decorrelating experience for 96 frames... [2024-09-30 13:50:45,850][02144] Heartbeat connected on RolloutWorker_w0 [2024-09-30 13:50:46,115][05234] Decorrelating experience for 96 frames... [2024-09-30 13:50:46,137][05233] Decorrelating experience for 64 frames... [2024-09-30 13:50:46,196][05238] Decorrelating experience for 96 frames... [2024-09-30 13:50:46,425][02144] Heartbeat connected on RolloutWorker_w3 [2024-09-30 13:50:46,483][02144] Heartbeat connected on RolloutWorker_w7 [2024-09-30 13:50:46,778][05237] Decorrelating experience for 96 frames... [2024-09-30 13:50:46,928][05236] Decorrelating experience for 64 frames... [2024-09-30 13:50:46,958][02144] Heartbeat connected on RolloutWorker_w6 [2024-09-30 13:50:47,060][05232] Decorrelating experience for 64 frames... [2024-09-30 13:50:47,339][05235] Decorrelating experience for 96 frames... [2024-09-30 13:50:47,467][02144] Heartbeat connected on RolloutWorker_w4 [2024-09-30 13:50:47,727][05232] Decorrelating experience for 96 frames... [2024-09-30 13:50:47,816][02144] Heartbeat connected on RolloutWorker_w2 [2024-09-30 13:50:47,989][05233] Decorrelating experience for 96 frames... [2024-09-30 13:50:48,133][02144] Heartbeat connected on RolloutWorker_w1 [2024-09-30 13:50:48,280][05236] Decorrelating experience for 96 frames... [2024-09-30 13:50:48,511][02144] Heartbeat connected on RolloutWorker_w5 [2024-09-30 13:50:50,006][02144] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 68.0. Samples: 680. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-30 13:50:50,008][02144] Avg episode reward: [(0, '1.060')] [2024-09-30 13:50:51,702][05217] Signal inference workers to stop experience collection... [2024-09-30 13:50:51,742][05230] InferenceWorker_p0-w0: stopping experience collection [2024-09-30 13:50:55,006][02144] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 175.1. Samples: 2626. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-30 13:50:55,007][02144] Avg episode reward: [(0, '2.331')] [2024-09-30 13:50:55,624][05217] Signal inference workers to resume experience collection... [2024-09-30 13:50:55,625][05230] InferenceWorker_p0-w0: resuming experience collection [2024-09-30 13:51:00,006][02144] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 201.6. Samples: 4032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-30 13:51:00,009][02144] Avg episode reward: [(0, '3.628')] [2024-09-30 13:51:03,954][05230] Updated weights for policy 0, policy_version 10 (0.0020) [2024-09-30 13:51:05,006][02144] Fps is (10 sec: 4505.6, 60 sec: 1802.3, 300 sec: 1802.3). Total num frames: 45056. Throughput: 0: 435.4. Samples: 10886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:05,010][02144] Avg episode reward: [(0, '4.219')] [2024-09-30 13:51:10,006][02144] Fps is (10 sec: 3686.4, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 521.5. Samples: 15644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:51:10,008][02144] Avg episode reward: [(0, '4.310')] [2024-09-30 13:51:15,006][02144] Fps is (10 sec: 3276.8, 60 sec: 2223.6, 300 sec: 2223.6). Total num frames: 77824. Throughput: 0: 514.2. Samples: 17996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:15,008][02144] Avg episode reward: [(0, '4.322')] [2024-09-30 13:51:15,561][05230] Updated weights for policy 0, policy_version 20 (0.0043) [2024-09-30 13:51:20,006][02144] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 621.1. Samples: 24842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:20,015][02144] Avg episode reward: [(0, '4.350')] [2024-09-30 13:51:25,008][02144] Fps is (10 sec: 4095.2, 60 sec: 2639.5, 300 sec: 2639.5). Total num frames: 118784. Throughput: 0: 674.7. Samples: 30364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:25,013][02144] Avg episode reward: [(0, '4.401')] [2024-09-30 13:51:25,026][05217] Saving new best policy, reward=4.401! [2024-09-30 13:51:26,327][05230] Updated weights for policy 0, policy_version 30 (0.0014) [2024-09-30 13:51:30,006][02144] Fps is (10 sec: 3276.8, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 131072. Throughput: 0: 719.2. Samples: 32362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:51:30,008][02144] Avg episode reward: [(0, '4.407')] [2024-09-30 13:51:30,012][05217] Saving new best policy, reward=4.407! [2024-09-30 13:51:35,006][02144] Fps is (10 sec: 3687.1, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 839.9. Samples: 38476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:35,010][02144] Avg episode reward: [(0, '4.278')] [2024-09-30 13:51:36,554][05230] Updated weights for policy 0, policy_version 40 (0.0026) [2024-09-30 13:51:40,006][02144] Fps is (10 sec: 4505.6, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 945.4. Samples: 45168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:51:40,015][02144] Avg episode reward: [(0, '4.390')] [2024-09-30 13:51:45,008][02144] Fps is (10 sec: 3276.2, 60 sec: 3140.2, 300 sec: 2898.6). Total num frames: 188416. Throughput: 0: 959.7. Samples: 47220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:45,010][02144] Avg episode reward: [(0, '4.307')] [2024-09-30 13:51:48,384][05230] Updated weights for policy 0, policy_version 50 (0.0018) [2024-09-30 13:51:50,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 921.4. Samples: 52350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:51:50,013][02144] Avg episode reward: [(0, '4.398')] [2024-09-30 13:51:55,006][02144] Fps is (10 sec: 4506.4, 60 sec: 3891.2, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 967.6. Samples: 59184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:51:55,012][02144] Avg episode reward: [(0, '4.475')] [2024-09-30 13:51:55,022][05217] Saving new best policy, reward=4.475! [2024-09-30 13:51:57,759][05230] Updated weights for policy 0, policy_version 60 (0.0030) [2024-09-30 13:52:00,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 976.1. Samples: 61920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:52:00,010][02144] Avg episode reward: [(0, '4.575')] [2024-09-30 13:52:00,012][05217] Saving new best policy, reward=4.575! [2024-09-30 13:52:05,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 913.7. Samples: 65960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-30 13:52:05,014][02144] Avg episode reward: [(0, '4.620')] [2024-09-30 13:52:05,020][05217] Saving new best policy, reward=4.620! [2024-09-30 13:52:09,422][05230] Updated weights for policy 0, policy_version 70 (0.0030) [2024-09-30 13:52:10,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3185.8). Total num frames: 286720. Throughput: 0: 943.2. Samples: 72806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:52:10,007][02144] Avg episode reward: [(0, '4.585')] [2024-09-30 13:52:15,006][02144] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 976.0. Samples: 76284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:52:15,008][02144] Avg episode reward: [(0, '4.452')] [2024-09-30 13:52:15,020][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-09-30 13:52:20,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3235.9). Total num frames: 323584. Throughput: 0: 939.5. Samples: 80752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:52:20,008][02144] Avg episode reward: [(0, '4.331')] [2024-09-30 13:52:21,133][05230] Updated weights for policy 0, policy_version 80 (0.0041) [2024-09-30 13:52:25,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 920.7. Samples: 86600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:52:25,008][02144] Avg episode reward: [(0, '4.285')] [2024-09-30 13:52:30,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 952.0. Samples: 90058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:52:30,008][02144] Avg episode reward: [(0, '4.250')] [2024-09-30 13:52:30,297][05230] Updated weights for policy 0, policy_version 90 (0.0038) [2024-09-30 13:52:35,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 958.9. Samples: 95502. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-30 13:52:35,009][02144] Avg episode reward: [(0, '4.393')] [2024-09-30 13:52:40,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3310.9). Total num frames: 397312. Throughput: 0: 920.9. Samples: 100624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-30 13:52:40,008][02144] Avg episode reward: [(0, '4.693')] [2024-09-30 13:52:40,013][05217] Saving new best policy, reward=4.693! [2024-09-30 13:52:42,089][05230] Updated weights for policy 0, policy_version 100 (0.0025) [2024-09-30 13:52:45,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 930.5. Samples: 103794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:52:45,013][02144] Avg episode reward: [(0, '4.572')] [2024-09-30 13:52:50,006][02144] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 981.3. Samples: 110118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:52:50,012][02144] Avg episode reward: [(0, '4.606')] [2024-09-30 13:52:53,564][05230] Updated weights for policy 0, policy_version 110 (0.0038) [2024-09-30 13:52:55,009][02144] Fps is (10 sec: 2866.4, 60 sec: 3618.0, 300 sec: 3337.4). Total num frames: 450560. Throughput: 0: 921.0. Samples: 114254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:52:55,012][02144] Avg episode reward: [(0, '4.705')] [2024-09-30 13:52:55,052][05217] Saving new best policy, reward=4.705! [2024-09-30 13:53:00,006][02144] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 916.0. Samples: 117502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:53:00,008][02144] Avg episode reward: [(0, '4.532')] [2024-09-30 13:53:03,307][05230] Updated weights for policy 0, policy_version 120 (0.0025) [2024-09-30 13:53:05,006][02144] Fps is (10 sec: 4506.8, 60 sec: 3822.9, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 966.9. Samples: 124264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:53:05,008][02144] Avg episode reward: [(0, '4.425')] [2024-09-30 13:53:10,008][02144] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3413.3). Total num frames: 512000. Throughput: 0: 939.6. Samples: 128882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 13:53:10,010][02144] Avg episode reward: [(0, '4.420')] [2024-09-30 13:53:15,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 917.9. Samples: 131364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:53:15,008][02144] Avg episode reward: [(0, '4.603')] [2024-09-30 13:53:15,093][05230] Updated weights for policy 0, policy_version 130 (0.0026) [2024-09-30 13:53:20,006][02144] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 948.8. Samples: 138198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:53:20,010][02144] Avg episode reward: [(0, '4.719')] [2024-09-30 13:53:20,015][05217] Saving new best policy, reward=4.719! [2024-09-30 13:53:25,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 956.9. Samples: 143684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:53:25,008][02144] Avg episode reward: [(0, '4.757')] [2024-09-30 13:53:25,025][05217] Saving new best policy, reward=4.757! [2024-09-30 13:53:25,304][05230] Updated weights for policy 0, policy_version 140 (0.0021) [2024-09-30 13:53:30,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3445.5). Total num frames: 585728. Throughput: 0: 929.8. Samples: 145634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:53:30,008][02144] Avg episode reward: [(0, '4.460')] [2024-09-30 13:53:35,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3464.1). Total num frames: 606208. Throughput: 0: 926.7. Samples: 151818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:53:35,008][02144] Avg episode reward: [(0, '4.667')] [2024-09-30 13:53:36,022][05230] Updated weights for policy 0, policy_version 150 (0.0018) [2024-09-30 13:53:40,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 978.0. Samples: 158260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:53:40,013][02144] Avg episode reward: [(0, '4.724')] [2024-09-30 13:53:45,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3476.1). Total num frames: 643072. Throughput: 0: 950.0. Samples: 160250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:53:45,013][02144] Avg episode reward: [(0, '4.774')] [2024-09-30 13:53:45,023][05217] Saving new best policy, reward=4.774! [2024-09-30 13:53:47,768][05230] Updated weights for policy 0, policy_version 160 (0.0013) [2024-09-30 13:53:50,006][02144] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 919.8. Samples: 165656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:53:50,010][02144] Avg episode reward: [(0, '4.969')] [2024-09-30 13:53:50,015][05217] Saving new best policy, reward=4.969! [2024-09-30 13:53:55,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3507.9). Total num frames: 684032. Throughput: 0: 968.1. Samples: 172446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:53:55,013][02144] Avg episode reward: [(0, '4.774')] [2024-09-30 13:53:57,468][05230] Updated weights for policy 0, policy_version 170 (0.0041) [2024-09-30 13:54:00,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3502.1). Total num frames: 700416. Throughput: 0: 969.6. Samples: 174994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:54:00,013][02144] Avg episode reward: [(0, '4.753')] [2024-09-30 13:54:05,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3496.6). Total num frames: 716800. Throughput: 0: 913.0. Samples: 179282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:54:05,012][02144] Avg episode reward: [(0, '4.716')] [2024-09-30 13:54:08,711][05230] Updated weights for policy 0, policy_version 180 (0.0027) [2024-09-30 13:54:10,006][02144] Fps is (10 sec: 4095.9, 60 sec: 3823.0, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 945.8. Samples: 186244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:54:10,010][02144] Avg episode reward: [(0, '4.654')] [2024-09-30 13:54:15,008][02144] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 979.2. Samples: 189700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:54:15,012][02144] Avg episode reward: [(0, '5.077')] [2024-09-30 13:54:15,026][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2024-09-30 13:54:15,228][05217] Saving new best policy, reward=5.077! [2024-09-30 13:54:20,007][02144] Fps is (10 sec: 3276.6, 60 sec: 3686.3, 300 sec: 3518.8). Total num frames: 774144. Throughput: 0: 934.3. Samples: 193862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:54:20,014][02144] Avg episode reward: [(0, '5.246')] [2024-09-30 13:54:20,017][05217] Saving new best policy, reward=5.246! [2024-09-30 13:54:20,819][05230] Updated weights for policy 0, policy_version 190 (0.0023) [2024-09-30 13:54:25,006][02144] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3531.7). Total num frames: 794624. Throughput: 0: 928.5. Samples: 200042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-30 13:54:25,008][02144] Avg episode reward: [(0, '5.521')] [2024-09-30 13:54:25,019][05217] Saving new best policy, reward=5.521! [2024-09-30 13:54:29,507][05230] Updated weights for policy 0, policy_version 200 (0.0024) [2024-09-30 13:54:30,006][02144] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 959.2. Samples: 203414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:54:30,008][02144] Avg episode reward: [(0, '5.354')] [2024-09-30 13:54:35,009][02144] Fps is (10 sec: 3685.1, 60 sec: 3754.4, 300 sec: 3538.2). Total num frames: 831488. Throughput: 0: 952.7. Samples: 208530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:54:35,016][02144] Avg episode reward: [(0, '5.600')] [2024-09-30 13:54:35,042][05217] Saving new best policy, reward=5.600! [2024-09-30 13:54:40,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 920.6. Samples: 213872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:54:40,008][02144] Avg episode reward: [(0, '5.434')] [2024-09-30 13:54:41,366][05230] Updated weights for policy 0, policy_version 210 (0.0025) [2024-09-30 13:54:45,006][02144] Fps is (10 sec: 4097.3, 60 sec: 3822.9, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 940.5. Samples: 217318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:54:45,008][02144] Avg episode reward: [(0, '5.259')] [2024-09-30 13:54:50,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 982.0. Samples: 223472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:54:50,013][02144] Avg episode reward: [(0, '5.482')] [2024-09-30 13:54:52,228][05230] Updated weights for policy 0, policy_version 220 (0.0040) [2024-09-30 13:54:55,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 923.1. Samples: 227782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:54:55,011][02144] Avg episode reward: [(0, '5.555')] [2024-09-30 13:55:00,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 922.1. Samples: 231194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:55:00,009][02144] Avg episode reward: [(0, '5.824')] [2024-09-30 13:55:00,012][05217] Saving new best policy, reward=5.824! [2024-09-30 13:55:02,446][05230] Updated weights for policy 0, policy_version 230 (0.0022) [2024-09-30 13:55:05,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 977.7. Samples: 237858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:55:05,009][02144] Avg episode reward: [(0, '5.657')] [2024-09-30 13:55:10,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3580.2). Total num frames: 966656. Throughput: 0: 938.3. Samples: 242266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:55:10,013][02144] Avg episode reward: [(0, '5.724')] [2024-09-30 13:55:14,064][05230] Updated weights for policy 0, policy_version 240 (0.0029) [2024-09-30 13:55:15,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3589.6). Total num frames: 987136. Throughput: 0: 922.9. Samples: 244946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:55:15,008][02144] Avg episode reward: [(0, '5.487')] [2024-09-30 13:55:20,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3598.6). Total num frames: 1007616. Throughput: 0: 963.5. Samples: 251886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:55:20,008][02144] Avg episode reward: [(0, '5.673')] [2024-09-30 13:55:24,006][05230] Updated weights for policy 0, policy_version 250 (0.0038) [2024-09-30 13:55:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3593.0). Total num frames: 1024000. Throughput: 0: 964.0. Samples: 257250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:55:25,011][02144] Avg episode reward: [(0, '6.044')] [2024-09-30 13:55:25,022][05217] Saving new best policy, reward=6.044! [2024-09-30 13:55:30,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3587.5). Total num frames: 1040384. Throughput: 0: 931.7. Samples: 259244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 13:55:30,013][02144] Avg episode reward: [(0, '5.907')] [2024-09-30 13:55:34,733][05230] Updated weights for policy 0, policy_version 260 (0.0027) [2024-09-30 13:55:35,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 937.8. Samples: 265674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:55:35,008][02144] Avg episode reward: [(0, '5.771')] [2024-09-30 13:55:40,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 981.9. Samples: 271966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:55:40,010][02144] Avg episode reward: [(0, '6.137')] [2024-09-30 13:55:40,018][05217] Saving new best policy, reward=6.137! [2024-09-30 13:55:45,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 949.4. Samples: 273916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:55:45,013][02144] Avg episode reward: [(0, '6.140')] [2024-09-30 13:55:45,032][05217] Saving new best policy, reward=6.140! [2024-09-30 13:55:46,768][05230] Updated weights for policy 0, policy_version 270 (0.0028) [2024-09-30 13:55:50,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 927.0. Samples: 279572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:55:50,009][02144] Avg episode reward: [(0, '6.099')] [2024-09-30 13:55:55,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1142784. Throughput: 0: 981.5. Samples: 286434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:55:55,012][02144] Avg episode reward: [(0, '5.826')] [2024-09-30 13:55:55,653][05230] Updated weights for policy 0, policy_version 280 (0.0028) [2024-09-30 13:56:00,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1159168. Throughput: 0: 976.7. Samples: 288896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:56:00,012][02144] Avg episode reward: [(0, '6.086')] [2024-09-30 13:56:05,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1175552. Throughput: 0: 922.4. Samples: 293392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:56:05,012][02144] Avg episode reward: [(0, '5.960')] [2024-09-30 13:56:07,470][05230] Updated weights for policy 0, policy_version 290 (0.0031) [2024-09-30 13:56:10,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1196032. Throughput: 0: 956.1. Samples: 300274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:56:10,009][02144] Avg episode reward: [(0, '6.102')] [2024-09-30 13:56:15,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1216512. Throughput: 0: 989.2. Samples: 303756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:56:15,013][02144] Avg episode reward: [(0, '6.541')] [2024-09-30 13:56:15,024][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth... [2024-09-30 13:56:15,233][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-09-30 13:56:15,264][05217] Saving new best policy, reward=6.541! [2024-09-30 13:56:19,010][05230] Updated weights for policy 0, policy_version 300 (0.0030) [2024-09-30 13:56:20,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1228800. Throughput: 0: 934.5. Samples: 307728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:56:20,010][02144] Avg episode reward: [(0, '6.586')] [2024-09-30 13:56:20,015][05217] Saving new best policy, reward=6.586! [2024-09-30 13:56:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1253376. Throughput: 0: 934.4. Samples: 314014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:56:25,013][02144] Avg episode reward: [(0, '6.863')] [2024-09-30 13:56:25,021][05217] Saving new best policy, reward=6.863! [2024-09-30 13:56:28,548][05230] Updated weights for policy 0, policy_version 310 (0.0032) [2024-09-30 13:56:30,006][02144] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1273856. Throughput: 0: 964.6. Samples: 317322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:56:30,013][02144] Avg episode reward: [(0, '7.465')] [2024-09-30 13:56:30,015][05217] Saving new best policy, reward=7.465! [2024-09-30 13:56:35,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1286144. Throughput: 0: 950.7. Samples: 322354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:56:35,008][02144] Avg episode reward: [(0, '7.373')] [2024-09-30 13:56:40,006][02144] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1306624. Throughput: 0: 919.9. Samples: 327830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:56:40,011][02144] Avg episode reward: [(0, '7.864')] [2024-09-30 13:56:40,014][05217] Saving new best policy, reward=7.864! [2024-09-30 13:56:40,382][05230] Updated weights for policy 0, policy_version 320 (0.0036) [2024-09-30 13:56:45,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1331200. Throughput: 0: 939.0. Samples: 331152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:56:45,009][02144] Avg episode reward: [(0, '7.935')] [2024-09-30 13:56:45,019][05217] Saving new best policy, reward=7.935! [2024-09-30 13:56:50,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1347584. Throughput: 0: 969.2. Samples: 337004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:56:50,008][02144] Avg episode reward: [(0, '8.830')] [2024-09-30 13:56:50,011][05217] Saving new best policy, reward=8.830! [2024-09-30 13:56:51,171][05230] Updated weights for policy 0, policy_version 330 (0.0022) [2024-09-30 13:56:55,006][02144] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 1363968. Throughput: 0: 913.9. Samples: 341400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:56:55,009][02144] Avg episode reward: [(0, '8.470')] [2024-09-30 13:57:00,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1384448. Throughput: 0: 912.2. Samples: 344804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 13:57:00,011][02144] Avg episode reward: [(0, '8.763')] [2024-09-30 13:57:01,300][05230] Updated weights for policy 0, policy_version 340 (0.0023) [2024-09-30 13:57:05,006][02144] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1404928. Throughput: 0: 973.6. Samples: 351540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:57:05,011][02144] Avg episode reward: [(0, '8.569')] [2024-09-30 13:57:10,007][02144] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1417216. Throughput: 0: 926.6. Samples: 355712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:57:10,009][02144] Avg episode reward: [(0, '9.030')] [2024-09-30 13:57:10,011][05217] Saving new best policy, reward=9.030! [2024-09-30 13:57:13,224][05230] Updated weights for policy 0, policy_version 350 (0.0026) [2024-09-30 13:57:15,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1441792. Throughput: 0: 916.8. Samples: 358578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:57:15,013][02144] Avg episode reward: [(0, '9.349')] [2024-09-30 13:57:15,023][05217] Saving new best policy, reward=9.349! [2024-09-30 13:57:20,006][02144] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1462272. Throughput: 0: 955.8. Samples: 365364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:57:20,008][02144] Avg episode reward: [(0, '9.273')] [2024-09-30 13:57:23,068][05230] Updated weights for policy 0, policy_version 360 (0.0025) [2024-09-30 13:57:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1478656. Throughput: 0: 946.5. Samples: 370422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:57:25,014][02144] Avg episode reward: [(0, '9.046')] [2024-09-30 13:57:30,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1495040. Throughput: 0: 917.1. Samples: 372422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:57:30,010][02144] Avg episode reward: [(0, '8.770')] [2024-09-30 13:57:34,074][05230] Updated weights for policy 0, policy_version 370 (0.0025) [2024-09-30 13:57:35,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1519616. Throughput: 0: 936.7. Samples: 379156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:57:35,007][02144] Avg episode reward: [(0, '9.630')] [2024-09-30 13:57:35,017][05217] Saving new best policy, reward=9.630! [2024-09-30 13:57:40,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1536000. Throughput: 0: 968.2. Samples: 384968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:57:40,012][02144] Avg episode reward: [(0, '10.076')] [2024-09-30 13:57:40,015][05217] Saving new best policy, reward=10.076! [2024-09-30 13:57:45,006][02144] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1548288. Throughput: 0: 935.9. Samples: 386918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:57:45,012][02144] Avg episode reward: [(0, '10.912')] [2024-09-30 13:57:45,023][05217] Saving new best policy, reward=10.912! [2024-09-30 13:57:46,192][05230] Updated weights for policy 0, policy_version 380 (0.0029) [2024-09-30 13:57:50,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 1572864. Throughput: 0: 916.4. Samples: 392778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:57:50,011][02144] Avg episode reward: [(0, '12.522')] [2024-09-30 13:57:50,015][05217] Saving new best policy, reward=12.522! [2024-09-30 13:57:55,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1593344. Throughput: 0: 973.3. Samples: 399510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:57:55,008][02144] Avg episode reward: [(0, '13.268')] [2024-09-30 13:57:55,030][05217] Saving new best policy, reward=13.268! [2024-09-30 13:57:55,412][05230] Updated weights for policy 0, policy_version 390 (0.0026) [2024-09-30 13:58:00,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1605632. Throughput: 0: 953.7. Samples: 401496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:58:00,010][02144] Avg episode reward: [(0, '13.488')] [2024-09-30 13:58:00,019][05217] Saving new best policy, reward=13.488! [2024-09-30 13:58:05,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1626112. Throughput: 0: 910.9. Samples: 406354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 13:58:05,013][02144] Avg episode reward: [(0, '13.000')] [2024-09-30 13:58:07,450][05230] Updated weights for policy 0, policy_version 400 (0.0031) [2024-09-30 13:58:10,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 1646592. Throughput: 0: 949.0. Samples: 413128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:58:10,013][02144] Avg episode reward: [(0, '12.906')] [2024-09-30 13:58:15,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 1667072. Throughput: 0: 969.3. Samples: 416042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:58:15,012][02144] Avg episode reward: [(0, '13.305')] [2024-09-30 13:58:15,024][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000407_1667072.pth... [2024-09-30 13:58:15,227][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2024-09-30 13:58:19,059][05230] Updated weights for policy 0, policy_version 410 (0.0023) [2024-09-30 13:58:20,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1683456. Throughput: 0: 911.0. Samples: 420150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:58:20,011][02144] Avg episode reward: [(0, '14.141')] [2024-09-30 13:58:20,014][05217] Saving new best policy, reward=14.141! [2024-09-30 13:58:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1703936. Throughput: 0: 932.3. Samples: 426920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:58:25,009][02144] Avg episode reward: [(0, '13.650')] [2024-09-30 13:58:28,263][05230] Updated weights for policy 0, policy_version 420 (0.0047) [2024-09-30 13:58:30,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1724416. Throughput: 0: 965.8. Samples: 430378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:58:30,011][02144] Avg episode reward: [(0, '13.773')] [2024-09-30 13:58:35,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1736704. Throughput: 0: 933.1. Samples: 434768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:58:35,011][02144] Avg episode reward: [(0, '12.944')] [2024-09-30 13:58:40,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1757184. Throughput: 0: 917.1. Samples: 440778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:58:40,008][02144] Avg episode reward: [(0, '12.494')] [2024-09-30 13:58:40,076][05230] Updated weights for policy 0, policy_version 430 (0.0028) [2024-09-30 13:58:45,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1781760. Throughput: 0: 948.4. Samples: 444174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:58:45,008][02144] Avg episode reward: [(0, '13.885')] [2024-09-30 13:58:50,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1798144. Throughput: 0: 963.0. Samples: 449688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:58:50,010][02144] Avg episode reward: [(0, '13.613')] [2024-09-30 13:58:51,122][05230] Updated weights for policy 0, policy_version 440 (0.0031) [2024-09-30 13:58:55,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 1814528. Throughput: 0: 925.5. Samples: 454776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:58:55,009][02144] Avg episode reward: [(0, '14.241')] [2024-09-30 13:58:55,025][05217] Saving new best policy, reward=14.241! [2024-09-30 13:59:00,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1839104. Throughput: 0: 936.2. Samples: 458172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:59:00,008][02144] Avg episode reward: [(0, '13.675')] [2024-09-30 13:59:00,695][05230] Updated weights for policy 0, policy_version 450 (0.0052) [2024-09-30 13:59:05,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1855488. Throughput: 0: 982.7. Samples: 464372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:05,009][02144] Avg episode reward: [(0, '14.243')] [2024-09-30 13:59:05,020][05217] Saving new best policy, reward=14.243! [2024-09-30 13:59:10,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1871872. Throughput: 0: 923.0. Samples: 468454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:10,008][02144] Avg episode reward: [(0, '14.455')] [2024-09-30 13:59:10,019][05217] Saving new best policy, reward=14.455! [2024-09-30 13:59:12,783][05230] Updated weights for policy 0, policy_version 460 (0.0041) [2024-09-30 13:59:15,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1892352. Throughput: 0: 920.0. Samples: 471780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:59:15,013][02144] Avg episode reward: [(0, '14.069')] [2024-09-30 13:59:20,011][02144] Fps is (10 sec: 4503.5, 60 sec: 3890.9, 300 sec: 3804.4). Total num frames: 1916928. Throughput: 0: 975.2. Samples: 478658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 13:59:20,018][02144] Avg episode reward: [(0, '14.095')] [2024-09-30 13:59:22,998][05230] Updated weights for policy 0, policy_version 470 (0.0043) [2024-09-30 13:59:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1929216. Throughput: 0: 940.2. Samples: 483088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:25,012][02144] Avg episode reward: [(0, '14.572')] [2024-09-30 13:59:25,026][05217] Saving new best policy, reward=14.572! [2024-09-30 13:59:30,006][02144] Fps is (10 sec: 3278.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 1949696. Throughput: 0: 920.8. Samples: 485608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 13:59:30,008][02144] Avg episode reward: [(0, '14.825')] [2024-09-30 13:59:30,013][05217] Saving new best policy, reward=14.825! [2024-09-30 13:59:33,896][05230] Updated weights for policy 0, policy_version 480 (0.0026) [2024-09-30 13:59:35,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1970176. Throughput: 0: 945.0. Samples: 492214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:35,014][02144] Avg episode reward: [(0, '15.375')] [2024-09-30 13:59:35,024][05217] Saving new best policy, reward=15.375! [2024-09-30 13:59:40,008][02144] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 1986560. Throughput: 0: 949.2. Samples: 497490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 13:59:40,010][02144] Avg episode reward: [(0, '15.640')] [2024-09-30 13:59:40,013][05217] Saving new best policy, reward=15.640! [2024-09-30 13:59:45,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2002944. Throughput: 0: 919.7. Samples: 499560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 13:59:45,008][02144] Avg episode reward: [(0, '15.696')] [2024-09-30 13:59:45,019][05217] Saving new best policy, reward=15.696! [2024-09-30 13:59:45,801][05230] Updated weights for policy 0, policy_version 490 (0.0017) [2024-09-30 13:59:50,006][02144] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 2023424. Throughput: 0: 926.7. Samples: 506074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:50,008][02144] Avg episode reward: [(0, '16.758')] [2024-09-30 13:59:50,013][05217] Saving new best policy, reward=16.758! [2024-09-30 13:59:55,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2043904. Throughput: 0: 974.4. Samples: 512300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 13:59:55,011][02144] Avg episode reward: [(0, '16.260')] [2024-09-30 13:59:55,365][05230] Updated weights for policy 0, policy_version 500 (0.0032) [2024-09-30 14:00:00,009][02144] Fps is (10 sec: 3685.2, 60 sec: 3686.2, 300 sec: 3762.7). Total num frames: 2060288. Throughput: 0: 945.4. Samples: 514328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:00:00,013][02144] Avg episode reward: [(0, '16.642')] [2024-09-30 14:00:05,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2080768. Throughput: 0: 916.3. Samples: 519888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:05,013][02144] Avg episode reward: [(0, '17.172')] [2024-09-30 14:00:05,022][05217] Saving new best policy, reward=17.172! [2024-09-30 14:00:06,478][05230] Updated weights for policy 0, policy_version 510 (0.0030) [2024-09-30 14:00:10,006][02144] Fps is (10 sec: 4507.1, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2105344. Throughput: 0: 972.1. Samples: 526834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:00:10,011][02144] Avg episode reward: [(0, '18.830')] [2024-09-30 14:00:10,014][05217] Saving new best policy, reward=18.830! [2024-09-30 14:00:15,010][02144] Fps is (10 sec: 3684.9, 60 sec: 3754.4, 300 sec: 3762.7). Total num frames: 2117632. Throughput: 0: 966.8. Samples: 529118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:00:15,017][02144] Avg episode reward: [(0, '19.025')] [2024-09-30 14:00:15,031][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth... [2024-09-30 14:00:15,227][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth [2024-09-30 14:00:15,245][05217] Saving new best policy, reward=19.025! [2024-09-30 14:00:18,453][05230] Updated weights for policy 0, policy_version 520 (0.0048) [2024-09-30 14:00:20,006][02144] Fps is (10 sec: 2867.2, 60 sec: 3618.4, 300 sec: 3762.8). Total num frames: 2134016. Throughput: 0: 918.0. Samples: 533524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:00:20,011][02144] Avg episode reward: [(0, '19.774')] [2024-09-30 14:00:20,014][05217] Saving new best policy, reward=19.774! [2024-09-30 14:00:25,006][02144] Fps is (10 sec: 4097.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2158592. Throughput: 0: 951.3. Samples: 540298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:25,011][02144] Avg episode reward: [(0, '21.144')] [2024-09-30 14:00:25,020][05217] Saving new best policy, reward=21.144! [2024-09-30 14:00:27,722][05230] Updated weights for policy 0, policy_version 530 (0.0025) [2024-09-30 14:00:30,008][02144] Fps is (10 sec: 4095.3, 60 sec: 3754.6, 300 sec: 3762.7). Total num frames: 2174976. Throughput: 0: 977.8. Samples: 543562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:30,009][02144] Avg episode reward: [(0, '21.281')] [2024-09-30 14:00:30,016][05217] Saving new best policy, reward=21.281! [2024-09-30 14:00:35,009][02144] Fps is (10 sec: 2866.4, 60 sec: 3618.0, 300 sec: 3735.0). Total num frames: 2187264. Throughput: 0: 920.7. Samples: 547508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:35,016][02144] Avg episode reward: [(0, '19.469')] [2024-09-30 14:00:39,671][05230] Updated weights for policy 0, policy_version 540 (0.0049) [2024-09-30 14:00:40,006][02144] Fps is (10 sec: 3687.0, 60 sec: 3754.8, 300 sec: 3776.6). Total num frames: 2211840. Throughput: 0: 923.7. Samples: 553868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:00:40,011][02144] Avg episode reward: [(0, '19.241')] [2024-09-30 14:00:45,006][02144] Fps is (10 sec: 4506.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2232320. Throughput: 0: 955.4. Samples: 557320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:45,011][02144] Avg episode reward: [(0, '18.327')] [2024-09-30 14:00:50,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2248704. Throughput: 0: 944.7. Samples: 562400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:50,008][02144] Avg episode reward: [(0, '19.020')] [2024-09-30 14:00:51,012][05230] Updated weights for policy 0, policy_version 550 (0.0049) [2024-09-30 14:00:55,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 2265088. Throughput: 0: 910.5. Samples: 567808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:00:55,008][02144] Avg episode reward: [(0, '18.406')] [2024-09-30 14:01:00,011][02144] Fps is (10 sec: 4094.0, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 2289664. Throughput: 0: 934.8. Samples: 571184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:01:00,012][02144] Avg episode reward: [(0, '19.519')] [2024-09-30 14:01:00,312][05230] Updated weights for policy 0, policy_version 560 (0.0019) [2024-09-30 14:01:05,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2306048. Throughput: 0: 973.2. Samples: 577316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:01:05,008][02144] Avg episode reward: [(0, '19.522')] [2024-09-30 14:01:10,006][02144] Fps is (10 sec: 3278.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2322432. Throughput: 0: 917.5. Samples: 581584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:01:10,008][02144] Avg episode reward: [(0, '20.039')] [2024-09-30 14:01:12,219][05230] Updated weights for policy 0, policy_version 570 (0.0048) [2024-09-30 14:01:15,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3790.5). Total num frames: 2347008. Throughput: 0: 922.8. Samples: 585086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:01:15,009][02144] Avg episode reward: [(0, '18.849')] [2024-09-30 14:01:20,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2367488. Throughput: 0: 989.4. Samples: 592026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:01:20,009][02144] Avg episode reward: [(0, '18.010')] [2024-09-30 14:01:21,879][05230] Updated weights for policy 0, policy_version 580 (0.0032) [2024-09-30 14:01:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2383872. Throughput: 0: 944.9. Samples: 596388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:01:25,009][02144] Avg episode reward: [(0, '18.734')] [2024-09-30 14:01:30,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2400256. Throughput: 0: 931.5. Samples: 599238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:01:30,013][02144] Avg episode reward: [(0, '20.008')] [2024-09-30 14:01:32,646][05230] Updated weights for policy 0, policy_version 590 (0.0032) [2024-09-30 14:01:35,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3790.5). Total num frames: 2424832. Throughput: 0: 972.2. Samples: 606150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:01:35,011][02144] Avg episode reward: [(0, '20.064')] [2024-09-30 14:01:40,007][02144] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3762.7). Total num frames: 2441216. Throughput: 0: 964.1. Samples: 611194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:01:40,017][02144] Avg episode reward: [(0, '21.102')] [2024-09-30 14:01:44,566][05230] Updated weights for policy 0, policy_version 600 (0.0033) [2024-09-30 14:01:45,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2457600. Throughput: 0: 935.2. Samples: 613262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:01:45,013][02144] Avg episode reward: [(0, '21.657')] [2024-09-30 14:01:45,029][05217] Saving new best policy, reward=21.657! [2024-09-30 14:01:50,006][02144] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2482176. Throughput: 0: 945.6. Samples: 619868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:01:50,010][02144] Avg episode reward: [(0, '21.972')] [2024-09-30 14:01:50,016][05217] Saving new best policy, reward=21.972! [2024-09-30 14:01:53,639][05230] Updated weights for policy 0, policy_version 610 (0.0019) [2024-09-30 14:01:55,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2498560. Throughput: 0: 986.8. Samples: 625990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:01:55,012][02144] Avg episode reward: [(0, '21.667')] [2024-09-30 14:02:00,006][02144] Fps is (10 sec: 3276.7, 60 sec: 3755.0, 300 sec: 3762.8). Total num frames: 2514944. Throughput: 0: 953.2. Samples: 627978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:02:00,009][02144] Avg episode reward: [(0, '23.001')] [2024-09-30 14:02:00,017][05217] Saving new best policy, reward=23.001! [2024-09-30 14:02:05,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2535424. Throughput: 0: 929.3. Samples: 633846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:02:05,008][02144] Avg episode reward: [(0, '23.444')] [2024-09-30 14:02:05,018][05217] Saving new best policy, reward=23.444! [2024-09-30 14:02:05,495][05230] Updated weights for policy 0, policy_version 620 (0.0041) [2024-09-30 14:02:10,006][02144] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2555904. Throughput: 0: 976.6. Samples: 640336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:02:10,008][02144] Avg episode reward: [(0, '22.812')] [2024-09-30 14:02:15,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2572288. Throughput: 0: 961.9. Samples: 642524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:02:15,013][02144] Avg episode reward: [(0, '21.320')] [2024-09-30 14:02:15,025][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth... [2024-09-30 14:02:15,235][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000407_1667072.pth [2024-09-30 14:02:17,394][05230] Updated weights for policy 0, policy_version 630 (0.0023) [2024-09-30 14:02:20,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2588672. Throughput: 0: 917.5. Samples: 647436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:02:20,013][02144] Avg episode reward: [(0, '22.866')] [2024-09-30 14:02:25,006][02144] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2613248. Throughput: 0: 958.9. Samples: 654342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:02:25,008][02144] Avg episode reward: [(0, '21.612')] [2024-09-30 14:02:26,226][05230] Updated weights for policy 0, policy_version 640 (0.0026) [2024-09-30 14:02:30,008][02144] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3776.6). Total num frames: 2633728. Throughput: 0: 982.1. Samples: 657458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:02:30,013][02144] Avg episode reward: [(0, '21.159')] [2024-09-30 14:02:35,006][02144] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2646016. Throughput: 0: 926.8. Samples: 661574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:02:35,008][02144] Avg episode reward: [(0, '20.228')] [2024-09-30 14:02:38,344][05230] Updated weights for policy 0, policy_version 650 (0.0022) [2024-09-30 14:02:40,006][02144] Fps is (10 sec: 3687.1, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2670592. Throughput: 0: 938.1. Samples: 668206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:02:40,008][02144] Avg episode reward: [(0, '19.205')] [2024-09-30 14:02:45,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2691072. Throughput: 0: 968.2. Samples: 671548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 14:02:45,008][02144] Avg episode reward: [(0, '19.292')] [2024-09-30 14:02:48,792][05230] Updated weights for policy 0, policy_version 660 (0.0020) [2024-09-30 14:02:50,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2703360. Throughput: 0: 944.4. Samples: 676346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:02:50,013][02144] Avg episode reward: [(0, '17.952')] [2024-09-30 14:02:55,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2723840. Throughput: 0: 931.8. Samples: 682266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:02:55,008][02144] Avg episode reward: [(0, '19.153')] [2024-09-30 14:02:58,741][05230] Updated weights for policy 0, policy_version 670 (0.0044) [2024-09-30 14:03:00,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2748416. Throughput: 0: 958.9. Samples: 685676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:03:00,012][02144] Avg episode reward: [(0, '19.794')] [2024-09-30 14:03:05,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2764800. Throughput: 0: 974.9. Samples: 691306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:03:05,013][02144] Avg episode reward: [(0, '21.999')] [2024-09-30 14:03:10,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2781184. Throughput: 0: 927.9. Samples: 696096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:03:10,009][02144] Avg episode reward: [(0, '22.567')] [2024-09-30 14:03:10,703][05230] Updated weights for policy 0, policy_version 680 (0.0026) [2024-09-30 14:03:15,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2801664. Throughput: 0: 934.6. Samples: 699512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:03:15,011][02144] Avg episode reward: [(0, '23.072')] [2024-09-30 14:03:20,011][02144] Fps is (10 sec: 4094.1, 60 sec: 3890.9, 300 sec: 3790.5). Total num frames: 2822144. Throughput: 0: 988.3. Samples: 706054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:03:20,020][02144] Avg episode reward: [(0, '22.839')] [2024-09-30 14:03:20,282][05230] Updated weights for policy 0, policy_version 690 (0.0030) [2024-09-30 14:03:25,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2838528. Throughput: 0: 934.4. Samples: 710254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:03:25,011][02144] Avg episode reward: [(0, '22.575')] [2024-09-30 14:03:30,006][02144] Fps is (10 sec: 3688.1, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 2859008. Throughput: 0: 932.0. Samples: 713486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:03:30,009][02144] Avg episode reward: [(0, '20.957')] [2024-09-30 14:03:31,319][05230] Updated weights for policy 0, policy_version 700 (0.0026) [2024-09-30 14:03:35,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2883584. Throughput: 0: 977.3. Samples: 720326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:03:35,008][02144] Avg episode reward: [(0, '21.175')] [2024-09-30 14:03:40,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2895872. Throughput: 0: 947.6. Samples: 724906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:03:40,011][02144] Avg episode reward: [(0, '20.918')] [2024-09-30 14:03:44,012][05230] Updated weights for policy 0, policy_version 710 (0.0025) [2024-09-30 14:03:45,006][02144] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2912256. Throughput: 0: 904.2. Samples: 726364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:03:45,011][02144] Avg episode reward: [(0, '22.046')] [2024-09-30 14:03:50,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2932736. Throughput: 0: 931.0. Samples: 733202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:03:50,008][02144] Avg episode reward: [(0, '22.032')] [2024-09-30 14:03:53,238][05230] Updated weights for policy 0, policy_version 720 (0.0029) [2024-09-30 14:03:55,006][02144] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2953216. Throughput: 0: 951.7. Samples: 738922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:03:55,013][02144] Avg episode reward: [(0, '23.460')] [2024-09-30 14:03:55,023][05217] Saving new best policy, reward=23.460! [2024-09-30 14:04:00,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2965504. Throughput: 0: 919.9. Samples: 740908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:04:00,010][02144] Avg episode reward: [(0, '23.732')] [2024-09-30 14:04:00,015][05217] Saving new best policy, reward=23.732! [2024-09-30 14:04:04,793][05230] Updated weights for policy 0, policy_version 730 (0.0043) [2024-09-30 14:04:05,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2990080. Throughput: 0: 911.5. Samples: 747066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:04:05,009][02144] Avg episode reward: [(0, '22.539')] [2024-09-30 14:04:10,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3010560. Throughput: 0: 963.2. Samples: 753596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:04:10,009][02144] Avg episode reward: [(0, '22.128')] [2024-09-30 14:04:15,011][02144] Fps is (10 sec: 3275.2, 60 sec: 3686.1, 300 sec: 3748.9). Total num frames: 3022848. Throughput: 0: 935.2. Samples: 755576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:04:15,018][02144] Avg episode reward: [(0, '21.659')] [2024-09-30 14:04:15,031][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth... [2024-09-30 14:04:15,227][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth [2024-09-30 14:04:16,939][05230] Updated weights for policy 0, policy_version 740 (0.0052) [2024-09-30 14:04:20,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3776.7). Total num frames: 3043328. Throughput: 0: 896.0. Samples: 760648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:04:20,010][02144] Avg episode reward: [(0, '22.572')] [2024-09-30 14:04:25,006][02144] Fps is (10 sec: 4507.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3067904. Throughput: 0: 947.2. Samples: 767528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:04:25,008][02144] Avg episode reward: [(0, '21.575')] [2024-09-30 14:04:25,792][05230] Updated weights for policy 0, policy_version 750 (0.0029) [2024-09-30 14:04:30,009][02144] Fps is (10 sec: 4094.5, 60 sec: 3754.4, 300 sec: 3776.6). Total num frames: 3084288. Throughput: 0: 975.0. Samples: 770244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:04:30,021][02144] Avg episode reward: [(0, '21.679')] [2024-09-30 14:04:35,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3100672. Throughput: 0: 922.2. Samples: 774702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:04:35,008][02144] Avg episode reward: [(0, '21.453')] [2024-09-30 14:04:37,874][05230] Updated weights for policy 0, policy_version 760 (0.0026) [2024-09-30 14:04:40,006][02144] Fps is (10 sec: 3687.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3121152. Throughput: 0: 936.3. Samples: 781054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 14:04:40,013][02144] Avg episode reward: [(0, '23.634')] [2024-09-30 14:04:45,009][02144] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 3137536. Throughput: 0: 964.7. Samples: 784322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:04:45,011][02144] Avg episode reward: [(0, '23.779')] [2024-09-30 14:04:45,053][05217] Saving new best policy, reward=23.779! [2024-09-30 14:04:50,009][02144] Fps is (10 sec: 2866.4, 60 sec: 3618.0, 300 sec: 3748.8). Total num frames: 3149824. Throughput: 0: 916.2. Samples: 788298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:04:50,015][02144] Avg episode reward: [(0, '22.983')] [2024-09-30 14:04:50,015][05230] Updated weights for policy 0, policy_version 770 (0.0019) [2024-09-30 14:04:55,006][02144] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3174400. Throughput: 0: 907.8. Samples: 794448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:04:55,013][02144] Avg episode reward: [(0, '24.518')] [2024-09-30 14:04:55,022][05217] Saving new best policy, reward=24.518! [2024-09-30 14:04:59,276][05230] Updated weights for policy 0, policy_version 780 (0.0026) [2024-09-30 14:05:00,006][02144] Fps is (10 sec: 4506.9, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3194880. Throughput: 0: 937.8. Samples: 797774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:05:00,015][02144] Avg episode reward: [(0, '24.790')] [2024-09-30 14:05:00,077][05217] Saving new best policy, reward=24.790! [2024-09-30 14:05:05,008][02144] Fps is (10 sec: 3685.5, 60 sec: 3686.2, 300 sec: 3748.8). Total num frames: 3211264. Throughput: 0: 936.1. Samples: 802774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:05:05,011][02144] Avg episode reward: [(0, '26.196')] [2024-09-30 14:05:05,027][05217] Saving new best policy, reward=26.196! [2024-09-30 14:05:10,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3227648. Throughput: 0: 892.2. Samples: 807676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:05:10,009][02144] Avg episode reward: [(0, '26.401')] [2024-09-30 14:05:10,012][05217] Saving new best policy, reward=26.401! [2024-09-30 14:05:11,730][05230] Updated weights for policy 0, policy_version 790 (0.0029) [2024-09-30 14:05:15,006][02144] Fps is (10 sec: 3687.4, 60 sec: 3755.0, 300 sec: 3776.7). Total num frames: 3248128. Throughput: 0: 904.3. Samples: 810932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:05:15,008][02144] Avg episode reward: [(0, '25.184')] [2024-09-30 14:05:20,012][02144] Fps is (10 sec: 4093.4, 60 sec: 3754.3, 300 sec: 3762.7). Total num frames: 3268608. Throughput: 0: 943.3. Samples: 817158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:05:20,019][02144] Avg episode reward: [(0, '25.107')] [2024-09-30 14:05:22,674][05230] Updated weights for policy 0, policy_version 800 (0.0017) [2024-09-30 14:05:25,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3748.9). Total num frames: 3280896. Throughput: 0: 893.2. Samples: 821250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:05:25,008][02144] Avg episode reward: [(0, '25.371')] [2024-09-30 14:05:30,006][02144] Fps is (10 sec: 3688.8, 60 sec: 3686.6, 300 sec: 3790.6). Total num frames: 3305472. Throughput: 0: 895.6. Samples: 824622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:05:30,009][02144] Avg episode reward: [(0, '24.616')] [2024-09-30 14:05:32,565][05230] Updated weights for policy 0, policy_version 810 (0.0018) [2024-09-30 14:05:35,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3325952. Throughput: 0: 957.3. Samples: 831376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:05:35,008][02144] Avg episode reward: [(0, '22.265')] [2024-09-30 14:05:40,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3338240. Throughput: 0: 920.4. Samples: 835868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:05:40,008][02144] Avg episode reward: [(0, '23.923')] [2024-09-30 14:05:44,541][05230] Updated weights for policy 0, policy_version 820 (0.0054) [2024-09-30 14:05:45,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 3358720. Throughput: 0: 900.6. Samples: 838300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:05:45,013][02144] Avg episode reward: [(0, '23.838')] [2024-09-30 14:05:50,009][02144] Fps is (10 sec: 4094.9, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3379200. Throughput: 0: 941.8. Samples: 845154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-30 14:05:50,010][02144] Avg episode reward: [(0, '23.926')] [2024-09-30 14:05:54,635][05230] Updated weights for policy 0, policy_version 830 (0.0014) [2024-09-30 14:05:55,008][02144] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3762.8). Total num frames: 3399680. Throughput: 0: 956.1. Samples: 850704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:05:55,015][02144] Avg episode reward: [(0, '24.526')] [2024-09-30 14:06:00,006][02144] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3416064. Throughput: 0: 929.3. Samples: 852752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-30 14:06:00,012][02144] Avg episode reward: [(0, '25.185')] [2024-09-30 14:06:05,006][02144] Fps is (10 sec: 3687.2, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 3436544. Throughput: 0: 932.5. Samples: 859114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:06:05,008][02144] Avg episode reward: [(0, '26.161')] [2024-09-30 14:06:05,361][05230] Updated weights for policy 0, policy_version 840 (0.0027) [2024-09-30 14:06:10,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3457024. Throughput: 0: 982.7. Samples: 865470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:06:10,008][02144] Avg episode reward: [(0, '24.459')] [2024-09-30 14:06:15,012][02144] Fps is (10 sec: 3684.3, 60 sec: 3754.3, 300 sec: 3748.8). Total num frames: 3473408. Throughput: 0: 952.0. Samples: 867466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:06:15,014][02144] Avg episode reward: [(0, '24.192')] [2024-09-30 14:06:15,033][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth... [2024-09-30 14:06:15,183][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000628_2572288.pth [2024-09-30 14:06:17,253][05230] Updated weights for policy 0, policy_version 850 (0.0042) [2024-09-30 14:06:20,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3755.1, 300 sec: 3762.8). Total num frames: 3493888. Throughput: 0: 919.6. Samples: 872760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:06:20,011][02144] Avg episode reward: [(0, '23.635')] [2024-09-30 14:06:25,006][02144] Fps is (10 sec: 4098.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3514368. Throughput: 0: 972.6. Samples: 879636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:06:25,008][02144] Avg episode reward: [(0, '23.619')] [2024-09-30 14:06:26,411][05230] Updated weights for policy 0, policy_version 860 (0.0018) [2024-09-30 14:06:30,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3530752. Throughput: 0: 977.7. Samples: 882298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:06:30,008][02144] Avg episode reward: [(0, '24.371')] [2024-09-30 14:06:35,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3547136. Throughput: 0: 919.7. Samples: 886540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:06:35,008][02144] Avg episode reward: [(0, '22.836')] [2024-09-30 14:06:38,310][05230] Updated weights for policy 0, policy_version 870 (0.0025) [2024-09-30 14:06:40,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3567616. Throughput: 0: 944.8. Samples: 893216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:06:40,012][02144] Avg episode reward: [(0, '24.224')] [2024-09-30 14:06:45,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3588096. Throughput: 0: 974.4. Samples: 896598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:06:45,008][02144] Avg episode reward: [(0, '23.827')] [2024-09-30 14:06:49,797][05230] Updated weights for policy 0, policy_version 880 (0.0029) [2024-09-30 14:06:50,006][02144] Fps is (10 sec: 3686.3, 60 sec: 3754.8, 300 sec: 3748.9). Total num frames: 3604480. Throughput: 0: 931.5. Samples: 901032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:06:50,012][02144] Avg episode reward: [(0, '25.168')] [2024-09-30 14:06:55,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 3624960. Throughput: 0: 922.9. Samples: 907002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:06:55,014][02144] Avg episode reward: [(0, '26.750')] [2024-09-30 14:06:55,023][05217] Saving new best policy, reward=26.750! [2024-09-30 14:06:59,349][05230] Updated weights for policy 0, policy_version 890 (0.0022) [2024-09-30 14:07:00,006][02144] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3645440. Throughput: 0: 954.2. Samples: 910398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:07:00,009][02144] Avg episode reward: [(0, '26.127')] [2024-09-30 14:07:05,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3661824. Throughput: 0: 956.2. Samples: 915790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:07:05,008][02144] Avg episode reward: [(0, '26.382')] [2024-09-30 14:07:10,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3678208. Throughput: 0: 914.9. Samples: 920808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:07:10,009][02144] Avg episode reward: [(0, '26.280')] [2024-09-30 14:07:11,173][05230] Updated weights for policy 0, policy_version 900 (0.0039) [2024-09-30 14:07:15,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3823.3, 300 sec: 3776.7). Total num frames: 3702784. Throughput: 0: 930.7. Samples: 924178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:07:15,013][02144] Avg episode reward: [(0, '26.639')] [2024-09-30 14:07:20,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3719168. Throughput: 0: 977.7. Samples: 930538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:07:20,013][02144] Avg episode reward: [(0, '26.788')] [2024-09-30 14:07:20,067][05217] Saving new best policy, reward=26.788! [2024-09-30 14:07:21,611][05230] Updated weights for policy 0, policy_version 910 (0.0016) [2024-09-30 14:07:25,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3735552. Throughput: 0: 922.6. Samples: 934734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:07:25,008][02144] Avg episode reward: [(0, '25.130')] [2024-09-30 14:07:30,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3756032. Throughput: 0: 924.1. Samples: 938184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:07:30,010][02144] Avg episode reward: [(0, '26.123')] [2024-09-30 14:07:31,844][05230] Updated weights for policy 0, policy_version 920 (0.0034) [2024-09-30 14:07:35,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3780608. Throughput: 0: 973.8. Samples: 944854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:07:35,008][02144] Avg episode reward: [(0, '27.729')] [2024-09-30 14:07:35,019][05217] Saving new best policy, reward=27.729! [2024-09-30 14:07:40,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3792896. Throughput: 0: 936.8. Samples: 949160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:07:40,012][02144] Avg episode reward: [(0, '28.676')] [2024-09-30 14:07:40,017][05217] Saving new best policy, reward=28.676! [2024-09-30 14:07:43,964][05230] Updated weights for policy 0, policy_version 930 (0.0022) [2024-09-30 14:07:45,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3813376. Throughput: 0: 916.0. Samples: 951620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:07:45,008][02144] Avg episode reward: [(0, '28.761')] [2024-09-30 14:07:45,023][05217] Saving new best policy, reward=28.761! [2024-09-30 14:07:50,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3833856. Throughput: 0: 946.4. Samples: 958378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-30 14:07:50,009][02144] Avg episode reward: [(0, '29.041')] [2024-09-30 14:07:50,013][05217] Saving new best policy, reward=29.041! [2024-09-30 14:07:53,791][05230] Updated weights for policy 0, policy_version 940 (0.0016) [2024-09-30 14:07:55,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 3850240. Throughput: 0: 954.8. Samples: 963774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:07:55,008][02144] Avg episode reward: [(0, '29.710')] [2024-09-30 14:07:55,024][05217] Saving new best policy, reward=29.710! [2024-09-30 14:08:00,006][02144] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3866624. Throughput: 0: 924.5. Samples: 965782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-30 14:08:00,009][02144] Avg episode reward: [(0, '29.466')] [2024-09-30 14:08:04,859][05230] Updated weights for policy 0, policy_version 950 (0.0019) [2024-09-30 14:08:05,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3891200. Throughput: 0: 925.5. Samples: 972186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-30 14:08:05,012][02144] Avg episode reward: [(0, '30.352')] [2024-09-30 14:08:05,023][05217] Saving new best policy, reward=30.352! [2024-09-30 14:08:10,006][02144] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3907584. Throughput: 0: 970.8. Samples: 978422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-30 14:08:10,014][02144] Avg episode reward: [(0, '27.077')] [2024-09-30 14:08:15,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3735.1). Total num frames: 3923968. Throughput: 0: 938.8. Samples: 980432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:08:15,009][02144] Avg episode reward: [(0, '27.243')] [2024-09-30 14:08:15,029][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000958_3923968.pth... [2024-09-30 14:08:15,198][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth [2024-09-30 14:08:17,081][05230] Updated weights for policy 0, policy_version 960 (0.0026) [2024-09-30 14:08:20,006][02144] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3944448. Throughput: 0: 909.4. Samples: 985776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-30 14:08:20,013][02144] Avg episode reward: [(0, '26.558')] [2024-09-30 14:08:25,006][02144] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3969024. Throughput: 0: 968.2. Samples: 992728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:08:25,009][02144] Avg episode reward: [(0, '25.204')] [2024-09-30 14:08:25,914][05230] Updated weights for policy 0, policy_version 970 (0.0028) [2024-09-30 14:08:30,006][02144] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3981312. Throughput: 0: 973.9. Samples: 995444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:08:30,012][02144] Avg episode reward: [(0, '25.734')] [2024-09-30 14:08:35,006][02144] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 4001792. Throughput: 0: 924.8. Samples: 999996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-30 14:08:35,008][02144] Avg episode reward: [(0, '24.127')] [2024-09-30 14:08:35,645][05217] Stopping Batcher_0... [2024-09-30 14:08:35,646][05217] Loop batcher_evt_loop terminating... [2024-09-30 14:08:35,647][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 14:08:35,645][02144] Component Batcher_0 stopped! [2024-09-30 14:08:35,707][05230] Weights refcount: 2 0 [2024-09-30 14:08:35,710][02144] Component InferenceWorker_p0-w0 stopped! [2024-09-30 14:08:35,714][05230] Stopping InferenceWorker_p0-w0... [2024-09-30 14:08:35,715][05230] Loop inference_proc0-0_evt_loop terminating... [2024-09-30 14:08:35,780][05217] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth [2024-09-30 14:08:35,792][05217] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 14:08:35,987][05217] Stopping LearnerWorker_p0... [2024-09-30 14:08:35,986][02144] Component LearnerWorker_p0 stopped! [2024-09-30 14:08:35,988][05217] Loop learner_proc0_evt_loop terminating... [2024-09-30 14:08:36,056][05234] Stopping RolloutWorker_w3... [2024-09-30 14:08:36,056][02144] Component RolloutWorker_w3 stopped! [2024-09-30 14:08:36,060][05234] Loop rollout_proc3_evt_loop terminating... [2024-09-30 14:08:36,071][05233] Stopping RolloutWorker_w1... [2024-09-30 14:08:36,071][02144] Component RolloutWorker_w1 stopped! [2024-09-30 14:08:36,081][05238] Stopping RolloutWorker_w7... [2024-09-30 14:08:36,072][05233] Loop rollout_proc1_evt_loop terminating... [2024-09-30 14:08:36,081][02144] Component RolloutWorker_w7 stopped! [2024-09-30 14:08:36,089][05236] Stopping RolloutWorker_w5... [2024-09-30 14:08:36,089][02144] Component RolloutWorker_w5 stopped! [2024-09-30 14:08:36,082][05238] Loop rollout_proc7_evt_loop terminating... [2024-09-30 14:08:36,089][05236] Loop rollout_proc5_evt_loop terminating... [2024-09-30 14:08:36,152][02144] Component RolloutWorker_w6 stopped! [2024-09-30 14:08:36,154][05237] Stopping RolloutWorker_w6... [2024-09-30 14:08:36,167][05237] Loop rollout_proc6_evt_loop terminating... [2024-09-30 14:08:36,177][02144] Component RolloutWorker_w2 stopped! [2024-09-30 14:08:36,179][05232] Stopping RolloutWorker_w2... [2024-09-30 14:08:36,190][05232] Loop rollout_proc2_evt_loop terminating... [2024-09-30 14:08:36,208][02144] Component RolloutWorker_w4 stopped! [2024-09-30 14:08:36,210][05235] Stopping RolloutWorker_w4... [2024-09-30 14:08:36,213][05235] Loop rollout_proc4_evt_loop terminating... [2024-09-30 14:08:36,221][02144] Component RolloutWorker_w0 stopped! [2024-09-30 14:08:36,224][02144] Waiting for process learner_proc0 to stop... [2024-09-30 14:08:36,228][05231] Stopping RolloutWorker_w0... [2024-09-30 14:08:36,240][05231] Loop rollout_proc0_evt_loop terminating... [2024-09-30 14:08:37,514][02144] Waiting for process inference_proc0-0 to join... [2024-09-30 14:08:37,518][02144] Waiting for process rollout_proc0 to join... [2024-09-30 14:08:39,498][02144] Waiting for process rollout_proc1 to join... [2024-09-30 14:08:39,655][02144] Waiting for process rollout_proc2 to join... [2024-09-30 14:08:39,659][02144] Waiting for process rollout_proc3 to join... [2024-09-30 14:08:39,664][02144] Waiting for process rollout_proc4 to join... [2024-09-30 14:08:39,667][02144] Waiting for process rollout_proc5 to join... [2024-09-30 14:08:39,672][02144] Waiting for process rollout_proc6 to join... [2024-09-30 14:08:39,677][02144] Waiting for process rollout_proc7 to join... [2024-09-30 14:08:39,680][02144] Batcher 0 profile tree view: batching: 27.0077, releasing_batches: 0.0285 [2024-09-30 14:08:39,681][02144] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 411.2638 update_model: 9.1797 weight_update: 0.0017 one_step: 0.0045 handle_policy_step: 608.6833 deserialize: 15.1250, stack: 3.2112, obs_to_device_normalize: 121.9497, forward: 326.5842, send_messages: 29.7051 prepare_outputs: 82.3052 to_cpu: 47.1069 [2024-09-30 14:08:39,684][02144] Learner 0 profile tree view: misc: 0.0058, prepare_batch: 14.2126 train: 74.0950 epoch_init: 0.0145, minibatch_init: 0.0127, losses_postprocess: 0.6513, kl_divergence: 0.7030, after_optimizer: 33.5138 calculate_losses: 26.4556 losses_init: 0.0036, forward_head: 1.1708, bptt_initial: 17.7452, tail: 1.2211, advantages_returns: 0.2880, losses: 3.8268 bptt: 1.8964 bptt_forward_core: 1.8208 update: 12.1234 clip: 0.8854 [2024-09-30 14:08:39,685][02144] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3922, enqueue_policy_requests: 101.1380, env_step: 832.9282, overhead: 14.0323, complete_rollouts: 7.4122 save_policy_outputs: 21.7558 split_output_tensors: 8.6602 [2024-09-30 14:08:39,686][02144] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3266, enqueue_policy_requests: 104.5017, env_step: 830.5496, overhead: 14.0815, complete_rollouts: 6.2295 save_policy_outputs: 21.5388 split_output_tensors: 8.4054 [2024-09-30 14:08:39,688][02144] Loop Runner_EvtLoop terminating... [2024-09-30 14:08:39,689][02144] Runner profile tree view: main_loop: 1099.3995 [2024-09-30 14:08:39,690][02144] Collected {0: 4005888}, FPS: 3643.7 [2024-09-30 14:09:48,548][02144] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-30 14:09:48,550][02144] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-30 14:09:48,552][02144] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-30 14:09:48,554][02144] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-30 14:09:48,555][02144] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-30 14:09:48,557][02144] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-30 14:09:48,559][02144] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-30 14:09:48,560][02144] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-30 14:09:48,561][02144] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-30 14:09:48,562][02144] Adding new argument 'hf_repository'='caiiofc/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-30 14:09:48,563][02144] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-30 14:09:48,564][02144] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-30 14:09:48,565][02144] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-30 14:09:48,566][02144] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-30 14:09:48,567][02144] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-30 14:09:48,601][02144] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-30 14:09:48,605][02144] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 14:09:48,607][02144] RunningMeanStd input shape: (1,) [2024-09-30 14:09:48,623][02144] ConvEncoder: input_channels=3 [2024-09-30 14:09:48,724][02144] Conv encoder output size: 512 [2024-09-30 14:09:48,726][02144] Policy head output size: 512 [2024-09-30 14:09:49,009][02144] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 14:09:49,825][02144] Num frames 100... [2024-09-30 14:09:49,955][02144] Num frames 200... [2024-09-30 14:09:50,078][02144] Num frames 300... [2024-09-30 14:09:50,202][02144] Num frames 400... [2024-09-30 14:09:50,332][02144] Num frames 500... [2024-09-30 14:09:50,453][02144] Num frames 600... [2024-09-30 14:09:50,519][02144] Avg episode rewards: #0: 11.080, true rewards: #0: 6.080 [2024-09-30 14:09:50,522][02144] Avg episode reward: 11.080, avg true_objective: 6.080 [2024-09-30 14:09:50,636][02144] Num frames 700... [2024-09-30 14:09:50,760][02144] Num frames 800... [2024-09-30 14:09:50,893][02144] Num frames 900... [2024-09-30 14:09:51,021][02144] Num frames 1000... [2024-09-30 14:09:51,152][02144] Num frames 1100... [2024-09-30 14:09:51,280][02144] Num frames 1200... [2024-09-30 14:09:51,408][02144] Num frames 1300... [2024-09-30 14:09:51,530][02144] Num frames 1400... [2024-09-30 14:09:51,653][02144] Num frames 1500... [2024-09-30 14:09:51,820][02144] Num frames 1600... [2024-09-30 14:09:51,991][02144] Num frames 1700... [2024-09-30 14:09:52,166][02144] Num frames 1800... [2024-09-30 14:09:52,338][02144] Num frames 1900... [2024-09-30 14:09:52,540][02144] Num frames 2000... [2024-09-30 14:09:52,723][02144] Num frames 2100... [2024-09-30 14:09:52,905][02144] Num frames 2200... [2024-09-30 14:09:53,096][02144] Num frames 2300... [2024-09-30 14:09:53,270][02144] Avg episode rewards: #0: 26.340, true rewards: #0: 11.840 [2024-09-30 14:09:53,273][02144] Avg episode reward: 26.340, avg true_objective: 11.840 [2024-09-30 14:09:53,343][02144] Num frames 2400... [2024-09-30 14:09:53,523][02144] Num frames 2500... [2024-09-30 14:09:53,697][02144] Num frames 2600... [2024-09-30 14:09:53,872][02144] Num frames 2700... [2024-09-30 14:09:54,056][02144] Num frames 2800... [2024-09-30 14:09:54,239][02144] Num frames 2900... [2024-09-30 14:09:54,366][02144] Num frames 3000... [2024-09-30 14:09:54,496][02144] Num frames 3100... [2024-09-30 14:09:54,619][02144] Num frames 3200... [2024-09-30 14:09:54,742][02144] Num frames 3300... [2024-09-30 14:09:54,879][02144] Num frames 3400... [2024-09-30 14:09:55,003][02144] Num frames 3500... [2024-09-30 14:09:55,130][02144] Num frames 3600... [2024-09-30 14:09:55,256][02144] Num frames 3700... [2024-09-30 14:09:55,404][02144] Avg episode rewards: #0: 28.586, true rewards: #0: 12.587 [2024-09-30 14:09:55,405][02144] Avg episode reward: 28.586, avg true_objective: 12.587 [2024-09-30 14:09:55,439][02144] Num frames 3800... [2024-09-30 14:09:55,566][02144] Num frames 3900... [2024-09-30 14:09:55,686][02144] Num frames 4000... [2024-09-30 14:09:55,817][02144] Num frames 4100... [2024-09-30 14:09:55,950][02144] Num frames 4200... [2024-09-30 14:09:56,076][02144] Num frames 4300... [2024-09-30 14:09:56,196][02144] Num frames 4400... [2024-09-30 14:09:56,320][02144] Num frames 4500... [2024-09-30 14:09:56,391][02144] Avg episode rewards: #0: 24.530, true rewards: #0: 11.280 [2024-09-30 14:09:56,392][02144] Avg episode reward: 24.530, avg true_objective: 11.280 [2024-09-30 14:09:56,503][02144] Num frames 4600... [2024-09-30 14:09:56,633][02144] Num frames 4700... [2024-09-30 14:09:56,758][02144] Num frames 4800... [2024-09-30 14:09:56,904][02144] Avg episode rewards: #0: 20.528, true rewards: #0: 9.728 [2024-09-30 14:09:56,906][02144] Avg episode reward: 20.528, avg true_objective: 9.728 [2024-09-30 14:09:56,958][02144] Num frames 4900... [2024-09-30 14:09:57,090][02144] Num frames 5000... [2024-09-30 14:09:57,215][02144] Num frames 5100... [2024-09-30 14:09:57,337][02144] Num frames 5200... [2024-09-30 14:09:57,460][02144] Num frames 5300... [2024-09-30 14:09:57,590][02144] Num frames 5400... [2024-09-30 14:09:57,712][02144] Num frames 5500... [2024-09-30 14:09:57,846][02144] Num frames 5600... [2024-09-30 14:09:57,976][02144] Num frames 5700... [2024-09-30 14:09:58,105][02144] Num frames 5800... [2024-09-30 14:09:58,231][02144] Num frames 5900... [2024-09-30 14:09:58,359][02144] Num frames 6000... [2024-09-30 14:09:58,478][02144] Num frames 6100... [2024-09-30 14:09:58,606][02144] Num frames 6200... [2024-09-30 14:09:58,738][02144] Num frames 6300... [2024-09-30 14:09:58,873][02144] Num frames 6400... [2024-09-30 14:09:58,995][02144] Num frames 6500... [2024-09-30 14:09:59,121][02144] Num frames 6600... [2024-09-30 14:09:59,243][02144] Num frames 6700... [2024-09-30 14:09:59,365][02144] Num frames 6800... [2024-09-30 14:09:59,487][02144] Num frames 6900... [2024-09-30 14:09:59,600][02144] Avg episode rewards: #0: 26.406, true rewards: #0: 11.573 [2024-09-30 14:09:59,604][02144] Avg episode reward: 26.406, avg true_objective: 11.573 [2024-09-30 14:09:59,675][02144] Num frames 7000... [2024-09-30 14:09:59,795][02144] Num frames 7100... [2024-09-30 14:09:59,910][02144] Avg episode rewards: #0: 23.345, true rewards: #0: 10.203 [2024-09-30 14:09:59,913][02144] Avg episode reward: 23.345, avg true_objective: 10.203 [2024-09-30 14:09:59,985][02144] Num frames 7200... [2024-09-30 14:10:00,110][02144] Num frames 7300... [2024-09-30 14:10:00,236][02144] Num frames 7400... [2024-09-30 14:10:00,360][02144] Num frames 7500... [2024-09-30 14:10:00,484][02144] Num frames 7600... [2024-09-30 14:10:00,614][02144] Num frames 7700... [2024-09-30 14:10:00,736][02144] Num frames 7800... [2024-09-30 14:10:00,871][02144] Num frames 7900... [2024-09-30 14:10:01,003][02144] Num frames 8000... [2024-09-30 14:10:01,068][02144] Avg episode rewards: #0: 23.007, true rewards: #0: 10.007 [2024-09-30 14:10:01,070][02144] Avg episode reward: 23.007, avg true_objective: 10.007 [2024-09-30 14:10:01,189][02144] Num frames 8100... [2024-09-30 14:10:01,313][02144] Num frames 8200... [2024-09-30 14:10:01,437][02144] Num frames 8300... [2024-09-30 14:10:01,559][02144] Num frames 8400... [2024-09-30 14:10:01,690][02144] Num frames 8500... [2024-09-30 14:10:01,817][02144] Num frames 8600... [2024-09-30 14:10:01,947][02144] Num frames 8700... [2024-09-30 14:10:02,077][02144] Num frames 8800... [2024-09-30 14:10:02,165][02144] Avg episode rewards: #0: 22.027, true rewards: #0: 9.804 [2024-09-30 14:10:02,167][02144] Avg episode reward: 22.027, avg true_objective: 9.804 [2024-09-30 14:10:02,260][02144] Num frames 8900... [2024-09-30 14:10:02,382][02144] Num frames 9000... [2024-09-30 14:10:02,507][02144] Num frames 9100... [2024-09-30 14:10:02,634][02144] Num frames 9200... [2024-09-30 14:10:02,798][02144] Avg episode rewards: #0: 20.677, true rewards: #0: 9.277 [2024-09-30 14:10:02,799][02144] Avg episode reward: 20.677, avg true_objective: 9.277 [2024-09-30 14:10:58,713][02144] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-30 14:13:19,745][02144] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-30 14:13:19,750][02144] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-30 14:13:19,754][02144] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-30 14:13:19,758][02144] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-30 14:13:19,764][02144] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-30 14:13:19,768][02144] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-30 14:13:19,770][02144] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-30 14:13:19,780][02144] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-30 14:13:19,787][02144] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-30 14:13:19,791][02144] Adding new argument 'hf_repository'='caiiofc/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-30 14:13:19,795][02144] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-30 14:13:19,797][02144] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-30 14:13:19,800][02144] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-30 14:13:19,812][02144] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-30 14:13:19,815][02144] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-30 14:13:19,911][02144] RunningMeanStd input shape: (3, 72, 128) [2024-09-30 14:13:19,917][02144] RunningMeanStd input shape: (1,) [2024-09-30 14:13:19,936][02144] ConvEncoder: input_channels=3 [2024-09-30 14:13:20,002][02144] Conv encoder output size: 512 [2024-09-30 14:13:20,007][02144] Policy head output size: 512 [2024-09-30 14:13:20,047][02144] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-30 14:13:20,562][02144] Num frames 100... [2024-09-30 14:13:20,687][02144] Num frames 200... [2024-09-30 14:13:20,811][02144] Num frames 300... [2024-09-30 14:13:20,954][02144] Num frames 400... [2024-09-30 14:13:21,080][02144] Num frames 500... [2024-09-30 14:13:21,203][02144] Num frames 600... [2024-09-30 14:13:21,333][02144] Num frames 700... [2024-09-30 14:13:21,456][02144] Num frames 800... [2024-09-30 14:13:21,582][02144] Num frames 900... [2024-09-30 14:13:21,708][02144] Num frames 1000... [2024-09-30 14:13:21,812][02144] Avg episode rewards: #0: 25.390, true rewards: #0: 10.390 [2024-09-30 14:13:21,813][02144] Avg episode reward: 25.390, avg true_objective: 10.390 [2024-09-30 14:13:21,899][02144] Num frames 1100... [2024-09-30 14:13:22,022][02144] Num frames 1200... [2024-09-30 14:13:22,148][02144] Num frames 1300... [2024-09-30 14:13:22,281][02144] Num frames 1400... [2024-09-30 14:13:22,403][02144] Num frames 1500... [2024-09-30 14:13:22,524][02144] Num frames 1600... [2024-09-30 14:13:22,643][02144] Num frames 1700... [2024-09-30 14:13:22,764][02144] Num frames 1800... [2024-09-30 14:13:22,832][02144] Avg episode rewards: #0: 20.035, true rewards: #0: 9.035 [2024-09-30 14:13:22,834][02144] Avg episode reward: 20.035, avg true_objective: 9.035 [2024-09-30 14:13:22,952][02144] Num frames 1900... [2024-09-30 14:13:23,074][02144] Num frames 2000... [2024-09-30 14:13:23,193][02144] Num frames 2100... [2024-09-30 14:13:23,321][02144] Num frames 2200... [2024-09-30 14:13:23,444][02144] Num frames 2300... [2024-09-30 14:13:23,571][02144] Num frames 2400... [2024-09-30 14:13:23,694][02144] Num frames 2500... [2024-09-30 14:13:23,817][02144] Num frames 2600... [2024-09-30 14:13:23,947][02144] Num frames 2700... [2024-09-30 14:13:24,073][02144] Num frames 2800... [2024-09-30 14:13:24,197][02144] Num frames 2900... [2024-09-30 14:13:24,333][02144] Num frames 3000... [2024-09-30 14:13:24,460][02144] Num frames 3100... [2024-09-30 14:13:24,586][02144] Num frames 3200... [2024-09-30 14:13:24,742][02144] Avg episode rewards: #0: 24.597, true rewards: #0: 10.930 [2024-09-30 14:13:24,744][02144] Avg episode reward: 24.597, avg true_objective: 10.930 [2024-09-30 14:13:24,774][02144] Num frames 3300... [2024-09-30 14:13:24,907][02144] Num frames 3400... [2024-09-30 14:13:25,038][02144] Num frames 3500... [2024-09-30 14:13:25,159][02144] Num frames 3600... [2024-09-30 14:13:25,284][02144] Num frames 3700... [2024-09-30 14:13:25,411][02144] Num frames 3800... [2024-09-30 14:13:25,534][02144] Num frames 3900... [2024-09-30 14:13:25,658][02144] Num frames 4000... [2024-09-30 14:13:25,732][02144] Avg episode rewards: #0: 22.788, true rewards: #0: 10.037 [2024-09-30 14:13:25,734][02144] Avg episode reward: 22.788, avg true_objective: 10.037 [2024-09-30 14:13:25,849][02144] Num frames 4100... [2024-09-30 14:13:25,970][02144] Num frames 4200... [2024-09-30 14:13:26,102][02144] Num frames 4300... [2024-09-30 14:13:26,275][02144] Num frames 4400... [2024-09-30 14:13:26,457][02144] Num frames 4500... [2024-09-30 14:13:26,625][02144] Num frames 4600... [2024-09-30 14:13:26,801][02144] Num frames 4700... [2024-09-30 14:13:26,970][02144] Num frames 4800... [2024-09-30 14:13:27,145][02144] Num frames 4900... [2024-09-30 14:13:27,310][02144] Num frames 5000... [2024-09-30 14:13:27,491][02144] Num frames 5100... [2024-09-30 14:13:27,664][02144] Num frames 5200... [2024-09-30 14:13:27,808][02144] Avg episode rewards: #0: 24.502, true rewards: #0: 10.502 [2024-09-30 14:13:27,811][02144] Avg episode reward: 24.502, avg true_objective: 10.502 [2024-09-30 14:13:27,902][02144] Num frames 5300... [2024-09-30 14:13:28,079][02144] Num frames 5400... [2024-09-30 14:13:28,256][02144] Num frames 5500... [2024-09-30 14:13:28,428][02144] Num frames 5600... [2024-09-30 14:13:28,615][02144] Num frames 5700... [2024-09-30 14:13:28,737][02144] Num frames 5800... [2024-09-30 14:13:28,842][02144] Avg episode rewards: #0: 22.067, true rewards: #0: 9.733 [2024-09-30 14:13:28,844][02144] Avg episode reward: 22.067, avg true_objective: 9.733 [2024-09-30 14:13:28,919][02144] Num frames 5900... [2024-09-30 14:13:29,050][02144] Num frames 6000... [2024-09-30 14:13:29,170][02144] Num frames 6100... [2024-09-30 14:13:29,291][02144] Num frames 6200... [2024-09-30 14:13:29,417][02144] Num frames 6300... [2024-09-30 14:13:29,545][02144] Num frames 6400... [2024-09-30 14:13:29,668][02144] Num frames 6500... [2024-09-30 14:13:29,793][02144] Num frames 6600... [2024-09-30 14:13:29,924][02144] Num frames 6700... [2024-09-30 14:13:30,061][02144] Avg episode rewards: #0: 22.097, true rewards: #0: 9.669 [2024-09-30 14:13:30,064][02144] Avg episode reward: 22.097, avg true_objective: 9.669 [2024-09-30 14:13:30,105][02144] Num frames 6800... [2024-09-30 14:13:30,225][02144] Num frames 6900... [2024-09-30 14:13:30,351][02144] Num frames 7000... [2024-09-30 14:13:30,476][02144] Num frames 7100... [2024-09-30 14:13:30,608][02144] Num frames 7200... [2024-09-30 14:13:30,730][02144] Num frames 7300... [2024-09-30 14:13:30,862][02144] Num frames 7400... [2024-09-30 14:13:30,986][02144] Num frames 7500... [2024-09-30 14:13:31,114][02144] Num frames 7600... [2024-09-30 14:13:31,235][02144] Num frames 7700... [2024-09-30 14:13:31,357][02144] Num frames 7800... [2024-09-30 14:13:31,507][02144] Avg episode rewards: #0: 22.473, true rewards: #0: 9.847 [2024-09-30 14:13:31,509][02144] Avg episode reward: 22.473, avg true_objective: 9.847 [2024-09-30 14:13:31,539][02144] Num frames 7900... [2024-09-30 14:13:31,667][02144] Num frames 8000... [2024-09-30 14:13:31,789][02144] Num frames 8100... [2024-09-30 14:13:31,920][02144] Num frames 8200... [2024-09-30 14:13:32,041][02144] Num frames 8300... [2024-09-30 14:13:32,163][02144] Num frames 8400... [2024-09-30 14:13:32,290][02144] Num frames 8500... [2024-09-30 14:13:32,412][02144] Num frames 8600... [2024-09-30 14:13:32,532][02144] Num frames 8700... [2024-09-30 14:13:32,662][02144] Num frames 8800... [2024-09-30 14:13:32,787][02144] Num frames 8900... [2024-09-30 14:13:32,918][02144] Num frames 9000... [2024-09-30 14:13:33,043][02144] Num frames 9100... [2024-09-30 14:13:33,166][02144] Num frames 9200... [2024-09-30 14:13:33,288][02144] Num frames 9300... [2024-09-30 14:13:33,417][02144] Num frames 9400... [2024-09-30 14:13:33,539][02144] Num frames 9500... [2024-09-30 14:13:33,675][02144] Num frames 9600... [2024-09-30 14:13:33,796][02144] Num frames 9700... [2024-09-30 14:13:33,934][02144] Num frames 9800... [2024-09-30 14:13:34,058][02144] Num frames 9900... [2024-09-30 14:13:34,213][02144] Avg episode rewards: #0: 26.087, true rewards: #0: 11.087 [2024-09-30 14:13:34,215][02144] Avg episode reward: 26.087, avg true_objective: 11.087 [2024-09-30 14:13:34,247][02144] Num frames 10000... [2024-09-30 14:13:34,374][02144] Num frames 10100... [2024-09-30 14:13:34,496][02144] Num frames 10200... [2024-09-30 14:13:34,621][02144] Num frames 10300... [2024-09-30 14:13:34,752][02144] Num frames 10400... [2024-09-30 14:13:34,892][02144] Num frames 10500... [2024-09-30 14:13:35,018][02144] Num frames 10600... [2024-09-30 14:13:35,145][02144] Num frames 10700... [2024-09-30 14:13:35,305][02144] Avg episode rewards: #0: 25.186, true rewards: #0: 10.786 [2024-09-30 14:13:35,307][02144] Avg episode reward: 25.186, avg true_objective: 10.786 [2024-09-30 14:14:39,327][02144] Replay video saved to /content/train_dir/default_experiment/replay.mp4!