[2023-12-27 05:17:03,044][00255] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-12-27 05:17:03,046][00255] Rollout worker 0 uses device cpu [2023-12-27 05:17:03,048][00255] Rollout worker 1 uses device cpu [2023-12-27 05:17:03,049][00255] Rollout worker 2 uses device cpu [2023-12-27 05:17:03,050][00255] Rollout worker 3 uses device cpu [2023-12-27 05:17:03,056][00255] Rollout worker 4 uses device cpu [2023-12-27 05:17:03,057][00255] Rollout worker 5 uses device cpu [2023-12-27 05:17:03,058][00255] Rollout worker 6 uses device cpu [2023-12-27 05:17:03,059][00255] Rollout worker 7 uses device cpu [2023-12-27 05:17:03,252][00255] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-27 05:17:03,253][00255] InferenceWorker_p0-w0: min num requests: 2 [2023-12-27 05:17:03,288][00255] Starting all processes... [2023-12-27 05:17:03,290][00255] Starting process learner_proc0 [2023-12-27 05:17:03,339][00255] Starting all processes... [2023-12-27 05:17:03,348][00255] Starting process inference_proc0-0 [2023-12-27 05:17:03,348][00255] Starting process rollout_proc0 [2023-12-27 05:17:03,350][00255] Starting process rollout_proc1 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc2 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc3 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc4 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc5 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc6 [2023-12-27 05:17:03,351][00255] Starting process rollout_proc7 [2023-12-27 05:17:19,993][01473] Worker 2 uses CPU cores [0] [2023-12-27 05:17:20,000][01457] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-27 05:17:20,000][01457] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-12-27 05:17:20,074][01476] Worker 5 uses CPU cores [1] [2023-12-27 05:17:20,077][01457] Num visible devices: 1 [2023-12-27 05:17:20,120][01457] Starting seed is not provided [2023-12-27 05:17:20,121][01457] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-27 05:17:20,121][01457] Initializing actor-critic model on device cuda:0 [2023-12-27 05:17:20,122][01457] RunningMeanStd input shape: (3, 72, 128) [2023-12-27 05:17:20,124][01457] RunningMeanStd input shape: (1,) [2023-12-27 05:17:20,218][01457] ConvEncoder: input_channels=3 [2023-12-27 05:17:20,246][01471] Worker 0 uses CPU cores [0] [2023-12-27 05:17:20,268][01470] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-27 05:17:20,282][01470] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-12-27 05:17:20,422][01470] Num visible devices: 1 [2023-12-27 05:17:20,450][01477] Worker 6 uses CPU cores [0] [2023-12-27 05:17:20,511][01475] Worker 4 uses CPU cores [0] [2023-12-27 05:17:20,586][01472] Worker 1 uses CPU cores [1] [2023-12-27 05:17:20,589][01474] Worker 3 uses CPU cores [1] [2023-12-27 05:17:20,648][01478] Worker 7 uses CPU cores [1] [2023-12-27 05:17:20,697][01457] Conv encoder output size: 512 [2023-12-27 05:17:20,698][01457] Policy head output size: 512 [2023-12-27 05:17:20,756][01457] Created Actor Critic model with architecture: [2023-12-27 05:17:20,756][01457] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-12-27 05:17:21,071][01457] Using optimizer [2023-12-27 05:17:22,299][01457] No checkpoints found [2023-12-27 05:17:22,300][01457] Did not load from checkpoint, starting from scratch! [2023-12-27 05:17:22,300][01457] Initialized policy 0 weights for model version 0 [2023-12-27 05:17:22,304][01457] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-12-27 05:17:22,311][01457] LearnerWorker_p0 finished initialization! [2023-12-27 05:17:22,400][01470] RunningMeanStd input shape: (3, 72, 128) [2023-12-27 05:17:22,402][01470] RunningMeanStd input shape: (1,) [2023-12-27 05:17:22,414][01470] ConvEncoder: input_channels=3 [2023-12-27 05:17:22,514][01470] Conv encoder output size: 512 [2023-12-27 05:17:22,515][01470] Policy head output size: 512 [2023-12-27 05:17:22,580][00255] Inference worker 0-0 is ready! [2023-12-27 05:17:22,582][00255] All inference workers are ready! Signal rollout workers to start! [2023-12-27 05:17:22,810][01478] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,812][01472] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,814][01474] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,816][01476] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,812][01471] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,809][01475] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,816][01473] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:22,819][01477] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:17:23,244][00255] Heartbeat connected on Batcher_0 [2023-12-27 05:17:23,250][00255] Heartbeat connected on LearnerWorker_p0 [2023-12-27 05:17:23,286][00255] Heartbeat connected on InferenceWorker_p0-w0 [2023-12-27 05:17:23,836][01475] Decorrelating experience for 0 frames... [2023-12-27 05:17:23,838][01471] Decorrelating experience for 0 frames... [2023-12-27 05:17:24,155][01474] Decorrelating experience for 0 frames... [2023-12-27 05:17:24,160][01478] Decorrelating experience for 0 frames... [2023-12-27 05:17:24,159][01476] Decorrelating experience for 0 frames... [2023-12-27 05:17:24,827][01471] Decorrelating experience for 32 frames... [2023-12-27 05:17:24,830][01475] Decorrelating experience for 32 frames... [2023-12-27 05:17:25,314][01477] Decorrelating experience for 0 frames... [2023-12-27 05:17:25,400][01476] Decorrelating experience for 32 frames... [2023-12-27 05:17:25,397][01474] Decorrelating experience for 32 frames... [2023-12-27 05:17:25,419][01472] Decorrelating experience for 0 frames... [2023-12-27 05:17:26,261][01478] Decorrelating experience for 32 frames... [2023-12-27 05:17:26,551][01475] Decorrelating experience for 64 frames... [2023-12-27 05:17:26,660][01472] Decorrelating experience for 32 frames... [2023-12-27 05:17:26,865][01477] Decorrelating experience for 32 frames... [2023-12-27 05:17:26,913][01473] Decorrelating experience for 0 frames... [2023-12-27 05:17:26,950][00255] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-27 05:17:26,975][01476] Decorrelating experience for 64 frames... [2023-12-27 05:17:27,053][01471] Decorrelating experience for 64 frames... [2023-12-27 05:17:27,933][01478] Decorrelating experience for 64 frames... [2023-12-27 05:17:28,125][01474] Decorrelating experience for 64 frames... [2023-12-27 05:17:28,198][01472] Decorrelating experience for 64 frames... [2023-12-27 05:17:28,482][01475] Decorrelating experience for 96 frames... [2023-12-27 05:17:28,546][01473] Decorrelating experience for 32 frames... [2023-12-27 05:17:28,765][00255] Heartbeat connected on RolloutWorker_w4 [2023-12-27 05:17:28,919][01471] Decorrelating experience for 96 frames... [2023-12-27 05:17:29,087][01477] Decorrelating experience for 64 frames... [2023-12-27 05:17:29,544][00255] Heartbeat connected on RolloutWorker_w0 [2023-12-27 05:17:30,304][01476] Decorrelating experience for 96 frames... [2023-12-27 05:17:30,472][01474] Decorrelating experience for 96 frames... [2023-12-27 05:17:30,833][00255] Heartbeat connected on RolloutWorker_w5 [2023-12-27 05:17:30,969][00255] Heartbeat connected on RolloutWorker_w3 [2023-12-27 05:17:31,873][01478] Decorrelating experience for 96 frames... [2023-12-27 05:17:31,951][00255] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 50.8. Samples: 254. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-27 05:17:31,953][00255] Avg episode reward: [(0, '0.640')] [2023-12-27 05:17:32,055][01472] Decorrelating experience for 96 frames... [2023-12-27 05:17:32,128][01473] Decorrelating experience for 64 frames... [2023-12-27 05:17:32,553][00255] Heartbeat connected on RolloutWorker_w7 [2023-12-27 05:17:33,051][00255] Heartbeat connected on RolloutWorker_w1 [2023-12-27 05:17:36,595][01477] Decorrelating experience for 96 frames... [2023-12-27 05:17:36,951][00255] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 111.0. Samples: 1110. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-12-27 05:17:36,958][00255] Avg episode reward: [(0, '2.331')] [2023-12-27 05:17:37,514][00255] Heartbeat connected on RolloutWorker_w6 [2023-12-27 05:17:37,596][01473] Decorrelating experience for 96 frames... [2023-12-27 05:17:38,522][00255] Heartbeat connected on RolloutWorker_w2 [2023-12-27 05:17:39,501][01457] Signal inference workers to stop experience collection... [2023-12-27 05:17:39,514][01470] InferenceWorker_p0-w0: stopping experience collection [2023-12-27 05:17:41,102][01457] Signal inference workers to resume experience collection... [2023-12-27 05:17:41,105][01470] InferenceWorker_p0-w0: resuming experience collection [2023-12-27 05:17:41,951][00255] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 194.5. Samples: 2918. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-12-27 05:17:41,959][00255] Avg episode reward: [(0, '2.863')] [2023-12-27 05:17:46,950][00255] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 365.8. Samples: 7316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-12-27 05:17:46,961][00255] Avg episode reward: [(0, '3.513')] [2023-12-27 05:17:51,950][00255] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 337.5. Samples: 8438. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-27 05:17:51,952][00255] Avg episode reward: [(0, '3.727')] [2023-12-27 05:17:53,410][01470] Updated weights for policy 0, policy_version 10 (0.0194) [2023-12-27 05:17:56,953][00255] Fps is (10 sec: 1638.0, 60 sec: 1501.8, 300 sec: 1501.8). Total num frames: 45056. Throughput: 0: 386.5. Samples: 11596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-27 05:17:56,958][00255] Avg episode reward: [(0, '4.055')] [2023-12-27 05:18:01,951][00255] Fps is (10 sec: 2457.5, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 450.6. Samples: 15772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:18:01,953][00255] Avg episode reward: [(0, '4.328')] [2023-12-27 05:18:06,697][01470] Updated weights for policy 0, policy_version 20 (0.0068) [2023-12-27 05:18:06,951][00255] Fps is (10 sec: 3687.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 470.5. Samples: 18822. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-27 05:18:06,953][00255] Avg episode reward: [(0, '4.475')] [2023-12-27 05:18:11,954][00255] Fps is (10 sec: 3685.4, 60 sec: 2184.4, 300 sec: 2184.4). Total num frames: 98304. Throughput: 0: 541.3. Samples: 24362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:18:11,961][00255] Avg episode reward: [(0, '4.439')] [2023-12-27 05:18:16,954][00255] Fps is (10 sec: 2866.3, 60 sec: 2211.7, 300 sec: 2211.7). Total num frames: 110592. Throughput: 0: 623.3. Samples: 28306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:18:16,956][00255] Avg episode reward: [(0, '4.344')] [2023-12-27 05:18:16,964][01457] Saving new best policy, reward=4.344! [2023-12-27 05:18:21,093][01470] Updated weights for policy 0, policy_version 30 (0.0031) [2023-12-27 05:18:21,951][00255] Fps is (10 sec: 2458.4, 60 sec: 2234.2, 300 sec: 2234.2). Total num frames: 122880. Throughput: 0: 646.2. Samples: 30188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:18:21,958][00255] Avg episode reward: [(0, '4.396')] [2023-12-27 05:18:21,963][01457] Saving new best policy, reward=4.396! [2023-12-27 05:18:26,950][00255] Fps is (10 sec: 3277.8, 60 sec: 2389.3, 300 sec: 2389.3). Total num frames: 143360. Throughput: 0: 729.6. Samples: 35750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:18:26,957][00255] Avg episode reward: [(0, '4.449')] [2023-12-27 05:18:26,972][01457] Saving new best policy, reward=4.449! [2023-12-27 05:18:31,596][01470] Updated weights for policy 0, policy_version 40 (0.0019) [2023-12-27 05:18:31,958][00255] Fps is (10 sec: 4093.0, 60 sec: 2730.3, 300 sec: 2520.3). Total num frames: 163840. Throughput: 0: 757.5. Samples: 41408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:18:31,961][00255] Avg episode reward: [(0, '4.386')] [2023-12-27 05:18:36,950][00255] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2516.1). Total num frames: 176128. Throughput: 0: 775.5. Samples: 43334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:18:36,959][00255] Avg episode reward: [(0, '4.497')] [2023-12-27 05:18:36,974][01457] Saving new best policy, reward=4.497! [2023-12-27 05:18:41,951][00255] Fps is (10 sec: 2459.4, 60 sec: 3072.0, 300 sec: 2512.2). Total num frames: 188416. Throughput: 0: 789.8. Samples: 47134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:18:41,957][00255] Avg episode reward: [(0, '4.517')] [2023-12-27 05:18:41,965][01457] Saving new best policy, reward=4.517! [2023-12-27 05:18:45,740][01470] Updated weights for policy 0, policy_version 50 (0.0029) [2023-12-27 05:18:46,951][00255] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2560.0). Total num frames: 204800. Throughput: 0: 816.6. Samples: 52518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:18:46,958][00255] Avg episode reward: [(0, '4.486')] [2023-12-27 05:18:51,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2650.4). Total num frames: 225280. Throughput: 0: 807.3. Samples: 55152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:18:51,956][00255] Avg episode reward: [(0, '4.421')] [2023-12-27 05:18:56,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 2639.6). Total num frames: 237568. Throughput: 0: 789.8. Samples: 59902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:18:56,955][00255] Avg episode reward: [(0, '4.566')] [2023-12-27 05:18:56,968][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth... [2023-12-27 05:18:57,152][01457] Saving new best policy, reward=4.566! [2023-12-27 05:18:58,801][01470] Updated weights for policy 0, policy_version 60 (0.0026) [2023-12-27 05:19:01,954][00255] Fps is (10 sec: 2456.8, 60 sec: 3140.1, 300 sec: 2630.0). Total num frames: 249856. Throughput: 0: 790.2. Samples: 63866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:19:01,956][00255] Avg episode reward: [(0, '4.678')] [2023-12-27 05:19:02,001][01457] Saving new best policy, reward=4.678! [2023-12-27 05:19:06,951][00255] Fps is (10 sec: 3276.7, 60 sec: 3140.2, 300 sec: 2703.3). Total num frames: 270336. Throughput: 0: 799.6. Samples: 66172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:19:06,953][00255] Avg episode reward: [(0, '4.631')] [2023-12-27 05:19:10,452][01470] Updated weights for policy 0, policy_version 70 (0.0032) [2023-12-27 05:19:11,950][00255] Fps is (10 sec: 4097.3, 60 sec: 3208.7, 300 sec: 2769.7). Total num frames: 290816. Throughput: 0: 810.2. Samples: 72210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:19:11,953][00255] Avg episode reward: [(0, '4.506')] [2023-12-27 05:19:16,952][00255] Fps is (10 sec: 3276.6, 60 sec: 3208.6, 300 sec: 2755.5). Total num frames: 303104. Throughput: 0: 790.4. Samples: 76970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:19:16,954][00255] Avg episode reward: [(0, '4.476')] [2023-12-27 05:19:21,951][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2742.5). Total num frames: 315392. Throughput: 0: 790.1. Samples: 78888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:19:21,953][00255] Avg episode reward: [(0, '4.537')] [2023-12-27 05:19:25,139][01470] Updated weights for policy 0, policy_version 80 (0.0026) [2023-12-27 05:19:26,950][00255] Fps is (10 sec: 2867.5, 60 sec: 3140.3, 300 sec: 2764.8). Total num frames: 331776. Throughput: 0: 795.6. Samples: 82934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:19:26,954][00255] Avg episode reward: [(0, '4.569')] [2023-12-27 05:19:31,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3140.6, 300 sec: 2818.0). Total num frames: 352256. Throughput: 0: 810.9. Samples: 89010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:19:31,953][00255] Avg episode reward: [(0, '4.420')] [2023-12-27 05:19:35,530][01470] Updated weights for policy 0, policy_version 90 (0.0023) [2023-12-27 05:19:36,953][00255] Fps is (10 sec: 3685.7, 60 sec: 3208.4, 300 sec: 2835.6). Total num frames: 368640. Throughput: 0: 820.1. Samples: 92060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:19:36,960][00255] Avg episode reward: [(0, '4.250')] [2023-12-27 05:19:41,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2821.7). Total num frames: 380928. Throughput: 0: 795.6. Samples: 95706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:19:41,958][00255] Avg episode reward: [(0, '4.424')] [2023-12-27 05:19:46,950][00255] Fps is (10 sec: 2867.8, 60 sec: 3208.5, 300 sec: 2837.9). Total num frames: 397312. Throughput: 0: 798.6. Samples: 99802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:19:46,953][00255] Avg episode reward: [(0, '4.420')] [2023-12-27 05:19:49,678][01470] Updated weights for policy 0, policy_version 100 (0.0024) [2023-12-27 05:19:51,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2881.3). Total num frames: 417792. Throughput: 0: 813.9. Samples: 102796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:19:51,960][00255] Avg episode reward: [(0, '4.572')] [2023-12-27 05:19:56,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 2921.8). Total num frames: 438272. Throughput: 0: 816.4. Samples: 108946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:19:56,957][00255] Avg episode reward: [(0, '4.463')] [2023-12-27 05:20:01,955][00255] Fps is (10 sec: 2866.0, 60 sec: 3276.7, 300 sec: 2880.3). Total num frames: 446464. Throughput: 0: 798.2. Samples: 112892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-12-27 05:20:01,957][00255] Avg episode reward: [(0, '4.337')] [2023-12-27 05:20:02,240][01470] Updated weights for policy 0, policy_version 110 (0.0049) [2023-12-27 05:20:06,951][00255] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 2892.8). Total num frames: 462848. Throughput: 0: 798.0. Samples: 114796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:20:06,960][00255] Avg episode reward: [(0, '4.443')] [2023-12-27 05:20:11,950][00255] Fps is (10 sec: 3278.2, 60 sec: 3140.3, 300 sec: 2904.4). Total num frames: 479232. Throughput: 0: 819.6. Samples: 119814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:20:11,952][00255] Avg episode reward: [(0, '4.258')] [2023-12-27 05:20:14,086][01470] Updated weights for policy 0, policy_version 120 (0.0019) [2023-12-27 05:20:16,951][00255] Fps is (10 sec: 3686.5, 60 sec: 3276.9, 300 sec: 2939.5). Total num frames: 499712. Throughput: 0: 820.5. Samples: 125932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:20:16,952][00255] Avg episode reward: [(0, '4.235')] [2023-12-27 05:20:21,952][00255] Fps is (10 sec: 3686.0, 60 sec: 3345.0, 300 sec: 2949.1). Total num frames: 516096. Throughput: 0: 802.8. Samples: 128184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:20:21,960][00255] Avg episode reward: [(0, '4.408')] [2023-12-27 05:20:26,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2935.5). Total num frames: 528384. Throughput: 0: 811.1. Samples: 132206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:20:26,957][00255] Avg episode reward: [(0, '4.466')] [2023-12-27 05:20:27,872][01470] Updated weights for policy 0, policy_version 130 (0.0023) [2023-12-27 05:20:31,950][00255] Fps is (10 sec: 2867.5, 60 sec: 3208.5, 300 sec: 2944.7). Total num frames: 544768. Throughput: 0: 827.6. Samples: 137046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:20:31,954][00255] Avg episode reward: [(0, '4.454')] [2023-12-27 05:20:36,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 2975.0). Total num frames: 565248. Throughput: 0: 830.8. Samples: 140180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:20:36,953][00255] Avg episode reward: [(0, '4.424')] [2023-12-27 05:20:38,178][01470] Updated weights for policy 0, policy_version 140 (0.0026) [2023-12-27 05:20:41,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2982.7). Total num frames: 581632. Throughput: 0: 816.7. Samples: 145696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:20:41,958][00255] Avg episode reward: [(0, '4.494')] [2023-12-27 05:20:46,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2969.6). Total num frames: 593920. Throughput: 0: 814.7. Samples: 149548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:20:46,953][00255] Avg episode reward: [(0, '4.290')] [2023-12-27 05:20:51,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2977.1). Total num frames: 610304. Throughput: 0: 814.3. Samples: 151440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:20:51,956][00255] Avg episode reward: [(0, '4.202')] [2023-12-27 05:20:52,556][01470] Updated weights for policy 0, policy_version 150 (0.0021) [2023-12-27 05:20:56,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3003.7). Total num frames: 630784. Throughput: 0: 824.6. Samples: 156922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:20:56,953][00255] Avg episode reward: [(0, '4.371')] [2023-12-27 05:20:56,965][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000154_630784.pth... [2023-12-27 05:21:01,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3345.3, 300 sec: 3010.1). Total num frames: 647168. Throughput: 0: 814.8. Samples: 162598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:21:01,957][00255] Avg episode reward: [(0, '4.563')] [2023-12-27 05:21:04,467][01470] Updated weights for policy 0, policy_version 160 (0.0048) [2023-12-27 05:21:06,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2997.5). Total num frames: 659456. Throughput: 0: 807.5. Samples: 164520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:21:06,960][00255] Avg episode reward: [(0, '4.435')] [2023-12-27 05:21:11,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 675840. Throughput: 0: 806.3. Samples: 168490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:21:11,953][00255] Avg episode reward: [(0, '4.481')] [2023-12-27 05:21:16,828][01470] Updated weights for policy 0, policy_version 170 (0.0023) [2023-12-27 05:21:16,957][00255] Fps is (10 sec: 3684.2, 60 sec: 3276.5, 300 sec: 3027.4). Total num frames: 696320. Throughput: 0: 829.2. Samples: 174366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:21:16,959][00255] Avg episode reward: [(0, '4.348')] [2023-12-27 05:21:21,950][00255] Fps is (10 sec: 3686.5, 60 sec: 3276.9, 300 sec: 3032.8). Total num frames: 712704. Throughput: 0: 825.4. Samples: 177322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:21:21,954][00255] Avg episode reward: [(0, '4.469')] [2023-12-27 05:21:26,952][00255] Fps is (10 sec: 2868.5, 60 sec: 3276.7, 300 sec: 3020.8). Total num frames: 724992. Throughput: 0: 804.3. Samples: 181892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:21:26,955][00255] Avg episode reward: [(0, '4.511')] [2023-12-27 05:21:30,187][01470] Updated weights for policy 0, policy_version 180 (0.0014) [2023-12-27 05:21:31,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3026.0). Total num frames: 741376. Throughput: 0: 806.6. Samples: 185846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:21:31,956][00255] Avg episode reward: [(0, '4.611')] [2023-12-27 05:21:36,950][00255] Fps is (10 sec: 3686.9, 60 sec: 3276.8, 300 sec: 3047.4). Total num frames: 761856. Throughput: 0: 824.4. Samples: 188540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:21:36,957][00255] Avg episode reward: [(0, '4.678')] [2023-12-27 05:21:40,842][01470] Updated weights for policy 0, policy_version 190 (0.0014) [2023-12-27 05:21:41,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3068.0). Total num frames: 782336. Throughput: 0: 841.8. Samples: 194804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:21:41,957][00255] Avg episode reward: [(0, '4.786')] [2023-12-27 05:21:41,962][01457] Saving new best policy, reward=4.786! [2023-12-27 05:21:46,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3056.2). Total num frames: 794624. Throughput: 0: 819.7. Samples: 199484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:21:46,954][00255] Avg episode reward: [(0, '4.831')] [2023-12-27 05:21:46,974][01457] Saving new best policy, reward=4.831! [2023-12-27 05:21:51,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3045.0). Total num frames: 806912. Throughput: 0: 819.8. Samples: 201410. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:21:51,953][00255] Avg episode reward: [(0, '4.655')] [2023-12-27 05:21:54,830][01470] Updated weights for policy 0, policy_version 200 (0.0034) [2023-12-27 05:21:56,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3064.4). Total num frames: 827392. Throughput: 0: 836.3. Samples: 206122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:21:56,956][00255] Avg episode reward: [(0, '4.698')] [2023-12-27 05:22:01,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3083.2). Total num frames: 847872. Throughput: 0: 845.3. Samples: 212398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:22:01,953][00255] Avg episode reward: [(0, '4.615')] [2023-12-27 05:22:05,428][01470] Updated weights for policy 0, policy_version 210 (0.0017) [2023-12-27 05:22:06,953][00255] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3086.6). Total num frames: 864256. Throughput: 0: 839.8. Samples: 215116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:22:06,960][00255] Avg episode reward: [(0, '4.560')] [2023-12-27 05:22:11,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3075.6). Total num frames: 876544. Throughput: 0: 826.3. Samples: 219074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:22:11,955][00255] Avg episode reward: [(0, '4.567')] [2023-12-27 05:22:16,951][00255] Fps is (10 sec: 2867.8, 60 sec: 3277.1, 300 sec: 3079.1). Total num frames: 892928. Throughput: 0: 842.3. Samples: 223748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:22:16,953][00255] Avg episode reward: [(0, '4.556')] [2023-12-27 05:22:18,665][01470] Updated weights for policy 0, policy_version 220 (0.0026) [2023-12-27 05:22:21,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3096.3). Total num frames: 913408. Throughput: 0: 850.0. Samples: 226790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:22:21,960][00255] Avg episode reward: [(0, '4.550')] [2023-12-27 05:22:26,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3151.8). Total num frames: 929792. Throughput: 0: 837.4. Samples: 232486. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:22:26,954][00255] Avg episode reward: [(0, '4.614')] [2023-12-27 05:22:31,225][01470] Updated weights for policy 0, policy_version 230 (0.0027) [2023-12-27 05:22:31,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 942080. Throughput: 0: 819.2. Samples: 236350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:22:31,953][00255] Avg episode reward: [(0, '4.573')] [2023-12-27 05:22:36,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 954368. Throughput: 0: 817.8. Samples: 238210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:22:36,953][00255] Avg episode reward: [(0, '4.598')] [2023-12-27 05:22:41,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 974848. Throughput: 0: 834.7. Samples: 243684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:22:41,953][00255] Avg episode reward: [(0, '4.572')] [2023-12-27 05:22:43,087][01470] Updated weights for policy 0, policy_version 240 (0.0023) [2023-12-27 05:22:46,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 995328. Throughput: 0: 826.6. Samples: 249596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:22:46,955][00255] Avg episode reward: [(0, '4.432')] [2023-12-27 05:22:51,954][00255] Fps is (10 sec: 3275.8, 60 sec: 3344.9, 300 sec: 3262.9). Total num frames: 1007616. Throughput: 0: 809.5. Samples: 251546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:22:51,965][00255] Avg episode reward: [(0, '4.419')] [2023-12-27 05:22:56,951][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1019904. Throughput: 0: 808.1. Samples: 255440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:22:56,953][00255] Avg episode reward: [(0, '4.347')] [2023-12-27 05:22:56,970][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000249_1019904.pth... [2023-12-27 05:22:57,137][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth [2023-12-27 05:22:57,610][01470] Updated weights for policy 0, policy_version 250 (0.0013) [2023-12-27 05:23:01,950][00255] Fps is (10 sec: 3277.9, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1040384. Throughput: 0: 816.0. Samples: 260470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:01,957][00255] Avg episode reward: [(0, '4.476')] [2023-12-27 05:23:06,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3249.1). Total num frames: 1056768. Throughput: 0: 813.3. Samples: 263388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:06,952][00255] Avg episode reward: [(0, '4.557')] [2023-12-27 05:23:08,362][01470] Updated weights for policy 0, policy_version 260 (0.0016) [2023-12-27 05:23:11,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1073152. Throughput: 0: 793.2. Samples: 268182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:23:11,955][00255] Avg episode reward: [(0, '4.600')] [2023-12-27 05:23:16,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1085440. Throughput: 0: 792.5. Samples: 272014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-27 05:23:16,957][00255] Avg episode reward: [(0, '4.668')] [2023-12-27 05:23:21,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 1101824. Throughput: 0: 803.2. Samples: 274354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:21,953][00255] Avg episode reward: [(0, '4.668')] [2023-12-27 05:23:22,122][01470] Updated weights for policy 0, policy_version 270 (0.0032) [2023-12-27 05:23:26,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 1122304. Throughput: 0: 819.5. Samples: 280564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:26,953][00255] Avg episode reward: [(0, '4.513')] [2023-12-27 05:23:31,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1138688. Throughput: 0: 795.4. Samples: 285390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:23:31,953][00255] Avg episode reward: [(0, '4.535')] [2023-12-27 05:23:34,444][01470] Updated weights for policy 0, policy_version 280 (0.0018) [2023-12-27 05:23:36,954][00255] Fps is (10 sec: 2866.4, 60 sec: 3276.6, 300 sec: 3262.9). Total num frames: 1150976. Throughput: 0: 794.8. Samples: 287314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:36,956][00255] Avg episode reward: [(0, '4.549')] [2023-12-27 05:23:41,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1167360. Throughput: 0: 799.7. Samples: 291428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:23:41,953][00255] Avg episode reward: [(0, '4.548')] [2023-12-27 05:23:46,561][01470] Updated weights for policy 0, policy_version 290 (0.0030) [2023-12-27 05:23:46,950][00255] Fps is (10 sec: 3687.5, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1187840. Throughput: 0: 825.1. Samples: 297600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:23:46,953][00255] Avg episode reward: [(0, '4.577')] [2023-12-27 05:23:51,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3277.0, 300 sec: 3276.8). Total num frames: 1204224. Throughput: 0: 824.8. Samples: 300502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:23:51,953][00255] Avg episode reward: [(0, '4.652')] [2023-12-27 05:23:56,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1216512. Throughput: 0: 806.2. Samples: 304462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:23:56,956][00255] Avg episode reward: [(0, '4.614')] [2023-12-27 05:24:00,510][01470] Updated weights for policy 0, policy_version 300 (0.0027) [2023-12-27 05:24:01,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1232896. Throughput: 0: 811.0. Samples: 308510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:24:01,953][00255] Avg episode reward: [(0, '4.801')] [2023-12-27 05:24:06,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1253376. Throughput: 0: 824.3. Samples: 311446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:24:06,959][00255] Avg episode reward: [(0, '4.736')] [2023-12-27 05:24:11,034][01470] Updated weights for policy 0, policy_version 310 (0.0017) [2023-12-27 05:24:11,952][00255] Fps is (10 sec: 3686.0, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1269760. Throughput: 0: 820.8. Samples: 317500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:24:11,954][00255] Avg episode reward: [(0, '4.527')] [2023-12-27 05:24:16,952][00255] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1282048. Throughput: 0: 802.7. Samples: 321514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:24:16,955][00255] Avg episode reward: [(0, '4.559')] [2023-12-27 05:24:21,950][00255] Fps is (10 sec: 2457.9, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1294336. Throughput: 0: 801.7. Samples: 323388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:24:21,956][00255] Avg episode reward: [(0, '4.601')] [2023-12-27 05:24:25,432][01470] Updated weights for policy 0, policy_version 320 (0.0030) [2023-12-27 05:24:26,951][00255] Fps is (10 sec: 3277.3, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1314816. Throughput: 0: 821.5. Samples: 328394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:24:26,957][00255] Avg episode reward: [(0, '4.755')] [2023-12-27 05:24:31,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 1335296. Throughput: 0: 817.6. Samples: 334390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:24:31,952][00255] Avg episode reward: [(0, '4.898')] [2023-12-27 05:24:31,958][01457] Saving new best policy, reward=4.898! [2023-12-27 05:24:36,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3277.0, 300 sec: 3276.8). Total num frames: 1347584. Throughput: 0: 804.1. Samples: 336686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:24:36,953][00255] Avg episode reward: [(0, '4.742')] [2023-12-27 05:24:37,390][01470] Updated weights for policy 0, policy_version 330 (0.0020) [2023-12-27 05:24:41,953][00255] Fps is (10 sec: 2457.1, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 1359872. Throughput: 0: 801.2. Samples: 340516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:24:41,955][00255] Avg episode reward: [(0, '4.712')] [2023-12-27 05:24:46,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1380352. Throughput: 0: 821.6. Samples: 345482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:24:46,953][00255] Avg episode reward: [(0, '4.590')] [2023-12-27 05:24:49,727][01470] Updated weights for policy 0, policy_version 340 (0.0023) [2023-12-27 05:24:51,951][00255] Fps is (10 sec: 4096.9, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1400832. Throughput: 0: 824.4. Samples: 348542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:24:51,953][00255] Avg episode reward: [(0, '4.783')] [2023-12-27 05:24:56,953][00255] Fps is (10 sec: 3276.1, 60 sec: 3276.7, 300 sec: 3276.8). Total num frames: 1413120. Throughput: 0: 807.9. Samples: 353858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:24:56,955][00255] Avg episode reward: [(0, '4.909')] [2023-12-27 05:24:56,967][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth... [2023-12-27 05:24:57,152][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000154_630784.pth [2023-12-27 05:24:57,171][01457] Saving new best policy, reward=4.909! [2023-12-27 05:25:01,953][00255] Fps is (10 sec: 2457.1, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 1425408. Throughput: 0: 801.8. Samples: 357594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:25:01,960][00255] Avg episode reward: [(0, '4.932')] [2023-12-27 05:25:01,967][01457] Saving new best policy, reward=4.932! [2023-12-27 05:25:03,971][01470] Updated weights for policy 0, policy_version 350 (0.0014) [2023-12-27 05:25:06,951][00255] Fps is (10 sec: 2867.8, 60 sec: 3140.3, 300 sec: 3262.9). Total num frames: 1441792. Throughput: 0: 798.8. Samples: 359332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:25:06,953][00255] Avg episode reward: [(0, '4.652')] [2023-12-27 05:25:11,950][00255] Fps is (10 sec: 3687.2, 60 sec: 3208.6, 300 sec: 3262.9). Total num frames: 1462272. Throughput: 0: 816.5. Samples: 365138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:25:11,959][00255] Avg episode reward: [(0, '4.869')] [2023-12-27 05:25:14,629][01470] Updated weights for policy 0, policy_version 360 (0.0032) [2023-12-27 05:25:16,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 1478656. Throughput: 0: 809.4. Samples: 370814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:25:16,953][00255] Avg episode reward: [(0, '5.124')] [2023-12-27 05:25:16,970][01457] Saving new best policy, reward=5.124! [2023-12-27 05:25:21,952][00255] Fps is (10 sec: 2866.7, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 1490944. Throughput: 0: 799.4. Samples: 372658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:25:21,956][00255] Avg episode reward: [(0, '5.021')] [2023-12-27 05:25:26,951][00255] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 1507328. Throughput: 0: 799.2. Samples: 376478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:25:26,956][00255] Avg episode reward: [(0, '5.106')] [2023-12-27 05:25:28,910][01470] Updated weights for policy 0, policy_version 370 (0.0020) [2023-12-27 05:25:31,950][00255] Fps is (10 sec: 3277.3, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 1523712. Throughput: 0: 813.6. Samples: 382094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:25:31,957][00255] Avg episode reward: [(0, '5.133')] [2023-12-27 05:25:31,973][01457] Saving new best policy, reward=5.133! [2023-12-27 05:25:36,952][00255] Fps is (10 sec: 3686.1, 60 sec: 3276.7, 300 sec: 3262.9). Total num frames: 1544192. Throughput: 0: 811.6. Samples: 385064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:25:36,955][00255] Avg episode reward: [(0, '5.146')] [2023-12-27 05:25:36,973][01457] Saving new best policy, reward=5.146! [2023-12-27 05:25:40,409][01470] Updated weights for policy 0, policy_version 380 (0.0025) [2023-12-27 05:25:41,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 1556480. Throughput: 0: 797.3. Samples: 389734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:25:41,955][00255] Avg episode reward: [(0, '5.205')] [2023-12-27 05:25:42,044][01457] Saving new best policy, reward=5.205! [2023-12-27 05:25:46,950][00255] Fps is (10 sec: 2457.9, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 1568768. Throughput: 0: 800.0. Samples: 393592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:25:46,956][00255] Avg episode reward: [(0, '5.145')] [2023-12-27 05:25:51,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 1589248. Throughput: 0: 815.4. Samples: 396024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:25:51,957][00255] Avg episode reward: [(0, '5.038')] [2023-12-27 05:25:53,291][01470] Updated weights for policy 0, policy_version 390 (0.0028) [2023-12-27 05:25:56,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 1609728. Throughput: 0: 823.9. Samples: 402214. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:25:56,959][00255] Avg episode reward: [(0, '4.940')] [2023-12-27 05:26:01,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 1622016. Throughput: 0: 800.7. Samples: 406846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:26:01,953][00255] Avg episode reward: [(0, '5.062')] [2023-12-27 05:26:06,482][01470] Updated weights for policy 0, policy_version 400 (0.0023) [2023-12-27 05:26:06,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1638400. Throughput: 0: 804.3. Samples: 408852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:26:06,953][00255] Avg episode reward: [(0, '4.962')] [2023-12-27 05:26:11,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 1654784. Throughput: 0: 815.0. Samples: 413152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:26:11,953][00255] Avg episode reward: [(0, '4.911')] [2023-12-27 05:26:16,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1675264. Throughput: 0: 826.7. Samples: 419296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:26:16,954][00255] Avg episode reward: [(0, '5.095')] [2023-12-27 05:26:17,647][01470] Updated weights for policy 0, policy_version 410 (0.0027) [2023-12-27 05:26:21,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 1691648. Throughput: 0: 828.4. Samples: 422340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:26:21,957][00255] Avg episode reward: [(0, '5.311')] [2023-12-27 05:26:21,961][01457] Saving new best policy, reward=5.311! [2023-12-27 05:26:26,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1703936. Throughput: 0: 808.7. Samples: 426124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:26:26,955][00255] Avg episode reward: [(0, '5.430')] [2023-12-27 05:26:26,977][01457] Saving new best policy, reward=5.430! [2023-12-27 05:26:31,957][00255] Fps is (10 sec: 2865.2, 60 sec: 3276.4, 300 sec: 3249.0). Total num frames: 1720320. Throughput: 0: 814.9. Samples: 430266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:26:31,960][00255] Avg episode reward: [(0, '5.388')] [2023-12-27 05:26:31,964][01470] Updated weights for policy 0, policy_version 420 (0.0033) [2023-12-27 05:26:36,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 1736704. Throughput: 0: 827.8. Samples: 433276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:26:36,953][00255] Avg episode reward: [(0, '5.960')] [2023-12-27 05:26:36,967][01457] Saving new best policy, reward=5.960! [2023-12-27 05:26:41,950][00255] Fps is (10 sec: 3689.0, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1757184. Throughput: 0: 821.6. Samples: 439186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:26:41,956][00255] Avg episode reward: [(0, '5.921')] [2023-12-27 05:26:43,084][01470] Updated weights for policy 0, policy_version 430 (0.0026) [2023-12-27 05:26:46,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1769472. Throughput: 0: 809.2. Samples: 443258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:26:46,955][00255] Avg episode reward: [(0, '5.926')] [2023-12-27 05:26:51,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1781760. Throughput: 0: 806.2. Samples: 445132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-27 05:26:51,953][00255] Avg episode reward: [(0, '5.682')] [2023-12-27 05:26:56,434][01470] Updated weights for policy 0, policy_version 440 (0.0026) [2023-12-27 05:26:56,951][00255] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1802240. Throughput: 0: 825.9. Samples: 450318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-27 05:26:56,957][00255] Avg episode reward: [(0, '5.745')] [2023-12-27 05:26:56,979][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth... [2023-12-27 05:26:57,126][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000249_1019904.pth [2023-12-27 05:27:01,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3249.1). Total num frames: 1822720. Throughput: 0: 821.1. Samples: 456246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:01,952][00255] Avg episode reward: [(0, '5.611')] [2023-12-27 05:27:06,951][00255] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1835008. Throughput: 0: 802.2. Samples: 458438. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-12-27 05:27:06,955][00255] Avg episode reward: [(0, '5.645')] [2023-12-27 05:27:09,059][01470] Updated weights for policy 0, policy_version 450 (0.0043) [2023-12-27 05:27:11,957][00255] Fps is (10 sec: 2456.1, 60 sec: 3208.2, 300 sec: 3235.1). Total num frames: 1847296. Throughput: 0: 805.0. Samples: 462356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:11,963][00255] Avg episode reward: [(0, '5.403')] [2023-12-27 05:27:16,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1867776. Throughput: 0: 823.7. Samples: 467328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:27:16,953][00255] Avg episode reward: [(0, '5.377')] [2023-12-27 05:27:20,825][01470] Updated weights for policy 0, policy_version 460 (0.0015) [2023-12-27 05:27:21,950][00255] Fps is (10 sec: 4098.5, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1888256. Throughput: 0: 824.2. Samples: 470364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:27:21,957][00255] Avg episode reward: [(0, '5.339')] [2023-12-27 05:27:26,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1900544. Throughput: 0: 810.4. Samples: 475652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:26,955][00255] Avg episode reward: [(0, '5.785')] [2023-12-27 05:27:31,951][00255] Fps is (10 sec: 2457.5, 60 sec: 3208.9, 300 sec: 3249.0). Total num frames: 1912832. Throughput: 0: 803.2. Samples: 479400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:27:31,960][00255] Avg episode reward: [(0, '5.798')] [2023-12-27 05:27:35,264][01470] Updated weights for policy 0, policy_version 470 (0.0037) [2023-12-27 05:27:36,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1929216. Throughput: 0: 804.5. Samples: 481336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:36,958][00255] Avg episode reward: [(0, '5.915')] [2023-12-27 05:27:41,950][00255] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1949696. Throughput: 0: 818.4. Samples: 487144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:27:41,957][00255] Avg episode reward: [(0, '5.742')] [2023-12-27 05:27:45,804][01470] Updated weights for policy 0, policy_version 480 (0.0026) [2023-12-27 05:27:46,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.1). Total num frames: 1966080. Throughput: 0: 808.0. Samples: 492608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:46,956][00255] Avg episode reward: [(0, '5.580')] [2023-12-27 05:27:51,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1978368. Throughput: 0: 801.7. Samples: 494514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:27:51,953][00255] Avg episode reward: [(0, '5.766')] [2023-12-27 05:27:56,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 1994752. Throughput: 0: 799.5. Samples: 498330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:27:56,953][00255] Avg episode reward: [(0, '5.926')] [2023-12-27 05:27:59,739][01470] Updated weights for policy 0, policy_version 490 (0.0025) [2023-12-27 05:28:01,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 2015232. Throughput: 0: 820.2. Samples: 504238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:28:01,959][00255] Avg episode reward: [(0, '6.225')] [2023-12-27 05:28:01,963][01457] Saving new best policy, reward=6.225! [2023-12-27 05:28:06,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2031616. Throughput: 0: 818.4. Samples: 507194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:28:06,953][00255] Avg episode reward: [(0, '6.625')] [2023-12-27 05:28:06,973][01457] Saving new best policy, reward=6.625! [2023-12-27 05:28:11,953][00255] Fps is (10 sec: 2866.6, 60 sec: 3277.0, 300 sec: 3249.0). Total num frames: 2043904. Throughput: 0: 798.8. Samples: 511600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:28:11,957][00255] Avg episode reward: [(0, '6.803')] [2023-12-27 05:28:12,031][01457] Saving new best policy, reward=6.803! [2023-12-27 05:28:12,025][01470] Updated weights for policy 0, policy_version 500 (0.0032) [2023-12-27 05:28:16,953][00255] Fps is (10 sec: 2866.4, 60 sec: 3208.4, 300 sec: 3249.0). Total num frames: 2060288. Throughput: 0: 800.7. Samples: 515432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:28:16,960][00255] Avg episode reward: [(0, '6.459')] [2023-12-27 05:28:21,951][00255] Fps is (10 sec: 3277.5, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 2076672. Throughput: 0: 818.6. Samples: 518174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:28:21,960][00255] Avg episode reward: [(0, '6.296')] [2023-12-27 05:28:24,174][01470] Updated weights for policy 0, policy_version 510 (0.0035) [2023-12-27 05:28:26,950][00255] Fps is (10 sec: 3687.5, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2097152. Throughput: 0: 825.2. Samples: 524280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:28:26,953][00255] Avg episode reward: [(0, '6.423')] [2023-12-27 05:28:31,957][00255] Fps is (10 sec: 3274.8, 60 sec: 3276.5, 300 sec: 3249.0). Total num frames: 2109440. Throughput: 0: 803.0. Samples: 528750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:28:31,959][00255] Avg episode reward: [(0, '6.449')] [2023-12-27 05:28:36,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2125824. Throughput: 0: 804.5. Samples: 530716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:28:36,956][00255] Avg episode reward: [(0, '6.281')] [2023-12-27 05:28:38,242][01470] Updated weights for policy 0, policy_version 520 (0.0023) [2023-12-27 05:28:41,950][00255] Fps is (10 sec: 3278.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2142208. Throughput: 0: 822.2. Samples: 535330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:28:41,959][00255] Avg episode reward: [(0, '5.898')] [2023-12-27 05:28:46,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2162688. Throughput: 0: 825.4. Samples: 541380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:28:46,958][00255] Avg episode reward: [(0, '5.694')] [2023-12-27 05:28:48,507][01470] Updated weights for policy 0, policy_version 530 (0.0033) [2023-12-27 05:28:51,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2179072. Throughput: 0: 819.2. Samples: 544060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:28:51,953][00255] Avg episode reward: [(0, '5.771')] [2023-12-27 05:28:56,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2191360. Throughput: 0: 807.3. Samples: 547926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:28:56,956][00255] Avg episode reward: [(0, '5.680')] [2023-12-27 05:28:56,968][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000535_2191360.pth... [2023-12-27 05:28:57,136][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000345_1413120.pth [2023-12-27 05:29:01,957][00255] Fps is (10 sec: 2865.4, 60 sec: 3208.2, 300 sec: 3235.1). Total num frames: 2207744. Throughput: 0: 822.3. Samples: 552438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:29:01,965][00255] Avg episode reward: [(0, '6.078')] [2023-12-27 05:29:02,785][01470] Updated weights for policy 0, policy_version 540 (0.0033) [2023-12-27 05:29:06,956][00255] Fps is (10 sec: 3275.1, 60 sec: 3208.3, 300 sec: 3235.1). Total num frames: 2224128. Throughput: 0: 824.9. Samples: 555300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:29:06,986][00255] Avg episode reward: [(0, '5.992')] [2023-12-27 05:29:11,956][00255] Fps is (10 sec: 2867.5, 60 sec: 3208.4, 300 sec: 3235.1). Total num frames: 2236416. Throughput: 0: 772.4. Samples: 559042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:29:11,958][00255] Avg episode reward: [(0, '6.152')] [2023-12-27 05:29:16,952][00255] Fps is (10 sec: 2048.8, 60 sec: 3072.1, 300 sec: 3221.2). Total num frames: 2244608. Throughput: 0: 739.9. Samples: 562040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:16,955][00255] Avg episode reward: [(0, '5.736')] [2023-12-27 05:29:20,062][01470] Updated weights for policy 0, policy_version 550 (0.0027) [2023-12-27 05:29:21,950][00255] Fps is (10 sec: 2049.0, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 2256896. Throughput: 0: 729.0. Samples: 563522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:21,953][00255] Avg episode reward: [(0, '5.964')] [2023-12-27 05:29:26,951][00255] Fps is (10 sec: 2457.9, 60 sec: 2867.2, 300 sec: 3165.7). Total num frames: 2269184. Throughput: 0: 714.8. Samples: 567496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:26,953][00255] Avg episode reward: [(0, '6.152')] [2023-12-27 05:29:31,951][00255] Fps is (10 sec: 3276.7, 60 sec: 3004.0, 300 sec: 3193.5). Total num frames: 2289664. Throughput: 0: 712.9. Samples: 573460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:31,963][00255] Avg episode reward: [(0, '6.556')] [2023-12-27 05:29:32,122][01470] Updated weights for policy 0, policy_version 560 (0.0023) [2023-12-27 05:29:36,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3207.4). Total num frames: 2306048. Throughput: 0: 718.5. Samples: 576394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:36,963][00255] Avg episode reward: [(0, '6.733')] [2023-12-27 05:29:41,950][00255] Fps is (10 sec: 3276.9, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 2322432. Throughput: 0: 717.5. Samples: 580214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:29:41,953][00255] Avg episode reward: [(0, '6.903')] [2023-12-27 05:29:41,959][01457] Saving new best policy, reward=6.903! [2023-12-27 05:29:46,572][01470] Updated weights for policy 0, policy_version 570 (0.0028) [2023-12-27 05:29:46,951][00255] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 3165.7). Total num frames: 2334720. Throughput: 0: 706.0. Samples: 584202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:46,956][00255] Avg episode reward: [(0, '6.809')] [2023-12-27 05:29:51,950][00255] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3193.5). Total num frames: 2355200. Throughput: 0: 710.4. Samples: 587266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:51,956][00255] Avg episode reward: [(0, '6.234')] [2023-12-27 05:29:56,954][00255] Fps is (10 sec: 3685.2, 60 sec: 3003.6, 300 sec: 3207.4). Total num frames: 2371584. Throughput: 0: 764.1. Samples: 593424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:29:56,956][00255] Avg episode reward: [(0, '6.015')] [2023-12-27 05:29:57,095][01470] Updated weights for policy 0, policy_version 580 (0.0021) [2023-12-27 05:30:01,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3004.0, 300 sec: 3207.4). Total num frames: 2387968. Throughput: 0: 783.4. Samples: 597294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:30:01,957][00255] Avg episode reward: [(0, '6.257')] [2023-12-27 05:30:06,950][00255] Fps is (10 sec: 2458.5, 60 sec: 2867.5, 300 sec: 3165.7). Total num frames: 2396160. Throughput: 0: 792.1. Samples: 599166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:06,957][00255] Avg episode reward: [(0, '6.258')] [2023-12-27 05:30:11,073][01470] Updated weights for policy 0, policy_version 590 (0.0026) [2023-12-27 05:30:11,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3004.0, 300 sec: 3179.6). Total num frames: 2416640. Throughput: 0: 816.0. Samples: 604216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:30:11,953][00255] Avg episode reward: [(0, '6.576')] [2023-12-27 05:30:16,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3208.6, 300 sec: 3207.4). Total num frames: 2437120. Throughput: 0: 819.0. Samples: 610314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:30:16,953][00255] Avg episode reward: [(0, '6.614')] [2023-12-27 05:30:21,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2453504. Throughput: 0: 801.4. Samples: 612458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:21,954][00255] Avg episode reward: [(0, '6.891')] [2023-12-27 05:30:23,384][01470] Updated weights for policy 0, policy_version 600 (0.0029) [2023-12-27 05:30:26,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2465792. Throughput: 0: 802.5. Samples: 616326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-12-27 05:30:26,953][00255] Avg episode reward: [(0, '7.018')] [2023-12-27 05:30:26,965][01457] Saving new best policy, reward=7.018! [2023-12-27 05:30:31,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3179.6). Total num frames: 2482176. Throughput: 0: 822.2. Samples: 621202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:31,959][00255] Avg episode reward: [(0, '7.376')] [2023-12-27 05:30:31,962][01457] Saving new best policy, reward=7.376! [2023-12-27 05:30:35,753][01470] Updated weights for policy 0, policy_version 610 (0.0021) [2023-12-27 05:30:36,953][00255] Fps is (10 sec: 3685.4, 60 sec: 3276.7, 300 sec: 3207.3). Total num frames: 2502656. Throughput: 0: 818.4. Samples: 624098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:36,962][00255] Avg episode reward: [(0, '7.248')] [2023-12-27 05:30:41,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2514944. Throughput: 0: 796.2. Samples: 629252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:41,958][00255] Avg episode reward: [(0, '7.129')] [2023-12-27 05:30:46,951][00255] Fps is (10 sec: 2867.9, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2531328. Throughput: 0: 798.2. Samples: 633214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:30:46,953][00255] Avg episode reward: [(0, '7.143')] [2023-12-27 05:30:49,957][01470] Updated weights for policy 0, policy_version 620 (0.0024) [2023-12-27 05:30:51,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2547712. Throughput: 0: 797.8. Samples: 635068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:51,953][00255] Avg episode reward: [(0, '7.686')] [2023-12-27 05:30:51,955][01457] Saving new best policy, reward=7.686! [2023-12-27 05:30:56,950][00255] Fps is (10 sec: 3686.5, 60 sec: 3277.0, 300 sec: 3207.4). Total num frames: 2568192. Throughput: 0: 824.4. Samples: 641312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:30:56,953][00255] Avg episode reward: [(0, '8.020')] [2023-12-27 05:30:56,967][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000627_2568192.pth... [2023-12-27 05:30:57,092][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_1802240.pth [2023-12-27 05:30:57,105][01457] Saving new best policy, reward=8.020! [2023-12-27 05:31:00,422][01470] Updated weights for policy 0, policy_version 630 (0.0020) [2023-12-27 05:31:01,955][00255] Fps is (10 sec: 3275.4, 60 sec: 3208.3, 300 sec: 3193.4). Total num frames: 2580480. Throughput: 0: 804.4. Samples: 646516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:01,964][00255] Avg episode reward: [(0, '8.400')] [2023-12-27 05:31:01,995][01457] Saving new best policy, reward=8.400! [2023-12-27 05:31:06,957][00255] Fps is (10 sec: 2865.5, 60 sec: 3344.7, 300 sec: 3193.4). Total num frames: 2596864. Throughput: 0: 797.8. Samples: 648366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:06,959][00255] Avg episode reward: [(0, '8.064')] [2023-12-27 05:31:11,951][00255] Fps is (10 sec: 2868.4, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 2609152. Throughput: 0: 800.6. Samples: 652352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:11,953][00255] Avg episode reward: [(0, '7.965')] [2023-12-27 05:31:14,229][01470] Updated weights for policy 0, policy_version 640 (0.0035) [2023-12-27 05:31:16,951][00255] Fps is (10 sec: 3278.8, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2629632. Throughput: 0: 823.8. Samples: 658272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:16,961][00255] Avg episode reward: [(0, '8.334')] [2023-12-27 05:31:21,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2650112. Throughput: 0: 828.0. Samples: 661356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-27 05:31:21,957][00255] Avg episode reward: [(0, '8.497')] [2023-12-27 05:31:21,959][01457] Saving new best policy, reward=8.497! [2023-12-27 05:31:26,332][01470] Updated weights for policy 0, policy_version 650 (0.0019) [2023-12-27 05:31:26,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3193.6). Total num frames: 2662400. Throughput: 0: 807.8. Samples: 665604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:31:26,956][00255] Avg episode reward: [(0, '8.667')] [2023-12-27 05:31:26,972][01457] Saving new best policy, reward=8.667! [2023-12-27 05:31:31,951][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2674688. Throughput: 0: 804.0. Samples: 669396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:31:31,953][00255] Avg episode reward: [(0, '8.984')] [2023-12-27 05:31:31,960][01457] Saving new best policy, reward=8.984! [2023-12-27 05:31:36,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.7, 300 sec: 3179.6). Total num frames: 2695168. Throughput: 0: 823.8. Samples: 672138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:31:36,958][00255] Avg episode reward: [(0, '8.875')] [2023-12-27 05:31:38,852][01470] Updated weights for policy 0, policy_version 660 (0.0025) [2023-12-27 05:31:41,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 2715648. Throughput: 0: 820.4. Samples: 678230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:41,953][00255] Avg episode reward: [(0, '8.441')] [2023-12-27 05:31:46,952][00255] Fps is (10 sec: 3276.3, 60 sec: 3276.7, 300 sec: 3207.4). Total num frames: 2727936. Throughput: 0: 801.9. Samples: 682598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:46,960][00255] Avg episode reward: [(0, '7.680')] [2023-12-27 05:31:51,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2740224. Throughput: 0: 804.5. Samples: 684564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:31:51,954][00255] Avg episode reward: [(0, '7.904')] [2023-12-27 05:31:52,865][01470] Updated weights for policy 0, policy_version 670 (0.0031) [2023-12-27 05:31:56,951][00255] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 2756608. Throughput: 0: 818.2. Samples: 689172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:31:56,958][00255] Avg episode reward: [(0, '8.476')] [2023-12-27 05:32:01,957][00255] Fps is (10 sec: 3684.0, 60 sec: 3276.7, 300 sec: 3193.4). Total num frames: 2777088. Throughput: 0: 822.0. Samples: 695266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:32:01,967][00255] Avg episode reward: [(0, '8.657')] [2023-12-27 05:32:03,232][01470] Updated weights for policy 0, policy_version 680 (0.0014) [2023-12-27 05:32:06,957][00255] Fps is (10 sec: 3684.2, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2793472. Throughput: 0: 808.9. Samples: 697762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:32:06,961][00255] Avg episode reward: [(0, '8.903')] [2023-12-27 05:32:11,950][00255] Fps is (10 sec: 2869.0, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 2805760. Throughput: 0: 800.5. Samples: 701628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:11,959][00255] Avg episode reward: [(0, '9.056')] [2023-12-27 05:32:11,964][01457] Saving new best policy, reward=9.056! [2023-12-27 05:32:16,951][00255] Fps is (10 sec: 2869.0, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 2822144. Throughput: 0: 819.7. Samples: 706282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:32:16,961][00255] Avg episode reward: [(0, '8.532')] [2023-12-27 05:32:17,448][01470] Updated weights for policy 0, policy_version 690 (0.0027) [2023-12-27 05:32:21,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2842624. Throughput: 0: 825.7. Samples: 709296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:32:21,953][00255] Avg episode reward: [(0, '8.239')] [2023-12-27 05:32:26,953][00255] Fps is (10 sec: 3685.7, 60 sec: 3276.7, 300 sec: 3207.4). Total num frames: 2859008. Throughput: 0: 818.0. Samples: 715042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:32:26,955][00255] Avg episode reward: [(0, '8.236')] [2023-12-27 05:32:29,214][01470] Updated weights for policy 0, policy_version 700 (0.0020) [2023-12-27 05:32:31,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2871296. Throughput: 0: 807.0. Samples: 718910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:31,956][00255] Avg episode reward: [(0, '8.234')] [2023-12-27 05:32:36,951][00255] Fps is (10 sec: 2867.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2887680. Throughput: 0: 806.2. Samples: 720842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:36,954][00255] Avg episode reward: [(0, '8.376')] [2023-12-27 05:32:41,424][01470] Updated weights for policy 0, policy_version 710 (0.0020) [2023-12-27 05:32:41,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2908160. Throughput: 0: 832.8. Samples: 726650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:41,953][00255] Avg episode reward: [(0, '8.541')] [2023-12-27 05:32:46,951][00255] Fps is (10 sec: 3686.6, 60 sec: 3276.9, 300 sec: 3207.4). Total num frames: 2924544. Throughput: 0: 823.1. Samples: 732300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:46,955][00255] Avg episode reward: [(0, '8.781')] [2023-12-27 05:32:51,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 2940928. Throughput: 0: 809.2. Samples: 734170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:32:51,956][00255] Avg episode reward: [(0, '9.089')] [2023-12-27 05:32:51,963][01457] Saving new best policy, reward=9.089! [2023-12-27 05:32:55,322][01470] Updated weights for policy 0, policy_version 720 (0.0024) [2023-12-27 05:32:56,952][00255] Fps is (10 sec: 2866.8, 60 sec: 3276.7, 300 sec: 3179.6). Total num frames: 2953216. Throughput: 0: 807.1. Samples: 737948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:32:56,954][00255] Avg episode reward: [(0, '9.055')] [2023-12-27 05:32:56,978][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000721_2953216.pth... [2023-12-27 05:32:57,163][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000535_2191360.pth [2023-12-27 05:33:01,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.9, 300 sec: 3179.6). Total num frames: 2969600. Throughput: 0: 825.7. Samples: 743440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:33:01,960][00255] Avg episode reward: [(0, '8.816')] [2023-12-27 05:33:06,039][01470] Updated weights for policy 0, policy_version 730 (0.0030) [2023-12-27 05:33:06,950][00255] Fps is (10 sec: 4096.7, 60 sec: 3345.4, 300 sec: 3221.3). Total num frames: 2994176. Throughput: 0: 827.8. Samples: 746546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:33:06,952][00255] Avg episode reward: [(0, '9.295')] [2023-12-27 05:33:06,962][01457] Saving new best policy, reward=9.295! [2023-12-27 05:33:11,952][00255] Fps is (10 sec: 3686.0, 60 sec: 3345.0, 300 sec: 3207.4). Total num frames: 3006464. Throughput: 0: 802.6. Samples: 751158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:33:11,954][00255] Avg episode reward: [(0, '8.865')] [2023-12-27 05:33:16,951][00255] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3018752. Throughput: 0: 803.7. Samples: 755078. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:33:16,953][00255] Avg episode reward: [(0, '9.084')] [2023-12-27 05:33:20,356][01470] Updated weights for policy 0, policy_version 740 (0.0025) [2023-12-27 05:33:21,950][00255] Fps is (10 sec: 2867.5, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 3035136. Throughput: 0: 815.5. Samples: 757540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:33:21,957][00255] Avg episode reward: [(0, '8.719')] [2023-12-27 05:33:26,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3207.4). Total num frames: 3055616. Throughput: 0: 823.3. Samples: 763700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:33:26,959][00255] Avg episode reward: [(0, '9.795')] [2023-12-27 05:33:26,974][01457] Saving new best policy, reward=9.795! [2023-12-27 05:33:31,918][01470] Updated weights for policy 0, policy_version 750 (0.0022) [2023-12-27 05:33:31,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 3072000. Throughput: 0: 802.9. Samples: 768432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:33:31,953][00255] Avg episode reward: [(0, '9.813')] [2023-12-27 05:33:31,955][01457] Saving new best policy, reward=9.813! [2023-12-27 05:33:36,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3084288. Throughput: 0: 803.0. Samples: 770304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:33:36,956][00255] Avg episode reward: [(0, '10.712')] [2023-12-27 05:33:36,969][01457] Saving new best policy, reward=10.712! [2023-12-27 05:33:41,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 3100672. Throughput: 0: 812.6. Samples: 774512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:33:41,959][00255] Avg episode reward: [(0, '11.331')] [2023-12-27 05:33:41,965][01457] Saving new best policy, reward=11.331! [2023-12-27 05:33:44,875][01470] Updated weights for policy 0, policy_version 760 (0.0018) [2023-12-27 05:33:46,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3121152. Throughput: 0: 826.6. Samples: 780636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:33:46,958][00255] Avg episode reward: [(0, '11.459')] [2023-12-27 05:33:46,970][01457] Saving new best policy, reward=11.459! [2023-12-27 05:33:51,957][00255] Fps is (10 sec: 3684.1, 60 sec: 3276.5, 300 sec: 3207.3). Total num frames: 3137536. Throughput: 0: 823.5. Samples: 783610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:33:51,959][00255] Avg episode reward: [(0, '11.136')] [2023-12-27 05:33:56,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3193.6). Total num frames: 3149824. Throughput: 0: 809.0. Samples: 787560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:33:56,953][00255] Avg episode reward: [(0, '11.159')] [2023-12-27 05:33:57,873][01470] Updated weights for policy 0, policy_version 770 (0.0020) [2023-12-27 05:34:01,950][00255] Fps is (10 sec: 2869.0, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3166208. Throughput: 0: 815.8. Samples: 791788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:01,953][00255] Avg episode reward: [(0, '12.184')] [2023-12-27 05:34:01,956][01457] Saving new best policy, reward=12.184! [2023-12-27 05:34:06,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3186688. Throughput: 0: 827.8. Samples: 794790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:06,958][00255] Avg episode reward: [(0, '12.104')] [2023-12-27 05:34:08,968][01470] Updated weights for policy 0, policy_version 780 (0.0038) [2023-12-27 05:34:11,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3249.0). Total num frames: 3203072. Throughput: 0: 825.8. Samples: 800860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:34:11,954][00255] Avg episode reward: [(0, '12.616')] [2023-12-27 05:34:11,961][01457] Saving new best policy, reward=12.616! [2023-12-27 05:34:16,951][00255] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3215360. Throughput: 0: 806.1. Samples: 804706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:16,955][00255] Avg episode reward: [(0, '12.287')] [2023-12-27 05:34:21,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3227648. Throughput: 0: 808.2. Samples: 806674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:21,953][00255] Avg episode reward: [(0, '12.653')] [2023-12-27 05:34:22,000][01457] Saving new best policy, reward=12.653! [2023-12-27 05:34:23,153][01470] Updated weights for policy 0, policy_version 790 (0.0053) [2023-12-27 05:34:26,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3248128. Throughput: 0: 834.8. Samples: 812080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:26,958][00255] Avg episode reward: [(0, '12.383')] [2023-12-27 05:34:31,950][00255] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3268608. Throughput: 0: 836.1. Samples: 818262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-12-27 05:34:31,957][00255] Avg episode reward: [(0, '13.316')] [2023-12-27 05:34:31,975][01457] Saving new best policy, reward=13.316! [2023-12-27 05:34:34,133][01470] Updated weights for policy 0, policy_version 800 (0.0039) [2023-12-27 05:34:36,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3280896. Throughput: 0: 811.9. Samples: 820142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:34:36,956][00255] Avg episode reward: [(0, '13.440')] [2023-12-27 05:34:36,977][01457] Saving new best policy, reward=13.440! [2023-12-27 05:34:41,951][00255] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3293184. Throughput: 0: 806.8. Samples: 823866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:34:41,959][00255] Avg episode reward: [(0, '14.119')] [2023-12-27 05:34:41,961][01457] Saving new best policy, reward=14.119! [2023-12-27 05:34:46,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3313664. Throughput: 0: 829.5. Samples: 829116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:34:46,958][00255] Avg episode reward: [(0, '14.377')] [2023-12-27 05:34:46,974][01457] Saving new best policy, reward=14.377! [2023-12-27 05:34:47,520][01470] Updated weights for policy 0, policy_version 810 (0.0051) [2023-12-27 05:34:51,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3277.1, 300 sec: 3263.0). Total num frames: 3334144. Throughput: 0: 828.4. Samples: 832068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:34:51,958][00255] Avg episode reward: [(0, '15.034')] [2023-12-27 05:34:51,961][01457] Saving new best policy, reward=15.034! [2023-12-27 05:34:56,952][00255] Fps is (10 sec: 3276.4, 60 sec: 3276.7, 300 sec: 3249.0). Total num frames: 3346432. Throughput: 0: 806.8. Samples: 837166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:34:56,954][00255] Avg episode reward: [(0, '15.198')] [2023-12-27 05:34:56,966][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000817_3346432.pth... [2023-12-27 05:34:57,154][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000627_2568192.pth [2023-12-27 05:34:57,173][01457] Saving new best policy, reward=15.198! [2023-12-27 05:35:00,393][01470] Updated weights for policy 0, policy_version 820 (0.0046) [2023-12-27 05:35:01,950][00255] Fps is (10 sec: 2457.7, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3358720. Throughput: 0: 806.2. Samples: 840984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:35:01,957][00255] Avg episode reward: [(0, '16.401')] [2023-12-27 05:35:01,960][01457] Saving new best policy, reward=16.401! [2023-12-27 05:35:06,951][00255] Fps is (10 sec: 2867.5, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3375104. Throughput: 0: 807.6. Samples: 843016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:06,953][00255] Avg episode reward: [(0, '16.195')] [2023-12-27 05:35:11,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3395584. Throughput: 0: 815.5. Samples: 848776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:11,953][00255] Avg episode reward: [(0, '17.832')] [2023-12-27 05:35:11,955][01457] Saving new best policy, reward=17.832! [2023-12-27 05:35:12,364][01470] Updated weights for policy 0, policy_version 830 (0.0028) [2023-12-27 05:35:16,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3411968. Throughput: 0: 789.2. Samples: 853778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:16,955][00255] Avg episode reward: [(0, '18.517')] [2023-12-27 05:35:16,974][01457] Saving new best policy, reward=18.517! [2023-12-27 05:35:21,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3424256. Throughput: 0: 788.2. Samples: 855610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:21,956][00255] Avg episode reward: [(0, '18.321')] [2023-12-27 05:35:26,631][01470] Updated weights for policy 0, policy_version 840 (0.0043) [2023-12-27 05:35:26,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3440640. Throughput: 0: 795.0. Samples: 859640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:35:26,953][00255] Avg episode reward: [(0, '19.907')] [2023-12-27 05:35:26,966][01457] Saving new best policy, reward=19.907! [2023-12-27 05:35:31,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3249.1). Total num frames: 3461120. Throughput: 0: 811.6. Samples: 865640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:35:31,954][00255] Avg episode reward: [(0, '18.789')] [2023-12-27 05:35:36,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3477504. Throughput: 0: 814.2. Samples: 868708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:36,956][00255] Avg episode reward: [(0, '19.099')] [2023-12-27 05:35:37,512][01470] Updated weights for policy 0, policy_version 850 (0.0017) [2023-12-27 05:35:41,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3489792. Throughput: 0: 793.0. Samples: 872852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:41,956][00255] Avg episode reward: [(0, '19.056')] [2023-12-27 05:35:46,950][00255] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 3502080. Throughput: 0: 793.9. Samples: 876710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:35:46,953][00255] Avg episode reward: [(0, '18.768')] [2023-12-27 05:35:50,956][01470] Updated weights for policy 0, policy_version 860 (0.0053) [2023-12-27 05:35:51,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3526656. Throughput: 0: 816.8. Samples: 879772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:35:51,956][00255] Avg episode reward: [(0, '18.765')] [2023-12-27 05:35:56,957][00255] Fps is (10 sec: 4093.5, 60 sec: 3276.5, 300 sec: 3262.9). Total num frames: 3543040. Throughput: 0: 824.6. Samples: 885888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:35:56,960][00255] Avg episode reward: [(0, '19.479')] [2023-12-27 05:36:01,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.1). Total num frames: 3555328. Throughput: 0: 808.9. Samples: 890180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:36:01,952][00255] Avg episode reward: [(0, '21.078')] [2023-12-27 05:36:01,984][01457] Saving new best policy, reward=21.078! [2023-12-27 05:36:03,614][01470] Updated weights for policy 0, policy_version 870 (0.0017) [2023-12-27 05:36:06,951][00255] Fps is (10 sec: 2869.0, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3571712. Throughput: 0: 811.6. Samples: 892134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:36:06,959][00255] Avg episode reward: [(0, '21.483')] [2023-12-27 05:36:06,971][01457] Saving new best policy, reward=21.483! [2023-12-27 05:36:11,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3588096. Throughput: 0: 827.2. Samples: 896866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:36:11,953][00255] Avg episode reward: [(0, '22.420')] [2023-12-27 05:36:11,957][01457] Saving new best policy, reward=22.420! [2023-12-27 05:36:15,192][01470] Updated weights for policy 0, policy_version 880 (0.0035) [2023-12-27 05:36:16,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3608576. Throughput: 0: 831.9. Samples: 903076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:36:16,953][00255] Avg episode reward: [(0, '23.815')] [2023-12-27 05:36:16,968][01457] Saving new best policy, reward=23.815! [2023-12-27 05:36:21,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3624960. Throughput: 0: 814.8. Samples: 905376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:21,953][00255] Avg episode reward: [(0, '23.297')] [2023-12-27 05:36:26,955][00255] Fps is (10 sec: 2866.0, 60 sec: 3276.6, 300 sec: 3262.9). Total num frames: 3637248. Throughput: 0: 810.6. Samples: 909334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:26,957][00255] Avg episode reward: [(0, '21.756')] [2023-12-27 05:36:29,408][01470] Updated weights for policy 0, policy_version 890 (0.0029) [2023-12-27 05:36:31,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3653632. Throughput: 0: 828.9. Samples: 914010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:31,953][00255] Avg episode reward: [(0, '20.304')] [2023-12-27 05:36:36,951][00255] Fps is (10 sec: 3687.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3674112. Throughput: 0: 829.5. Samples: 917100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:36,960][00255] Avg episode reward: [(0, '20.527')] [2023-12-27 05:36:39,827][01470] Updated weights for policy 0, policy_version 900 (0.0032) [2023-12-27 05:36:41,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3690496. Throughput: 0: 820.0. Samples: 922784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-12-27 05:36:41,953][00255] Avg episode reward: [(0, '17.811')] [2023-12-27 05:36:46,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3702784. Throughput: 0: 809.0. Samples: 926586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-12-27 05:36:46,960][00255] Avg episode reward: [(0, '17.526')] [2023-12-27 05:36:51,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3719168. Throughput: 0: 810.0. Samples: 928582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:51,958][00255] Avg episode reward: [(0, '17.888')] [2023-12-27 05:36:53,519][01470] Updated weights for policy 0, policy_version 910 (0.0021) [2023-12-27 05:36:56,951][00255] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3263.0). Total num frames: 3739648. Throughput: 0: 835.2. Samples: 934450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:36:56,959][00255] Avg episode reward: [(0, '19.563')] [2023-12-27 05:36:56,972][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000913_3739648.pth... [2023-12-27 05:36:57,103][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000721_2953216.pth [2023-12-27 05:37:01,950][00255] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3263.0). Total num frames: 3756032. Throughput: 0: 825.2. Samples: 940210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:37:01,953][00255] Avg episode reward: [(0, '19.596')] [2023-12-27 05:37:05,173][01470] Updated weights for policy 0, policy_version 920 (0.0021) [2023-12-27 05:37:06,950][00255] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3772416. Throughput: 0: 818.3. Samples: 942198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:37:06,955][00255] Avg episode reward: [(0, '19.845')] [2023-12-27 05:37:11,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3784704. Throughput: 0: 816.7. Samples: 946082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:37:11,957][00255] Avg episode reward: [(0, '21.108')] [2023-12-27 05:37:16,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3805184. Throughput: 0: 846.7. Samples: 952110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:37:16,960][00255] Avg episode reward: [(0, '20.120')] [2023-12-27 05:37:17,330][01470] Updated weights for policy 0, policy_version 930 (0.0052) [2023-12-27 05:37:21,956][00255] Fps is (10 sec: 4093.9, 60 sec: 3344.8, 300 sec: 3276.8). Total num frames: 3825664. Throughput: 0: 845.4. Samples: 955146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:37:21,964][00255] Avg episode reward: [(0, '19.889')] [2023-12-27 05:37:26,957][00255] Fps is (10 sec: 3274.7, 60 sec: 3344.9, 300 sec: 3276.7). Total num frames: 3837952. Throughput: 0: 821.5. Samples: 959756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:37:26,963][00255] Avg episode reward: [(0, '19.906')] [2023-12-27 05:37:30,653][01470] Updated weights for policy 0, policy_version 940 (0.0026) [2023-12-27 05:37:31,951][00255] Fps is (10 sec: 2458.9, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3850240. Throughput: 0: 822.7. Samples: 963608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:37:31,953][00255] Avg episode reward: [(0, '19.631')] [2023-12-27 05:37:36,951][00255] Fps is (10 sec: 3278.9, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3870720. Throughput: 0: 834.3. Samples: 966124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:37:36,953][00255] Avg episode reward: [(0, '19.920')] [2023-12-27 05:37:41,656][01470] Updated weights for policy 0, policy_version 950 (0.0022) [2023-12-27 05:37:41,951][00255] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3891200. Throughput: 0: 840.8. Samples: 972284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:37:41,958][00255] Avg episode reward: [(0, '20.346')] [2023-12-27 05:37:46,955][00255] Fps is (10 sec: 3275.5, 60 sec: 3344.8, 300 sec: 3262.9). Total num frames: 3903488. Throughput: 0: 818.9. Samples: 977062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:37:46,963][00255] Avg episode reward: [(0, '20.876')] [2023-12-27 05:37:51,951][00255] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3919872. Throughput: 0: 819.2. Samples: 979064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-12-27 05:37:51,958][00255] Avg episode reward: [(0, '20.605')] [2023-12-27 05:37:55,771][01470] Updated weights for policy 0, policy_version 960 (0.0025) [2023-12-27 05:37:56,951][00255] Fps is (10 sec: 3278.1, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3936256. Throughput: 0: 830.8. Samples: 983466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:37:56,953][00255] Avg episode reward: [(0, '21.127')] [2023-12-27 05:38:01,951][00255] Fps is (10 sec: 3686.3, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 3956736. Throughput: 0: 830.0. Samples: 989462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-12-27 05:38:01,960][00255] Avg episode reward: [(0, '22.391')] [2023-12-27 05:38:06,951][00255] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3969024. Throughput: 0: 827.2. Samples: 992368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:38:06,959][00255] Avg episode reward: [(0, '22.114')] [2023-12-27 05:38:07,129][01470] Updated weights for policy 0, policy_version 970 (0.0028) [2023-12-27 05:38:11,955][00255] Fps is (10 sec: 2866.1, 60 sec: 3344.8, 300 sec: 3276.8). Total num frames: 3985408. Throughput: 0: 810.6. Samples: 996232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-12-27 05:38:11,958][00255] Avg episode reward: [(0, '22.570')] [2023-12-27 05:38:16,950][00255] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 3997696. Throughput: 0: 817.3. Samples: 1000388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-12-27 05:38:16,956][00255] Avg episode reward: [(0, '23.071')] [2023-12-27 05:38:18,251][01457] Stopping Batcher_0... [2023-12-27 05:38:18,252][01457] Loop batcher_evt_loop terminating... [2023-12-27 05:38:18,254][00255] Component Batcher_0 stopped! [2023-12-27 05:38:18,262][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-27 05:38:18,306][00255] Component RolloutWorker_w1 stopped! [2023-12-27 05:38:18,305][01472] Stopping RolloutWorker_w1... [2023-12-27 05:38:18,322][01474] Stopping RolloutWorker_w3... [2023-12-27 05:38:18,318][01472] Loop rollout_proc1_evt_loop terminating... [2023-12-27 05:38:18,322][00255] Component RolloutWorker_w3 stopped! [2023-12-27 05:38:18,326][01474] Loop rollout_proc3_evt_loop terminating... [2023-12-27 05:38:18,336][00255] Component RolloutWorker_w0 stopped! [2023-12-27 05:38:18,338][01471] Stopping RolloutWorker_w0... [2023-12-27 05:38:18,338][01471] Loop rollout_proc0_evt_loop terminating... [2023-12-27 05:38:18,337][01470] Weights refcount: 2 0 [2023-12-27 05:38:18,347][01470] Stopping InferenceWorker_p0-w0... [2023-12-27 05:38:18,344][01478] Stopping RolloutWorker_w7... [2023-12-27 05:38:18,344][00255] Component RolloutWorker_w7 stopped! [2023-12-27 05:38:18,348][01470] Loop inference_proc0-0_evt_loop terminating... [2023-12-27 05:38:18,348][00255] Component InferenceWorker_p0-w0 stopped! [2023-12-27 05:38:18,355][01476] Stopping RolloutWorker_w5... [2023-12-27 05:38:18,356][01473] Stopping RolloutWorker_w2... [2023-12-27 05:38:18,357][01478] Loop rollout_proc7_evt_loop terminating... [2023-12-27 05:38:18,355][00255] Component RolloutWorker_w5 stopped! [2023-12-27 05:38:18,362][00255] Component RolloutWorker_w2 stopped! [2023-12-27 05:38:18,361][01476] Loop rollout_proc5_evt_loop terminating... [2023-12-27 05:38:18,392][00255] Component RolloutWorker_w4 stopped! [2023-12-27 05:38:18,397][01473] Loop rollout_proc2_evt_loop terminating... [2023-12-27 05:38:18,392][01475] Stopping RolloutWorker_w4... [2023-12-27 05:38:18,409][01475] Loop rollout_proc4_evt_loop terminating... [2023-12-27 05:38:18,430][01477] Stopping RolloutWorker_w6... [2023-12-27 05:38:18,430][00255] Component RolloutWorker_w6 stopped! [2023-12-27 05:38:18,435][01477] Loop rollout_proc6_evt_loop terminating... [2023-12-27 05:38:18,462][01457] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000817_3346432.pth [2023-12-27 05:38:18,476][01457] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-27 05:38:18,668][01457] Stopping LearnerWorker_p0... [2023-12-27 05:38:18,668][01457] Loop learner_proc0_evt_loop terminating... [2023-12-27 05:38:18,669][00255] Component LearnerWorker_p0 stopped! [2023-12-27 05:38:18,671][00255] Waiting for process learner_proc0 to stop... [2023-12-27 05:38:20,135][00255] Waiting for process inference_proc0-0 to join... [2023-12-27 05:38:20,138][00255] Waiting for process rollout_proc0 to join... [2023-12-27 05:38:22,084][00255] Waiting for process rollout_proc1 to join... [2023-12-27 05:38:22,085][00255] Waiting for process rollout_proc2 to join... [2023-12-27 05:38:22,095][00255] Waiting for process rollout_proc3 to join... [2023-12-27 05:38:22,096][00255] Waiting for process rollout_proc4 to join... [2023-12-27 05:38:22,098][00255] Waiting for process rollout_proc5 to join... [2023-12-27 05:38:22,101][00255] Waiting for process rollout_proc6 to join... [2023-12-27 05:38:22,103][00255] Waiting for process rollout_proc7 to join... [2023-12-27 05:38:22,105][00255] Batcher 0 profile tree view: batching: 27.9790, releasing_batches: 0.0290 [2023-12-27 05:38:22,107][00255] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0047 wait_policy_total: 587.6832 update_model: 9.3048 weight_update: 0.0043 one_step: 0.0025 handle_policy_step: 610.4699 deserialize: 16.3160, stack: 3.1757, obs_to_device_normalize: 121.7183, forward: 325.9882, send_messages: 29.4057 prepare_outputs: 82.9239 to_cpu: 47.5954 [2023-12-27 05:38:22,108][00255] Learner 0 profile tree view: misc: 0.0058, prepare_batch: 14.4575 train: 75.5340 epoch_init: 0.0149, minibatch_init: 0.0141, losses_postprocess: 0.6047, kl_divergence: 0.6845, after_optimizer: 34.4028 calculate_losses: 27.2563 losses_init: 0.0041, forward_head: 1.3566, bptt_initial: 18.0255, tail: 1.1528, advantages_returns: 0.2833, losses: 4.0400 bptt: 2.0874 bptt_forward_core: 2.0028 update: 11.8721 clip: 0.9887 [2023-12-27 05:38:22,110][00255] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4184, enqueue_policy_requests: 173.7067, env_step: 938.4276, overhead: 24.6588, complete_rollouts: 7.9782 save_policy_outputs: 23.0460 split_output_tensors: 10.8217 [2023-12-27 05:38:22,111][00255] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3706, enqueue_policy_requests: 176.3866, env_step: 932.8229, overhead: 25.5552, complete_rollouts: 7.1231 save_policy_outputs: 22.4502 split_output_tensors: 10.6770 [2023-12-27 05:38:22,113][00255] Loop Runner_EvtLoop terminating... [2023-12-27 05:38:22,114][00255] Runner profile tree view: main_loop: 1278.8266 [2023-12-27 05:38:22,116][00255] Collected {0: 4005888}, FPS: 3132.5 [2023-12-27 05:39:42,547][00255] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-12-27 05:39:42,550][00255] Overriding arg 'num_workers' with value 1 passed from command line [2023-12-27 05:39:42,552][00255] Adding new argument 'no_render'=True that is not in the saved config file! [2023-12-27 05:39:42,554][00255] Adding new argument 'save_video'=True that is not in the saved config file! [2023-12-27 05:39:42,556][00255] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-12-27 05:39:42,558][00255] Adding new argument 'video_name'=None that is not in the saved config file! [2023-12-27 05:39:42,560][00255] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-12-27 05:39:42,561][00255] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-12-27 05:39:42,563][00255] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-12-27 05:39:42,564][00255] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-12-27 05:39:42,565][00255] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-12-27 05:39:42,566][00255] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-12-27 05:39:42,567][00255] Adding new argument 'train_script'=None that is not in the saved config file! [2023-12-27 05:39:42,569][00255] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-12-27 05:39:42,570][00255] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-12-27 05:39:42,613][00255] Doom resolution: 160x120, resize resolution: (128, 72) [2023-12-27 05:39:42,618][00255] RunningMeanStd input shape: (3, 72, 128) [2023-12-27 05:39:42,621][00255] RunningMeanStd input shape: (1,) [2023-12-27 05:39:42,637][00255] ConvEncoder: input_channels=3 [2023-12-27 05:39:42,741][00255] Conv encoder output size: 512 [2023-12-27 05:39:42,743][00255] Policy head output size: 512 [2023-12-27 05:39:42,936][00255] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-27 05:39:43,662][00255] Num frames 100... [2023-12-27 05:39:43,809][00255] Num frames 200... [2023-12-27 05:39:43,951][00255] Num frames 300... [2023-12-27 05:39:44,078][00255] Num frames 400... [2023-12-27 05:39:44,219][00255] Avg episode rewards: #0: 7.650, true rewards: #0: 4.650 [2023-12-27 05:39:44,221][00255] Avg episode reward: 7.650, avg true_objective: 4.650 [2023-12-27 05:39:44,268][00255] Num frames 500... [2023-12-27 05:39:44,393][00255] Num frames 600... [2023-12-27 05:39:44,523][00255] Num frames 700... [2023-12-27 05:39:44,647][00255] Num frames 800... [2023-12-27 05:39:44,778][00255] Num frames 900... [2023-12-27 05:39:44,907][00255] Num frames 1000... [2023-12-27 05:39:45,032][00255] Num frames 1100... [2023-12-27 05:39:45,158][00255] Num frames 1200... [2023-12-27 05:39:45,292][00255] Num frames 1300... [2023-12-27 05:39:45,418][00255] Num frames 1400... [2023-12-27 05:39:45,548][00255] Num frames 1500... [2023-12-27 05:39:45,675][00255] Num frames 1600... [2023-12-27 05:39:45,799][00255] Num frames 1700... [2023-12-27 05:39:45,929][00255] Num frames 1800... [2023-12-27 05:39:46,052][00255] Num frames 1900... [2023-12-27 05:39:46,175][00255] Num frames 2000... [2023-12-27 05:39:46,276][00255] Avg episode rewards: #0: 23.665, true rewards: #0: 10.165 [2023-12-27 05:39:46,278][00255] Avg episode reward: 23.665, avg true_objective: 10.165 [2023-12-27 05:39:46,374][00255] Num frames 2100... [2023-12-27 05:39:46,508][00255] Num frames 2200... [2023-12-27 05:39:46,635][00255] Num frames 2300... [2023-12-27 05:39:46,757][00255] Num frames 2400... [2023-12-27 05:39:46,881][00255] Num frames 2500... [2023-12-27 05:39:47,004][00255] Num frames 2600... [2023-12-27 05:39:47,133][00255] Num frames 2700... [2023-12-27 05:39:47,276][00255] Num frames 2800... [2023-12-27 05:39:47,407][00255] Num frames 2900... [2023-12-27 05:39:47,501][00255] Avg episode rewards: #0: 23.097, true rewards: #0: 9.763 [2023-12-27 05:39:47,502][00255] Avg episode reward: 23.097, avg true_objective: 9.763 [2023-12-27 05:39:47,595][00255] Num frames 3000... [2023-12-27 05:39:47,737][00255] Num frames 3100... [2023-12-27 05:39:47,921][00255] Num frames 3200... [2023-12-27 05:39:48,102][00255] Num frames 3300... [2023-12-27 05:39:48,289][00255] Num frames 3400... [2023-12-27 05:39:48,477][00255] Num frames 3500... [2023-12-27 05:39:48,665][00255] Avg episode rewards: #0: 21.422, true rewards: #0: 8.922 [2023-12-27 05:39:48,667][00255] Avg episode reward: 21.422, avg true_objective: 8.922 [2023-12-27 05:39:48,728][00255] Num frames 3600... [2023-12-27 05:39:48,920][00255] Num frames 3700... [2023-12-27 05:39:49,106][00255] Num frames 3800... [2023-12-27 05:39:49,283][00255] Num frames 3900... [2023-12-27 05:39:49,470][00255] Num frames 4000... [2023-12-27 05:39:49,653][00255] Num frames 4100... [2023-12-27 05:39:49,833][00255] Num frames 4200... [2023-12-27 05:39:50,016][00255] Num frames 4300... [2023-12-27 05:39:50,404][00255] Num frames 4400... [2023-12-27 05:39:50,582][00255] Num frames 4500... [2023-12-27 05:39:50,754][00255] Num frames 4600... [2023-12-27 05:39:50,882][00255] Num frames 4700... [2023-12-27 05:39:51,079][00255] Num frames 4800... [2023-12-27 05:39:51,234][00255] Num frames 4900... [2023-12-27 05:39:51,363][00255] Num frames 5000... [2023-12-27 05:39:51,499][00255] Num frames 5100... [2023-12-27 05:39:51,629][00255] Num frames 5200... [2023-12-27 05:39:51,757][00255] Num frames 5300... [2023-12-27 05:39:51,882][00255] Num frames 5400... [2023-12-27 05:39:52,016][00255] Num frames 5500... [2023-12-27 05:39:52,151][00255] Num frames 5600... [2023-12-27 05:39:52,299][00255] Avg episode rewards: #0: 28.138, true rewards: #0: 11.338 [2023-12-27 05:39:52,301][00255] Avg episode reward: 28.138, avg true_objective: 11.338 [2023-12-27 05:39:52,354][00255] Num frames 5700... [2023-12-27 05:39:52,499][00255] Num frames 5800... [2023-12-27 05:39:52,625][00255] Num frames 5900... [2023-12-27 05:39:52,750][00255] Num frames 6000... [2023-12-27 05:39:52,877][00255] Num frames 6100... [2023-12-27 05:39:53,015][00255] Num frames 6200... [2023-12-27 05:39:53,143][00255] Num frames 6300... [2023-12-27 05:39:53,268][00255] Num frames 6400... [2023-12-27 05:39:53,395][00255] Num frames 6500... [2023-12-27 05:39:53,535][00255] Num frames 6600... [2023-12-27 05:39:53,666][00255] Num frames 6700... [2023-12-27 05:39:53,796][00255] Num frames 6800... [2023-12-27 05:39:53,926][00255] Num frames 6900... [2023-12-27 05:39:54,057][00255] Num frames 7000... [2023-12-27 05:39:54,188][00255] Num frames 7100... [2023-12-27 05:39:54,321][00255] Num frames 7200... [2023-12-27 05:39:54,452][00255] Num frames 7300... [2023-12-27 05:39:54,587][00255] Num frames 7400... [2023-12-27 05:39:54,720][00255] Num frames 7500... [2023-12-27 05:39:54,855][00255] Num frames 7600... [2023-12-27 05:39:54,941][00255] Avg episode rewards: #0: 32.035, true rewards: #0: 12.702 [2023-12-27 05:39:54,942][00255] Avg episode reward: 32.035, avg true_objective: 12.702 [2023-12-27 05:39:55,042][00255] Num frames 7700... [2023-12-27 05:39:55,167][00255] Num frames 7800... [2023-12-27 05:39:55,292][00255] Num frames 7900... [2023-12-27 05:39:55,415][00255] Num frames 8000... [2023-12-27 05:39:55,538][00255] Num frames 8100... [2023-12-27 05:39:55,671][00255] Num frames 8200... [2023-12-27 05:39:55,800][00255] Num frames 8300... [2023-12-27 05:39:55,927][00255] Num frames 8400... [2023-12-27 05:39:56,059][00255] Num frames 8500... [2023-12-27 05:39:56,187][00255] Num frames 8600... [2023-12-27 05:39:56,261][00255] Avg episode rewards: #0: 30.018, true rewards: #0: 12.304 [2023-12-27 05:39:56,263][00255] Avg episode reward: 30.018, avg true_objective: 12.304 [2023-12-27 05:39:56,382][00255] Num frames 8700... [2023-12-27 05:39:56,519][00255] Num frames 8800... [2023-12-27 05:39:56,655][00255] Num frames 8900... [2023-12-27 05:39:56,780][00255] Num frames 9000... [2023-12-27 05:39:56,943][00255] Num frames 9100... [2023-12-27 05:39:57,076][00255] Num frames 9200... [2023-12-27 05:39:57,218][00255] Num frames 9300... [2023-12-27 05:39:57,358][00255] Num frames 9400... [2023-12-27 05:39:57,489][00255] Num frames 9500... [2023-12-27 05:39:57,559][00255] Avg episode rewards: #0: 28.386, true rewards: #0: 11.886 [2023-12-27 05:39:57,561][00255] Avg episode reward: 28.386, avg true_objective: 11.886 [2023-12-27 05:39:57,695][00255] Num frames 9600... [2023-12-27 05:39:57,827][00255] Num frames 9700... [2023-12-27 05:39:57,953][00255] Num frames 9800... [2023-12-27 05:39:58,078][00255] Num frames 9900... [2023-12-27 05:39:58,207][00255] Num frames 10000... [2023-12-27 05:39:58,332][00255] Num frames 10100... [2023-12-27 05:39:58,411][00255] Avg episode rewards: #0: 26.241, true rewards: #0: 11.241 [2023-12-27 05:39:58,413][00255] Avg episode reward: 26.241, avg true_objective: 11.241 [2023-12-27 05:39:58,526][00255] Num frames 10200... [2023-12-27 05:39:58,678][00255] Num frames 10300... [2023-12-27 05:39:58,815][00255] Num frames 10400... [2023-12-27 05:39:58,940][00255] Num frames 10500... [2023-12-27 05:39:59,063][00255] Num frames 10600... [2023-12-27 05:39:59,188][00255] Num frames 10700... [2023-12-27 05:39:59,317][00255] Num frames 10800... [2023-12-27 05:39:59,446][00255] Num frames 10900... [2023-12-27 05:39:59,573][00255] Num frames 11000... [2023-12-27 05:39:59,707][00255] Num frames 11100... [2023-12-27 05:39:59,832][00255] Num frames 11200... [2023-12-27 05:39:59,962][00255] Num frames 11300... [2023-12-27 05:40:00,090][00255] Num frames 11400... [2023-12-27 05:40:00,213][00255] Num frames 11500... [2023-12-27 05:40:00,302][00255] Avg episode rewards: #0: 26.725, true rewards: #0: 11.525 [2023-12-27 05:40:00,303][00255] Avg episode reward: 26.725, avg true_objective: 11.525 [2023-12-27 05:41:15,301][00255] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-12-27 05:43:35,757][00255] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-12-27 05:43:35,759][00255] Overriding arg 'num_workers' with value 1 passed from command line [2023-12-27 05:43:35,761][00255] Adding new argument 'no_render'=True that is not in the saved config file! [2023-12-27 05:43:35,763][00255] Adding new argument 'save_video'=True that is not in the saved config file! [2023-12-27 05:43:35,765][00255] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-12-27 05:43:35,767][00255] Adding new argument 'video_name'=None that is not in the saved config file! [2023-12-27 05:43:35,769][00255] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-12-27 05:43:35,771][00255] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-12-27 05:43:35,772][00255] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-12-27 05:43:35,774][00255] Adding new argument 'hf_repository'='lorenzreyes/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-12-27 05:43:35,775][00255] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-12-27 05:43:35,776][00255] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-12-27 05:43:35,777][00255] Adding new argument 'train_script'=None that is not in the saved config file! [2023-12-27 05:43:35,778][00255] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-12-27 05:43:35,779][00255] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-12-27 05:43:35,814][00255] RunningMeanStd input shape: (3, 72, 128) [2023-12-27 05:43:35,817][00255] RunningMeanStd input shape: (1,) [2023-12-27 05:43:35,830][00255] ConvEncoder: input_channels=3 [2023-12-27 05:43:35,867][00255] Conv encoder output size: 512 [2023-12-27 05:43:35,868][00255] Policy head output size: 512 [2023-12-27 05:43:35,887][00255] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-12-27 05:43:36,291][00255] Num frames 100... [2023-12-27 05:43:36,417][00255] Num frames 200... [2023-12-27 05:43:36,547][00255] Num frames 300... [2023-12-27 05:43:36,674][00255] Num frames 400... [2023-12-27 05:43:36,805][00255] Num frames 500... [2023-12-27 05:43:36,916][00255] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 [2023-12-27 05:43:36,918][00255] Avg episode reward: 7.440, avg true_objective: 5.440 [2023-12-27 05:43:36,995][00255] Num frames 600... [2023-12-27 05:43:37,118][00255] Num frames 700... [2023-12-27 05:43:37,240][00255] Num frames 800... [2023-12-27 05:43:37,363][00255] Num frames 900... [2023-12-27 05:43:37,490][00255] Num frames 1000... [2023-12-27 05:43:37,617][00255] Num frames 1100... [2023-12-27 05:43:37,748][00255] Num frames 1200... [2023-12-27 05:43:37,873][00255] Num frames 1300... [2023-12-27 05:43:38,002][00255] Num frames 1400... [2023-12-27 05:43:38,123][00255] Num frames 1500... [2023-12-27 05:43:38,240][00255] Avg episode rewards: #0: 13.725, true rewards: #0: 7.725 [2023-12-27 05:43:38,242][00255] Avg episode reward: 13.725, avg true_objective: 7.725 [2023-12-27 05:43:38,319][00255] Num frames 1600... [2023-12-27 05:43:38,452][00255] Num frames 1700... [2023-12-27 05:43:38,579][00255] Num frames 1800... [2023-12-27 05:43:38,707][00255] Num frames 1900... [2023-12-27 05:43:38,846][00255] Num frames 2000... [2023-12-27 05:43:38,972][00255] Num frames 2100... [2023-12-27 05:43:39,101][00255] Num frames 2200... [2023-12-27 05:43:39,224][00255] Num frames 2300... [2023-12-27 05:43:39,354][00255] Num frames 2400... [2023-12-27 05:43:39,485][00255] Num frames 2500... [2023-12-27 05:43:39,614][00255] Num frames 2600... [2023-12-27 05:43:39,743][00255] Num frames 2700... [2023-12-27 05:43:39,877][00255] Num frames 2800... [2023-12-27 05:43:40,003][00255] Num frames 2900... [2023-12-27 05:43:40,129][00255] Num frames 3000... [2023-12-27 05:43:40,260][00255] Num frames 3100... [2023-12-27 05:43:40,393][00255] Num frames 3200... [2023-12-27 05:43:40,523][00255] Avg episode rewards: #0: 21.860, true rewards: #0: 10.860 [2023-12-27 05:43:40,525][00255] Avg episode reward: 21.860, avg true_objective: 10.860 [2023-12-27 05:43:40,585][00255] Num frames 3300... [2023-12-27 05:43:40,721][00255] Num frames 3400... [2023-12-27 05:43:40,861][00255] Num frames 3500... [2023-12-27 05:43:40,991][00255] Num frames 3600... [2023-12-27 05:43:41,082][00255] Avg episode rewards: #0: 17.570, true rewards: #0: 9.070 [2023-12-27 05:43:41,083][00255] Avg episode reward: 17.570, avg true_objective: 9.070 [2023-12-27 05:43:41,186][00255] Num frames 3700... [2023-12-27 05:43:41,328][00255] Num frames 3800... [2023-12-27 05:43:41,469][00255] Num frames 3900... [2023-12-27 05:43:41,609][00255] Num frames 4000... [2023-12-27 05:43:41,743][00255] Num frames 4100... [2023-12-27 05:43:41,881][00255] Num frames 4200... [2023-12-27 05:43:42,007][00255] Num frames 4300... [2023-12-27 05:43:42,144][00255] Num frames 4400... [2023-12-27 05:43:42,307][00255] Avg episode rewards: #0: 17.952, true rewards: #0: 8.952 [2023-12-27 05:43:42,309][00255] Avg episode reward: 17.952, avg true_objective: 8.952 [2023-12-27 05:43:42,346][00255] Num frames 4500... [2023-12-27 05:43:42,481][00255] Num frames 4600... [2023-12-27 05:43:42,619][00255] Num frames 4700... [2023-12-27 05:43:42,746][00255] Num frames 4800... [2023-12-27 05:43:42,882][00255] Num frames 4900... [2023-12-27 05:43:43,011][00255] Num frames 5000... [2023-12-27 05:43:43,137][00255] Num frames 5100... [2023-12-27 05:43:43,261][00255] Num frames 5200... [2023-12-27 05:43:43,388][00255] Num frames 5300... [2023-12-27 05:43:43,513][00255] Num frames 5400... [2023-12-27 05:43:43,638][00255] Num frames 5500... [2023-12-27 05:43:43,765][00255] Num frames 5600... [2023-12-27 05:43:43,864][00255] Avg episode rewards: #0: 20.055, true rewards: #0: 9.388 [2023-12-27 05:43:43,866][00255] Avg episode reward: 20.055, avg true_objective: 9.388 [2023-12-27 05:43:43,962][00255] Num frames 5700... [2023-12-27 05:43:44,089][00255] Num frames 5800... [2023-12-27 05:43:44,212][00255] Num frames 5900... [2023-12-27 05:43:44,340][00255] Num frames 6000... [2023-12-27 05:43:44,470][00255] Num frames 6100... [2023-12-27 05:43:44,592][00255] Num frames 6200... [2023-12-27 05:43:44,712][00255] Num frames 6300... [2023-12-27 05:43:44,813][00255] Avg episode rewards: #0: 19.481, true rewards: #0: 9.053 [2023-12-27 05:43:44,815][00255] Avg episode reward: 19.481, avg true_objective: 9.053 [2023-12-27 05:43:44,894][00255] Num frames 6400... [2023-12-27 05:43:45,031][00255] Num frames 6500... [2023-12-27 05:43:45,161][00255] Num frames 6600... [2023-12-27 05:43:45,286][00255] Num frames 6700... [2023-12-27 05:43:45,411][00255] Num frames 6800... [2023-12-27 05:43:45,577][00255] Num frames 6900... [2023-12-27 05:43:45,753][00255] Num frames 7000... [2023-12-27 05:43:45,940][00255] Num frames 7100... [2023-12-27 05:43:46,119][00255] Num frames 7200... [2023-12-27 05:43:46,331][00255] Avg episode rewards: #0: 19.486, true rewards: #0: 9.111 [2023-12-27 05:43:46,334][00255] Avg episode reward: 19.486, avg true_objective: 9.111 [2023-12-27 05:43:46,363][00255] Num frames 7300... [2023-12-27 05:43:46,552][00255] Num frames 7400... [2023-12-27 05:43:46,744][00255] Num frames 7500... [2023-12-27 05:43:46,938][00255] Num frames 7600... [2023-12-27 05:43:47,146][00255] Num frames 7700... [2023-12-27 05:43:47,334][00255] Num frames 7800... [2023-12-27 05:43:47,532][00255] Num frames 7900... [2023-12-27 05:43:47,712][00255] Num frames 8000... [2023-12-27 05:43:47,896][00255] Num frames 8100... [2023-12-27 05:43:48,078][00255] Num frames 8200... [2023-12-27 05:43:48,261][00255] Num frames 8300... [2023-12-27 05:43:48,439][00255] Num frames 8400... [2023-12-27 05:43:48,627][00255] Avg episode rewards: #0: 19.981, true rewards: #0: 9.426 [2023-12-27 05:43:48,629][00255] Avg episode reward: 19.981, avg true_objective: 9.426 [2023-12-27 05:43:48,657][00255] Num frames 8500... [2023-12-27 05:43:48,790][00255] Num frames 8600... [2023-12-27 05:43:48,921][00255] Num frames 8700... [2023-12-27 05:43:49,045][00255] Num frames 8800... [2023-12-27 05:43:49,178][00255] Num frames 8900... [2023-12-27 05:43:49,299][00255] Num frames 9000... [2023-12-27 05:43:49,389][00255] Avg episode rewards: #0: 18.727, true rewards: #0: 9.027 [2023-12-27 05:43:49,391][00255] Avg episode reward: 18.727, avg true_objective: 9.027 [2023-12-27 05:44:47,514][00255] Replay video saved to /content/train_dir/default_experiment/replay.mp4!