[2023-10-27 20:49:08,893][00421] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-10-27 20:49:08,895][00421] Rollout worker 0 uses device cpu [2023-10-27 20:49:08,897][00421] Rollout worker 1 uses device cpu [2023-10-27 20:49:08,899][00421] Rollout worker 2 uses device cpu [2023-10-27 20:49:08,901][00421] Rollout worker 3 uses device cpu [2023-10-27 20:49:08,903][00421] Rollout worker 4 uses device cpu [2023-10-27 20:49:08,904][00421] Rollout worker 5 uses device cpu [2023-10-27 20:49:08,905][00421] Rollout worker 6 uses device cpu [2023-10-27 20:49:08,907][00421] Rollout worker 7 uses device cpu [2023-10-27 20:49:09,056][00421] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-27 20:49:09,058][00421] InferenceWorker_p0-w0: min num requests: 2 [2023-10-27 20:49:09,089][00421] Starting all processes... [2023-10-27 20:49:09,090][00421] Starting process learner_proc0 [2023-10-27 20:49:09,138][00421] Starting all processes... [2023-10-27 20:49:09,144][00421] Starting process inference_proc0-0 [2023-10-27 20:49:09,145][00421] Starting process rollout_proc0 [2023-10-27 20:49:09,146][00421] Starting process rollout_proc1 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc2 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc3 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc4 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc5 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc6 [2023-10-27 20:49:09,147][00421] Starting process rollout_proc7 [2023-10-27 20:49:26,248][05149] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-27 20:49:26,249][05149] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-10-27 20:49:26,317][05149] Num visible devices: 1 [2023-10-27 20:49:26,354][05136] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-27 20:49:26,358][05136] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-10-27 20:49:26,373][05156] Worker 7 uses CPU cores [1] [2023-10-27 20:49:26,392][05136] Num visible devices: 1 [2023-10-27 20:49:26,411][05154] Worker 4 uses CPU cores [0] [2023-10-27 20:49:26,419][05136] Starting seed is not provided [2023-10-27 20:49:26,420][05155] Worker 5 uses CPU cores [1] [2023-10-27 20:49:26,421][05136] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-27 20:49:26,421][05136] Initializing actor-critic model on device cuda:0 [2023-10-27 20:49:26,422][05136] RunningMeanStd input shape: (3, 72, 128) [2023-10-27 20:49:26,425][05136] RunningMeanStd input shape: (1,) [2023-10-27 20:49:26,432][05157] Worker 6 uses CPU cores [0] [2023-10-27 20:49:26,441][05150] Worker 0 uses CPU cores [0] [2023-10-27 20:49:26,458][05136] ConvEncoder: input_channels=3 [2023-10-27 20:49:26,502][05153] Worker 3 uses CPU cores [1] [2023-10-27 20:49:26,521][05152] Worker 2 uses CPU cores [0] [2023-10-27 20:49:26,563][05151] Worker 1 uses CPU cores [1] [2023-10-27 20:49:26,676][05136] Conv encoder output size: 512 [2023-10-27 20:49:26,677][05136] Policy head output size: 512 [2023-10-27 20:49:26,726][05136] Created Actor Critic model with architecture: [2023-10-27 20:49:26,727][05136] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-10-27 20:49:27,161][05136] Using optimizer [2023-10-27 20:49:27,487][05136] No checkpoints found [2023-10-27 20:49:27,487][05136] Did not load from checkpoint, starting from scratch! [2023-10-27 20:49:27,488][05136] Initialized policy 0 weights for model version 0 [2023-10-27 20:49:27,491][05136] LearnerWorker_p0 finished initialization! [2023-10-27 20:49:27,491][05136] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-10-27 20:49:27,670][05149] RunningMeanStd input shape: (3, 72, 128) [2023-10-27 20:49:27,671][05149] RunningMeanStd input shape: (1,) [2023-10-27 20:49:27,683][05149] ConvEncoder: input_channels=3 [2023-10-27 20:49:27,779][05149] Conv encoder output size: 512 [2023-10-27 20:49:27,779][05149] Policy head output size: 512 [2023-10-27 20:49:27,839][00421] Inference worker 0-0 is ready! [2023-10-27 20:49:27,841][00421] All inference workers are ready! Signal rollout workers to start! [2023-10-27 20:49:28,046][05156] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,047][05151] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,043][05155] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,045][05153] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,067][05150] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,061][05154] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,070][05152] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:28,075][05157] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 20:49:29,029][00421] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-10-27 20:49:29,048][00421] Heartbeat connected on Batcher_0 [2023-10-27 20:49:29,052][00421] Heartbeat connected on LearnerWorker_p0 [2023-10-27 20:49:29,059][05157] Decorrelating experience for 0 frames... [2023-10-27 20:49:29,060][05152] Decorrelating experience for 0 frames... [2023-10-27 20:49:29,089][00421] Heartbeat connected on InferenceWorker_p0-w0 [2023-10-27 20:49:29,423][05151] Decorrelating experience for 0 frames... [2023-10-27 20:49:29,427][05156] Decorrelating experience for 0 frames... [2023-10-27 20:49:29,430][05155] Decorrelating experience for 0 frames... [2023-10-27 20:49:29,435][05153] Decorrelating experience for 0 frames... [2023-10-27 20:49:30,623][05150] Decorrelating experience for 0 frames... [2023-10-27 20:49:30,748][05157] Decorrelating experience for 32 frames... [2023-10-27 20:49:30,750][05152] Decorrelating experience for 32 frames... [2023-10-27 20:49:31,284][05151] Decorrelating experience for 32 frames... [2023-10-27 20:49:31,288][05156] Decorrelating experience for 32 frames... [2023-10-27 20:49:31,293][05153] Decorrelating experience for 32 frames... [2023-10-27 20:49:33,005][05155] Decorrelating experience for 32 frames... [2023-10-27 20:49:33,508][05154] Decorrelating experience for 0 frames... [2023-10-27 20:49:33,532][05150] Decorrelating experience for 32 frames... [2023-10-27 20:49:34,029][00421] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-10-27 20:49:34,113][05157] Decorrelating experience for 64 frames... [2023-10-27 20:49:34,386][05156] Decorrelating experience for 64 frames... [2023-10-27 20:49:35,817][05153] Decorrelating experience for 64 frames... [2023-10-27 20:49:35,994][05154] Decorrelating experience for 32 frames... [2023-10-27 20:49:36,065][05151] Decorrelating experience for 64 frames... [2023-10-27 20:49:36,198][05155] Decorrelating experience for 64 frames... [2023-10-27 20:49:36,532][05152] Decorrelating experience for 64 frames... [2023-10-27 20:49:36,695][05150] Decorrelating experience for 64 frames... [2023-10-27 20:49:37,474][05157] Decorrelating experience for 96 frames... [2023-10-27 20:49:37,822][00421] Heartbeat connected on RolloutWorker_w6 [2023-10-27 20:49:38,155][05156] Decorrelating experience for 96 frames... [2023-10-27 20:49:38,324][05150] Decorrelating experience for 96 frames... [2023-10-27 20:49:38,567][00421] Heartbeat connected on RolloutWorker_w7 [2023-10-27 20:49:38,601][05155] Decorrelating experience for 96 frames... [2023-10-27 20:49:38,645][00421] Heartbeat connected on RolloutWorker_w0 [2023-10-27 20:49:38,856][00421] Heartbeat connected on RolloutWorker_w5 [2023-10-27 20:49:38,995][05151] Decorrelating experience for 96 frames... [2023-10-27 20:49:39,029][00421] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-10-27 20:49:39,160][00421] Heartbeat connected on RolloutWorker_w1 [2023-10-27 20:49:39,312][05153] Decorrelating experience for 96 frames... [2023-10-27 20:49:39,485][05154] Decorrelating experience for 64 frames... [2023-10-27 20:49:39,495][00421] Heartbeat connected on RolloutWorker_w3 [2023-10-27 20:49:40,797][05152] Decorrelating experience for 96 frames... [2023-10-27 20:49:41,418][00421] Heartbeat connected on RolloutWorker_w2 [2023-10-27 20:49:41,655][05154] Decorrelating experience for 96 frames... [2023-10-27 20:49:42,201][00421] Heartbeat connected on RolloutWorker_w4 [2023-10-27 20:49:43,573][05136] Signal inference workers to stop experience collection... [2023-10-27 20:49:43,581][05149] InferenceWorker_p0-w0: stopping experience collection [2023-10-27 20:49:44,029][00421] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 151.2. Samples: 2268. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-10-27 20:49:44,030][00421] Avg episode reward: [(0, '2.569')] [2023-10-27 20:49:44,268][05136] Signal inference workers to resume experience collection... [2023-10-27 20:49:44,269][05149] InferenceWorker_p0-w0: resuming experience collection [2023-10-27 20:49:49,029][00421] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 322.9. Samples: 6458. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-10-27 20:49:49,031][00421] Avg episode reward: [(0, '3.736')] [2023-10-27 20:49:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 327.2. Samples: 8180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-10-27 20:49:54,031][00421] Avg episode reward: [(0, '3.880')] [2023-10-27 20:49:56,331][05149] Updated weights for policy 0, policy_version 10 (0.0221) [2023-10-27 20:49:59,029][00421] Fps is (10 sec: 2457.6, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 401.3. Samples: 12038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:49:59,034][00421] Avg episode reward: [(0, '4.308')] [2023-10-27 20:50:04,029][00421] Fps is (10 sec: 3276.8, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 503.3. Samples: 17616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:50:04,033][00421] Avg episode reward: [(0, '4.485')] [2023-10-27 20:50:07,816][05149] Updated weights for policy 0, policy_version 20 (0.0041) [2023-10-27 20:50:09,029][00421] Fps is (10 sec: 4096.1, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 502.6. Samples: 20106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:09,036][00421] Avg episode reward: [(0, '4.388')] [2023-10-27 20:50:14,029][00421] Fps is (10 sec: 3276.8, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 553.3. Samples: 24898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:50:14,034][00421] Avg episode reward: [(0, '4.266')] [2023-10-27 20:50:19,029][00421] Fps is (10 sec: 2867.2, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 114688. Throughput: 0: 648.2. Samples: 29168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:19,034][00421] Avg episode reward: [(0, '4.291')] [2023-10-27 20:50:19,036][05136] Saving new best policy, reward=4.291! [2023-10-27 20:50:21,156][05149] Updated weights for policy 0, policy_version 30 (0.0024) [2023-10-27 20:50:24,029][00421] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 713.5. Samples: 32106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:50:24,030][00421] Avg episode reward: [(0, '4.421')] [2023-10-27 20:50:24,042][05136] Saving new best policy, reward=4.421! [2023-10-27 20:50:29,034][00421] Fps is (10 sec: 3684.5, 60 sec: 2525.6, 300 sec: 2525.6). Total num frames: 151552. Throughput: 0: 803.6. Samples: 38434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:50:29,036][00421] Avg episode reward: [(0, '4.483')] [2023-10-27 20:50:29,041][05136] Saving new best policy, reward=4.483! [2023-10-27 20:50:33,989][05149] Updated weights for policy 0, policy_version 40 (0.0019) [2023-10-27 20:50:34,029][00421] Fps is (10 sec: 2867.0, 60 sec: 2730.6, 300 sec: 2520.6). Total num frames: 163840. Throughput: 0: 778.1. Samples: 41474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:34,038][00421] Avg episode reward: [(0, '4.361')] [2023-10-27 20:50:39,029][00421] Fps is (10 sec: 2458.8, 60 sec: 2935.5, 300 sec: 2516.1). Total num frames: 176128. Throughput: 0: 784.8. Samples: 43498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:39,035][00421] Avg episode reward: [(0, '4.615')] [2023-10-27 20:50:39,038][05136] Saving new best policy, reward=4.615! [2023-10-27 20:50:44,029][00421] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 2676.1). Total num frames: 200704. Throughput: 0: 837.3. Samples: 49718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:50:44,037][00421] Avg episode reward: [(0, '4.780')] [2023-10-27 20:50:44,045][05136] Saving new best policy, reward=4.780! [2023-10-27 20:50:44,761][05149] Updated weights for policy 0, policy_version 50 (0.0036) [2023-10-27 20:50:49,031][00421] Fps is (10 sec: 4095.1, 60 sec: 3276.7, 300 sec: 2713.5). Total num frames: 217088. Throughput: 0: 847.5. Samples: 55754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:50:49,036][00421] Avg episode reward: [(0, '4.484')] [2023-10-27 20:50:54,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2650.4). Total num frames: 225280. Throughput: 0: 817.1. Samples: 56874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:54,032][00421] Avg episode reward: [(0, '4.428')] [2023-10-27 20:50:59,029][00421] Fps is (10 sec: 2458.1, 60 sec: 3276.8, 300 sec: 2685.1). Total num frames: 241664. Throughput: 0: 786.8. Samples: 60306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:50:59,035][00421] Avg episode reward: [(0, '4.393')] [2023-10-27 20:50:59,361][05149] Updated weights for policy 0, policy_version 60 (0.0036) [2023-10-27 20:51:04,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2759.4). Total num frames: 262144. Throughput: 0: 832.0. Samples: 66610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:51:04,035][00421] Avg episode reward: [(0, '4.733')] [2023-10-27 20:51:04,044][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth... [2023-10-27 20:51:09,029][00421] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 2826.2). Total num frames: 282624. Throughput: 0: 830.4. Samples: 69474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:51:09,031][00421] Avg episode reward: [(0, '4.841')] [2023-10-27 20:51:09,033][05136] Saving new best policy, reward=4.841! [2023-10-27 20:51:11,448][05149] Updated weights for policy 0, policy_version 70 (0.0022) [2023-10-27 20:51:14,037][00421] Fps is (10 sec: 2864.9, 60 sec: 3208.1, 300 sec: 2769.5). Total num frames: 290816. Throughput: 0: 770.0. Samples: 73086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:51:14,039][00421] Avg episode reward: [(0, '4.709')] [2023-10-27 20:51:19,031][00421] Fps is (10 sec: 2047.6, 60 sec: 3140.2, 300 sec: 2755.4). Total num frames: 303104. Throughput: 0: 786.2. Samples: 76856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:51:19,032][00421] Avg episode reward: [(0, '4.568')] [2023-10-27 20:51:24,029][00421] Fps is (10 sec: 2459.6, 60 sec: 3003.7, 300 sec: 2742.5). Total num frames: 315392. Throughput: 0: 781.7. Samples: 78674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:51:24,033][00421] Avg episode reward: [(0, '4.552')] [2023-10-27 20:51:26,141][05149] Updated weights for policy 0, policy_version 80 (0.0047) [2023-10-27 20:51:29,035][00421] Fps is (10 sec: 2866.1, 60 sec: 3003.7, 300 sec: 2764.7). Total num frames: 331776. Throughput: 0: 759.5. Samples: 83902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:51:29,038][00421] Avg episode reward: [(0, '4.481')] [2023-10-27 20:51:34,036][00421] Fps is (10 sec: 2865.0, 60 sec: 3003.4, 300 sec: 2752.3). Total num frames: 344064. Throughput: 0: 699.2. Samples: 87222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:51:34,043][00421] Avg episode reward: [(0, '4.498')] [2023-10-27 20:51:39,029][00421] Fps is (10 sec: 2459.1, 60 sec: 3003.7, 300 sec: 2741.2). Total num frames: 356352. Throughput: 0: 704.6. Samples: 88582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:51:39,030][00421] Avg episode reward: [(0, '4.404')] [2023-10-27 20:51:41,049][05149] Updated weights for policy 0, policy_version 90 (0.0044) [2023-10-27 20:51:44,029][00421] Fps is (10 sec: 3279.2, 60 sec: 2935.5, 300 sec: 2791.3). Total num frames: 376832. Throughput: 0: 762.7. Samples: 94628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:51:44,031][00421] Avg episode reward: [(0, '4.607')] [2023-10-27 20:51:49,032][00421] Fps is (10 sec: 4094.7, 60 sec: 3003.7, 300 sec: 2837.9). Total num frames: 397312. Throughput: 0: 738.8. Samples: 99858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:51:49,035][00421] Avg episode reward: [(0, '4.704')] [2023-10-27 20:51:53,738][05149] Updated weights for policy 0, policy_version 100 (0.0029) [2023-10-27 20:51:54,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 2824.8). Total num frames: 409600. Throughput: 0: 718.8. Samples: 101818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:51:54,032][00421] Avg episode reward: [(0, '4.825')] [2023-10-27 20:51:59,029][00421] Fps is (10 sec: 2458.4, 60 sec: 3003.7, 300 sec: 2812.6). Total num frames: 421888. Throughput: 0: 724.1. Samples: 105664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:51:59,034][00421] Avg episode reward: [(0, '4.743')] [2023-10-27 20:52:04,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2854.0). Total num frames: 442368. Throughput: 0: 770.2. Samples: 111514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:52:04,035][00421] Avg episode reward: [(0, '4.740')] [2023-10-27 20:52:05,661][05149] Updated weights for policy 0, policy_version 110 (0.0039) [2023-10-27 20:52:09,029][00421] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2867.2). Total num frames: 458752. Throughput: 0: 790.0. Samples: 114226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:52:09,034][00421] Avg episode reward: [(0, '4.595')] [2023-10-27 20:52:14,034][00421] Fps is (10 sec: 2865.7, 60 sec: 3003.9, 300 sec: 2854.7). Total num frames: 471040. Throughput: 0: 760.2. Samples: 118112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:52:14,036][00421] Avg episode reward: [(0, '4.488')] [2023-10-27 20:52:19,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3003.8, 300 sec: 2843.1). Total num frames: 483328. Throughput: 0: 775.0. Samples: 122090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:52:19,031][00421] Avg episode reward: [(0, '4.535')] [2023-10-27 20:52:20,090][05149] Updated weights for policy 0, policy_version 120 (0.0038) [2023-10-27 20:52:24,029][00421] Fps is (10 sec: 3278.5, 60 sec: 3140.3, 300 sec: 2878.9). Total num frames: 503808. Throughput: 0: 815.4. Samples: 125276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:52:24,034][00421] Avg episode reward: [(0, '4.469')] [2023-10-27 20:52:29,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3208.9, 300 sec: 2912.7). Total num frames: 524288. Throughput: 0: 819.5. Samples: 131504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:52:29,031][00421] Avg episode reward: [(0, '4.638')] [2023-10-27 20:52:31,553][05149] Updated weights for policy 0, policy_version 130 (0.0024) [2023-10-27 20:52:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.9, 300 sec: 2900.4). Total num frames: 536576. Throughput: 0: 791.7. Samples: 135484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:52:34,031][00421] Avg episode reward: [(0, '4.743')] [2023-10-27 20:52:39,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2888.8). Total num frames: 548864. Throughput: 0: 780.4. Samples: 136936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:52:39,034][00421] Avg episode reward: [(0, '4.716')] [2023-10-27 20:52:43,813][05149] Updated weights for policy 0, policy_version 140 (0.0017) [2023-10-27 20:52:44,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2940.7). Total num frames: 573440. Throughput: 0: 822.7. Samples: 142686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-10-27 20:52:44,035][00421] Avg episode reward: [(0, '4.529')] [2023-10-27 20:52:49,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3277.0, 300 sec: 2969.6). Total num frames: 593920. Throughput: 0: 832.5. Samples: 148976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:52:49,035][00421] Avg episode reward: [(0, '4.399')] [2023-10-27 20:52:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2957.1). Total num frames: 606208. Throughput: 0: 820.4. Samples: 151142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:52:54,031][00421] Avg episode reward: [(0, '4.339')] [2023-10-27 20:52:55,866][05149] Updated weights for policy 0, policy_version 150 (0.0022) [2023-10-27 20:52:59,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2984.2). Total num frames: 626688. Throughput: 0: 836.9. Samples: 155770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:52:59,031][00421] Avg episode reward: [(0, '4.330')] [2023-10-27 20:53:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3010.1). Total num frames: 647168. Throughput: 0: 900.5. Samples: 162614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:53:04,033][00421] Avg episode reward: [(0, '4.685')] [2023-10-27 20:53:04,045][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth... [2023-10-27 20:53:05,507][05149] Updated weights for policy 0, policy_version 160 (0.0026) [2023-10-27 20:53:09,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3034.8). Total num frames: 667648. Throughput: 0: 903.2. Samples: 165918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:53:09,032][00421] Avg episode reward: [(0, '4.684')] [2023-10-27 20:53:14,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3021.9). Total num frames: 679936. Throughput: 0: 865.4. Samples: 170448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:53:14,030][00421] Avg episode reward: [(0, '4.545')] [2023-10-27 20:53:18,204][05149] Updated weights for policy 0, policy_version 170 (0.0051) [2023-10-27 20:53:19,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3027.5). Total num frames: 696320. Throughput: 0: 887.8. Samples: 175436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:53:19,036][00421] Avg episode reward: [(0, '4.542')] [2023-10-27 20:53:24,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3067.6). Total num frames: 720896. Throughput: 0: 930.4. Samples: 178804. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-10-27 20:53:24,033][00421] Avg episode reward: [(0, '4.510')] [2023-10-27 20:53:27,164][05149] Updated weights for policy 0, policy_version 180 (0.0024) [2023-10-27 20:53:29,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3089.1). Total num frames: 741376. Throughput: 0: 952.4. Samples: 185546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:53:29,031][00421] Avg episode reward: [(0, '4.581')] [2023-10-27 20:53:34,033][00421] Fps is (10 sec: 3684.6, 60 sec: 3686.1, 300 sec: 3092.8). Total num frames: 757760. Throughput: 0: 909.9. Samples: 189924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:53:34,040][00421] Avg episode reward: [(0, '4.688')] [2023-10-27 20:53:39,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3080.2). Total num frames: 770048. Throughput: 0: 893.1. Samples: 191330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:53:39,035][00421] Avg episode reward: [(0, '4.725')] [2023-10-27 20:53:40,337][05149] Updated weights for policy 0, policy_version 190 (0.0021) [2023-10-27 20:53:44,029][00421] Fps is (10 sec: 3688.2, 60 sec: 3686.4, 300 sec: 3116.2). Total num frames: 794624. Throughput: 0: 941.0. Samples: 198116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:53:44,037][00421] Avg episode reward: [(0, '4.965')] [2023-10-27 20:53:44,051][05136] Saving new best policy, reward=4.965! [2023-10-27 20:53:49,032][00421] Fps is (10 sec: 4504.1, 60 sec: 3686.2, 300 sec: 3135.0). Total num frames: 815104. Throughput: 0: 927.1. Samples: 204336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:53:49,039][00421] Avg episode reward: [(0, '4.957')] [2023-10-27 20:53:50,194][05149] Updated weights for policy 0, policy_version 200 (0.0013) [2023-10-27 20:53:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3122.2). Total num frames: 827392. Throughput: 0: 901.5. Samples: 206484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:53:54,030][00421] Avg episode reward: [(0, '4.890')] [2023-10-27 20:53:59,029][00421] Fps is (10 sec: 2868.1, 60 sec: 3618.1, 300 sec: 3125.1). Total num frames: 843776. Throughput: 0: 906.9. Samples: 211260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:53:59,031][00421] Avg episode reward: [(0, '4.679')] [2023-10-27 20:54:01,856][05149] Updated weights for policy 0, policy_version 210 (0.0028) [2023-10-27 20:54:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3157.6). Total num frames: 868352. Throughput: 0: 943.3. Samples: 217886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:54:04,031][00421] Avg episode reward: [(0, '5.000')] [2023-10-27 20:54:04,039][05136] Saving new best policy, reward=5.000! [2023-10-27 20:54:09,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3174.4). Total num frames: 888832. Throughput: 0: 938.0. Samples: 221014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:54:09,035][00421] Avg episode reward: [(0, '5.116')] [2023-10-27 20:54:09,039][05136] Saving new best policy, reward=5.116! [2023-10-27 20:54:13,371][05149] Updated weights for policy 0, policy_version 220 (0.0012) [2023-10-27 20:54:14,035][00421] Fps is (10 sec: 3274.8, 60 sec: 3686.0, 300 sec: 3161.8). Total num frames: 901120. Throughput: 0: 887.8. Samples: 225502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:54:14,036][00421] Avg episode reward: [(0, '5.227')] [2023-10-27 20:54:14,057][05136] Saving new best policy, reward=5.227! [2023-10-27 20:54:19,029][00421] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3163.8). Total num frames: 917504. Throughput: 0: 904.4. Samples: 230620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:54:19,033][00421] Avg episode reward: [(0, '5.207')] [2023-10-27 20:54:23,726][05149] Updated weights for policy 0, policy_version 230 (0.0030) [2023-10-27 20:54:24,029][00421] Fps is (10 sec: 4098.5, 60 sec: 3686.4, 300 sec: 3193.5). Total num frames: 942080. Throughput: 0: 949.9. Samples: 234074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:54:24,034][00421] Avg episode reward: [(0, '5.203')] [2023-10-27 20:54:29,029][00421] Fps is (10 sec: 4505.8, 60 sec: 3686.4, 300 sec: 3262.9). Total num frames: 962560. Throughput: 0: 942.5. Samples: 240530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:54:29,032][00421] Avg episode reward: [(0, '5.245')] [2023-10-27 20:54:29,034][05136] Saving new best policy, reward=5.245! [2023-10-27 20:54:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3304.6). Total num frames: 974848. Throughput: 0: 894.3. Samples: 244576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:54:34,031][00421] Avg episode reward: [(0, '5.093')] [2023-10-27 20:54:36,174][05149] Updated weights for policy 0, policy_version 240 (0.0022) [2023-10-27 20:54:39,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 896.9. Samples: 246846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:54:39,030][00421] Avg episode reward: [(0, '5.077')] [2023-10-27 20:54:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3374.0). Total num frames: 1015808. Throughput: 0: 940.6. Samples: 253588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:54:44,031][00421] Avg episode reward: [(0, '5.055')] [2023-10-27 20:54:45,498][05149] Updated weights for policy 0, policy_version 250 (0.0015) [2023-10-27 20:54:49,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3686.6, 300 sec: 3401.8). Total num frames: 1036288. Throughput: 0: 927.6. Samples: 259626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:54:49,032][00421] Avg episode reward: [(0, '5.263')] [2023-10-27 20:54:49,037][05136] Saving new best policy, reward=5.263! [2023-10-27 20:54:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3401.8). Total num frames: 1048576. Throughput: 0: 905.9. Samples: 261780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:54:54,036][00421] Avg episode reward: [(0, '5.244')] [2023-10-27 20:54:58,079][05149] Updated weights for policy 0, policy_version 260 (0.0025) [2023-10-27 20:54:59,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3387.9). Total num frames: 1064960. Throughput: 0: 914.5. Samples: 266648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:54:59,035][00421] Avg episode reward: [(0, '5.462')] [2023-10-27 20:54:59,043][05136] Saving new best policy, reward=5.462! [2023-10-27 20:55:04,029][00421] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3401.8). Total num frames: 1089536. Throughput: 0: 952.8. Samples: 273494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:04,034][00421] Avg episode reward: [(0, '5.418')] [2023-10-27 20:55:04,048][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000266_1089536.pth... [2023-10-27 20:55:04,151][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000064_262144.pth [2023-10-27 20:55:07,237][05149] Updated weights for policy 0, policy_version 270 (0.0021) [2023-10-27 20:55:09,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 1110016. Throughput: 0: 949.6. Samples: 276806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:09,033][00421] Avg episode reward: [(0, '5.644')] [2023-10-27 20:55:09,042][05136] Saving new best policy, reward=5.644! [2023-10-27 20:55:14,030][00421] Fps is (10 sec: 3276.5, 60 sec: 3686.7, 300 sec: 3415.6). Total num frames: 1122304. Throughput: 0: 900.5. Samples: 281054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:55:14,036][00421] Avg episode reward: [(0, '5.790')] [2023-10-27 20:55:14,049][05136] Saving new best policy, reward=5.790! [2023-10-27 20:55:19,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3415.6). Total num frames: 1142784. Throughput: 0: 927.9. Samples: 286332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:55:19,030][00421] Avg episode reward: [(0, '5.935')] [2023-10-27 20:55:19,036][05136] Saving new best policy, reward=5.935! [2023-10-27 20:55:19,870][05149] Updated weights for policy 0, policy_version 280 (0.0046) [2023-10-27 20:55:24,029][00421] Fps is (10 sec: 4096.5, 60 sec: 3686.4, 300 sec: 3429.6). Total num frames: 1163264. Throughput: 0: 953.2. Samples: 289738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:55:24,031][00421] Avg episode reward: [(0, '6.244')] [2023-10-27 20:55:24,043][05136] Saving new best policy, reward=6.244! [2023-10-27 20:55:29,031][00421] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3457.3). Total num frames: 1183744. Throughput: 0: 946.8. Samples: 296196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:29,037][00421] Avg episode reward: [(0, '6.362')] [2023-10-27 20:55:29,039][05136] Saving new best policy, reward=6.362! [2023-10-27 20:55:29,674][05149] Updated weights for policy 0, policy_version 290 (0.0037) [2023-10-27 20:55:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3457.3). Total num frames: 1196032. Throughput: 0: 908.6. Samples: 300514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:55:34,035][00421] Avg episode reward: [(0, '6.553')] [2023-10-27 20:55:34,050][05136] Saving new best policy, reward=6.553! [2023-10-27 20:55:39,029][00421] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3443.4). Total num frames: 1216512. Throughput: 0: 908.1. Samples: 302644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:39,031][00421] Avg episode reward: [(0, '6.713')] [2023-10-27 20:55:39,036][05136] Saving new best policy, reward=6.713! [2023-10-27 20:55:41,410][05149] Updated weights for policy 0, policy_version 300 (0.0026) [2023-10-27 20:55:44,032][00421] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3457.3). Total num frames: 1236992. Throughput: 0: 950.9. Samples: 309442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:44,036][00421] Avg episode reward: [(0, '6.877')] [2023-10-27 20:55:44,045][05136] Saving new best policy, reward=6.877! [2023-10-27 20:55:49,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 1257472. Throughput: 0: 928.9. Samples: 315296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:55:49,037][00421] Avg episode reward: [(0, '6.690')] [2023-10-27 20:55:52,477][05149] Updated weights for policy 0, policy_version 310 (0.0014) [2023-10-27 20:55:54,029][00421] Fps is (10 sec: 3277.8, 60 sec: 3686.4, 300 sec: 3485.1). Total num frames: 1269760. Throughput: 0: 902.1. Samples: 317402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:55:54,033][00421] Avg episode reward: [(0, '6.935')] [2023-10-27 20:55:54,149][05136] Saving new best policy, reward=6.935! [2023-10-27 20:55:59,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3485.1). Total num frames: 1290240. Throughput: 0: 916.2. Samples: 322280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:55:59,031][00421] Avg episode reward: [(0, '7.176')] [2023-10-27 20:55:59,033][05136] Saving new best policy, reward=7.176! [2023-10-27 20:56:03,094][05149] Updated weights for policy 0, policy_version 320 (0.0022) [2023-10-27 20:56:04,032][00421] Fps is (10 sec: 4504.4, 60 sec: 3754.5, 300 sec: 3498.9). Total num frames: 1314816. Throughput: 0: 949.9. Samples: 329080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:04,038][00421] Avg episode reward: [(0, '7.356')] [2023-10-27 20:56:04,044][05136] Saving new best policy, reward=7.356! [2023-10-27 20:56:09,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3526.8). Total num frames: 1331200. Throughput: 0: 944.5. Samples: 332242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:56:09,032][00421] Avg episode reward: [(0, '7.697')] [2023-10-27 20:56:09,041][05136] Saving new best policy, reward=7.697! [2023-10-27 20:56:14,029][00421] Fps is (10 sec: 2868.0, 60 sec: 3686.5, 300 sec: 3526.8). Total num frames: 1343488. Throughput: 0: 893.9. Samples: 336418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:14,032][00421] Avg episode reward: [(0, '8.344')] [2023-10-27 20:56:14,044][05136] Saving new best policy, reward=8.344! [2023-10-27 20:56:15,740][05149] Updated weights for policy 0, policy_version 330 (0.0032) [2023-10-27 20:56:19,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1363968. Throughput: 0: 919.0. Samples: 341870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:19,031][00421] Avg episode reward: [(0, '8.489')] [2023-10-27 20:56:19,038][05136] Saving new best policy, reward=8.489! [2023-10-27 20:56:24,029][00421] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1388544. Throughput: 0: 947.2. Samples: 345268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:24,031][00421] Avg episode reward: [(0, '8.773')] [2023-10-27 20:56:24,043][05136] Saving new best policy, reward=8.773! [2023-10-27 20:56:24,939][05149] Updated weights for policy 0, policy_version 340 (0.0033) [2023-10-27 20:56:29,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3596.2). Total num frames: 1404928. Throughput: 0: 927.3. Samples: 351168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:29,031][00421] Avg episode reward: [(0, '7.933')] [2023-10-27 20:56:34,029][00421] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1417216. Throughput: 0: 892.9. Samples: 355478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:56:34,036][00421] Avg episode reward: [(0, '7.850')] [2023-10-27 20:56:37,830][05149] Updated weights for policy 0, policy_version 350 (0.0029) [2023-10-27 20:56:39,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 1437696. Throughput: 0: 899.4. Samples: 357874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:56:39,036][00421] Avg episode reward: [(0, '7.937')] [2023-10-27 20:56:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3596.2). Total num frames: 1458176. Throughput: 0: 943.8. Samples: 364752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:56:44,031][00421] Avg episode reward: [(0, '8.281')] [2023-10-27 20:56:47,057][05149] Updated weights for policy 0, policy_version 360 (0.0021) [2023-10-27 20:56:49,031][00421] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 1478656. Throughput: 0: 919.4. Samples: 370454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:49,033][00421] Avg episode reward: [(0, '8.328')] [2023-10-27 20:56:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1490944. Throughput: 0: 897.0. Samples: 372606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:56:54,041][00421] Avg episode reward: [(0, '8.096')] [2023-10-27 20:56:59,029][00421] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1511424. Throughput: 0: 916.8. Samples: 377674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:56:59,034][00421] Avg episode reward: [(0, '8.610')] [2023-10-27 20:56:59,524][05149] Updated weights for policy 0, policy_version 370 (0.0033) [2023-10-27 20:57:04,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 1536000. Throughput: 0: 948.7. Samples: 384562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:57:04,031][00421] Avg episode reward: [(0, '9.080')] [2023-10-27 20:57:04,047][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth... [2023-10-27 20:57:04,184][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000158_647168.pth [2023-10-27 20:57:04,200][05136] Saving new best policy, reward=9.080! [2023-10-27 20:57:09,029][00421] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1552384. Throughput: 0: 938.8. Samples: 387514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:09,036][00421] Avg episode reward: [(0, '9.258')] [2023-10-27 20:57:09,038][05136] Saving new best policy, reward=9.258! [2023-10-27 20:57:09,994][05149] Updated weights for policy 0, policy_version 380 (0.0030) [2023-10-27 20:57:14,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1564672. Throughput: 0: 901.1. Samples: 391718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:57:14,034][00421] Avg episode reward: [(0, '9.168')] [2023-10-27 20:57:19,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1585152. Throughput: 0: 926.6. Samples: 397174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:19,031][00421] Avg episode reward: [(0, '9.267')] [2023-10-27 20:57:19,037][05136] Saving new best policy, reward=9.267! [2023-10-27 20:57:21,425][05149] Updated weights for policy 0, policy_version 390 (0.0028) [2023-10-27 20:57:24,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1605632. Throughput: 0: 947.2. Samples: 400496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:24,033][00421] Avg episode reward: [(0, '9.608')] [2023-10-27 20:57:24,072][05136] Saving new best policy, reward=9.608! [2023-10-27 20:57:29,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1626112. Throughput: 0: 929.6. Samples: 406582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:57:29,034][00421] Avg episode reward: [(0, '9.681')] [2023-10-27 20:57:29,040][05136] Saving new best policy, reward=9.681! [2023-10-27 20:57:32,775][05149] Updated weights for policy 0, policy_version 400 (0.0018) [2023-10-27 20:57:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1638400. Throughput: 0: 897.2. Samples: 410826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:57:34,035][00421] Avg episode reward: [(0, '10.171')] [2023-10-27 20:57:34,046][05136] Saving new best policy, reward=10.171! [2023-10-27 20:57:39,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1658880. Throughput: 0: 906.4. Samples: 413396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:39,034][00421] Avg episode reward: [(0, '11.160')] [2023-10-27 20:57:39,037][05136] Saving new best policy, reward=11.160! [2023-10-27 20:57:42,914][05149] Updated weights for policy 0, policy_version 410 (0.0014) [2023-10-27 20:57:44,029][00421] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1683456. Throughput: 0: 945.4. Samples: 420216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 20:57:44,031][00421] Avg episode reward: [(0, '11.626')] [2023-10-27 20:57:44,041][05136] Saving new best policy, reward=11.626! [2023-10-27 20:57:49,033][00421] Fps is (10 sec: 4094.3, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 1699840. Throughput: 0: 917.9. Samples: 425872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:49,035][00421] Avg episode reward: [(0, '13.051')] [2023-10-27 20:57:49,042][05136] Saving new best policy, reward=13.051! [2023-10-27 20:57:54,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1712128. Throughput: 0: 899.0. Samples: 427968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:54,031][00421] Avg episode reward: [(0, '13.356')] [2023-10-27 20:57:54,042][05136] Saving new best policy, reward=13.356! [2023-10-27 20:57:55,590][05149] Updated weights for policy 0, policy_version 420 (0.0020) [2023-10-27 20:57:59,029][00421] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1732608. Throughput: 0: 922.0. Samples: 433206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:57:59,035][00421] Avg episode reward: [(0, '13.369')] [2023-10-27 20:57:59,037][05136] Saving new best policy, reward=13.369! [2023-10-27 20:58:04,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1757184. Throughput: 0: 951.2. Samples: 439976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:58:04,031][00421] Avg episode reward: [(0, '13.815')] [2023-10-27 20:58:04,038][05136] Saving new best policy, reward=13.815! [2023-10-27 20:58:04,742][05149] Updated weights for policy 0, policy_version 430 (0.0026) [2023-10-27 20:58:09,031][00421] Fps is (10 sec: 4095.1, 60 sec: 3686.3, 300 sec: 3707.2). Total num frames: 1773568. Throughput: 0: 943.2. Samples: 442940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:58:09,033][00421] Avg episode reward: [(0, '13.634')] [2023-10-27 20:58:14,029][00421] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1789952. Throughput: 0: 902.6. Samples: 447198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:58:14,033][00421] Avg episode reward: [(0, '13.902')] [2023-10-27 20:58:14,047][05136] Saving new best policy, reward=13.902! [2023-10-27 20:58:17,365][05149] Updated weights for policy 0, policy_version 440 (0.0028) [2023-10-27 20:58:19,029][00421] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1806336. Throughput: 0: 932.7. Samples: 452796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:58:19,038][00421] Avg episode reward: [(0, '14.603')] [2023-10-27 20:58:19,041][05136] Saving new best policy, reward=14.603! [2023-10-27 20:58:24,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1822720. Throughput: 0: 942.5. Samples: 455808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:58:24,030][00421] Avg episode reward: [(0, '13.787')] [2023-10-27 20:58:29,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1839104. Throughput: 0: 889.9. Samples: 460262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:58:29,035][00421] Avg episode reward: [(0, '15.259')] [2023-10-27 20:58:29,037][05136] Saving new best policy, reward=15.259! [2023-10-27 20:58:30,048][05149] Updated weights for policy 0, policy_version 450 (0.0018) [2023-10-27 20:58:34,032][00421] Fps is (10 sec: 2866.3, 60 sec: 3549.7, 300 sec: 3665.5). Total num frames: 1851392. Throughput: 0: 841.9. Samples: 463756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:58:34,037][00421] Avg episode reward: [(0, '16.042')] [2023-10-27 20:58:34,049][05136] Saving new best policy, reward=16.042! [2023-10-27 20:58:39,029][00421] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 1867776. Throughput: 0: 843.4. Samples: 465920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:58:39,032][00421] Avg episode reward: [(0, '15.460')] [2023-10-27 20:58:42,100][05149] Updated weights for policy 0, policy_version 460 (0.0027) [2023-10-27 20:58:44,029][00421] Fps is (10 sec: 4097.3, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1892352. Throughput: 0: 874.3. Samples: 472550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:58:44,030][00421] Avg episode reward: [(0, '16.072')] [2023-10-27 20:58:44,047][05136] Saving new best policy, reward=16.072! [2023-10-27 20:58:49,031][00421] Fps is (10 sec: 4095.3, 60 sec: 3481.7, 300 sec: 3665.5). Total num frames: 1908736. Throughput: 0: 847.4. Samples: 478112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:58:49,034][00421] Avg episode reward: [(0, '17.718')] [2023-10-27 20:58:49,038][05136] Saving new best policy, reward=17.718! [2023-10-27 20:58:54,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1921024. Throughput: 0: 826.1. Samples: 480114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:58:54,031][00421] Avg episode reward: [(0, '18.205')] [2023-10-27 20:58:54,049][05136] Saving new best policy, reward=18.205! [2023-10-27 20:58:54,397][05149] Updated weights for policy 0, policy_version 470 (0.0027) [2023-10-27 20:58:59,029][00421] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 1941504. Throughput: 0: 839.7. Samples: 484986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:58:59,034][00421] Avg episode reward: [(0, '17.922')] [2023-10-27 20:59:03,966][05149] Updated weights for policy 0, policy_version 480 (0.0026) [2023-10-27 20:59:04,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1966080. Throughput: 0: 870.1. Samples: 491950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 20:59:04,036][00421] Avg episode reward: [(0, '18.024')] [2023-10-27 20:59:04,052][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000480_1966080.pth... [2023-10-27 20:59:04,162][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000266_1089536.pth [2023-10-27 20:59:09,029][00421] Fps is (10 sec: 4095.9, 60 sec: 3481.7, 300 sec: 3665.6). Total num frames: 1982464. Throughput: 0: 872.2. Samples: 495056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 20:59:09,031][00421] Avg episode reward: [(0, '17.957')] [2023-10-27 20:59:14,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3651.7). Total num frames: 1994752. Throughput: 0: 867.7. Samples: 499310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 20:59:14,038][00421] Avg episode reward: [(0, '17.722')] [2023-10-27 20:59:16,709][05149] Updated weights for policy 0, policy_version 490 (0.0029) [2023-10-27 20:59:19,034][00421] Fps is (10 sec: 3275.0, 60 sec: 3481.3, 300 sec: 3637.7). Total num frames: 2015232. Throughput: 0: 912.1. Samples: 504804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 20:59:19,036][00421] Avg episode reward: [(0, '17.357')] [2023-10-27 20:59:24,029][00421] Fps is (10 sec: 4505.5, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2039808. Throughput: 0: 941.7. Samples: 508298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:59:24,032][00421] Avg episode reward: [(0, '17.824')] [2023-10-27 20:59:25,531][05149] Updated weights for policy 0, policy_version 500 (0.0016) [2023-10-27 20:59:29,029][00421] Fps is (10 sec: 4098.3, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2056192. Throughput: 0: 930.6. Samples: 514426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:59:29,031][00421] Avg episode reward: [(0, '17.260')] [2023-10-27 20:59:34,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 2072576. Throughput: 0: 903.0. Samples: 518746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:59:34,035][00421] Avg episode reward: [(0, '18.780')] [2023-10-27 20:59:34,046][05136] Saving new best policy, reward=18.780! [2023-10-27 20:59:38,092][05149] Updated weights for policy 0, policy_version 510 (0.0043) [2023-10-27 20:59:39,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2093056. Throughput: 0: 917.6. Samples: 521406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:59:39,035][00421] Avg episode reward: [(0, '19.604')] [2023-10-27 20:59:39,040][05136] Saving new best policy, reward=19.604! [2023-10-27 20:59:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2113536. Throughput: 0: 960.5. Samples: 528210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:59:44,036][00421] Avg episode reward: [(0, '20.345')] [2023-10-27 20:59:44,046][05136] Saving new best policy, reward=20.345! [2023-10-27 20:59:47,995][05149] Updated weights for policy 0, policy_version 520 (0.0020) [2023-10-27 20:59:49,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 2129920. Throughput: 0: 926.4. Samples: 533636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 20:59:49,030][00421] Avg episode reward: [(0, '20.549')] [2023-10-27 20:59:49,039][05136] Saving new best policy, reward=20.549! [2023-10-27 20:59:54,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2146304. Throughput: 0: 904.1. Samples: 535742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 20:59:54,032][00421] Avg episode reward: [(0, '20.259')] [2023-10-27 20:59:59,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2166784. Throughput: 0: 928.4. Samples: 541090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-10-27 20:59:59,037][00421] Avg episode reward: [(0, '22.231')] [2023-10-27 20:59:59,040][05136] Saving new best policy, reward=22.231! [2023-10-27 20:59:59,684][05149] Updated weights for policy 0, policy_version 530 (0.0017) [2023-10-27 21:00:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2187264. Throughput: 0: 958.3. Samples: 547924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:00:04,036][00421] Avg episode reward: [(0, '20.790')] [2023-10-27 21:00:09,029][00421] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2203648. Throughput: 0: 942.9. Samples: 550730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:00:09,033][00421] Avg episode reward: [(0, '22.328')] [2023-10-27 21:00:09,037][05136] Saving new best policy, reward=22.328! [2023-10-27 21:00:10,959][05149] Updated weights for policy 0, policy_version 540 (0.0018) [2023-10-27 21:00:14,030][00421] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 2220032. Throughput: 0: 901.2. Samples: 554980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:00:14,038][00421] Avg episode reward: [(0, '22.779')] [2023-10-27 21:00:14,049][05136] Saving new best policy, reward=22.779! [2023-10-27 21:00:19,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3755.0, 300 sec: 3651.7). Total num frames: 2240512. Throughput: 0: 934.2. Samples: 560784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:00:19,030][00421] Avg episode reward: [(0, '23.408')] [2023-10-27 21:00:19,038][05136] Saving new best policy, reward=23.408! [2023-10-27 21:00:21,498][05149] Updated weights for policy 0, policy_version 550 (0.0023) [2023-10-27 21:00:24,029][00421] Fps is (10 sec: 4096.5, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2260992. Throughput: 0: 951.4. Samples: 564218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:24,035][00421] Avg episode reward: [(0, '22.255')] [2023-10-27 21:00:29,029][00421] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2277376. Throughput: 0: 932.9. Samples: 570192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:29,036][00421] Avg episode reward: [(0, '23.276')] [2023-10-27 21:00:33,328][05149] Updated weights for policy 0, policy_version 560 (0.0017) [2023-10-27 21:00:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2293760. Throughput: 0: 909.2. Samples: 574552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:34,034][00421] Avg episode reward: [(0, '23.160')] [2023-10-27 21:00:39,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2314240. Throughput: 0: 926.9. Samples: 577452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:39,030][00421] Avg episode reward: [(0, '21.797')] [2023-10-27 21:00:42,792][05149] Updated weights for policy 0, policy_version 570 (0.0014) [2023-10-27 21:00:44,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2338816. Throughput: 0: 961.5. Samples: 584358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:00:44,030][00421] Avg episode reward: [(0, '21.831')] [2023-10-27 21:00:49,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2355200. Throughput: 0: 929.5. Samples: 589752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:49,030][00421] Avg episode reward: [(0, '22.438')] [2023-10-27 21:00:54,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2367488. Throughput: 0: 913.7. Samples: 591846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 21:00:54,031][00421] Avg episode reward: [(0, '23.534')] [2023-10-27 21:00:54,053][05136] Saving new best policy, reward=23.534! [2023-10-27 21:00:55,434][05149] Updated weights for policy 0, policy_version 580 (0.0032) [2023-10-27 21:00:59,029][00421] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 2392064. Throughput: 0: 944.9. Samples: 597498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:00:59,031][00421] Avg episode reward: [(0, '22.151')] [2023-10-27 21:01:04,029][00421] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2412544. Throughput: 0: 969.1. Samples: 604394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:04,034][00421] Avg episode reward: [(0, '23.217')] [2023-10-27 21:01:04,044][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000589_2412544.pth... [2023-10-27 21:01:04,173][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000375_1536000.pth [2023-10-27 21:01:04,279][05149] Updated weights for policy 0, policy_version 590 (0.0029) [2023-10-27 21:01:09,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2428928. Throughput: 0: 950.0. Samples: 606966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:09,031][00421] Avg episode reward: [(0, '22.214')] [2023-10-27 21:01:14,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2445312. Throughput: 0: 911.4. Samples: 611204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:14,031][00421] Avg episode reward: [(0, '21.347')] [2023-10-27 21:01:16,915][05149] Updated weights for policy 0, policy_version 600 (0.0027) [2023-10-27 21:01:19,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2465792. Throughput: 0: 948.2. Samples: 617222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:19,031][00421] Avg episode reward: [(0, '19.543')] [2023-10-27 21:01:24,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2486272. Throughput: 0: 960.4. Samples: 620672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:24,033][00421] Avg episode reward: [(0, '19.954')] [2023-10-27 21:01:26,520][05149] Updated weights for policy 0, policy_version 610 (0.0023) [2023-10-27 21:01:29,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2502656. Throughput: 0: 926.1. Samples: 626032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:01:29,031][00421] Avg episode reward: [(0, '19.500')] [2023-10-27 21:01:34,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2519040. Throughput: 0: 900.0. Samples: 630252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:01:34,031][00421] Avg episode reward: [(0, '19.226')] [2023-10-27 21:01:38,952][05149] Updated weights for policy 0, policy_version 620 (0.0021) [2023-10-27 21:01:39,029][00421] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 2539520. Throughput: 0: 919.1. Samples: 633206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:01:39,031][00421] Avg episode reward: [(0, '21.753')] [2023-10-27 21:01:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2560000. Throughput: 0: 943.9. Samples: 639974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:01:44,031][00421] Avg episode reward: [(0, '24.922')] [2023-10-27 21:01:44,039][05136] Saving new best policy, reward=24.922! [2023-10-27 21:01:49,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2576384. Throughput: 0: 900.1. Samples: 644898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 21:01:49,031][00421] Avg episode reward: [(0, '25.250')] [2023-10-27 21:01:49,033][05136] Saving new best policy, reward=25.250! [2023-10-27 21:01:50,198][05149] Updated weights for policy 0, policy_version 630 (0.0024) [2023-10-27 21:01:54,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2588672. Throughput: 0: 886.8. Samples: 646872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 21:01:54,038][00421] Avg episode reward: [(0, '24.328')] [2023-10-27 21:01:59,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2609152. Throughput: 0: 904.0. Samples: 651884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:01:59,035][00421] Avg episode reward: [(0, '24.768')] [2023-10-27 21:02:01,571][05149] Updated weights for policy 0, policy_version 640 (0.0019) [2023-10-27 21:02:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2629632. Throughput: 0: 917.0. Samples: 658486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:04,031][00421] Avg episode reward: [(0, '23.783')] [2023-10-27 21:02:09,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2646016. Throughput: 0: 895.0. Samples: 660948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:02:09,032][00421] Avg episode reward: [(0, '23.578')] [2023-10-27 21:02:14,029][00421] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3637.8). Total num frames: 2658304. Throughput: 0: 856.2. Samples: 664562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:02:14,034][00421] Avg episode reward: [(0, '22.064')] [2023-10-27 21:02:15,328][05149] Updated weights for policy 0, policy_version 650 (0.0041) [2023-10-27 21:02:19,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 2674688. Throughput: 0: 879.6. Samples: 669836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:19,036][00421] Avg episode reward: [(0, '22.019')] [2023-10-27 21:02:24,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 2695168. Throughput: 0: 879.9. Samples: 672800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:24,033][00421] Avg episode reward: [(0, '21.437')] [2023-10-27 21:02:25,680][05149] Updated weights for policy 0, policy_version 660 (0.0015) [2023-10-27 21:02:29,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3623.9). Total num frames: 2707456. Throughput: 0: 834.9. Samples: 677544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:02:29,031][00421] Avg episode reward: [(0, '20.649')] [2023-10-27 21:02:34,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 2719744. Throughput: 0: 809.0. Samples: 681304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:34,031][00421] Avg episode reward: [(0, '20.483')] [2023-10-27 21:02:39,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3582.3). Total num frames: 2740224. Throughput: 0: 825.9. Samples: 684036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:39,031][00421] Avg episode reward: [(0, '20.417')] [2023-10-27 21:02:39,543][05149] Updated weights for policy 0, policy_version 670 (0.0045) [2023-10-27 21:02:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3596.2). Total num frames: 2760704. Throughput: 0: 844.8. Samples: 689898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:02:44,031][00421] Avg episode reward: [(0, '21.530')] [2023-10-27 21:02:49,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3596.1). Total num frames: 2772992. Throughput: 0: 794.0. Samples: 694214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:49,032][00421] Avg episode reward: [(0, '21.091')] [2023-10-27 21:02:53,117][05149] Updated weights for policy 0, policy_version 680 (0.0035) [2023-10-27 21:02:54,032][00421] Fps is (10 sec: 2456.9, 60 sec: 3276.6, 300 sec: 3568.3). Total num frames: 2785280. Throughput: 0: 779.6. Samples: 696030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:54,037][00421] Avg episode reward: [(0, '23.238')] [2023-10-27 21:02:59,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3554.5). Total num frames: 2805760. Throughput: 0: 811.6. Samples: 701082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:02:59,035][00421] Avg episode reward: [(0, '24.661')] [2023-10-27 21:03:03,856][05149] Updated weights for policy 0, policy_version 690 (0.0013) [2023-10-27 21:03:04,029][00421] Fps is (10 sec: 4097.2, 60 sec: 3276.8, 300 sec: 3568.4). Total num frames: 2826240. Throughput: 0: 828.5. Samples: 707118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:03:04,037][00421] Avg episode reward: [(0, '26.280')] [2023-10-27 21:03:04,054][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000690_2826240.pth... [2023-10-27 21:03:04,175][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000480_1966080.pth [2023-10-27 21:03:04,183][05136] Saving new best policy, reward=26.280! [2023-10-27 21:03:09,032][00421] Fps is (10 sec: 3275.7, 60 sec: 3208.4, 300 sec: 3554.5). Total num frames: 2838528. Throughput: 0: 807.2. Samples: 709126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:03:09,036][00421] Avg episode reward: [(0, '27.201')] [2023-10-27 21:03:09,043][05136] Saving new best policy, reward=27.201! [2023-10-27 21:03:14,029][00421] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 3540.6). Total num frames: 2850816. Throughput: 0: 783.3. Samples: 712792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:03:14,038][00421] Avg episode reward: [(0, '27.317')] [2023-10-27 21:03:14,054][05136] Saving new best policy, reward=27.317! [2023-10-27 21:03:18,258][05149] Updated weights for policy 0, policy_version 700 (0.0052) [2023-10-27 21:03:19,029][00421] Fps is (10 sec: 2868.1, 60 sec: 3208.5, 300 sec: 3540.6). Total num frames: 2867200. Throughput: 0: 817.5. Samples: 718090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:03:19,033][00421] Avg episode reward: [(0, '27.274')] [2023-10-27 21:03:24,029][00421] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 3554.5). Total num frames: 2887680. Throughput: 0: 822.7. Samples: 721058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:03:24,031][00421] Avg episode reward: [(0, '27.561')] [2023-10-27 21:03:24,039][05136] Saving new best policy, reward=27.561! [2023-10-27 21:03:29,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3554.5). Total num frames: 2899968. Throughput: 0: 794.2. Samples: 725636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:03:29,032][00421] Avg episode reward: [(0, '26.177')] [2023-10-27 21:03:31,103][05149] Updated weights for policy 0, policy_version 710 (0.0016) [2023-10-27 21:03:34,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3540.6). Total num frames: 2912256. Throughput: 0: 782.3. Samples: 729416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:03:34,031][00421] Avg episode reward: [(0, '26.127')] [2023-10-27 21:03:39,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3526.7). Total num frames: 2932736. Throughput: 0: 807.1. Samples: 732346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:03:39,031][00421] Avg episode reward: [(0, '26.282')] [2023-10-27 21:03:42,370][05149] Updated weights for policy 0, policy_version 720 (0.0024) [2023-10-27 21:03:44,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3540.6). Total num frames: 2953216. Throughput: 0: 828.0. Samples: 738340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:03:44,031][00421] Avg episode reward: [(0, '26.912')] [2023-10-27 21:03:49,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3540.6). Total num frames: 2965504. Throughput: 0: 785.1. Samples: 742448. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:03:49,031][00421] Avg episode reward: [(0, '26.282')] [2023-10-27 21:03:54,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3208.7, 300 sec: 3512.8). Total num frames: 2977792. Throughput: 0: 779.9. Samples: 744218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:03:54,034][00421] Avg episode reward: [(0, '26.898')] [2023-10-27 21:03:56,828][05149] Updated weights for policy 0, policy_version 730 (0.0019) [2023-10-27 21:03:59,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3499.0). Total num frames: 2998272. Throughput: 0: 808.6. Samples: 749178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:03:59,031][00421] Avg episode reward: [(0, '25.512')] [2023-10-27 21:04:04,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3499.0). Total num frames: 3014656. Throughput: 0: 822.6. Samples: 755106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:04:04,039][00421] Avg episode reward: [(0, '26.042')] [2023-10-27 21:04:08,784][05149] Updated weights for policy 0, policy_version 740 (0.0017) [2023-10-27 21:04:09,034][00421] Fps is (10 sec: 3275.1, 60 sec: 3208.4, 300 sec: 3512.8). Total num frames: 3031040. Throughput: 0: 801.5. Samples: 757130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:04:09,037][00421] Avg episode reward: [(0, '25.297')] [2023-10-27 21:04:14,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3485.1). Total num frames: 3043328. Throughput: 0: 781.6. Samples: 760806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:04:14,031][00421] Avg episode reward: [(0, '23.899')] [2023-10-27 21:04:19,029][00421] Fps is (10 sec: 2868.7, 60 sec: 3208.5, 300 sec: 3457.3). Total num frames: 3059712. Throughput: 0: 817.2. Samples: 766188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:04:19,032][00421] Avg episode reward: [(0, '22.232')] [2023-10-27 21:04:21,260][05149] Updated weights for policy 0, policy_version 750 (0.0019) [2023-10-27 21:04:24,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3471.2). Total num frames: 3080192. Throughput: 0: 818.8. Samples: 769194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:04:24,036][00421] Avg episode reward: [(0, '21.326')] [2023-10-27 21:04:29,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3457.3). Total num frames: 3092480. Throughput: 0: 788.3. Samples: 773814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:04:29,042][00421] Avg episode reward: [(0, '21.803')] [2023-10-27 21:04:34,030][00421] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3443.4). Total num frames: 3108864. Throughput: 0: 781.1. Samples: 777600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:04:34,034][00421] Avg episode reward: [(0, '21.422')] [2023-10-27 21:04:35,339][05149] Updated weights for policy 0, policy_version 760 (0.0026) [2023-10-27 21:04:39,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3429.5). Total num frames: 3125248. Throughput: 0: 805.0. Samples: 780442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:04:39,033][00421] Avg episode reward: [(0, '20.842')] [2023-10-27 21:04:44,033][00421] Fps is (10 sec: 3685.3, 60 sec: 3208.3, 300 sec: 3443.4). Total num frames: 3145728. Throughput: 0: 826.0. Samples: 786352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-10-27 21:04:44,040][00421] Avg episode reward: [(0, '21.447')] [2023-10-27 21:04:46,599][05149] Updated weights for policy 0, policy_version 770 (0.0018) [2023-10-27 21:04:49,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3429.5). Total num frames: 3158016. Throughput: 0: 787.4. Samples: 790540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:04:49,033][00421] Avg episode reward: [(0, '22.551')] [2023-10-27 21:04:54,029][00421] Fps is (10 sec: 2458.6, 60 sec: 3208.5, 300 sec: 3401.8). Total num frames: 3170304. Throughput: 0: 783.3. Samples: 792374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:04:54,035][00421] Avg episode reward: [(0, '22.946')] [2023-10-27 21:04:59,029][00421] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3401.8). Total num frames: 3190784. Throughput: 0: 818.7. Samples: 797646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:04:59,031][00421] Avg episode reward: [(0, '23.384')] [2023-10-27 21:04:59,412][05149] Updated weights for policy 0, policy_version 780 (0.0023) [2023-10-27 21:05:04,034][00421] Fps is (10 sec: 4093.8, 60 sec: 3276.5, 300 sec: 3415.6). Total num frames: 3211264. Throughput: 0: 834.0. Samples: 803724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:05:04,037][00421] Avg episode reward: [(0, '24.307')] [2023-10-27 21:05:04,053][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth... [2023-10-27 21:05:04,193][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000589_2412544.pth [2023-10-27 21:05:09,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3208.8, 300 sec: 3401.8). Total num frames: 3223552. Throughput: 0: 809.0. Samples: 805598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:05:09,033][00421] Avg episode reward: [(0, '24.830')] [2023-10-27 21:05:13,192][05149] Updated weights for policy 0, policy_version 790 (0.0027) [2023-10-27 21:05:14,029][00421] Fps is (10 sec: 2458.9, 60 sec: 3208.5, 300 sec: 3374.0). Total num frames: 3235840. Throughput: 0: 789.1. Samples: 809324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:05:14,034][00421] Avg episode reward: [(0, '24.324')] [2023-10-27 21:05:19,029][00421] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 3256320. Throughput: 0: 832.7. Samples: 815072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:05:19,031][00421] Avg episode reward: [(0, '23.297')] [2023-10-27 21:05:23,783][05149] Updated weights for policy 0, policy_version 800 (0.0032) [2023-10-27 21:05:24,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 3276800. Throughput: 0: 833.7. Samples: 817958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:05:24,031][00421] Avg episode reward: [(0, '23.043')] [2023-10-27 21:05:29,033][00421] Fps is (10 sec: 3275.5, 60 sec: 3276.6, 300 sec: 3373.9). Total num frames: 3289088. Throughput: 0: 796.0. Samples: 822170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:05:29,036][00421] Avg episode reward: [(0, '22.835')] [2023-10-27 21:05:34,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3208.6, 300 sec: 3346.2). Total num frames: 3301376. Throughput: 0: 783.4. Samples: 825792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:05:34,036][00421] Avg episode reward: [(0, '22.698')] [2023-10-27 21:05:38,384][05149] Updated weights for policy 0, policy_version 810 (0.0040) [2023-10-27 21:05:39,029][00421] Fps is (10 sec: 2868.4, 60 sec: 3208.5, 300 sec: 3318.5). Total num frames: 3317760. Throughput: 0: 804.9. Samples: 828594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:05:39,030][00421] Avg episode reward: [(0, '23.658')] [2023-10-27 21:05:44,029][00421] Fps is (10 sec: 3686.3, 60 sec: 3208.7, 300 sec: 3332.3). Total num frames: 3338240. Throughput: 0: 812.4. Samples: 834202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:05:44,036][00421] Avg episode reward: [(0, '23.873')] [2023-10-27 21:05:49,034][00421] Fps is (10 sec: 2865.7, 60 sec: 3140.0, 300 sec: 3318.4). Total num frames: 3346432. Throughput: 0: 760.8. Samples: 837962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 21:05:49,037][00421] Avg episode reward: [(0, '24.067')] [2023-10-27 21:05:52,625][05149] Updated weights for policy 0, policy_version 820 (0.0019) [2023-10-27 21:05:54,029][00421] Fps is (10 sec: 2048.1, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 3358720. Throughput: 0: 757.5. Samples: 839686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:05:54,031][00421] Avg episode reward: [(0, '24.390')] [2023-10-27 21:05:59,029][00421] Fps is (10 sec: 3278.5, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 3379200. Throughput: 0: 791.5. Samples: 844940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:05:59,033][00421] Avg episode reward: [(0, '24.209')] [2023-10-27 21:06:03,592][05149] Updated weights for policy 0, policy_version 830 (0.0025) [2023-10-27 21:06:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3140.5, 300 sec: 3290.7). Total num frames: 3399680. Throughput: 0: 794.0. Samples: 850802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:06:04,036][00421] Avg episode reward: [(0, '24.470')] [2023-10-27 21:06:09,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3276.8). Total num frames: 3411968. Throughput: 0: 770.8. Samples: 852644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-10-27 21:06:09,033][00421] Avg episode reward: [(0, '24.939')] [2023-10-27 21:06:14,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3424256. Throughput: 0: 757.3. Samples: 856246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:06:14,031][00421] Avg episode reward: [(0, '23.573')] [2023-10-27 21:06:17,742][05149] Updated weights for policy 0, policy_version 840 (0.0015) [2023-10-27 21:06:19,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 3444736. Throughput: 0: 803.9. Samples: 861966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:06:19,031][00421] Avg episode reward: [(0, '24.403')] [2023-10-27 21:06:24,030][00421] Fps is (10 sec: 3686.1, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 3461120. Throughput: 0: 807.0. Samples: 864910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:06:24,037][00421] Avg episode reward: [(0, '23.951')] [2023-10-27 21:06:29,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3072.2, 300 sec: 3235.1). Total num frames: 3473408. Throughput: 0: 775.9. Samples: 869118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:06:29,032][00421] Avg episode reward: [(0, '24.711')] [2023-10-27 21:06:30,934][05149] Updated weights for policy 0, policy_version 850 (0.0013) [2023-10-27 21:06:34,029][00421] Fps is (10 sec: 2867.5, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3489792. Throughput: 0: 779.3. Samples: 873026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:06:34,031][00421] Avg episode reward: [(0, '25.112')] [2023-10-27 21:06:39,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3510272. Throughput: 0: 808.7. Samples: 876076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:06:39,031][00421] Avg episode reward: [(0, '25.133')] [2023-10-27 21:06:41,761][05149] Updated weights for policy 0, policy_version 860 (0.0020) [2023-10-27 21:06:44,033][00421] Fps is (10 sec: 4094.3, 60 sec: 3208.3, 300 sec: 3235.1). Total num frames: 3530752. Throughput: 0: 826.9. Samples: 882152. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-10-27 21:06:44,039][00421] Avg episode reward: [(0, '25.401')] [2023-10-27 21:06:49,029][00421] Fps is (10 sec: 3276.7, 60 sec: 3277.1, 300 sec: 3235.1). Total num frames: 3543040. Throughput: 0: 789.2. Samples: 886316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:06:49,031][00421] Avg episode reward: [(0, '24.441')] [2023-10-27 21:06:54,029][00421] Fps is (10 sec: 2458.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3555328. Throughput: 0: 790.4. Samples: 888210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:06:54,036][00421] Avg episode reward: [(0, '24.809')] [2023-10-27 21:06:55,544][05149] Updated weights for policy 0, policy_version 870 (0.0013) [2023-10-27 21:06:59,029][00421] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3575808. Throughput: 0: 833.2. Samples: 893738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:06:59,035][00421] Avg episode reward: [(0, '26.394')] [2023-10-27 21:07:04,029][00421] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3596288. Throughput: 0: 841.1. Samples: 899816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:04,031][00421] Avg episode reward: [(0, '26.548')] [2023-10-27 21:07:04,047][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000878_3596288.pth... [2023-10-27 21:07:04,181][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000690_2826240.pth [2023-10-27 21:07:06,916][05149] Updated weights for policy 0, policy_version 880 (0.0024) [2023-10-27 21:07:09,029][00421] Fps is (10 sec: 3276.5, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3608576. Throughput: 0: 816.8. Samples: 901664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:07:09,041][00421] Avg episode reward: [(0, '27.498')] [2023-10-27 21:07:14,029][00421] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3620864. Throughput: 0: 809.2. Samples: 905534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:14,037][00421] Avg episode reward: [(0, '27.349')] [2023-10-27 21:07:19,029][00421] Fps is (10 sec: 3277.1, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3641344. Throughput: 0: 856.7. Samples: 911578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:07:19,031][00421] Avg episode reward: [(0, '28.050')] [2023-10-27 21:07:19,037][05136] Saving new best policy, reward=28.050! [2023-10-27 21:07:19,290][05149] Updated weights for policy 0, policy_version 890 (0.0022) [2023-10-27 21:07:24,031][00421] Fps is (10 sec: 4095.2, 60 sec: 3345.0, 300 sec: 3235.1). Total num frames: 3661824. Throughput: 0: 854.1. Samples: 914512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:24,034][00421] Avg episode reward: [(0, '29.241')] [2023-10-27 21:07:24,046][05136] Saving new best policy, reward=29.241! [2023-10-27 21:07:29,029][00421] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3674112. Throughput: 0: 815.6. Samples: 918852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:29,036][00421] Avg episode reward: [(0, '29.261')] [2023-10-27 21:07:29,040][05136] Saving new best policy, reward=29.261! [2023-10-27 21:07:32,958][05149] Updated weights for policy 0, policy_version 900 (0.0014) [2023-10-27 21:07:34,029][00421] Fps is (10 sec: 2867.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3690496. Throughput: 0: 817.1. Samples: 923086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:34,036][00421] Avg episode reward: [(0, '28.605')] [2023-10-27 21:07:39,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3710976. Throughput: 0: 844.0. Samples: 926188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:07:39,040][00421] Avg episode reward: [(0, '28.204')] [2023-10-27 21:07:42,854][05149] Updated weights for policy 0, policy_version 910 (0.0018) [2023-10-27 21:07:44,036][00421] Fps is (10 sec: 3683.7, 60 sec: 3276.6, 300 sec: 3235.1). Total num frames: 3727360. Throughput: 0: 857.9. Samples: 932350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:07:44,039][00421] Avg episode reward: [(0, '26.981')] [2023-10-27 21:07:49,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3739648. Throughput: 0: 809.2. Samples: 936230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-10-27 21:07:49,036][00421] Avg episode reward: [(0, '27.501')] [2023-10-27 21:07:54,029][00421] Fps is (10 sec: 2869.3, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3756032. Throughput: 0: 809.3. Samples: 938084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:07:54,031][00421] Avg episode reward: [(0, '27.440')] [2023-10-27 21:07:56,618][05149] Updated weights for policy 0, policy_version 920 (0.0027) [2023-10-27 21:07:59,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3776512. Throughput: 0: 855.0. Samples: 944008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:07:59,033][00421] Avg episode reward: [(0, '27.218')] [2023-10-27 21:08:04,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3792896. Throughput: 0: 844.1. Samples: 949564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:08:04,031][00421] Avg episode reward: [(0, '26.983')] [2023-10-27 21:08:09,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3805184. Throughput: 0: 819.5. Samples: 951388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-10-27 21:08:09,031][00421] Avg episode reward: [(0, '27.986')] [2023-10-27 21:08:09,382][05149] Updated weights for policy 0, policy_version 930 (0.0018) [2023-10-27 21:08:14,029][00421] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3821568. Throughput: 0: 807.4. Samples: 955186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:14,031][00421] Avg episode reward: [(0, '27.657')] [2023-10-27 21:08:19,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3842048. Throughput: 0: 849.4. Samples: 961308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:19,032][00421] Avg episode reward: [(0, '27.238')] [2023-10-27 21:08:20,689][05149] Updated weights for policy 0, policy_version 940 (0.0029) [2023-10-27 21:08:24,029][00421] Fps is (10 sec: 3686.2, 60 sec: 3276.9, 300 sec: 3249.0). Total num frames: 3858432. Throughput: 0: 845.7. Samples: 964244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:24,033][00421] Avg episode reward: [(0, '27.395')] [2023-10-27 21:08:29,029][00421] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3870720. Throughput: 0: 798.2. Samples: 968262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:29,035][00421] Avg episode reward: [(0, '27.486')] [2023-10-27 21:08:34,029][00421] Fps is (10 sec: 2867.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3887104. Throughput: 0: 809.8. Samples: 972670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-10-27 21:08:34,031][00421] Avg episode reward: [(0, '27.077')] [2023-10-27 21:08:34,608][05149] Updated weights for policy 0, policy_version 950 (0.0013) [2023-10-27 21:08:39,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3907584. Throughput: 0: 835.7. Samples: 975690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:39,033][00421] Avg episode reward: [(0, '27.103')] [2023-10-27 21:08:44,029][00421] Fps is (10 sec: 3686.4, 60 sec: 3277.2, 300 sec: 3249.0). Total num frames: 3923968. Throughput: 0: 833.3. Samples: 981508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-10-27 21:08:44,031][00421] Avg episode reward: [(0, '27.777')] [2023-10-27 21:08:45,985][05149] Updated weights for policy 0, policy_version 960 (0.0019) [2023-10-27 21:08:49,030][00421] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3249.0). Total num frames: 3936256. Throughput: 0: 794.3. Samples: 985308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:08:49,037][00421] Avg episode reward: [(0, '27.415')] [2023-10-27 21:08:54,029][00421] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3952640. Throughput: 0: 796.5. Samples: 987232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-10-27 21:08:54,036][00421] Avg episode reward: [(0, '26.160')] [2023-10-27 21:08:58,821][05149] Updated weights for policy 0, policy_version 970 (0.0024) [2023-10-27 21:08:59,029][00421] Fps is (10 sec: 3686.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3973120. Throughput: 0: 839.1. Samples: 992946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-10-27 21:08:59,031][00421] Avg episode reward: [(0, '25.616')] [2023-10-27 21:09:04,029][00421] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3249.1). Total num frames: 3989504. Throughput: 0: 826.6. Samples: 998504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-10-27 21:09:04,031][00421] Avg episode reward: [(0, '24.718')] [2023-10-27 21:09:04,048][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth... [2023-10-27 21:09:04,229][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000784_3211264.pth [2023-10-27 21:09:09,033][00421] Fps is (10 sec: 2866.0, 60 sec: 3276.6, 300 sec: 3249.0). Total num frames: 4001792. Throughput: 0: 800.4. Samples: 1000264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-10-27 21:09:09,035][00421] Avg episode reward: [(0, '25.656')] [2023-10-27 21:09:09,850][05136] Stopping Batcher_0... [2023-10-27 21:09:09,850][05136] Loop batcher_evt_loop terminating... [2023-10-27 21:09:09,853][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-10-27 21:09:09,853][00421] Component Batcher_0 stopped! [2023-10-27 21:09:09,944][00421] Component RolloutWorker_w0 stopped! [2023-10-27 21:09:09,951][05150] Stopping RolloutWorker_w0... [2023-10-27 21:09:09,963][00421] Component RolloutWorker_w2 stopped! [2023-10-27 21:09:09,967][05152] Stopping RolloutWorker_w2... [2023-10-27 21:09:09,975][05151] Stopping RolloutWorker_w1... [2023-10-27 21:09:09,972][00421] Component RolloutWorker_w4 stopped! [2023-10-27 21:09:09,977][05151] Loop rollout_proc1_evt_loop terminating... [2023-10-27 21:09:09,977][00421] Component RolloutWorker_w1 stopped! [2023-10-27 21:09:09,980][05154] Stopping RolloutWorker_w4... [2023-10-27 21:09:09,981][05154] Loop rollout_proc4_evt_loop terminating... [2023-10-27 21:09:09,993][05153] Stopping RolloutWorker_w3... [2023-10-27 21:09:09,994][05153] Loop rollout_proc3_evt_loop terminating... [2023-10-27 21:09:09,998][05156] Stopping RolloutWorker_w7... [2023-10-27 21:09:09,994][00421] Component RolloutWorker_w3 stopped! [2023-10-27 21:09:09,999][00421] Component RolloutWorker_w7 stopped! [2023-10-27 21:09:10,001][00421] Component RolloutWorker_w6 stopped! [2023-10-27 21:09:10,008][05157] Stopping RolloutWorker_w6... [2023-10-27 21:09:10,008][05157] Loop rollout_proc6_evt_loop terminating... [2023-10-27 21:09:09,951][05150] Loop rollout_proc0_evt_loop terminating... [2023-10-27 21:09:09,967][05152] Loop rollout_proc2_evt_loop terminating... [2023-10-27 21:09:10,023][05156] Loop rollout_proc7_evt_loop terminating... [2023-10-27 21:09:10,029][05149] Weights refcount: 2 0 [2023-10-27 21:09:10,040][05149] Stopping InferenceWorker_p0-w0... [2023-10-27 21:09:10,040][05149] Loop inference_proc0-0_evt_loop terminating... [2023-10-27 21:09:10,040][00421] Component InferenceWorker_p0-w0 stopped! [2023-10-27 21:09:10,062][05155] Stopping RolloutWorker_w5... [2023-10-27 21:09:10,061][00421] Component RolloutWorker_w5 stopped! [2023-10-27 21:09:10,068][05155] Loop rollout_proc5_evt_loop terminating... [2023-10-27 21:09:10,085][05136] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000878_3596288.pth [2023-10-27 21:09:10,097][05136] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-10-27 21:09:10,297][00421] Component LearnerWorker_p0 stopped! [2023-10-27 21:09:10,300][00421] Waiting for process learner_proc0 to stop... [2023-10-27 21:09:10,297][05136] Stopping LearnerWorker_p0... [2023-10-27 21:09:10,310][05136] Loop learner_proc0_evt_loop terminating... [2023-10-27 21:09:12,216][00421] Waiting for process inference_proc0-0 to join... [2023-10-27 21:09:12,416][00421] Waiting for process rollout_proc0 to join... [2023-10-27 21:09:14,453][00421] Waiting for process rollout_proc1 to join... [2023-10-27 21:09:14,455][00421] Waiting for process rollout_proc2 to join... [2023-10-27 21:09:14,461][00421] Waiting for process rollout_proc3 to join... [2023-10-27 21:09:14,463][00421] Waiting for process rollout_proc4 to join... [2023-10-27 21:09:14,464][00421] Waiting for process rollout_proc5 to join... [2023-10-27 21:09:14,466][00421] Waiting for process rollout_proc6 to join... [2023-10-27 21:09:14,467][00421] Waiting for process rollout_proc7 to join... [2023-10-27 21:09:14,468][00421] Batcher 0 profile tree view: batching: 27.7779, releasing_batches: 0.0293 [2023-10-27 21:09:14,469][00421] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 527.8604 update_model: 9.0674 weight_update: 0.0038 one_step: 0.0067 handle_policy_step: 600.2618 deserialize: 16.2461, stack: 3.0952, obs_to_device_normalize: 119.8303, forward: 326.5259, send_messages: 28.3808 prepare_outputs: 77.1705 to_cpu: 44.5280 [2023-10-27 21:09:14,470][00421] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 13.3598 train: 73.8575 epoch_init: 0.0127, minibatch_init: 0.0117, losses_postprocess: 0.6890, kl_divergence: 0.7184, after_optimizer: 34.6959 calculate_losses: 26.0353 losses_init: 0.0041, forward_head: 1.2871, bptt_initial: 17.2335, tail: 1.1253, advantages_returns: 0.2831, losses: 3.6392 bptt: 2.1061 bptt_forward_core: 2.0050 update: 11.0790 clip: 0.9096 [2023-10-27 21:09:14,472][00421] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3076, enqueue_policy_requests: 149.7218, env_step: 886.0197, overhead: 23.0922, complete_rollouts: 7.1232 save_policy_outputs: 21.6387 split_output_tensors: 10.3707 [2023-10-27 21:09:14,473][00421] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3375, enqueue_policy_requests: 152.6236, env_step: 883.7869, overhead: 23.6649, complete_rollouts: 7.2616 save_policy_outputs: 21.5466 split_output_tensors: 10.2668 [2023-10-27 21:09:14,474][00421] Loop Runner_EvtLoop terminating... [2023-10-27 21:09:14,476][00421] Runner profile tree view: main_loop: 1205.3874 [2023-10-27 21:09:14,477][00421] Collected {0: 4005888}, FPS: 3323.3 [2023-10-27 21:09:15,609][00421] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-10-27 21:09:15,611][00421] Overriding arg 'num_workers' with value 1 passed from command line [2023-10-27 21:09:15,613][00421] Adding new argument 'no_render'=True that is not in the saved config file! [2023-10-27 21:09:15,615][00421] Adding new argument 'save_video'=True that is not in the saved config file! [2023-10-27 21:09:15,620][00421] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-10-27 21:09:15,622][00421] Adding new argument 'video_name'=None that is not in the saved config file! [2023-10-27 21:09:15,623][00421] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-10-27 21:09:15,624][00421] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-10-27 21:09:15,625][00421] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-10-27 21:09:15,630][00421] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-10-27 21:09:15,631][00421] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-10-27 21:09:15,632][00421] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-10-27 21:09:15,633][00421] Adding new argument 'train_script'=None that is not in the saved config file! [2023-10-27 21:09:15,634][00421] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-10-27 21:09:15,635][00421] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-10-27 21:09:15,693][00421] Doom resolution: 160x120, resize resolution: (128, 72) [2023-10-27 21:09:15,699][00421] RunningMeanStd input shape: (3, 72, 128) [2023-10-27 21:09:15,702][00421] RunningMeanStd input shape: (1,) [2023-10-27 21:09:15,723][00421] ConvEncoder: input_channels=3 [2023-10-27 21:09:15,897][00421] Conv encoder output size: 512 [2023-10-27 21:09:15,900][00421] Policy head output size: 512 [2023-10-27 21:09:16,230][00421] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-10-27 21:09:17,132][00421] Num frames 100... [2023-10-27 21:09:17,338][00421] Num frames 200... [2023-10-27 21:09:17,534][00421] Num frames 300... [2023-10-27 21:09:17,740][00421] Num frames 400... [2023-10-27 21:09:17,937][00421] Num frames 500... [2023-10-27 21:09:18,134][00421] Num frames 600... [2023-10-27 21:09:18,273][00421] Num frames 700... [2023-10-27 21:09:18,405][00421] Num frames 800... [2023-10-27 21:09:18,536][00421] Num frames 900... [2023-10-27 21:09:18,665][00421] Num frames 1000... [2023-10-27 21:09:18,800][00421] Num frames 1100... [2023-10-27 21:09:18,914][00421] Avg episode rewards: #0: 26.440, true rewards: #0: 11.440 [2023-10-27 21:09:18,916][00421] Avg episode reward: 26.440, avg true_objective: 11.440 [2023-10-27 21:09:18,998][00421] Num frames 1200... [2023-10-27 21:09:19,131][00421] Num frames 1300... [2023-10-27 21:09:19,262][00421] Num frames 1400... [2023-10-27 21:09:19,402][00421] Num frames 1500... [2023-10-27 21:09:19,532][00421] Num frames 1600... [2023-10-27 21:09:19,662][00421] Num frames 1700... [2023-10-27 21:09:19,800][00421] Num frames 1800... [2023-10-27 21:09:19,934][00421] Num frames 1900... [2023-10-27 21:09:20,081][00421] Num frames 2000... [2023-10-27 21:09:20,158][00421] Avg episode rewards: #0: 24.075, true rewards: #0: 10.075 [2023-10-27 21:09:20,159][00421] Avg episode reward: 24.075, avg true_objective: 10.075 [2023-10-27 21:09:20,270][00421] Num frames 2100... [2023-10-27 21:09:20,405][00421] Num frames 2200... [2023-10-27 21:09:20,531][00421] Num frames 2300... [2023-10-27 21:09:20,670][00421] Num frames 2400... [2023-10-27 21:09:20,803][00421] Num frames 2500... [2023-10-27 21:09:20,935][00421] Num frames 2600... [2023-10-27 21:09:21,068][00421] Num frames 2700... [2023-10-27 21:09:21,204][00421] Num frames 2800... [2023-10-27 21:09:21,335][00421] Num frames 2900... [2023-10-27 21:09:21,491][00421] Num frames 3000... [2023-10-27 21:09:21,552][00421] Avg episode rewards: #0: 23.674, true rewards: #0: 10.007 [2023-10-27 21:09:21,554][00421] Avg episode reward: 23.674, avg true_objective: 10.007 [2023-10-27 21:09:21,687][00421] Num frames 3100... [2023-10-27 21:09:21,819][00421] Num frames 3200... [2023-10-27 21:09:22,004][00421] Num frames 3300... [2023-10-27 21:09:22,215][00421] Num frames 3400... [2023-10-27 21:09:22,372][00421] Avg episode rewards: #0: 19.875, true rewards: #0: 8.625 [2023-10-27 21:09:22,375][00421] Avg episode reward: 19.875, avg true_objective: 8.625 [2023-10-27 21:09:22,479][00421] Num frames 3500... [2023-10-27 21:09:22,671][00421] Num frames 3600... [2023-10-27 21:09:22,870][00421] Num frames 3700... [2023-10-27 21:09:23,069][00421] Num frames 3800... [2023-10-27 21:09:23,253][00421] Avg episode rewards: #0: 17.132, true rewards: #0: 7.732 [2023-10-27 21:09:23,255][00421] Avg episode reward: 17.132, avg true_objective: 7.732 [2023-10-27 21:09:23,325][00421] Num frames 3900... [2023-10-27 21:09:23,532][00421] Num frames 4000... [2023-10-27 21:09:23,735][00421] Num frames 4100... [2023-10-27 21:09:23,936][00421] Num frames 4200... [2023-10-27 21:09:24,136][00421] Num frames 4300... [2023-10-27 21:09:24,340][00421] Num frames 4400... [2023-10-27 21:09:24,540][00421] Num frames 4500... [2023-10-27 21:09:24,744][00421] Num frames 4600... [2023-10-27 21:09:24,954][00421] Num frames 4700... [2023-10-27 21:09:25,159][00421] Num frames 4800... [2023-10-27 21:09:25,356][00421] Num frames 4900... [2023-10-27 21:09:25,567][00421] Num frames 5000... [2023-10-27 21:09:25,749][00421] Num frames 5100... [2023-10-27 21:09:25,892][00421] Avg episode rewards: #0: 19.777, true rewards: #0: 8.610 [2023-10-27 21:09:25,894][00421] Avg episode reward: 19.777, avg true_objective: 8.610 [2023-10-27 21:09:25,941][00421] Num frames 5200... [2023-10-27 21:09:26,073][00421] Num frames 5300... [2023-10-27 21:09:26,207][00421] Num frames 5400... [2023-10-27 21:09:26,337][00421] Num frames 5500... [2023-10-27 21:09:26,467][00421] Num frames 5600... [2023-10-27 21:09:26,594][00421] Num frames 5700... [2023-10-27 21:09:26,733][00421] Num frames 5800... [2023-10-27 21:09:26,891][00421] Avg episode rewards: #0: 19.256, true rewards: #0: 8.399 [2023-10-27 21:09:26,893][00421] Avg episode reward: 19.256, avg true_objective: 8.399 [2023-10-27 21:09:26,931][00421] Num frames 5900... [2023-10-27 21:09:27,066][00421] Num frames 6000... [2023-10-27 21:09:27,195][00421] Num frames 6100... [2023-10-27 21:09:27,322][00421] Num frames 6200... [2023-10-27 21:09:27,458][00421] Num frames 6300... [2023-10-27 21:09:27,592][00421] Num frames 6400... [2023-10-27 21:09:27,733][00421] Num frames 6500... [2023-10-27 21:09:27,866][00421] Num frames 6600... [2023-10-27 21:09:27,998][00421] Num frames 6700... [2023-10-27 21:09:28,131][00421] Num frames 6800... [2023-10-27 21:09:28,268][00421] Num frames 6900... [2023-10-27 21:09:28,413][00421] Avg episode rewards: #0: 20.459, true rewards: #0: 8.709 [2023-10-27 21:09:28,414][00421] Avg episode reward: 20.459, avg true_objective: 8.709 [2023-10-27 21:09:28,466][00421] Num frames 7000... [2023-10-27 21:09:28,595][00421] Num frames 7100... [2023-10-27 21:09:28,738][00421] Num frames 7200... [2023-10-27 21:09:28,870][00421] Num frames 7300... [2023-10-27 21:09:29,008][00421] Num frames 7400... [2023-10-27 21:09:29,145][00421] Num frames 7500... [2023-10-27 21:09:29,312][00421] Avg episode rewards: #0: 19.317, true rewards: #0: 8.428 [2023-10-27 21:09:29,314][00421] Avg episode reward: 19.317, avg true_objective: 8.428 [2023-10-27 21:09:29,336][00421] Num frames 7600... [2023-10-27 21:09:29,465][00421] Num frames 7700... [2023-10-27 21:09:29,603][00421] Num frames 7800... [2023-10-27 21:09:29,752][00421] Num frames 7900... [2023-10-27 21:09:29,890][00421] Num frames 8000... [2023-10-27 21:09:30,025][00421] Num frames 8100... [2023-10-27 21:09:30,164][00421] Num frames 8200... [2023-10-27 21:09:30,298][00421] Num frames 8300... [2023-10-27 21:09:30,440][00421] Num frames 8400... [2023-10-27 21:09:30,576][00421] Num frames 8500... [2023-10-27 21:09:30,719][00421] Num frames 8600... [2023-10-27 21:09:30,854][00421] Num frames 8700... [2023-10-27 21:09:30,987][00421] Num frames 8800... [2023-10-27 21:09:31,118][00421] Num frames 8900... [2023-10-27 21:09:31,245][00421] Num frames 9000... [2023-10-27 21:09:31,382][00421] Num frames 9100... [2023-10-27 21:09:31,515][00421] Num frames 9200... [2023-10-27 21:09:31,642][00421] Num frames 9300... [2023-10-27 21:09:31,789][00421] Num frames 9400... [2023-10-27 21:09:31,924][00421] Num frames 9500... [2023-10-27 21:09:32,073][00421] Avg episode rewards: #0: 22.265, true rewards: #0: 9.565 [2023-10-27 21:09:32,074][00421] Avg episode reward: 22.265, avg true_objective: 9.565 [2023-10-27 21:10:31,031][00421] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-10-27 21:10:31,619][00421] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-10-27 21:10:31,622][00421] Overriding arg 'num_workers' with value 1 passed from command line [2023-10-27 21:10:31,623][00421] Adding new argument 'no_render'=True that is not in the saved config file! [2023-10-27 21:10:31,625][00421] Adding new argument 'save_video'=True that is not in the saved config file! [2023-10-27 21:10:31,627][00421] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-10-27 21:10:31,629][00421] Adding new argument 'video_name'=None that is not in the saved config file! [2023-10-27 21:10:31,630][00421] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-10-27 21:10:31,632][00421] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-10-27 21:10:31,633][00421] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-10-27 21:10:31,634][00421] Adding new argument 'hf_repository'='AmrMorgado/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-10-27 21:10:31,634][00421] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-10-27 21:10:31,635][00421] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-10-27 21:10:31,636][00421] Adding new argument 'train_script'=None that is not in the saved config file! [2023-10-27 21:10:31,637][00421] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-10-27 21:10:31,638][00421] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-10-27 21:10:31,683][00421] RunningMeanStd input shape: (3, 72, 128) [2023-10-27 21:10:31,686][00421] RunningMeanStd input shape: (1,) [2023-10-27 21:10:31,702][00421] ConvEncoder: input_channels=3 [2023-10-27 21:10:31,776][00421] Conv encoder output size: 512 [2023-10-27 21:10:31,779][00421] Policy head output size: 512 [2023-10-27 21:10:31,814][00421] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-10-27 21:10:32,432][00421] Num frames 100... [2023-10-27 21:10:32,618][00421] Num frames 200... [2023-10-27 21:10:32,798][00421] Num frames 300... [2023-10-27 21:10:32,977][00421] Num frames 400... [2023-10-27 21:10:33,160][00421] Num frames 500... [2023-10-27 21:10:33,347][00421] Num frames 600... [2023-10-27 21:10:33,527][00421] Num frames 700... [2023-10-27 21:10:33,726][00421] Num frames 800... [2023-10-27 21:10:33,940][00421] Num frames 900... [2023-10-27 21:10:34,154][00421] Num frames 1000... [2023-10-27 21:10:34,391][00421] Num frames 1100... [2023-10-27 21:10:34,581][00421] Num frames 1200... [2023-10-27 21:10:34,782][00421] Num frames 1300... [2023-10-27 21:10:34,984][00421] Num frames 1400... [2023-10-27 21:10:35,143][00421] Avg episode rewards: #0: 35.530, true rewards: #0: 14.530 [2023-10-27 21:10:35,145][00421] Avg episode reward: 35.530, avg true_objective: 14.530 [2023-10-27 21:10:35,257][00421] Num frames 1500... [2023-10-27 21:10:35,452][00421] Num frames 1600... [2023-10-27 21:10:35,648][00421] Num frames 1700... [2023-10-27 21:10:35,856][00421] Num frames 1800... [2023-10-27 21:10:36,054][00421] Num frames 1900... [2023-10-27 21:10:36,240][00421] Num frames 2000... [2023-10-27 21:10:36,417][00421] Num frames 2100... [2023-10-27 21:10:36,611][00421] Num frames 2200... [2023-10-27 21:10:36,816][00421] Num frames 2300... [2023-10-27 21:10:37,021][00421] Num frames 2400... [2023-10-27 21:10:37,211][00421] Num frames 2500... [2023-10-27 21:10:37,397][00421] Num frames 2600... [2023-10-27 21:10:37,465][00421] Avg episode rewards: #0: 28.525, true rewards: #0: 13.025 [2023-10-27 21:10:37,468][00421] Avg episode reward: 28.525, avg true_objective: 13.025 [2023-10-27 21:10:37,684][00421] Num frames 2700... [2023-10-27 21:10:37,922][00421] Num frames 2800... [2023-10-27 21:10:38,143][00421] Num frames 2900... [2023-10-27 21:10:38,376][00421] Num frames 3000... [2023-10-27 21:10:38,619][00421] Num frames 3100... [2023-10-27 21:10:38,837][00421] Num frames 3200... [2023-10-27 21:10:39,066][00421] Num frames 3300... [2023-10-27 21:10:39,309][00421] Num frames 3400... [2023-10-27 21:10:39,535][00421] Num frames 3500... [2023-10-27 21:10:39,777][00421] Num frames 3600... [2023-10-27 21:10:40,018][00421] Num frames 3700... [2023-10-27 21:10:40,228][00421] Num frames 3800... [2023-10-27 21:10:40,412][00421] Num frames 3900... [2023-10-27 21:10:40,569][00421] Num frames 4000... [2023-10-27 21:10:40,711][00421] Num frames 4100... [2023-10-27 21:10:40,773][00421] Avg episode rewards: #0: 31.343, true rewards: #0: 13.677 [2023-10-27 21:10:40,774][00421] Avg episode reward: 31.343, avg true_objective: 13.677 [2023-10-27 21:10:40,907][00421] Num frames 4200... [2023-10-27 21:10:41,050][00421] Num frames 4300... [2023-10-27 21:10:41,143][00421] Avg episode rewards: #0: 24.317, true rewards: #0: 10.817 [2023-10-27 21:10:41,145][00421] Avg episode reward: 24.317, avg true_objective: 10.817 [2023-10-27 21:10:41,241][00421] Num frames 4400... [2023-10-27 21:10:41,372][00421] Num frames 4500... [2023-10-27 21:10:41,501][00421] Num frames 4600... [2023-10-27 21:10:41,630][00421] Num frames 4700... [2023-10-27 21:10:41,764][00421] Num frames 4800... [2023-10-27 21:10:41,897][00421] Num frames 4900... [2023-10-27 21:10:42,025][00421] Num frames 5000... [2023-10-27 21:10:42,171][00421] Avg episode rewards: #0: 22.326, true rewards: #0: 10.126 [2023-10-27 21:10:42,172][00421] Avg episode reward: 22.326, avg true_objective: 10.126 [2023-10-27 21:10:42,222][00421] Num frames 5100... [2023-10-27 21:10:42,344][00421] Num frames 5200... [2023-10-27 21:10:42,468][00421] Num frames 5300... [2023-10-27 21:10:42,595][00421] Num frames 5400... [2023-10-27 21:10:42,721][00421] Num frames 5500... [2023-10-27 21:10:42,851][00421] Num frames 5600... [2023-10-27 21:10:42,977][00421] Num frames 5700... [2023-10-27 21:10:43,115][00421] Num frames 5800... [2023-10-27 21:10:43,239][00421] Num frames 5900... [2023-10-27 21:10:43,362][00421] Num frames 6000... [2023-10-27 21:10:43,492][00421] Num frames 6100... [2023-10-27 21:10:43,621][00421] Num frames 6200... [2023-10-27 21:10:43,753][00421] Num frames 6300... [2023-10-27 21:10:43,889][00421] Num frames 6400... [2023-10-27 21:10:44,016][00421] Num frames 6500... [2023-10-27 21:10:44,168][00421] Avg episode rewards: #0: 25.117, true rewards: #0: 10.950 [2023-10-27 21:10:44,170][00421] Avg episode reward: 25.117, avg true_objective: 10.950 [2023-10-27 21:10:44,210][00421] Num frames 6600... [2023-10-27 21:10:44,333][00421] Num frames 6700... [2023-10-27 21:10:44,459][00421] Num frames 6800... [2023-10-27 21:10:44,583][00421] Num frames 6900... [2023-10-27 21:10:44,726][00421] Num frames 7000... [2023-10-27 21:10:44,859][00421] Num frames 7100... [2023-10-27 21:10:44,984][00421] Num frames 7200... [2023-10-27 21:10:45,120][00421] Num frames 7300... [2023-10-27 21:10:45,200][00421] Avg episode rewards: #0: 23.599, true rewards: #0: 10.456 [2023-10-27 21:10:45,202][00421] Avg episode reward: 23.599, avg true_objective: 10.456 [2023-10-27 21:10:45,311][00421] Num frames 7400... [2023-10-27 21:10:45,453][00421] Num frames 7500... [2023-10-27 21:10:45,657][00421] Num frames 7600... [2023-10-27 21:10:45,830][00421] Num frames 7700... [2023-10-27 21:10:45,934][00421] Avg episode rewards: #0: 21.419, true rewards: #0: 9.669 [2023-10-27 21:10:45,935][00421] Avg episode reward: 21.419, avg true_objective: 9.669 [2023-10-27 21:10:46,028][00421] Num frames 7800... [2023-10-27 21:10:46,171][00421] Num frames 7900... [2023-10-27 21:10:46,300][00421] Num frames 8000... [2023-10-27 21:10:46,436][00421] Num frames 8100... [2023-10-27 21:10:46,569][00421] Num frames 8200... [2023-10-27 21:10:46,703][00421] Num frames 8300... [2023-10-27 21:10:46,811][00421] Avg episode rewards: #0: 20.374, true rewards: #0: 9.263 [2023-10-27 21:10:46,813][00421] Avg episode reward: 20.374, avg true_objective: 9.263 [2023-10-27 21:10:46,904][00421] Num frames 8400... [2023-10-27 21:10:47,040][00421] Num frames 8500... [2023-10-27 21:10:47,184][00421] Num frames 8600... [2023-10-27 21:10:47,315][00421] Num frames 8700... [2023-10-27 21:10:47,447][00421] Num frames 8800... [2023-10-27 21:10:47,575][00421] Num frames 8900... [2023-10-27 21:10:47,711][00421] Num frames 9000... [2023-10-27 21:10:47,848][00421] Num frames 9100... [2023-10-27 21:10:47,980][00421] Num frames 9200... [2023-10-27 21:10:48,109][00421] Num frames 9300... [2023-10-27 21:10:48,244][00421] Num frames 9400... [2023-10-27 21:10:48,374][00421] Num frames 9500... [2023-10-27 21:10:48,505][00421] Num frames 9600... [2023-10-27 21:10:48,637][00421] Num frames 9700... [2023-10-27 21:10:48,766][00421] Num frames 9800... [2023-10-27 21:10:48,894][00421] Num frames 9900... [2023-10-27 21:10:49,031][00421] Num frames 10000... [2023-10-27 21:10:49,163][00421] Num frames 10100... [2023-10-27 21:10:49,325][00421] Avg episode rewards: #0: 23.777, true rewards: #0: 10.177 [2023-10-27 21:10:49,327][00421] Avg episode reward: 23.777, avg true_objective: 10.177 [2023-10-27 21:11:16,784][00421] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-10-27 21:11:27,017][00421] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-10-27 21:11:27,019][00421] Overriding arg 'num_workers' with value 1 passed from command line [2023-10-27 21:11:27,020][00421] Adding new argument 'no_render'=True that is not in the saved config file! [2023-10-27 21:11:27,022][00421] Adding new argument 'save_video'=True that is not in the saved config file! [2023-10-27 21:11:27,023][00421] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-10-27 21:11:27,024][00421] Adding new argument 'video_name'=None that is not in the saved config file! [2023-10-27 21:11:27,026][00421] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-10-27 21:11:27,027][00421] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-10-27 21:11:27,028][00421] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-10-27 21:11:27,030][00421] Adding new argument 'hf_repository'='AmrMorgado/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-10-27 21:11:27,031][00421] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-10-27 21:11:27,033][00421] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-10-27 21:11:27,034][00421] Adding new argument 'train_script'=None that is not in the saved config file! [2023-10-27 21:11:27,035][00421] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-10-27 21:11:27,037][00421] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-10-27 21:11:27,081][00421] RunningMeanStd input shape: (3, 72, 128) [2023-10-27 21:11:27,083][00421] RunningMeanStd input shape: (1,) [2023-10-27 21:11:27,097][00421] ConvEncoder: input_channels=3 [2023-10-27 21:11:27,135][00421] Conv encoder output size: 512 [2023-10-27 21:11:27,136][00421] Policy head output size: 512 [2023-10-27 21:11:27,157][00421] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-10-27 21:11:27,579][00421] Num frames 100... [2023-10-27 21:11:27,727][00421] Num frames 200... [2023-10-27 21:11:27,872][00421] Num frames 300... [2023-10-27 21:11:28,017][00421] Num frames 400... [2023-10-27 21:11:28,146][00421] Num frames 500... [2023-10-27 21:11:28,287][00421] Num frames 600... [2023-10-27 21:11:28,355][00421] Avg episode rewards: #0: 11.080, true rewards: #0: 6.080 [2023-10-27 21:11:28,358][00421] Avg episode reward: 11.080, avg true_objective: 6.080 [2023-10-27 21:11:28,478][00421] Num frames 700... [2023-10-27 21:11:28,611][00421] Num frames 800... [2023-10-27 21:11:28,743][00421] Num frames 900... [2023-10-27 21:11:28,877][00421] Num frames 1000... [2023-10-27 21:11:29,010][00421] Num frames 1100... [2023-10-27 21:11:29,141][00421] Num frames 1200... [2023-10-27 21:11:29,284][00421] Num frames 1300... [2023-10-27 21:11:29,415][00421] Num frames 1400... [2023-10-27 21:11:29,552][00421] Num frames 1500... [2023-10-27 21:11:29,687][00421] Num frames 1600... [2023-10-27 21:11:29,823][00421] Num frames 1700... [2023-10-27 21:11:29,957][00421] Num frames 1800... [2023-10-27 21:11:30,093][00421] Num frames 1900... [2023-10-27 21:11:30,229][00421] Num frames 2000... [2023-10-27 21:11:30,365][00421] Num frames 2100... [2023-10-27 21:11:30,499][00421] Num frames 2200... [2023-10-27 21:11:30,638][00421] Num frames 2300... [2023-10-27 21:11:30,774][00421] Num frames 2400... [2023-10-27 21:11:30,907][00421] Num frames 2500... [2023-10-27 21:11:30,974][00421] Avg episode rewards: #0: 27.025, true rewards: #0: 12.525 [2023-10-27 21:11:30,975][00421] Avg episode reward: 27.025, avg true_objective: 12.525 [2023-10-27 21:11:31,104][00421] Num frames 2600... [2023-10-27 21:11:31,235][00421] Num frames 2700... [2023-10-27 21:11:31,371][00421] Num frames 2800... [2023-10-27 21:11:31,501][00421] Num frames 2900... [2023-10-27 21:11:31,628][00421] Avg episode rewards: #0: 19.843, true rewards: #0: 9.843 [2023-10-27 21:11:31,630][00421] Avg episode reward: 19.843, avg true_objective: 9.843 [2023-10-27 21:11:31,733][00421] Num frames 3000... [2023-10-27 21:11:31,928][00421] Num frames 3100... [2023-10-27 21:11:32,127][00421] Num frames 3200... [2023-10-27 21:11:32,325][00421] Num frames 3300... [2023-10-27 21:11:32,515][00421] Num frames 3400... [2023-10-27 21:11:32,718][00421] Num frames 3500... [2023-10-27 21:11:32,910][00421] Num frames 3600... [2023-10-27 21:11:33,105][00421] Num frames 3700... [2023-10-27 21:11:33,298][00421] Num frames 3800... [2023-10-27 21:11:33,506][00421] Num frames 3900... [2023-10-27 21:11:33,698][00421] Num frames 4000... [2023-10-27 21:11:33,895][00421] Num frames 4100... [2023-10-27 21:11:34,087][00421] Avg episode rewards: #0: 21.922, true rewards: #0: 10.422 [2023-10-27 21:11:34,090][00421] Avg episode reward: 21.922, avg true_objective: 10.422 [2023-10-27 21:11:34,149][00421] Num frames 4200... [2023-10-27 21:11:34,337][00421] Num frames 4300... [2023-10-27 21:11:34,534][00421] Num frames 4400... [2023-10-27 21:11:34,722][00421] Num frames 4500... [2023-10-27 21:11:34,911][00421] Num frames 4600... [2023-10-27 21:11:35,099][00421] Num frames 4700... [2023-10-27 21:11:35,284][00421] Num frames 4800... [2023-10-27 21:11:35,461][00421] Num frames 4900... [2023-10-27 21:11:35,590][00421] Num frames 5000... [2023-10-27 21:11:35,719][00421] Num frames 5100... [2023-10-27 21:11:35,847][00421] Num frames 5200... [2023-10-27 21:11:35,973][00421] Num frames 5300... [2023-10-27 21:11:36,101][00421] Num frames 5400... [2023-10-27 21:11:36,193][00421] Avg episode rewards: #0: 23.256, true rewards: #0: 10.856 [2023-10-27 21:11:36,195][00421] Avg episode reward: 23.256, avg true_objective: 10.856 [2023-10-27 21:11:36,289][00421] Num frames 5500... [2023-10-27 21:11:36,434][00421] Num frames 5600... [2023-10-27 21:11:36,563][00421] Num frames 5700... [2023-10-27 21:11:36,689][00421] Num frames 5800... [2023-10-27 21:11:36,820][00421] Num frames 5900... [2023-10-27 21:11:36,945][00421] Num frames 6000... [2023-10-27 21:11:37,078][00421] Num frames 6100... [2023-10-27 21:11:37,206][00421] Num frames 6200... [2023-10-27 21:11:37,335][00421] Num frames 6300... [2023-10-27 21:11:37,465][00421] Num frames 6400... [2023-10-27 21:11:37,600][00421] Num frames 6500... [2023-10-27 21:11:37,735][00421] Num frames 6600... [2023-10-27 21:11:37,867][00421] Num frames 6700... [2023-10-27 21:11:37,993][00421] Num frames 6800... [2023-10-27 21:11:38,121][00421] Num frames 6900... [2023-10-27 21:11:38,248][00421] Num frames 7000... [2023-10-27 21:11:38,373][00421] Num frames 7100... [2023-10-27 21:11:38,506][00421] Num frames 7200... [2023-10-27 21:11:38,640][00421] Num frames 7300... [2023-10-27 21:11:38,772][00421] Num frames 7400... [2023-10-27 21:11:38,945][00421] Avg episode rewards: #0: 28.821, true rewards: #0: 12.488 [2023-10-27 21:11:38,947][00421] Avg episode reward: 28.821, avg true_objective: 12.488 [2023-10-27 21:11:38,960][00421] Num frames 7500... [2023-10-27 21:11:39,087][00421] Num frames 7600... [2023-10-27 21:11:39,214][00421] Num frames 7700... [2023-10-27 21:11:39,341][00421] Num frames 7800... [2023-10-27 21:11:39,468][00421] Num frames 7900... [2023-10-27 21:11:39,604][00421] Num frames 8000... [2023-10-27 21:11:39,738][00421] Num frames 8100... [2023-10-27 21:11:39,868][00421] Num frames 8200... [2023-10-27 21:11:39,970][00421] Avg episode rewards: #0: 27.184, true rewards: #0: 11.756 [2023-10-27 21:11:39,972][00421] Avg episode reward: 27.184, avg true_objective: 11.756 [2023-10-27 21:11:40,067][00421] Num frames 8300... [2023-10-27 21:11:40,209][00421] Num frames 8400... [2023-10-27 21:11:40,338][00421] Num frames 8500... [2023-10-27 21:11:40,465][00421] Num frames 8600... [2023-10-27 21:11:40,597][00421] Num frames 8700... [2023-10-27 21:11:40,726][00421] Num frames 8800... [2023-10-27 21:11:40,884][00421] Num frames 8900... [2023-10-27 21:11:41,014][00421] Num frames 9000... [2023-10-27 21:11:41,144][00421] Num frames 9100... [2023-10-27 21:11:41,270][00421] Num frames 9200... [2023-10-27 21:11:41,394][00421] Num frames 9300... [2023-10-27 21:11:41,553][00421] Num frames 9400... [2023-10-27 21:11:41,771][00421] Num frames 9500... [2023-10-27 21:11:41,907][00421] Num frames 9600... [2023-10-27 21:11:42,046][00421] Num frames 9700... [2023-10-27 21:11:42,106][00421] Avg episode rewards: #0: 28.251, true rewards: #0: 12.126 [2023-10-27 21:11:42,107][00421] Avg episode reward: 28.251, avg true_objective: 12.126 [2023-10-27 21:11:42,238][00421] Num frames 9800... [2023-10-27 21:11:42,364][00421] Num frames 9900... [2023-10-27 21:11:42,492][00421] Num frames 10000... [2023-10-27 21:11:42,629][00421] Num frames 10100... [2023-10-27 21:11:42,772][00421] Num frames 10200... [2023-10-27 21:11:42,906][00421] Num frames 10300... [2023-10-27 21:11:43,041][00421] Num frames 10400... [2023-10-27 21:11:43,208][00421] Num frames 10500... [2023-10-27 21:11:43,352][00421] Num frames 10600... [2023-10-27 21:11:43,545][00421] Num frames 10700... [2023-10-27 21:11:43,750][00421] Num frames 10800... [2023-10-27 21:11:43,879][00421] Num frames 10900... [2023-10-27 21:11:43,963][00421] Avg episode rewards: #0: 28.019, true rewards: #0: 12.130 [2023-10-27 21:11:43,965][00421] Avg episode reward: 28.019, avg true_objective: 12.130 [2023-10-27 21:11:44,077][00421] Num frames 11000... [2023-10-27 21:11:44,204][00421] Num frames 11100... [2023-10-27 21:11:44,337][00421] Num frames 11200... [2023-10-27 21:11:44,465][00421] Num frames 11300... [2023-10-27 21:11:44,595][00421] Num frames 11400... [2023-10-27 21:11:44,735][00421] Num frames 11500... [2023-10-27 21:11:44,864][00421] Num frames 11600... [2023-10-27 21:11:44,993][00421] Num frames 11700... [2023-10-27 21:11:45,131][00421] Num frames 11800... [2023-10-27 21:11:45,264][00421] Num frames 11900... [2023-10-27 21:11:45,401][00421] Num frames 12000... [2023-10-27 21:11:45,584][00421] Num frames 12100... [2023-10-27 21:11:45,794][00421] Num frames 12200... [2023-10-27 21:11:45,979][00421] Num frames 12300... [2023-10-27 21:11:46,167][00421] Num frames 12400... [2023-10-27 21:11:46,304][00421] Avg episode rewards: #0: 29.342, true rewards: #0: 12.442 [2023-10-27 21:11:46,309][00421] Avg episode reward: 29.342, avg true_objective: 12.442 [2023-10-27 21:13:05,282][00421] Replay video saved to /content/train_dir/default_experiment/replay.mp4!