[2023-02-23 10:36:40,071][01342] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 10:36:40,073][01342] Rollout worker 0 uses device cpu [2023-02-23 10:36:40,075][01342] Rollout worker 1 uses device cpu [2023-02-23 10:36:40,076][01342] Rollout worker 2 uses device cpu [2023-02-23 10:36:40,077][01342] Rollout worker 3 uses device cpu [2023-02-23 10:36:40,078][01342] Rollout worker 4 uses device cpu [2023-02-23 10:36:40,079][01342] Rollout worker 5 uses device cpu [2023-02-23 10:36:40,080][01342] Rollout worker 6 uses device cpu [2023-02-23 10:36:40,081][01342] Rollout worker 7 uses device cpu [2023-02-23 10:36:40,258][01342] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:36:40,260][01342] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 10:36:40,293][01342] Starting all processes... [2023-02-23 10:36:40,294][01342] Starting process learner_proc0 [2023-02-23 10:36:40,353][01342] Starting all processes... [2023-02-23 10:36:40,364][01342] Starting process inference_proc0-0 [2023-02-23 10:36:40,364][01342] Starting process rollout_proc0 [2023-02-23 10:36:40,364][01342] Starting process rollout_proc1 [2023-02-23 10:36:40,364][01342] Starting process rollout_proc2 [2023-02-23 10:36:40,364][01342] Starting process rollout_proc3 [2023-02-23 10:36:40,364][01342] Starting process rollout_proc4 [2023-02-23 10:36:40,365][01342] Starting process rollout_proc5 [2023-02-23 10:36:40,365][01342] Starting process rollout_proc6 [2023-02-23 10:36:40,365][01342] Starting process rollout_proc7 [2023-02-23 10:36:50,664][11493] Worker 2 uses CPU cores [0] [2023-02-23 10:36:50,792][11495] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:36:50,793][11495] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 10:36:50,913][11491] Worker 1 uses CPU cores [1] [2023-02-23 10:36:50,990][11477] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:36:50,991][11477] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 10:36:51,013][11499] Worker 7 uses CPU cores [1] [2023-02-23 10:36:51,125][11497] Worker 4 uses CPU cores [0] [2023-02-23 10:36:51,187][11498] Worker 6 uses CPU cores [0] [2023-02-23 10:36:51,244][11492] Worker 0 uses CPU cores [0] [2023-02-23 10:36:51,328][11494] Worker 3 uses CPU cores [1] [2023-02-23 10:36:51,354][11496] Worker 5 uses CPU cores [1] [2023-02-23 10:36:51,726][11477] Num visible devices: 1 [2023-02-23 10:36:51,726][11495] Num visible devices: 1 [2023-02-23 10:36:51,749][11477] Starting seed is not provided [2023-02-23 10:36:51,749][11477] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:36:51,750][11477] Initializing actor-critic model on device cuda:0 [2023-02-23 10:36:51,750][11477] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 10:36:51,752][11477] RunningMeanStd input shape: (1,) [2023-02-23 10:36:51,764][11477] ConvEncoder: input_channels=3 [2023-02-23 10:36:52,036][11477] Conv encoder output size: 512 [2023-02-23 10:36:52,036][11477] Policy head output size: 512 [2023-02-23 10:36:52,084][11477] Created Actor Critic model with architecture: [2023-02-23 10:36:52,084][11477] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 10:36:58,910][11477] Using optimizer [2023-02-23 10:36:58,911][11477] No checkpoints found [2023-02-23 10:36:58,911][11477] Did not load from checkpoint, starting from scratch! [2023-02-23 10:36:58,911][11477] Initialized policy 0 weights for model version 0 [2023-02-23 10:36:58,916][11477] LearnerWorker_p0 finished initialization! [2023-02-23 10:36:58,918][11477] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:36:59,170][11495] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 10:36:59,171][11495] RunningMeanStd input shape: (1,) [2023-02-23 10:36:59,189][11495] ConvEncoder: input_channels=3 [2023-02-23 10:36:59,339][11495] Conv encoder output size: 512 [2023-02-23 10:36:59,340][11495] Policy head output size: 512 [2023-02-23 10:37:00,251][01342] Heartbeat connected on Batcher_0 [2023-02-23 10:37:00,255][01342] Heartbeat connected on LearnerWorker_p0 [2023-02-23 10:37:00,273][01342] Heartbeat connected on RolloutWorker_w0 [2023-02-23 10:37:00,275][01342] Heartbeat connected on RolloutWorker_w1 [2023-02-23 10:37:00,277][01342] Heartbeat connected on RolloutWorker_w2 [2023-02-23 10:37:00,285][01342] Heartbeat connected on RolloutWorker_w3 [2023-02-23 10:37:00,290][01342] Heartbeat connected on RolloutWorker_w4 [2023-02-23 10:37:00,292][01342] Heartbeat connected on RolloutWorker_w5 [2023-02-23 10:37:00,296][01342] Heartbeat connected on RolloutWorker_w6 [2023-02-23 10:37:00,298][01342] Heartbeat connected on RolloutWorker_w7 [2023-02-23 10:37:01,178][01342] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:37:01,598][01342] Inference worker 0-0 is ready! [2023-02-23 10:37:01,600][01342] All inference workers are ready! Signal rollout workers to start! [2023-02-23 10:37:01,605][01342] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 10:37:01,700][11497] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,721][11492] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,733][11493] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,755][11498] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,757][11496] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,760][11491] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,771][11499] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:01,780][11494] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:37:02,915][11494] Decorrelating experience for 0 frames... [2023-02-23 10:37:02,917][11499] Decorrelating experience for 0 frames... [2023-02-23 10:37:02,916][11491] Decorrelating experience for 0 frames... [2023-02-23 10:37:03,192][11497] Decorrelating experience for 0 frames... [2023-02-23 10:37:03,196][11492] Decorrelating experience for 0 frames... [2023-02-23 10:37:03,202][11493] Decorrelating experience for 0 frames... [2023-02-23 10:37:03,216][11498] Decorrelating experience for 0 frames... [2023-02-23 10:37:03,362][11499] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,019][11491] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,022][11496] Decorrelating experience for 0 frames... [2023-02-23 10:37:04,451][11494] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,509][11497] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,511][11492] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,521][11498] Decorrelating experience for 32 frames... [2023-02-23 10:37:04,614][11493] Decorrelating experience for 32 frames... [2023-02-23 10:37:05,178][11499] Decorrelating experience for 64 frames... [2023-02-23 10:37:05,301][11496] Decorrelating experience for 32 frames... [2023-02-23 10:37:05,689][11494] Decorrelating experience for 64 frames... [2023-02-23 10:37:05,950][11497] Decorrelating experience for 64 frames... [2023-02-23 10:37:05,952][11492] Decorrelating experience for 64 frames... [2023-02-23 10:37:05,998][11498] Decorrelating experience for 64 frames... [2023-02-23 10:37:06,178][01342] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:37:06,459][11493] Decorrelating experience for 64 frames... [2023-02-23 10:37:06,688][11497] Decorrelating experience for 96 frames... [2023-02-23 10:37:06,707][11491] Decorrelating experience for 64 frames... [2023-02-23 10:37:06,748][11494] Decorrelating experience for 96 frames... [2023-02-23 10:37:07,167][11496] Decorrelating experience for 64 frames... [2023-02-23 10:37:07,243][11498] Decorrelating experience for 96 frames... [2023-02-23 10:37:07,420][11499] Decorrelating experience for 96 frames... [2023-02-23 10:37:07,705][11493] Decorrelating experience for 96 frames... [2023-02-23 10:37:08,041][11496] Decorrelating experience for 96 frames... [2023-02-23 10:37:08,392][11492] Decorrelating experience for 96 frames... [2023-02-23 10:37:08,723][11491] Decorrelating experience for 96 frames... [2023-02-23 10:37:11,179][01342] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 36.2. Samples: 362. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:37:11,184][01342] Avg episode reward: [(0, '0.640')] [2023-02-23 10:37:14,611][11477] Signal inference workers to stop experience collection... [2023-02-23 10:37:14,648][11495] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 10:37:16,178][01342] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 131.6. Samples: 1974. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:37:16,180][01342] Avg episode reward: [(0, '1.937')] [2023-02-23 10:37:16,871][11477] Signal inference workers to resume experience collection... [2023-02-23 10:37:16,872][11495] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 10:37:21,178][01342] Fps is (10 sec: 2457.9, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 240.0. Samples: 4800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-23 10:37:21,182][01342] Avg episode reward: [(0, '3.608')] [2023-02-23 10:37:24,528][11495] Updated weights for policy 0, policy_version 10 (0.0018) [2023-02-23 10:37:26,178][01342] Fps is (10 sec: 4505.6, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 484.2. Samples: 12106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:37:26,183][01342] Avg episode reward: [(0, '4.356')] [2023-02-23 10:37:31,178][01342] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 493.7. Samples: 14810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:37:31,184][01342] Avg episode reward: [(0, '4.447')] [2023-02-23 10:37:36,180][01342] Fps is (10 sec: 3276.2, 60 sec: 2223.4, 300 sec: 2223.4). Total num frames: 77824. Throughput: 0: 556.8. Samples: 19488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:37:36,182][01342] Avg episode reward: [(0, '4.406')] [2023-02-23 10:37:36,213][11495] Updated weights for policy 0, policy_version 20 (0.0015) [2023-02-23 10:37:41,178][01342] Fps is (10 sec: 4096.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 659.4. Samples: 26376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 10:37:41,181][01342] Avg episode reward: [(0, '4.387')] [2023-02-23 10:37:41,183][11477] Saving new best policy, reward=4.387! [2023-02-23 10:37:44,680][11495] Updated weights for policy 0, policy_version 30 (0.0026) [2023-02-23 10:37:46,178][01342] Fps is (10 sec: 4916.1, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 666.3. Samples: 29984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:37:46,185][01342] Avg episode reward: [(0, '4.540')] [2023-02-23 10:37:46,196][11477] Saving new best policy, reward=4.540! [2023-02-23 10:37:51,178][01342] Fps is (10 sec: 3686.4, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 785.2. Samples: 35332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:37:51,183][01342] Avg episode reward: [(0, '4.382')] [2023-02-23 10:37:56,178][01342] Fps is (10 sec: 3276.8, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 888.6. Samples: 40348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:37:56,180][01342] Avg episode reward: [(0, '4.414')] [2023-02-23 10:37:56,691][11495] Updated weights for policy 0, policy_version 40 (0.0051) [2023-02-23 10:38:01,178][01342] Fps is (10 sec: 4505.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 931.9. Samples: 43910. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:38:01,183][01342] Avg episode reward: [(0, '4.373')] [2023-02-23 10:38:05,370][11495] Updated weights for policy 0, policy_version 50 (0.0012) [2023-02-23 10:38:06,179][01342] Fps is (10 sec: 4505.2, 60 sec: 3413.3, 300 sec: 3150.7). Total num frames: 204800. Throughput: 0: 1029.3. Samples: 51120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:38:06,187][01342] Avg episode reward: [(0, '4.458')] [2023-02-23 10:38:11,184][01342] Fps is (10 sec: 3684.2, 60 sec: 3686.1, 300 sec: 3159.5). Total num frames: 221184. Throughput: 0: 970.3. Samples: 55774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:38:11,187][01342] Avg episode reward: [(0, '4.315')] [2023-02-23 10:38:16,178][01342] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 966.2. Samples: 58290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:38:16,181][01342] Avg episode reward: [(0, '4.134')] [2023-02-23 10:38:16,673][11495] Updated weights for policy 0, policy_version 60 (0.0019) [2023-02-23 10:38:21,178][01342] Fps is (10 sec: 4508.2, 60 sec: 4027.7, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 1023.3. Samples: 65534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:38:21,181][01342] Avg episode reward: [(0, '4.249')] [2023-02-23 10:38:26,092][11495] Updated weights for policy 0, policy_version 70 (0.0013) [2023-02-23 10:38:26,178][01342] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 1010.4. Samples: 71846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:38:26,188][01342] Avg episode reward: [(0, '4.405')] [2023-02-23 10:38:31,178][01342] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 981.1. Samples: 74132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:38:31,185][01342] Avg episode reward: [(0, '4.398')] [2023-02-23 10:38:36,178][01342] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3406.2). Total num frames: 323584. Throughput: 0: 990.6. Samples: 79908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 10:38:36,184][01342] Avg episode reward: [(0, '4.499')] [2023-02-23 10:38:36,196][11477] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth... [2023-02-23 10:38:36,695][11495] Updated weights for policy 0, policy_version 80 (0.0019) [2023-02-23 10:38:41,178][01342] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 3481.6). Total num frames: 348160. Throughput: 0: 1037.4. Samples: 87032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:38:41,188][01342] Avg episode reward: [(0, '4.405')] [2023-02-23 10:38:46,178][01342] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3471.9). Total num frames: 364544. Throughput: 0: 1025.2. Samples: 90042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 10:38:46,185][01342] Avg episode reward: [(0, '4.329')] [2023-02-23 10:38:46,899][11495] Updated weights for policy 0, policy_version 90 (0.0024) [2023-02-23 10:38:51,178][01342] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 968.5. Samples: 94700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:38:51,190][01342] Avg episode reward: [(0, '4.367')] [2023-02-23 10:38:56,178][01342] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3526.1). Total num frames: 405504. Throughput: 0: 1012.7. Samples: 101340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:38:56,181][01342] Avg episode reward: [(0, '4.592')] [2023-02-23 10:38:56,188][11477] Saving new best policy, reward=4.592! [2023-02-23 10:38:57,065][11495] Updated weights for policy 0, policy_version 100 (0.0022) [2023-02-23 10:39:01,178][01342] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 1033.6. Samples: 104800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 10:39:01,187][01342] Avg episode reward: [(0, '4.459')] [2023-02-23 10:39:06,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3538.9). Total num frames: 442368. Throughput: 0: 999.1. Samples: 110494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:39:06,182][01342] Avg episode reward: [(0, '4.382')] [2023-02-23 10:39:08,155][11495] Updated weights for policy 0, policy_version 110 (0.0013) [2023-02-23 10:39:11,178][01342] Fps is (10 sec: 3276.9, 60 sec: 3959.9, 300 sec: 3528.9). Total num frames: 458752. Throughput: 0: 970.4. Samples: 115514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:39:11,186][01342] Avg episode reward: [(0, '4.437')] [2023-02-23 10:39:16,178][01342] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 1002.0. Samples: 119224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:39:16,186][01342] Avg episode reward: [(0, '4.405')] [2023-02-23 10:39:17,288][11495] Updated weights for policy 0, policy_version 120 (0.0030) [2023-02-23 10:39:21,178][01342] Fps is (10 sec: 4915.0, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 1033.0. Samples: 126392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:39:21,184][01342] Avg episode reward: [(0, '4.481')] [2023-02-23 10:39:26,180][01342] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 978.5. Samples: 131068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:39:26,189][01342] Avg episode reward: [(0, '4.532')] [2023-02-23 10:39:28,995][11495] Updated weights for policy 0, policy_version 130 (0.0016) [2023-02-23 10:39:31,178][01342] Fps is (10 sec: 3276.9, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 540672. Throughput: 0: 963.6. Samples: 133404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:39:31,184][01342] Avg episode reward: [(0, '4.411')] [2023-02-23 10:39:36,178][01342] Fps is (10 sec: 4506.6, 60 sec: 4027.7, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 1020.1. Samples: 140606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:39:36,186][01342] Avg episode reward: [(0, '4.316')] [2023-02-23 10:39:37,354][11495] Updated weights for policy 0, policy_version 140 (0.0019) [2023-02-23 10:39:41,178][01342] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3660.8). Total num frames: 585728. Throughput: 0: 1015.4. Samples: 147034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:39:41,185][01342] Avg episode reward: [(0, '4.350')] [2023-02-23 10:39:46,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 990.2. Samples: 149358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:39:46,184][01342] Avg episode reward: [(0, '4.296')] [2023-02-23 10:39:49,014][11495] Updated weights for policy 0, policy_version 150 (0.0018) [2023-02-23 10:39:51,178][01342] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 986.9. Samples: 154904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:39:51,180][01342] Avg episode reward: [(0, '4.376')] [2023-02-23 10:39:56,178][01342] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3674.7). Total num frames: 643072. Throughput: 0: 1026.7. Samples: 161716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:39:56,181][01342] Avg episode reward: [(0, '4.556')] [2023-02-23 10:39:58,431][11495] Updated weights for policy 0, policy_version 160 (0.0016) [2023-02-23 10:40:01,178][01342] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3663.6). Total num frames: 659456. Throughput: 0: 1006.3. Samples: 164508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:01,181][01342] Avg episode reward: [(0, '4.536')] [2023-02-23 10:40:06,179][01342] Fps is (10 sec: 3276.4, 60 sec: 3891.1, 300 sec: 3653.2). Total num frames: 675840. Throughput: 0: 942.3. Samples: 168796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:40:06,182][01342] Avg episode reward: [(0, '4.478')] [2023-02-23 10:40:10,360][11495] Updated weights for policy 0, policy_version 170 (0.0030) [2023-02-23 10:40:11,178][01342] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3664.8). Total num frames: 696320. Throughput: 0: 976.8. Samples: 175024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:40:11,181][01342] Avg episode reward: [(0, '4.554')] [2023-02-23 10:40:16,178][01342] Fps is (10 sec: 4506.2, 60 sec: 3959.5, 300 sec: 3696.9). Total num frames: 720896. Throughput: 0: 999.6. Samples: 178388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:16,181][01342] Avg episode reward: [(0, '4.600')] [2023-02-23 10:40:16,192][11477] Saving new best policy, reward=4.600! [2023-02-23 10:40:21,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.9). Total num frames: 733184. Throughput: 0: 954.1. Samples: 183540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:21,184][01342] Avg episode reward: [(0, '4.603')] [2023-02-23 10:40:21,187][11477] Saving new best policy, reward=4.603! [2023-02-23 10:40:21,561][11495] Updated weights for policy 0, policy_version 180 (0.0015) [2023-02-23 10:40:26,178][01342] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3676.4). Total num frames: 753664. Throughput: 0: 923.0. Samples: 188568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:26,181][01342] Avg episode reward: [(0, '4.564')] [2023-02-23 10:40:31,043][11495] Updated weights for policy 0, policy_version 190 (0.0026) [2023-02-23 10:40:31,178][01342] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3705.9). Total num frames: 778240. Throughput: 0: 952.4. Samples: 192216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:31,180][01342] Avg episode reward: [(0, '4.505')] [2023-02-23 10:40:36,186][01342] Fps is (10 sec: 4502.1, 60 sec: 3890.7, 300 sec: 3714.8). Total num frames: 798720. Throughput: 0: 988.6. Samples: 199398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:40:36,193][01342] Avg episode reward: [(0, '4.675')] [2023-02-23 10:40:36,203][11477] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth... [2023-02-23 10:40:36,337][11477] Saving new best policy, reward=4.675! [2023-02-23 10:40:41,179][01342] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3686.4). Total num frames: 811008. Throughput: 0: 937.3. Samples: 203894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:41,182][01342] Avg episode reward: [(0, '4.652')] [2023-02-23 10:40:42,480][11495] Updated weights for policy 0, policy_version 200 (0.0028) [2023-02-23 10:40:46,178][01342] Fps is (10 sec: 3689.3, 60 sec: 3891.2, 300 sec: 3713.7). Total num frames: 835584. Throughput: 0: 930.2. Samples: 206366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:40:46,181][01342] Avg episode reward: [(0, '4.496')] [2023-02-23 10:40:51,176][11495] Updated weights for policy 0, policy_version 210 (0.0012) [2023-02-23 10:40:51,178][01342] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3722.0). Total num frames: 856064. Throughput: 0: 997.4. Samples: 213676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:40:51,181][01342] Avg episode reward: [(0, '4.499')] [2023-02-23 10:40:56,178][01342] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3730.0). Total num frames: 876544. Throughput: 0: 995.4. Samples: 219816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:40:56,183][01342] Avg episode reward: [(0, '4.591')] [2023-02-23 10:41:01,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3720.5). Total num frames: 892928. Throughput: 0: 970.8. Samples: 222076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:01,183][01342] Avg episode reward: [(0, '4.522')] [2023-02-23 10:41:03,092][11495] Updated weights for policy 0, policy_version 220 (0.0025) [2023-02-23 10:41:06,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3728.2). Total num frames: 913408. Throughput: 0: 982.9. Samples: 227772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:06,185][01342] Avg episode reward: [(0, '4.285')] [2023-02-23 10:41:11,178][01342] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3751.9). Total num frames: 937984. Throughput: 0: 1030.9. Samples: 234960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:41:11,180][01342] Avg episode reward: [(0, '4.296')] [2023-02-23 10:41:11,551][11495] Updated weights for policy 0, policy_version 230 (0.0013) [2023-02-23 10:41:16,178][01342] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3742.6). Total num frames: 954368. Throughput: 0: 1014.9. Samples: 237888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:16,182][01342] Avg episode reward: [(0, '4.392')] [2023-02-23 10:41:21,178][01342] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3733.7). Total num frames: 970752. Throughput: 0: 958.3. Samples: 242516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:21,185][01342] Avg episode reward: [(0, '4.368')] [2023-02-23 10:41:23,035][11495] Updated weights for policy 0, policy_version 240 (0.0026) [2023-02-23 10:41:26,178][01342] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3756.0). Total num frames: 995328. Throughput: 0: 1011.6. Samples: 249416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:41:26,180][01342] Avg episode reward: [(0, '4.462')] [2023-02-23 10:41:31,178][01342] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3777.4). Total num frames: 1019904. Throughput: 0: 1037.8. Samples: 253066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:41:31,182][01342] Avg episode reward: [(0, '4.620')] [2023-02-23 10:41:32,025][11495] Updated weights for policy 0, policy_version 250 (0.0013) [2023-02-23 10:41:36,178][01342] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3753.4). Total num frames: 1032192. Throughput: 0: 995.5. Samples: 258474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:36,180][01342] Avg episode reward: [(0, '4.652')] [2023-02-23 10:41:41,178][01342] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 3759.5). Total num frames: 1052672. Throughput: 0: 976.0. Samples: 263734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:41:41,185][01342] Avg episode reward: [(0, '4.452')] [2023-02-23 10:41:43,221][11495] Updated weights for policy 0, policy_version 260 (0.0011) [2023-02-23 10:41:46,178][01342] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3779.8). Total num frames: 1077248. Throughput: 0: 1006.0. Samples: 267344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:41:46,183][01342] Avg episode reward: [(0, '4.580')] [2023-02-23 10:41:51,178][01342] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 1036.7. Samples: 274424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:41:51,181][01342] Avg episode reward: [(0, '4.569')] [2023-02-23 10:41:53,093][11495] Updated weights for policy 0, policy_version 270 (0.0013) [2023-02-23 10:41:56,179][01342] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3776.6). Total num frames: 1114112. Throughput: 0: 977.8. Samples: 278962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:41:56,184][01342] Avg episode reward: [(0, '4.471')] [2023-02-23 10:42:01,178][01342] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1134592. Throughput: 0: 971.5. Samples: 281606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:42:01,185][01342] Avg episode reward: [(0, '4.647')] [2023-02-23 10:42:03,333][11495] Updated weights for policy 0, policy_version 280 (0.0026) [2023-02-23 10:42:06,178][01342] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 1027.7. Samples: 288764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:42:06,185][01342] Avg episode reward: [(0, '4.502')] [2023-02-23 10:42:11,178][01342] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1008.8. Samples: 294812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:42:11,180][01342] Avg episode reward: [(0, '4.515')] [2023-02-23 10:42:13,829][11495] Updated weights for policy 0, policy_version 290 (0.0013) [2023-02-23 10:42:16,179][01342] Fps is (10 sec: 3276.6, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 1191936. Throughput: 0: 980.3. Samples: 297182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 10:42:16,188][01342] Avg episode reward: [(0, '4.586')] [2023-02-23 10:42:16,600][01342] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 1342], exiting... [2023-02-23 10:42:16,613][11477] Stopping Batcher_0... [2023-02-23 10:42:16,613][11477] Loop batcher_evt_loop terminating... [2023-02-23 10:42:16,614][11477] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth... [2023-02-23 10:42:16,612][01342] Runner profile tree view: main_loop: 336.3199 [2023-02-23 10:42:16,623][01342] Collected {0: 1196032}, FPS: 3556.2 [2023-02-23 10:42:16,632][11499] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,667][11499] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2023-02-23 10:42:16,665][11496] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,676][11494] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,685][11494] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2023-02-23 10:42:16,681][11496] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2023-02-23 10:42:16,692][11495] Weights refcount: 2 0 [2023-02-23 10:42:16,662][11491] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,694][11491] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2023-02-23 10:42:16,713][11495] Stopping InferenceWorker_p0-w0... [2023-02-23 10:42:16,739][11495] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 10:42:16,731][11497] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,810][11497] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2023-02-23 10:42:16,768][11492] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,874][11492] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2023-02-23 10:42:16,762][11498] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,914][11498] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2023-02-23 10:42:16,881][11493] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(0, 0) Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 384, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 88, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 319, in step return self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-02-23 10:42:16,992][11493] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2023-02-23 10:42:17,049][11477] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_323584.pth [2023-02-23 10:42:17,064][11477] Stopping LearnerWorker_p0... [2023-02-23 10:42:17,072][11477] Loop learner_proc0_evt_loop terminating... [2023-02-23 10:42:17,095][01342] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 10:42:17,098][01342] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 10:42:17,100][01342] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 10:42:17,102][01342] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 10:42:17,104][01342] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 10:42:17,105][01342] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 10:42:17,107][01342] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 10:42:17,110][01342] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 10:42:17,111][01342] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 10:42:17,113][01342] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 10:42:17,114][01342] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 10:42:17,117][01342] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 10:42:17,118][01342] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 10:42:17,119][01342] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 10:42:17,121][01342] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 10:42:17,163][01342] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:42:17,166][01342] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 10:42:17,173][01342] RunningMeanStd input shape: (1,) [2023-02-23 10:42:17,215][01342] ConvEncoder: input_channels=3 [2023-02-23 10:42:17,668][01342] Conv encoder output size: 512 [2023-02-23 10:42:17,671][01342] Policy head output size: 512 [2023-02-23 10:56:18,410][18366] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 10:56:18,414][18366] Rollout worker 0 uses device cpu [2023-02-23 10:56:18,416][18366] Rollout worker 1 uses device cpu [2023-02-23 10:56:18,417][18366] Rollout worker 2 uses device cpu [2023-02-23 10:56:18,419][18366] Rollout worker 3 uses device cpu [2023-02-23 10:56:18,420][18366] Rollout worker 4 uses device cpu [2023-02-23 10:56:18,422][18366] Rollout worker 5 uses device cpu [2023-02-23 10:56:18,423][18366] Rollout worker 6 uses device cpu [2023-02-23 10:56:18,425][18366] Rollout worker 7 uses device cpu [2023-02-23 10:56:18,568][18366] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:56:18,570][18366] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 10:56:18,602][18366] Starting all processes... [2023-02-23 10:56:18,604][18366] Starting process learner_proc0 [2023-02-23 10:56:18,668][18366] Starting all processes... [2023-02-23 10:56:18,677][18366] Starting process inference_proc0-0 [2023-02-23 10:56:18,677][18366] Starting process rollout_proc0 [2023-02-23 10:56:18,679][18366] Starting process rollout_proc1 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc2 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc3 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc4 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc5 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc6 [2023-02-23 10:56:18,680][18366] Starting process rollout_proc7 [2023-02-23 10:56:31,236][18654] Worker 0 uses CPU cores [0] [2023-02-23 10:56:31,329][18657] Worker 7 uses CPU cores [1] [2023-02-23 10:56:31,491][18639] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:56:31,493][18639] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 10:56:31,562][18655] Worker 1 uses CPU cores [1] [2023-02-23 10:56:31,593][18653] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:56:31,594][18653] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 10:56:31,643][18661] Worker 4 uses CPU cores [0] [2023-02-23 10:56:31,680][18658] Worker 5 uses CPU cores [1] [2023-02-23 10:56:31,868][18656] Worker 3 uses CPU cores [1] [2023-02-23 10:56:31,966][18659] Worker 6 uses CPU cores [0] [2023-02-23 10:56:32,205][18660] Worker 2 uses CPU cores [0] [2023-02-23 10:56:32,347][18653] Num visible devices: 1 [2023-02-23 10:56:32,351][18639] Num visible devices: 1 [2023-02-23 10:56:32,375][18639] Starting seed is not provided [2023-02-23 10:56:32,376][18639] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:56:32,377][18639] Initializing actor-critic model on device cuda:0 [2023-02-23 10:56:32,378][18639] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 10:56:32,379][18639] RunningMeanStd input shape: (1,) [2023-02-23 10:56:32,416][18639] ConvEncoder: input_channels=3 [2023-02-23 10:56:32,693][18639] Conv encoder output size: 512 [2023-02-23 10:56:32,695][18639] Policy head output size: 512 [2023-02-23 10:56:32,720][18639] Created Actor Critic model with architecture: [2023-02-23 10:56:32,721][18639] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 10:56:37,348][18639] Using optimizer [2023-02-23 10:56:37,354][18639] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth... [2023-02-23 10:56:37,459][18639] Loading model from checkpoint [2023-02-23 10:56:37,483][18639] Loaded experiment state at self.train_step=292, self.env_steps=1196032 [2023-02-23 10:56:37,483][18639] Initialized policy 0 weights for model version 292 [2023-02-23 10:56:37,500][18639] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 10:56:37,523][18639] LearnerWorker_p0 finished initialization! [2023-02-23 10:56:38,273][18653] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 10:56:38,281][18653] RunningMeanStd input shape: (1,) [2023-02-23 10:56:38,380][18653] ConvEncoder: input_channels=3 [2023-02-23 10:56:38,564][18366] Heartbeat connected on LearnerWorker_p0 [2023-02-23 10:56:38,566][18366] Heartbeat connected on Batcher_0 [2023-02-23 10:56:38,583][18366] Heartbeat connected on RolloutWorker_w1 [2023-02-23 10:56:38,588][18366] Heartbeat connected on RolloutWorker_w2 [2023-02-23 10:56:38,599][18366] Heartbeat connected on RolloutWorker_w0 [2023-02-23 10:56:38,608][18366] Heartbeat connected on RolloutWorker_w3 [2023-02-23 10:56:38,609][18366] Heartbeat connected on RolloutWorker_w4 [2023-02-23 10:56:38,613][18366] Heartbeat connected on RolloutWorker_w5 [2023-02-23 10:56:38,647][18366] Heartbeat connected on RolloutWorker_w6 [2023-02-23 10:56:38,648][18366] Heartbeat connected on RolloutWorker_w7 [2023-02-23 10:56:38,932][18653] Conv encoder output size: 512 [2023-02-23 10:56:38,933][18653] Policy head output size: 512 [2023-02-23 10:56:40,187][18366] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1196032. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:56:41,720][18366] Inference worker 0-0 is ready! [2023-02-23 10:56:41,721][18366] All inference workers are ready! Signal rollout workers to start! [2023-02-23 10:56:41,728][18366] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 10:56:41,818][18661] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,821][18660] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,823][18659] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,826][18654] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,837][18655] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,839][18656] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,840][18657] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:41,836][18658] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 10:56:42,292][18656] Decorrelating experience for 0 frames... [2023-02-23 10:56:42,649][18656] Decorrelating experience for 32 frames... [2023-02-23 10:56:43,051][18656] Decorrelating experience for 64 frames... [2023-02-23 10:56:43,198][18661] Decorrelating experience for 0 frames... [2023-02-23 10:56:43,201][18659] Decorrelating experience for 0 frames... [2023-02-23 10:56:43,205][18660] Decorrelating experience for 0 frames... [2023-02-23 10:56:43,207][18654] Decorrelating experience for 0 frames... [2023-02-23 10:56:44,251][18660] Decorrelating experience for 32 frames... [2023-02-23 10:56:44,254][18661] Decorrelating experience for 32 frames... [2023-02-23 10:56:44,257][18659] Decorrelating experience for 32 frames... [2023-02-23 10:56:44,298][18657] Decorrelating experience for 0 frames... [2023-02-23 10:56:44,299][18658] Decorrelating experience for 0 frames... [2023-02-23 10:56:44,307][18656] Decorrelating experience for 96 frames... [2023-02-23 10:56:45,187][18366] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1196032. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:56:45,496][18655] Decorrelating experience for 0 frames... [2023-02-23 10:56:45,506][18658] Decorrelating experience for 32 frames... [2023-02-23 10:56:45,888][18654] Decorrelating experience for 32 frames... [2023-02-23 10:56:46,060][18661] Decorrelating experience for 64 frames... [2023-02-23 10:56:46,067][18659] Decorrelating experience for 64 frames... [2023-02-23 10:56:46,960][18654] Decorrelating experience for 64 frames... [2023-02-23 10:56:47,003][18659] Decorrelating experience for 96 frames... [2023-02-23 10:56:47,285][18655] Decorrelating experience for 32 frames... [2023-02-23 10:56:47,380][18657] Decorrelating experience for 32 frames... [2023-02-23 10:56:47,739][18658] Decorrelating experience for 64 frames... [2023-02-23 10:56:48,185][18654] Decorrelating experience for 96 frames... [2023-02-23 10:56:49,036][18660] Decorrelating experience for 64 frames... [2023-02-23 10:56:49,893][18661] Decorrelating experience for 96 frames... [2023-02-23 10:56:50,188][18366] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1196032. Throughput: 0: 54.4. Samples: 544. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 10:56:50,190][18366] Avg episode reward: [(0, '3.147')] [2023-02-23 10:56:50,369][18655] Decorrelating experience for 64 frames... [2023-02-23 10:56:50,470][18657] Decorrelating experience for 64 frames... [2023-02-23 10:56:50,853][18658] Decorrelating experience for 96 frames... [2023-02-23 10:56:52,451][18639] Signal inference workers to stop experience collection... [2023-02-23 10:56:52,473][18653] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 10:56:52,699][18660] Decorrelating experience for 96 frames... [2023-02-23 10:56:52,997][18639] Signal inference workers to resume experience collection... [2023-02-23 10:56:52,998][18653] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 10:56:55,187][18366] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 1200128. Throughput: 0: 174.3. Samples: 2614. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-23 10:56:55,190][18366] Avg episode reward: [(0, '3.254')] [2023-02-23 10:56:55,470][18655] Decorrelating experience for 96 frames... [2023-02-23 10:56:55,663][18657] Decorrelating experience for 96 frames... [2023-02-23 10:57:00,187][18366] Fps is (10 sec: 2048.1, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 1216512. Throughput: 0: 311.8. Samples: 6236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:57:00,192][18366] Avg episode reward: [(0, '3.496')] [2023-02-23 10:57:04,020][18653] Updated weights for policy 0, policy_version 302 (0.0042) [2023-02-23 10:57:05,187][18366] Fps is (10 sec: 4096.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 1241088. Throughput: 0: 378.2. Samples: 9456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:57:05,190][18366] Avg episode reward: [(0, '4.136')] [2023-02-23 10:57:10,188][18366] Fps is (10 sec: 4915.1, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 1265664. Throughput: 0: 549.7. Samples: 16490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:57:10,191][18366] Avg episode reward: [(0, '4.622')] [2023-02-23 10:57:13,871][18653] Updated weights for policy 0, policy_version 312 (0.0022) [2023-02-23 10:57:15,187][18366] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 1282048. Throughput: 0: 620.7. Samples: 21726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:57:15,190][18366] Avg episode reward: [(0, '4.519')] [2023-02-23 10:57:20,188][18366] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 1294336. Throughput: 0: 598.2. Samples: 23928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:57:20,190][18366] Avg episode reward: [(0, '4.480')] [2023-02-23 10:57:24,843][18653] Updated weights for policy 0, policy_version 322 (0.0015) [2023-02-23 10:57:25,187][18366] Fps is (10 sec: 3686.4, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 1318912. Throughput: 0: 667.0. Samples: 30016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:57:25,189][18366] Avg episode reward: [(0, '4.379')] [2023-02-23 10:57:30,191][18366] Fps is (10 sec: 4913.3, 60 sec: 2948.9, 300 sec: 2948.9). Total num frames: 1343488. Throughput: 0: 826.4. Samples: 37190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:57:30,194][18366] Avg episode reward: [(0, '4.396')] [2023-02-23 10:57:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 1355776. Throughput: 0: 868.4. Samples: 39620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:57:35,190][18366] Avg episode reward: [(0, '4.363')] [2023-02-23 10:57:35,297][18653] Updated weights for policy 0, policy_version 332 (0.0019) [2023-02-23 10:57:40,187][18366] Fps is (10 sec: 2868.4, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 1372160. Throughput: 0: 921.6. Samples: 44086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:57:40,190][18366] Avg episode reward: [(0, '4.453')] [2023-02-23 10:57:45,188][18366] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 1396736. Throughput: 0: 987.9. Samples: 50690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:57:45,193][18366] Avg episode reward: [(0, '4.676')] [2023-02-23 10:57:45,203][18639] Saving new best policy, reward=4.676! [2023-02-23 10:57:45,764][18653] Updated weights for policy 0, policy_version 342 (0.0031) [2023-02-23 10:57:50,188][18366] Fps is (10 sec: 4915.0, 60 sec: 3754.7, 300 sec: 3218.3). Total num frames: 1421312. Throughput: 0: 991.5. Samples: 54074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:57:50,190][18366] Avg episode reward: [(0, '4.643')] [2023-02-23 10:57:55,187][18366] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3167.6). Total num frames: 1433600. Throughput: 0: 956.5. Samples: 59534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:57:55,199][18366] Avg episode reward: [(0, '4.741')] [2023-02-23 10:57:55,212][18639] Saving new best policy, reward=4.741! [2023-02-23 10:57:57,172][18653] Updated weights for policy 0, policy_version 352 (0.0012) [2023-02-23 10:58:00,187][18366] Fps is (10 sec: 2867.3, 60 sec: 3891.2, 300 sec: 3174.4). Total num frames: 1449984. Throughput: 0: 938.4. Samples: 63952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:00,195][18366] Avg episode reward: [(0, '4.551')] [2023-02-23 10:58:05,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 1474560. Throughput: 0: 966.4. Samples: 67416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:05,189][18366] Avg episode reward: [(0, '4.483')] [2023-02-23 10:58:06,891][18653] Updated weights for policy 0, policy_version 362 (0.0029) [2023-02-23 10:58:10,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3322.3). Total num frames: 1495040. Throughput: 0: 991.1. Samples: 74614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:58:10,193][18366] Avg episode reward: [(0, '4.748')] [2023-02-23 10:58:10,209][18639] Saving new best policy, reward=4.748! [2023-02-23 10:58:15,188][18366] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3319.9). Total num frames: 1511424. Throughput: 0: 942.3. Samples: 79588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 10:58:15,194][18366] Avg episode reward: [(0, '4.772')] [2023-02-23 10:58:15,207][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000369_1511424.pth... [2023-02-23 10:58:15,384][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth [2023-02-23 10:58:15,396][18639] Saving new best policy, reward=4.772! [2023-02-23 10:58:19,580][18653] Updated weights for policy 0, policy_version 372 (0.0045) [2023-02-23 10:58:20,187][18366] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 1523712. Throughput: 0: 928.9. Samples: 81422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:20,195][18366] Avg episode reward: [(0, '4.863')] [2023-02-23 10:58:20,199][18639] Saving new best policy, reward=4.863! [2023-02-23 10:58:25,187][18366] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3354.8). Total num frames: 1548288. Throughput: 0: 959.9. Samples: 87280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:25,189][18366] Avg episode reward: [(0, '4.938')] [2023-02-23 10:58:25,209][18639] Saving new best policy, reward=4.938! [2023-02-23 10:58:28,402][18653] Updated weights for policy 0, policy_version 382 (0.0015) [2023-02-23 10:58:30,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3754.9, 300 sec: 3388.5). Total num frames: 1568768. Throughput: 0: 970.7. Samples: 94372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:58:30,190][18366] Avg episode reward: [(0, '4.778')] [2023-02-23 10:58:35,194][18366] Fps is (10 sec: 3684.0, 60 sec: 3822.5, 300 sec: 3383.5). Total num frames: 1585152. Throughput: 0: 944.0. Samples: 96558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:58:35,197][18366] Avg episode reward: [(0, '4.811')] [2023-02-23 10:58:40,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3379.2). Total num frames: 1601536. Throughput: 0: 922.9. Samples: 101064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:40,190][18366] Avg episode reward: [(0, '4.979')] [2023-02-23 10:58:40,197][18639] Saving new best policy, reward=4.979! [2023-02-23 10:58:40,627][18653] Updated weights for policy 0, policy_version 392 (0.0022) [2023-02-23 10:58:45,187][18366] Fps is (10 sec: 4098.7, 60 sec: 3823.0, 300 sec: 3440.6). Total num frames: 1626112. Throughput: 0: 977.1. Samples: 107920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 10:58:45,190][18366] Avg episode reward: [(0, '5.290')] [2023-02-23 10:58:45,201][18639] Saving new best policy, reward=5.290! [2023-02-23 10:58:49,551][18653] Updated weights for policy 0, policy_version 402 (0.0019) [2023-02-23 10:58:50,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3465.8). Total num frames: 1646592. Throughput: 0: 975.9. Samples: 111332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:58:50,193][18366] Avg episode reward: [(0, '5.149')] [2023-02-23 10:58:55,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3458.8). Total num frames: 1662976. Throughput: 0: 930.6. Samples: 116492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:58:55,195][18366] Avg episode reward: [(0, '4.972')] [2023-02-23 10:59:00,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3452.3). Total num frames: 1679360. Throughput: 0: 924.2. Samples: 121176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:59:00,195][18366] Avg episode reward: [(0, '4.872')] [2023-02-23 10:59:01,825][18653] Updated weights for policy 0, policy_version 412 (0.0030) [2023-02-23 10:59:05,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3474.5). Total num frames: 1699840. Throughput: 0: 961.1. Samples: 124672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:59:05,189][18366] Avg episode reward: [(0, '5.138')] [2023-02-23 10:59:10,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3522.6). Total num frames: 1724416. Throughput: 0: 986.6. Samples: 131678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:59:10,192][18366] Avg episode reward: [(0, '5.497')] [2023-02-23 10:59:10,204][18639] Saving new best policy, reward=5.497! [2023-02-23 10:59:11,076][18653] Updated weights for policy 0, policy_version 422 (0.0018) [2023-02-23 10:59:15,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3488.2). Total num frames: 1736704. Throughput: 0: 932.1. Samples: 136318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:59:15,190][18366] Avg episode reward: [(0, '5.408')] [2023-02-23 10:59:20,187][18366] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 1753088. Throughput: 0: 933.4. Samples: 138554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 10:59:20,190][18366] Avg episode reward: [(0, '5.391')] [2023-02-23 10:59:22,769][18653] Updated weights for policy 0, policy_version 432 (0.0023) [2023-02-23 10:59:25,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3525.0). Total num frames: 1777664. Throughput: 0: 974.0. Samples: 144894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:59:25,193][18366] Avg episode reward: [(0, '5.010')] [2023-02-23 10:59:30,189][18366] Fps is (10 sec: 4504.9, 60 sec: 3822.8, 300 sec: 3541.8). Total num frames: 1798144. Throughput: 0: 972.8. Samples: 151698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 10:59:30,194][18366] Avg episode reward: [(0, '4.844')] [2023-02-23 10:59:33,007][18653] Updated weights for policy 0, policy_version 442 (0.0018) [2023-02-23 10:59:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3534.3). Total num frames: 1814528. Throughput: 0: 947.8. Samples: 153982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:59:35,190][18366] Avg episode reward: [(0, '4.945')] [2023-02-23 10:59:40,188][18366] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3527.1). Total num frames: 1830912. Throughput: 0: 932.9. Samples: 158472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:59:40,191][18366] Avg episode reward: [(0, '4.973')] [2023-02-23 10:59:43,778][18653] Updated weights for policy 0, policy_version 452 (0.0036) [2023-02-23 10:59:45,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3564.6). Total num frames: 1855488. Throughput: 0: 986.4. Samples: 165566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 10:59:45,190][18366] Avg episode reward: [(0, '5.144')] [2023-02-23 10:59:50,187][18366] Fps is (10 sec: 4506.0, 60 sec: 3822.9, 300 sec: 3578.6). Total num frames: 1875968. Throughput: 0: 989.2. Samples: 169186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 10:59:50,194][18366] Avg episode reward: [(0, '5.189')] [2023-02-23 10:59:54,671][18653] Updated weights for policy 0, policy_version 462 (0.0028) [2023-02-23 10:59:55,188][18366] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 1892352. Throughput: 0: 944.7. Samples: 174188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 10:59:55,193][18366] Avg episode reward: [(0, '5.065')] [2023-02-23 11:00:00,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3563.5). Total num frames: 1908736. Throughput: 0: 946.2. Samples: 178898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:00:00,195][18366] Avg episode reward: [(0, '5.092')] [2023-02-23 11:00:04,863][18653] Updated weights for policy 0, policy_version 472 (0.0028) [2023-02-23 11:00:05,187][18366] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3596.5). Total num frames: 1933312. Throughput: 0: 974.8. Samples: 182418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:00:05,190][18366] Avg episode reward: [(0, '4.806')] [2023-02-23 11:00:10,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3608.4). Total num frames: 1953792. Throughput: 0: 993.4. Samples: 189598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:00:10,190][18366] Avg episode reward: [(0, '5.132')] [2023-02-23 11:00:15,195][18366] Fps is (10 sec: 3683.6, 60 sec: 3890.7, 300 sec: 3600.5). Total num frames: 1970176. Throughput: 0: 942.4. Samples: 194110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:00:15,202][18366] Avg episode reward: [(0, '5.459')] [2023-02-23 11:00:15,223][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000481_1970176.pth... [2023-02-23 11:00:15,382][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth [2023-02-23 11:00:16,260][18653] Updated weights for policy 0, policy_version 482 (0.0015) [2023-02-23 11:00:20,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3593.3). Total num frames: 1986560. Throughput: 0: 940.4. Samples: 196298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:00:20,189][18366] Avg episode reward: [(0, '5.709')] [2023-02-23 11:00:20,195][18639] Saving new best policy, reward=5.709! [2023-02-23 11:00:25,187][18366] Fps is (10 sec: 4099.1, 60 sec: 3891.2, 300 sec: 3622.7). Total num frames: 2011136. Throughput: 0: 986.6. Samples: 202868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:00:25,189][18366] Avg episode reward: [(0, '5.462')] [2023-02-23 11:00:25,919][18653] Updated weights for policy 0, policy_version 492 (0.0024) [2023-02-23 11:00:30,188][18366] Fps is (10 sec: 4505.4, 60 sec: 3891.3, 300 sec: 3633.0). Total num frames: 2031616. Throughput: 0: 975.3. Samples: 209454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:00:30,194][18366] Avg episode reward: [(0, '5.204')] [2023-02-23 11:00:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3625.4). Total num frames: 2048000. Throughput: 0: 943.5. Samples: 211644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 11:00:35,191][18366] Avg episode reward: [(0, '5.325')] [2023-02-23 11:00:37,954][18653] Updated weights for policy 0, policy_version 502 (0.0017) [2023-02-23 11:00:40,187][18366] Fps is (10 sec: 3276.9, 60 sec: 3891.3, 300 sec: 3618.1). Total num frames: 2064384. Throughput: 0: 935.5. Samples: 216286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 11:00:40,189][18366] Avg episode reward: [(0, '5.156')] [2023-02-23 11:00:45,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3644.6). Total num frames: 2088960. Throughput: 0: 991.6. Samples: 223518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 11:00:45,196][18366] Avg episode reward: [(0, '5.178')] [2023-02-23 11:00:46,663][18653] Updated weights for policy 0, policy_version 512 (0.0017) [2023-02-23 11:00:50,189][18366] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3653.6). Total num frames: 2109440. Throughput: 0: 992.4. Samples: 227076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:00:50,192][18366] Avg episode reward: [(0, '5.111')] [2023-02-23 11:00:55,190][18366] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3630.1). Total num frames: 2121728. Throughput: 0: 936.3. Samples: 231736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:00:55,201][18366] Avg episode reward: [(0, '5.029')] [2023-02-23 11:00:59,225][18653] Updated weights for policy 0, policy_version 522 (0.0014) [2023-02-23 11:01:00,187][18366] Fps is (10 sec: 3277.3, 60 sec: 3891.2, 300 sec: 3639.1). Total num frames: 2142208. Throughput: 0: 949.5. Samples: 236832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:01:00,190][18366] Avg episode reward: [(0, '5.020')] [2023-02-23 11:01:05,187][18366] Fps is (10 sec: 4506.7, 60 sec: 3891.2, 300 sec: 3663.2). Total num frames: 2166784. Throughput: 0: 980.0. Samples: 240400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:01:05,195][18366] Avg episode reward: [(0, '5.351')] [2023-02-23 11:01:07,611][18653] Updated weights for policy 0, policy_version 532 (0.0013) [2023-02-23 11:01:10,190][18366] Fps is (10 sec: 4094.9, 60 sec: 3822.8, 300 sec: 3656.0). Total num frames: 2183168. Throughput: 0: 986.4. Samples: 247258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:01:10,192][18366] Avg episode reward: [(0, '5.347')] [2023-02-23 11:01:15,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3823.4, 300 sec: 3649.2). Total num frames: 2199552. Throughput: 0: 939.7. Samples: 251740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:01:15,192][18366] Avg episode reward: [(0, '5.329')] [2023-02-23 11:01:19,986][18653] Updated weights for policy 0, policy_version 542 (0.0020) [2023-02-23 11:01:20,187][18366] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3657.1). Total num frames: 2220032. Throughput: 0: 942.0. Samples: 254034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:01:20,195][18366] Avg episode reward: [(0, '5.457')] [2023-02-23 11:01:25,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3679.2). Total num frames: 2244608. Throughput: 0: 996.6. Samples: 261132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:01:25,190][18366] Avg episode reward: [(0, '5.524')] [2023-02-23 11:01:29,014][18653] Updated weights for policy 0, policy_version 552 (0.0038) [2023-02-23 11:01:30,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3672.3). Total num frames: 2260992. Throughput: 0: 975.3. Samples: 267408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:01:30,192][18366] Avg episode reward: [(0, '5.545')] [2023-02-23 11:01:35,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2277376. Throughput: 0: 946.8. Samples: 269682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:01:35,197][18366] Avg episode reward: [(0, '5.399')] [2023-02-23 11:01:40,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2297856. Throughput: 0: 959.4. Samples: 274908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:01:40,189][18366] Avg episode reward: [(0, '5.256')] [2023-02-23 11:01:40,535][18653] Updated weights for policy 0, policy_version 562 (0.0021) [2023-02-23 11:01:45,188][18366] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2322432. Throughput: 0: 1005.6. Samples: 282084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:01:45,194][18366] Avg episode reward: [(0, '5.189')] [2023-02-23 11:01:50,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 2338816. Throughput: 0: 1001.4. Samples: 285464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:01:50,191][18366] Avg episode reward: [(0, '5.317')] [2023-02-23 11:01:50,328][18653] Updated weights for policy 0, policy_version 572 (0.0012) [2023-02-23 11:01:55,190][18366] Fps is (10 sec: 3276.1, 60 sec: 3891.2, 300 sec: 3859.9). Total num frames: 2355200. Throughput: 0: 949.3. Samples: 289976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:01:55,194][18366] Avg episode reward: [(0, '5.458')] [2023-02-23 11:02:00,188][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2375680. Throughput: 0: 971.9. Samples: 295476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:02:00,193][18366] Avg episode reward: [(0, '5.494')] [2023-02-23 11:02:01,398][18653] Updated weights for policy 0, policy_version 582 (0.0019) [2023-02-23 11:02:05,187][18366] Fps is (10 sec: 4506.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2400256. Throughput: 0: 999.4. Samples: 299008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:02:05,194][18366] Avg episode reward: [(0, '5.618')] [2023-02-23 11:02:10,187][18366] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 2416640. Throughput: 0: 988.0. Samples: 305590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:02:10,190][18366] Avg episode reward: [(0, '5.878')] [2023-02-23 11:02:10,192][18639] Saving new best policy, reward=5.878! [2023-02-23 11:02:11,791][18653] Updated weights for policy 0, policy_version 592 (0.0012) [2023-02-23 11:02:15,188][18366] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2433024. Throughput: 0: 946.6. Samples: 310006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:15,193][18366] Avg episode reward: [(0, '6.393')] [2023-02-23 11:02:15,217][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000594_2433024.pth... [2023-02-23 11:02:15,445][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000369_1511424.pth [2023-02-23 11:02:15,464][18639] Saving new best policy, reward=6.393! [2023-02-23 11:02:20,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2453504. Throughput: 0: 949.0. Samples: 312386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:20,190][18366] Avg episode reward: [(0, '6.393')] [2023-02-23 11:02:22,497][18653] Updated weights for policy 0, policy_version 602 (0.0024) [2023-02-23 11:02:25,187][18366] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2478080. Throughput: 0: 991.7. Samples: 319536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:25,190][18366] Avg episode reward: [(0, '6.274')] [2023-02-23 11:02:30,192][18366] Fps is (10 sec: 4094.3, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2494464. Throughput: 0: 966.9. Samples: 325598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:30,199][18366] Avg episode reward: [(0, '5.954')] [2023-02-23 11:02:33,226][18653] Updated weights for policy 0, policy_version 612 (0.0016) [2023-02-23 11:02:35,188][18366] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2510848. Throughput: 0: 942.0. Samples: 327856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:02:35,196][18366] Avg episode reward: [(0, '6.170')] [2023-02-23 11:02:40,187][18366] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2531328. Throughput: 0: 962.4. Samples: 333280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:02:40,190][18366] Avg episode reward: [(0, '6.383')] [2023-02-23 11:02:43,017][18653] Updated weights for policy 0, policy_version 622 (0.0015) [2023-02-23 11:02:45,187][18366] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2555904. Throughput: 0: 1002.6. Samples: 340592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:45,190][18366] Avg episode reward: [(0, '6.151')] [2023-02-23 11:02:50,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2572288. Throughput: 0: 995.5. Samples: 343804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:50,190][18366] Avg episode reward: [(0, '5.783')] [2023-02-23 11:02:54,323][18653] Updated weights for policy 0, policy_version 632 (0.0035) [2023-02-23 11:02:55,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 2588672. Throughput: 0: 948.9. Samples: 348292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:02:55,198][18366] Avg episode reward: [(0, '5.634')] [2023-02-23 11:03:00,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2609152. Throughput: 0: 981.2. Samples: 354160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:03:00,189][18366] Avg episode reward: [(0, '5.551')] [2023-02-23 11:03:03,791][18653] Updated weights for policy 0, policy_version 642 (0.0013) [2023-02-23 11:03:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2633728. Throughput: 0: 1008.4. Samples: 357762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:03:05,190][18366] Avg episode reward: [(0, '5.792')] [2023-02-23 11:03:10,192][18366] Fps is (10 sec: 4094.1, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2650112. Throughput: 0: 987.6. Samples: 363982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:10,195][18366] Avg episode reward: [(0, '6.294')] [2023-02-23 11:03:15,188][18366] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2666496. Throughput: 0: 951.4. Samples: 368408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:15,191][18366] Avg episode reward: [(0, '6.221')] [2023-02-23 11:03:15,792][18653] Updated weights for policy 0, policy_version 652 (0.0012) [2023-02-23 11:03:20,187][18366] Fps is (10 sec: 3688.1, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2686976. Throughput: 0: 964.3. Samples: 371250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:03:20,190][18366] Avg episode reward: [(0, '6.020')] [2023-02-23 11:03:24,625][18653] Updated weights for policy 0, policy_version 662 (0.0017) [2023-02-23 11:03:25,187][18366] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2711552. Throughput: 0: 1002.0. Samples: 378372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:03:25,190][18366] Avg episode reward: [(0, '6.422')] [2023-02-23 11:03:25,205][18639] Saving new best policy, reward=6.422! [2023-02-23 11:03:30,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3873.9). Total num frames: 2727936. Throughput: 0: 964.4. Samples: 383992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:30,190][18366] Avg episode reward: [(0, '6.459')] [2023-02-23 11:03:30,197][18639] Saving new best policy, reward=6.459! [2023-02-23 11:03:35,188][18366] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2744320. Throughput: 0: 941.3. Samples: 386162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:35,191][18366] Avg episode reward: [(0, '6.720')] [2023-02-23 11:03:35,215][18639] Saving new best policy, reward=6.720! [2023-02-23 11:03:36,971][18653] Updated weights for policy 0, policy_version 672 (0.0017) [2023-02-23 11:03:40,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2764800. Throughput: 0: 969.1. Samples: 391902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:03:40,193][18366] Avg episode reward: [(0, '6.746')] [2023-02-23 11:03:40,222][18639] Saving new best policy, reward=6.746! [2023-02-23 11:03:45,188][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2789376. Throughput: 0: 997.9. Samples: 399064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:03:45,194][18366] Avg episode reward: [(0, '7.082')] [2023-02-23 11:03:45,206][18639] Saving new best policy, reward=7.082! [2023-02-23 11:03:45,515][18653] Updated weights for policy 0, policy_version 682 (0.0013) [2023-02-23 11:03:50,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2805760. Throughput: 0: 978.6. Samples: 401798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:50,193][18366] Avg episode reward: [(0, '6.955')] [2023-02-23 11:03:55,187][18366] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2822144. Throughput: 0: 939.3. Samples: 406246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:03:55,190][18366] Avg episode reward: [(0, '7.043')] [2023-02-23 11:03:57,651][18653] Updated weights for policy 0, policy_version 692 (0.0015) [2023-02-23 11:04:00,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2842624. Throughput: 0: 982.4. Samples: 412616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:00,196][18366] Avg episode reward: [(0, '7.316')] [2023-02-23 11:04:00,307][18639] Saving new best policy, reward=7.316! [2023-02-23 11:04:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2867200. Throughput: 0: 997.4. Samples: 416132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:05,195][18366] Avg episode reward: [(0, '7.154')] [2023-02-23 11:04:06,520][18653] Updated weights for policy 0, policy_version 702 (0.0020) [2023-02-23 11:04:10,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3887.7). Total num frames: 2883584. Throughput: 0: 970.3. Samples: 422036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:10,190][18366] Avg episode reward: [(0, '7.887')] [2023-02-23 11:04:10,198][18639] Saving new best policy, reward=7.887! [2023-02-23 11:04:15,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2899968. Throughput: 0: 944.2. Samples: 426480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:04:15,195][18366] Avg episode reward: [(0, '8.004')] [2023-02-23 11:04:15,216][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000708_2899968.pth... [2023-02-23 11:04:15,467][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000481_1970176.pth [2023-02-23 11:04:15,483][18639] Saving new best policy, reward=8.004! [2023-02-23 11:04:18,703][18653] Updated weights for policy 0, policy_version 712 (0.0020) [2023-02-23 11:04:20,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2920448. Throughput: 0: 964.1. Samples: 429546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:04:20,194][18366] Avg episode reward: [(0, '8.189')] [2023-02-23 11:04:20,198][18639] Saving new best policy, reward=8.189! [2023-02-23 11:04:25,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2945024. Throughput: 0: 991.4. Samples: 436514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:04:25,194][18366] Avg episode reward: [(0, '7.926')] [2023-02-23 11:04:28,470][18653] Updated weights for policy 0, policy_version 722 (0.0012) [2023-02-23 11:04:30,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2961408. Throughput: 0: 950.4. Samples: 441834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:30,193][18366] Avg episode reward: [(0, '7.550')] [2023-02-23 11:04:35,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2977792. Throughput: 0: 938.5. Samples: 444032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:35,190][18366] Avg episode reward: [(0, '7.569')] [2023-02-23 11:04:39,627][18653] Updated weights for policy 0, policy_version 732 (0.0012) [2023-02-23 11:04:40,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2998272. Throughput: 0: 974.5. Samples: 450100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:04:40,193][18366] Avg episode reward: [(0, '7.834')] [2023-02-23 11:04:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3022848. Throughput: 0: 995.7. Samples: 457422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:04:45,193][18366] Avg episode reward: [(0, '9.052')] [2023-02-23 11:04:45,204][18639] Saving new best policy, reward=9.052! [2023-02-23 11:04:49,534][18653] Updated weights for policy 0, policy_version 742 (0.0020) [2023-02-23 11:04:50,189][18366] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 3039232. Throughput: 0: 972.7. Samples: 459906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:50,195][18366] Avg episode reward: [(0, '10.136')] [2023-02-23 11:04:50,197][18639] Saving new best policy, reward=10.136! [2023-02-23 11:04:55,188][18366] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3055616. Throughput: 0: 940.7. Samples: 464370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:04:55,192][18366] Avg episode reward: [(0, '10.219')] [2023-02-23 11:04:55,208][18639] Saving new best policy, reward=10.219! [2023-02-23 11:05:00,187][18366] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3076096. Throughput: 0: 983.6. Samples: 470744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:05:00,195][18366] Avg episode reward: [(0, '10.340')] [2023-02-23 11:05:00,199][18639] Saving new best policy, reward=10.340! [2023-02-23 11:05:00,481][18653] Updated weights for policy 0, policy_version 752 (0.0018) [2023-02-23 11:05:05,187][18366] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3100672. Throughput: 0: 993.8. Samples: 474266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:05,190][18366] Avg episode reward: [(0, '9.585')] [2023-02-23 11:05:10,188][18366] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3887.8). Total num frames: 3117056. Throughput: 0: 965.4. Samples: 479956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:05:10,191][18366] Avg episode reward: [(0, '10.289')] [2023-02-23 11:05:11,298][18653] Updated weights for policy 0, policy_version 762 (0.0025) [2023-02-23 11:05:15,187][18366] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3129344. Throughput: 0: 947.9. Samples: 484488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:15,190][18366] Avg episode reward: [(0, '10.592')] [2023-02-23 11:05:15,268][18639] Saving new best policy, reward=10.592! [2023-02-23 11:05:20,187][18366] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3153920. Throughput: 0: 975.0. Samples: 487906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:20,194][18366] Avg episode reward: [(0, '10.672')] [2023-02-23 11:05:20,198][18639] Saving new best policy, reward=10.672! [2023-02-23 11:05:21,178][18653] Updated weights for policy 0, policy_version 772 (0.0016) [2023-02-23 11:05:25,191][18366] Fps is (10 sec: 4913.3, 60 sec: 3890.9, 300 sec: 3887.7). Total num frames: 3178496. Throughput: 0: 999.6. Samples: 495088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:25,198][18366] Avg episode reward: [(0, '10.708')] [2023-02-23 11:05:25,210][18639] Saving new best policy, reward=10.708! [2023-02-23 11:05:30,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3194880. Throughput: 0: 952.0. Samples: 500260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:05:30,193][18366] Avg episode reward: [(0, '9.931')] [2023-02-23 11:05:32,610][18653] Updated weights for policy 0, policy_version 782 (0.0012) [2023-02-23 11:05:35,188][18366] Fps is (10 sec: 2868.3, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3207168. Throughput: 0: 944.4. Samples: 502402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:05:35,190][18366] Avg episode reward: [(0, '10.419')] [2023-02-23 11:05:40,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3231744. Throughput: 0: 985.0. Samples: 508694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:05:40,190][18366] Avg episode reward: [(0, '10.323')] [2023-02-23 11:05:42,015][18653] Updated weights for policy 0, policy_version 792 (0.0014) [2023-02-23 11:05:45,191][18366] Fps is (10 sec: 4913.5, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3256320. Throughput: 0: 1003.7. Samples: 515912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:05:45,197][18366] Avg episode reward: [(0, '10.970')] [2023-02-23 11:05:45,210][18639] Saving new best policy, reward=10.970! [2023-02-23 11:05:50,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 3272704. Throughput: 0: 975.7. Samples: 518172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:50,193][18366] Avg episode reward: [(0, '10.835')] [2023-02-23 11:05:53,906][18653] Updated weights for policy 0, policy_version 802 (0.0023) [2023-02-23 11:05:55,191][18366] Fps is (10 sec: 3276.9, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 3289088. Throughput: 0: 949.0. Samples: 522664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:05:55,193][18366] Avg episode reward: [(0, '11.373')] [2023-02-23 11:05:55,211][18639] Saving new best policy, reward=11.373! [2023-02-23 11:06:00,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3313664. Throughput: 0: 996.6. Samples: 529336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:06:00,191][18366] Avg episode reward: [(0, '11.977')] [2023-02-23 11:06:00,194][18639] Saving new best policy, reward=11.977! [2023-02-23 11:06:02,946][18653] Updated weights for policy 0, policy_version 812 (0.0029) [2023-02-23 11:06:05,187][18366] Fps is (10 sec: 4507.1, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3334144. Throughput: 0: 998.9. Samples: 532858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:06:05,190][18366] Avg episode reward: [(0, '12.220')] [2023-02-23 11:06:05,199][18639] Saving new best policy, reward=12.220! [2023-02-23 11:06:10,194][18366] Fps is (10 sec: 3684.0, 60 sec: 3890.8, 300 sec: 3901.5). Total num frames: 3350528. Throughput: 0: 957.6. Samples: 538184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:06:10,197][18366] Avg episode reward: [(0, '12.345')] [2023-02-23 11:06:10,201][18639] Saving new best policy, reward=12.345! [2023-02-23 11:06:15,187][18366] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3362816. Throughput: 0: 945.8. Samples: 542822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:06:15,190][18366] Avg episode reward: [(0, '12.522')] [2023-02-23 11:06:15,212][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000822_3366912.pth... [2023-02-23 11:06:15,218][18653] Updated weights for policy 0, policy_version 822 (0.0028) [2023-02-23 11:06:15,339][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000594_2433024.pth [2023-02-23 11:06:15,372][18639] Saving new best policy, reward=12.522! [2023-02-23 11:06:20,188][18366] Fps is (10 sec: 3688.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3387392. Throughput: 0: 974.0. Samples: 546232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:06:20,194][18366] Avg episode reward: [(0, '12.104')] [2023-02-23 11:06:23,897][18653] Updated weights for policy 0, policy_version 832 (0.0014) [2023-02-23 11:06:25,190][18366] Fps is (10 sec: 4914.0, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 3411968. Throughput: 0: 992.6. Samples: 553362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:06:25,192][18366] Avg episode reward: [(0, '11.827')] [2023-02-23 11:06:30,187][18366] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3428352. Throughput: 0: 943.0. Samples: 558344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:06:30,193][18366] Avg episode reward: [(0, '11.744')] [2023-02-23 11:06:35,187][18366] Fps is (10 sec: 3277.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3444736. Throughput: 0: 943.7. Samples: 560640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:06:35,194][18366] Avg episode reward: [(0, '10.662')] [2023-02-23 11:06:35,970][18653] Updated weights for policy 0, policy_version 842 (0.0023) [2023-02-23 11:06:40,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3465216. Throughput: 0: 986.7. Samples: 567064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:06:40,196][18366] Avg episode reward: [(0, '11.506')] [2023-02-23 11:06:44,488][18653] Updated weights for policy 0, policy_version 852 (0.0012) [2023-02-23 11:06:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.4, 300 sec: 3901.6). Total num frames: 3489792. Throughput: 0: 999.4. Samples: 574310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:06:45,190][18366] Avg episode reward: [(0, '12.124')] [2023-02-23 11:06:50,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3506176. Throughput: 0: 972.3. Samples: 576610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:06:50,196][18366] Avg episode reward: [(0, '12.701')] [2023-02-23 11:06:50,199][18639] Saving new best policy, reward=12.701! [2023-02-23 11:06:55,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 3522560. Throughput: 0: 954.7. Samples: 581140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 11:06:55,196][18366] Avg episode reward: [(0, '12.827')] [2023-02-23 11:06:55,206][18639] Saving new best policy, reward=12.827! [2023-02-23 11:06:56,768][18653] Updated weights for policy 0, policy_version 862 (0.0017) [2023-02-23 11:07:00,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3543040. Throughput: 0: 1000.0. Samples: 587820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:07:00,189][18366] Avg episode reward: [(0, '12.951')] [2023-02-23 11:07:00,208][18639] Saving new best policy, reward=12.951! [2023-02-23 11:07:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3567616. Throughput: 0: 1001.6. Samples: 591304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:07:05,191][18366] Avg episode reward: [(0, '12.834')] [2023-02-23 11:07:05,960][18653] Updated weights for policy 0, policy_version 872 (0.0019) [2023-02-23 11:07:10,189][18366] Fps is (10 sec: 4095.4, 60 sec: 3891.5, 300 sec: 3901.6). Total num frames: 3584000. Throughput: 0: 962.1. Samples: 596654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:07:10,194][18366] Avg episode reward: [(0, '13.262')] [2023-02-23 11:07:10,199][18639] Saving new best policy, reward=13.262! [2023-02-23 11:07:15,188][18366] Fps is (10 sec: 2867.1, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3596288. Throughput: 0: 952.0. Samples: 601186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:07:15,196][18366] Avg episode reward: [(0, '12.860')] [2023-02-23 11:07:17,708][18653] Updated weights for policy 0, policy_version 882 (0.0018) [2023-02-23 11:07:20,187][18366] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3624960. Throughput: 0: 980.3. Samples: 604752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:07:20,190][18366] Avg episode reward: [(0, '12.264')] [2023-02-23 11:07:25,187][18366] Fps is (10 sec: 4915.3, 60 sec: 3891.4, 300 sec: 3901.7). Total num frames: 3645440. Throughput: 0: 1002.6. Samples: 612180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:07:25,190][18366] Avg episode reward: [(0, '13.810')] [2023-02-23 11:07:25,204][18639] Saving new best policy, reward=13.810! [2023-02-23 11:07:27,250][18653] Updated weights for policy 0, policy_version 892 (0.0024) [2023-02-23 11:07:30,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3661824. Throughput: 0: 948.4. Samples: 616988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:07:30,190][18366] Avg episode reward: [(0, '15.064')] [2023-02-23 11:07:30,200][18639] Saving new best policy, reward=15.064! [2023-02-23 11:07:35,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3678208. Throughput: 0: 946.0. Samples: 619180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:07:35,192][18366] Avg episode reward: [(0, '14.659')] [2023-02-23 11:07:38,316][18653] Updated weights for policy 0, policy_version 902 (0.0019) [2023-02-23 11:07:40,188][18366] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3702784. Throughput: 0: 991.7. Samples: 625768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:07:40,196][18366] Avg episode reward: [(0, '15.117')] [2023-02-23 11:07:40,201][18639] Saving new best policy, reward=15.117! [2023-02-23 11:07:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3723264. Throughput: 0: 997.7. Samples: 632718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:07:45,197][18366] Avg episode reward: [(0, '13.915')] [2023-02-23 11:07:48,300][18653] Updated weights for policy 0, policy_version 912 (0.0014) [2023-02-23 11:07:50,187][18366] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3739648. Throughput: 0: 971.5. Samples: 635020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:07:50,195][18366] Avg episode reward: [(0, '12.890')] [2023-02-23 11:07:55,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3756032. Throughput: 0: 954.3. Samples: 639598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:07:55,190][18366] Avg episode reward: [(0, '12.515')] [2023-02-23 11:07:59,110][18653] Updated weights for policy 0, policy_version 922 (0.0019) [2023-02-23 11:08:00,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3780608. Throughput: 0: 1010.9. Samples: 646676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:08:00,190][18366] Avg episode reward: [(0, '12.705')] [2023-02-23 11:08:05,188][18366] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 3801088. Throughput: 0: 1010.5. Samples: 650226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:05,195][18366] Avg episode reward: [(0, '12.988')] [2023-02-23 11:08:09,330][18653] Updated weights for policy 0, policy_version 932 (0.0023) [2023-02-23 11:08:10,188][18366] Fps is (10 sec: 3686.2, 60 sec: 3891.3, 300 sec: 3901.6). Total num frames: 3817472. Throughput: 0: 959.7. Samples: 655368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:10,190][18366] Avg episode reward: [(0, '13.841')] [2023-02-23 11:08:15,188][18366] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3833856. Throughput: 0: 962.7. Samples: 660310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:08:15,192][18366] Avg episode reward: [(0, '14.894')] [2023-02-23 11:08:15,205][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000936_3833856.pth... [2023-02-23 11:08:15,339][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000708_2899968.pth [2023-02-23 11:08:19,609][18653] Updated weights for policy 0, policy_version 942 (0.0012) [2023-02-23 11:08:20,187][18366] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3858432. Throughput: 0: 993.2. Samples: 663876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:08:20,195][18366] Avg episode reward: [(0, '15.834')] [2023-02-23 11:08:20,199][18639] Saving new best policy, reward=15.834! [2023-02-23 11:08:25,191][18366] Fps is (10 sec: 4503.9, 60 sec: 3891.0, 300 sec: 3901.6). Total num frames: 3878912. Throughput: 0: 1004.2. Samples: 670962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:25,198][18366] Avg episode reward: [(0, '16.019')] [2023-02-23 11:08:25,211][18639] Saving new best policy, reward=16.019! [2023-02-23 11:08:30,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3895296. Throughput: 0: 949.6. Samples: 675450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:30,192][18366] Avg episode reward: [(0, '16.637')] [2023-02-23 11:08:30,194][18639] Saving new best policy, reward=16.637! [2023-02-23 11:08:31,101][18653] Updated weights for policy 0, policy_version 952 (0.0018) [2023-02-23 11:08:35,188][18366] Fps is (10 sec: 3278.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3911680. Throughput: 0: 947.6. Samples: 677664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:08:35,190][18366] Avg episode reward: [(0, '16.451')] [2023-02-23 11:08:40,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3936256. Throughput: 0: 995.5. Samples: 684394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:40,190][18366] Avg episode reward: [(0, '16.265')] [2023-02-23 11:08:40,601][18653] Updated weights for policy 0, policy_version 962 (0.0013) [2023-02-23 11:08:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3956736. Throughput: 0: 990.6. Samples: 691254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:08:45,193][18366] Avg episode reward: [(0, '16.097')] [2023-02-23 11:08:50,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3973120. Throughput: 0: 963.2. Samples: 693568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:50,194][18366] Avg episode reward: [(0, '15.923')] [2023-02-23 11:08:52,107][18653] Updated weights for policy 0, policy_version 972 (0.0023) [2023-02-23 11:08:55,188][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3989504. Throughput: 0: 951.8. Samples: 698200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:08:55,189][18366] Avg episode reward: [(0, '15.840')] [2023-02-23 11:09:00,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4014080. Throughput: 0: 997.1. Samples: 705178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:09:00,194][18366] Avg episode reward: [(0, '16.145')] [2023-02-23 11:09:01,600][18653] Updated weights for policy 0, policy_version 982 (0.0012) [2023-02-23 11:09:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4034560. Throughput: 0: 994.9. Samples: 708646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:09:05,190][18366] Avg episode reward: [(0, '16.923')] [2023-02-23 11:09:05,212][18639] Saving new best policy, reward=16.923! [2023-02-23 11:09:10,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4050944. Throughput: 0: 951.0. Samples: 713752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:09:10,200][18366] Avg episode reward: [(0, '16.592')] [2023-02-23 11:09:13,502][18653] Updated weights for policy 0, policy_version 992 (0.0018) [2023-02-23 11:09:15,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4067328. Throughput: 0: 959.3. Samples: 718618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:09:15,191][18366] Avg episode reward: [(0, '17.132')] [2023-02-23 11:09:15,199][18639] Saving new best policy, reward=17.132! [2023-02-23 11:09:20,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4091904. Throughput: 0: 987.9. Samples: 722118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:09:20,189][18366] Avg episode reward: [(0, '18.430')] [2023-02-23 11:09:20,193][18639] Saving new best policy, reward=18.430! [2023-02-23 11:09:22,308][18653] Updated weights for policy 0, policy_version 1002 (0.0028) [2023-02-23 11:09:25,188][18366] Fps is (10 sec: 4505.5, 60 sec: 3891.4, 300 sec: 3901.6). Total num frames: 4112384. Throughput: 0: 998.6. Samples: 729330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:09:25,193][18366] Avg episode reward: [(0, '16.935')] [2023-02-23 11:09:30,188][18366] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4128768. Throughput: 0: 947.9. Samples: 733910. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 11:09:30,192][18366] Avg episode reward: [(0, '18.118')] [2023-02-23 11:09:34,546][18653] Updated weights for policy 0, policy_version 1012 (0.0023) [2023-02-23 11:09:35,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4145152. Throughput: 0: 946.7. Samples: 736168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:09:35,189][18366] Avg episode reward: [(0, '19.359')] [2023-02-23 11:09:35,206][18639] Saving new best policy, reward=19.359! [2023-02-23 11:09:40,187][18366] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4169728. Throughput: 0: 991.3. Samples: 742810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:09:40,193][18366] Avg episode reward: [(0, '18.277')] [2023-02-23 11:09:43,091][18653] Updated weights for policy 0, policy_version 1022 (0.0014) [2023-02-23 11:09:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4190208. Throughput: 0: 989.7. Samples: 749714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:09:45,193][18366] Avg episode reward: [(0, '19.584')] [2023-02-23 11:09:45,213][18639] Saving new best policy, reward=19.584! [2023-02-23 11:09:50,193][18366] Fps is (10 sec: 3684.4, 60 sec: 3890.8, 300 sec: 3901.5). Total num frames: 4206592. Throughput: 0: 961.1. Samples: 751902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:09:50,204][18366] Avg episode reward: [(0, '20.144')] [2023-02-23 11:09:50,208][18639] Saving new best policy, reward=20.144! [2023-02-23 11:09:55,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4222976. Throughput: 0: 944.7. Samples: 756264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:09:55,190][18366] Avg episode reward: [(0, '19.706')] [2023-02-23 11:09:55,424][18653] Updated weights for policy 0, policy_version 1032 (0.0019) [2023-02-23 11:10:00,187][18366] Fps is (10 sec: 4098.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4247552. Throughput: 0: 996.3. Samples: 763450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:10:00,190][18366] Avg episode reward: [(0, '18.667')] [2023-02-23 11:10:04,129][18653] Updated weights for policy 0, policy_version 1042 (0.0019) [2023-02-23 11:10:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4268032. Throughput: 0: 998.8. Samples: 767066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:10:05,190][18366] Avg episode reward: [(0, '19.479')] [2023-02-23 11:10:10,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 4284416. Throughput: 0: 950.0. Samples: 772080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:10:10,190][18366] Avg episode reward: [(0, '18.808')] [2023-02-23 11:10:15,188][18366] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4300800. Throughput: 0: 960.6. Samples: 777138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:15,191][18366] Avg episode reward: [(0, '19.075')] [2023-02-23 11:10:15,204][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth... [2023-02-23 11:10:15,346][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000822_3366912.pth [2023-02-23 11:10:16,294][18653] Updated weights for policy 0, policy_version 1052 (0.0022) [2023-02-23 11:10:20,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 4325376. Throughput: 0: 989.2. Samples: 780684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:20,189][18366] Avg episode reward: [(0, '19.928')] [2023-02-23 11:10:25,187][18366] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4345856. Throughput: 0: 1002.7. Samples: 787930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:25,191][18366] Avg episode reward: [(0, '19.489')] [2023-02-23 11:10:25,372][18653] Updated weights for policy 0, policy_version 1062 (0.0016) [2023-02-23 11:10:30,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 4362240. Throughput: 0: 949.2. Samples: 792428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:30,191][18366] Avg episode reward: [(0, '18.690')] [2023-02-23 11:10:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 4382720. Throughput: 0: 954.4. Samples: 794844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:10:35,189][18366] Avg episode reward: [(0, '18.922')] [2023-02-23 11:10:36,757][18653] Updated weights for policy 0, policy_version 1072 (0.0016) [2023-02-23 11:10:40,196][18366] Fps is (10 sec: 4501.7, 60 sec: 3958.9, 300 sec: 3901.5). Total num frames: 4407296. Throughput: 0: 1010.8. Samples: 801760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:40,205][18366] Avg episode reward: [(0, '19.072')] [2023-02-23 11:10:45,189][18366] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 4427776. Throughput: 0: 1000.0. Samples: 808450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:10:45,191][18366] Avg episode reward: [(0, '19.200')] [2023-02-23 11:10:46,342][18653] Updated weights for policy 0, policy_version 1082 (0.0028) [2023-02-23 11:10:50,187][18366] Fps is (10 sec: 3689.6, 60 sec: 3959.8, 300 sec: 3915.5). Total num frames: 4444160. Throughput: 0: 970.3. Samples: 810728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:10:50,190][18366] Avg episode reward: [(0, '19.688')] [2023-02-23 11:10:55,187][18366] Fps is (10 sec: 3277.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 4460544. Throughput: 0: 961.5. Samples: 815348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:10:55,194][18366] Avg episode reward: [(0, '20.297')] [2023-02-23 11:10:55,206][18639] Saving new best policy, reward=20.297! [2023-02-23 11:10:57,603][18653] Updated weights for policy 0, policy_version 1092 (0.0012) [2023-02-23 11:11:00,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 4485120. Throughput: 0: 1006.2. Samples: 822418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:11:00,194][18366] Avg episode reward: [(0, '19.853')] [2023-02-23 11:11:05,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 4505600. Throughput: 0: 1004.6. Samples: 825890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:11:05,190][18366] Avg episode reward: [(0, '19.656')] [2023-02-23 11:11:07,559][18653] Updated weights for policy 0, policy_version 1102 (0.0025) [2023-02-23 11:11:10,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 4517888. Throughput: 0: 951.8. Samples: 830762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:11:10,193][18366] Avg episode reward: [(0, '20.290')] [2023-02-23 11:11:15,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 4538368. Throughput: 0: 967.4. Samples: 835962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:11:15,195][18366] Avg episode reward: [(0, '20.463')] [2023-02-23 11:11:15,208][18639] Saving new best policy, reward=20.463! [2023-02-23 11:11:18,224][18653] Updated weights for policy 0, policy_version 1112 (0.0012) [2023-02-23 11:11:20,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 4562944. Throughput: 0: 991.9. Samples: 839478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:11:20,195][18366] Avg episode reward: [(0, '21.202')] [2023-02-23 11:11:20,202][18639] Saving new best policy, reward=21.202! [2023-02-23 11:11:25,189][18366] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 4583424. Throughput: 0: 993.1. Samples: 846444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:11:25,194][18366] Avg episode reward: [(0, '20.362')] [2023-02-23 11:11:28,988][18653] Updated weights for policy 0, policy_version 1122 (0.0012) [2023-02-23 11:11:30,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4595712. Throughput: 0: 945.9. Samples: 851012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:11:30,190][18366] Avg episode reward: [(0, '20.506')] [2023-02-23 11:11:35,187][18366] Fps is (10 sec: 3277.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4616192. Throughput: 0: 946.4. Samples: 853314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:11:35,190][18366] Avg episode reward: [(0, '19.995')] [2023-02-23 11:11:39,229][18653] Updated weights for policy 0, policy_version 1132 (0.0016) [2023-02-23 11:11:40,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.8, 300 sec: 3901.6). Total num frames: 4640768. Throughput: 0: 997.2. Samples: 860220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:11:40,193][18366] Avg episode reward: [(0, '20.356')] [2023-02-23 11:11:45,189][18366] Fps is (10 sec: 4504.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 4661248. Throughput: 0: 986.1. Samples: 866796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:11:45,192][18366] Avg episode reward: [(0, '19.936')] [2023-02-23 11:11:50,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 4673536. Throughput: 0: 960.6. Samples: 869118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:11:50,195][18366] Avg episode reward: [(0, '20.238')] [2023-02-23 11:11:50,305][18653] Updated weights for policy 0, policy_version 1142 (0.0015) [2023-02-23 11:11:55,187][18366] Fps is (10 sec: 3277.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4694016. Throughput: 0: 960.4. Samples: 873982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:11:55,194][18366] Avg episode reward: [(0, '20.794')] [2023-02-23 11:11:59,731][18653] Updated weights for policy 0, policy_version 1152 (0.0024) [2023-02-23 11:12:00,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4718592. Throughput: 0: 1006.9. Samples: 881272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:12:00,194][18366] Avg episode reward: [(0, '22.710')] [2023-02-23 11:12:00,197][18639] Saving new best policy, reward=22.710! [2023-02-23 11:12:05,194][18366] Fps is (10 sec: 4502.7, 60 sec: 3890.8, 300 sec: 3915.4). Total num frames: 4739072. Throughput: 0: 1005.8. Samples: 884746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:05,197][18366] Avg episode reward: [(0, '22.440')] [2023-02-23 11:12:10,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 4755456. Throughput: 0: 954.4. Samples: 889392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:10,190][18366] Avg episode reward: [(0, '23.217')] [2023-02-23 11:12:10,194][18639] Saving new best policy, reward=23.217! [2023-02-23 11:12:11,605][18653] Updated weights for policy 0, policy_version 1162 (0.0030) [2023-02-23 11:12:15,187][18366] Fps is (10 sec: 3278.9, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4771840. Throughput: 0: 972.0. Samples: 894750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:15,192][18366] Avg episode reward: [(0, '23.239')] [2023-02-23 11:12:15,207][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001165_4771840.pth... [2023-02-23 11:12:15,342][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000936_3833856.pth [2023-02-23 11:12:15,353][18639] Saving new best policy, reward=23.239! [2023-02-23 11:12:20,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4796416. Throughput: 0: 997.5. Samples: 898202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:12:20,194][18366] Avg episode reward: [(0, '21.409')] [2023-02-23 11:12:20,665][18653] Updated weights for policy 0, policy_version 1172 (0.0016) [2023-02-23 11:12:25,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3915.5). Total num frames: 4816896. Throughput: 0: 997.9. Samples: 905124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:12:25,195][18366] Avg episode reward: [(0, '19.815')] [2023-02-23 11:12:30,191][18366] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 3915.4). Total num frames: 4833280. Throughput: 0: 951.0. Samples: 909592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:30,197][18366] Avg episode reward: [(0, '20.326')] [2023-02-23 11:12:32,839][18653] Updated weights for policy 0, policy_version 1182 (0.0020) [2023-02-23 11:12:35,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 4849664. Throughput: 0: 951.0. Samples: 911912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:35,189][18366] Avg episode reward: [(0, '19.024')] [2023-02-23 11:12:40,187][18366] Fps is (10 sec: 4097.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4874240. Throughput: 0: 1000.9. Samples: 919024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:12:40,190][18366] Avg episode reward: [(0, '20.321')] [2023-02-23 11:12:41,323][18653] Updated weights for policy 0, policy_version 1192 (0.0015) [2023-02-23 11:12:45,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3915.5). Total num frames: 4894720. Throughput: 0: 981.6. Samples: 925442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:12:45,195][18366] Avg episode reward: [(0, '20.418')] [2023-02-23 11:12:50,189][18366] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3915.5). Total num frames: 4911104. Throughput: 0: 954.4. Samples: 927688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:12:50,194][18366] Avg episode reward: [(0, '21.384')] [2023-02-23 11:12:53,539][18653] Updated weights for policy 0, policy_version 1202 (0.0018) [2023-02-23 11:12:55,189][18366] Fps is (10 sec: 3276.3, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 4927488. Throughput: 0: 964.0. Samples: 932772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:12:55,192][18366] Avg episode reward: [(0, '22.798')] [2023-02-23 11:13:00,187][18366] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 4952064. Throughput: 0: 1005.8. Samples: 940010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:13:00,194][18366] Avg episode reward: [(0, '23.147')] [2023-02-23 11:13:02,080][18653] Updated weights for policy 0, policy_version 1212 (0.0014) [2023-02-23 11:13:05,187][18366] Fps is (10 sec: 4506.3, 60 sec: 3891.6, 300 sec: 3915.5). Total num frames: 4972544. Throughput: 0: 1004.5. Samples: 943404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:13:05,194][18366] Avg episode reward: [(0, '23.087')] [2023-02-23 11:13:10,188][18366] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 4988928. Throughput: 0: 951.1. Samples: 947926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:13:10,191][18366] Avg episode reward: [(0, '23.296')] [2023-02-23 11:13:10,195][18639] Saving new best policy, reward=23.296! [2023-02-23 11:13:14,064][18653] Updated weights for policy 0, policy_version 1222 (0.0027) [2023-02-23 11:13:15,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 5009408. Throughput: 0: 978.8. Samples: 953632. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:13:15,191][18366] Avg episode reward: [(0, '23.119')] [2023-02-23 11:13:20,187][18366] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 5029888. Throughput: 0: 1004.1. Samples: 957096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:13:20,190][18366] Avg episode reward: [(0, '21.705')] [2023-02-23 11:13:22,949][18653] Updated weights for policy 0, policy_version 1232 (0.0014) [2023-02-23 11:13:25,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5050368. Throughput: 0: 993.3. Samples: 963724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:13:25,193][18366] Avg episode reward: [(0, '21.515')] [2023-02-23 11:13:30,188][18366] Fps is (10 sec: 3686.1, 60 sec: 3891.4, 300 sec: 3915.5). Total num frames: 5066752. Throughput: 0: 953.7. Samples: 968360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:13:30,193][18366] Avg episode reward: [(0, '21.385')] [2023-02-23 11:13:34,765][18653] Updated weights for policy 0, policy_version 1242 (0.0012) [2023-02-23 11:13:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 5087232. Throughput: 0: 963.3. Samples: 971034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 11:13:35,190][18366] Avg episode reward: [(0, '21.362')] [2023-02-23 11:13:40,187][18366] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5111808. Throughput: 0: 1010.7. Samples: 978252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:13:40,189][18366] Avg episode reward: [(0, '21.477')] [2023-02-23 11:13:44,023][18653] Updated weights for policy 0, policy_version 1252 (0.0015) [2023-02-23 11:13:45,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5128192. Throughput: 0: 983.8. Samples: 984282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:13:45,189][18366] Avg episode reward: [(0, '22.617')] [2023-02-23 11:13:50,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3915.5). Total num frames: 5144576. Throughput: 0: 958.9. Samples: 986556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 11:13:50,197][18366] Avg episode reward: [(0, '23.284')] [2023-02-23 11:13:55,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 5165056. Throughput: 0: 975.9. Samples: 991840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:13:55,190][18366] Avg episode reward: [(0, '23.870')] [2023-02-23 11:13:55,204][18639] Saving new best policy, reward=23.870! [2023-02-23 11:13:55,488][18653] Updated weights for policy 0, policy_version 1262 (0.0015) [2023-02-23 11:14:00,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5189632. Throughput: 0: 1007.9. Samples: 998986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:00,192][18366] Avg episode reward: [(0, '23.235')] [2023-02-23 11:14:05,188][18366] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5206016. Throughput: 0: 1002.2. Samples: 1002194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:05,195][18366] Avg episode reward: [(0, '22.885')] [2023-02-23 11:14:05,205][18653] Updated weights for policy 0, policy_version 1272 (0.0014) [2023-02-23 11:14:10,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3915.5). Total num frames: 5222400. Throughput: 0: 955.2. Samples: 1006706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:10,189][18366] Avg episode reward: [(0, '21.761')] [2023-02-23 11:14:15,187][18366] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5242880. Throughput: 0: 980.7. Samples: 1012492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:15,197][18366] Avg episode reward: [(0, '21.559')] [2023-02-23 11:14:15,205][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001280_5242880.pth... [2023-02-23 11:14:15,318][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth [2023-02-23 11:14:16,288][18653] Updated weights for policy 0, policy_version 1282 (0.0024) [2023-02-23 11:14:20,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5267456. Throughput: 0: 999.9. Samples: 1016028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:20,195][18366] Avg episode reward: [(0, '21.533')] [2023-02-23 11:14:25,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5283840. Throughput: 0: 983.3. Samples: 1022500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:14:25,193][18366] Avg episode reward: [(0, '21.770')] [2023-02-23 11:14:26,627][18653] Updated weights for policy 0, policy_version 1292 (0.0012) [2023-02-23 11:14:30,188][18366] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5300224. Throughput: 0: 949.6. Samples: 1027014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 11:14:30,197][18366] Avg episode reward: [(0, '21.546')] [2023-02-23 11:14:35,188][18366] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5320704. Throughput: 0: 958.2. Samples: 1029676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:14:35,190][18366] Avg episode reward: [(0, '21.325')] [2023-02-23 11:14:36,927][18653] Updated weights for policy 0, policy_version 1302 (0.0026) [2023-02-23 11:14:40,187][18366] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5345280. Throughput: 0: 1001.8. Samples: 1036920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:40,192][18366] Avg episode reward: [(0, '22.246')] [2023-02-23 11:14:45,187][18366] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3929.5). Total num frames: 5365760. Throughput: 0: 975.9. Samples: 1042900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:14:45,191][18366] Avg episode reward: [(0, '22.589')] [2023-02-23 11:14:47,572][18653] Updated weights for policy 0, policy_version 1312 (0.0017) [2023-02-23 11:14:50,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5378048. Throughput: 0: 952.7. Samples: 1045066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:14:50,194][18366] Avg episode reward: [(0, '22.731')] [2023-02-23 11:14:55,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5402624. Throughput: 0: 978.2. Samples: 1050724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:14:55,194][18366] Avg episode reward: [(0, '24.170')] [2023-02-23 11:14:55,206][18639] Saving new best policy, reward=24.170! [2023-02-23 11:14:57,598][18653] Updated weights for policy 0, policy_version 1322 (0.0012) [2023-02-23 11:15:00,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5423104. Throughput: 0: 1008.2. Samples: 1057862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:15:00,189][18366] Avg episode reward: [(0, '24.077')] [2023-02-23 11:15:05,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 5443584. Throughput: 0: 994.3. Samples: 1060772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:05,192][18366] Avg episode reward: [(0, '22.789')] [2023-02-23 11:15:09,176][18653] Updated weights for policy 0, policy_version 1332 (0.0012) [2023-02-23 11:15:10,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5455872. Throughput: 0: 949.4. Samples: 1065222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:10,191][18366] Avg episode reward: [(0, '22.698')] [2023-02-23 11:15:15,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5480448. Throughput: 0: 983.7. Samples: 1071280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:15,190][18366] Avg episode reward: [(0, '22.164')] [2023-02-23 11:15:18,480][18653] Updated weights for policy 0, policy_version 1342 (0.0024) [2023-02-23 11:15:20,187][18366] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 5505024. Throughput: 0: 1002.7. Samples: 1074796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:15:20,190][18366] Avg episode reward: [(0, '22.483')] [2023-02-23 11:15:25,189][18366] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 5521408. Throughput: 0: 978.5. Samples: 1080952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:25,191][18366] Avg episode reward: [(0, '21.419')] [2023-02-23 11:15:30,187][18366] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5533696. Throughput: 0: 948.3. Samples: 1085572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:30,191][18366] Avg episode reward: [(0, '21.601')] [2023-02-23 11:15:30,389][18653] Updated weights for policy 0, policy_version 1352 (0.0023) [2023-02-23 11:15:35,187][18366] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3901.7). Total num frames: 5558272. Throughput: 0: 968.8. Samples: 1088662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:15:35,194][18366] Avg episode reward: [(0, '22.809')] [2023-02-23 11:15:39,027][18653] Updated weights for policy 0, policy_version 1362 (0.0013) [2023-02-23 11:15:40,187][18366] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5582848. Throughput: 0: 1003.3. Samples: 1095874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:15:40,195][18366] Avg episode reward: [(0, '20.834')] [2023-02-23 11:15:45,188][18366] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5599232. Throughput: 0: 968.8. Samples: 1101456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:15:45,194][18366] Avg episode reward: [(0, '20.759')] [2023-02-23 11:15:50,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5615616. Throughput: 0: 955.1. Samples: 1103750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:15:50,192][18366] Avg episode reward: [(0, '22.905')] [2023-02-23 11:15:51,112][18653] Updated weights for policy 0, policy_version 1372 (0.0015) [2023-02-23 11:15:55,187][18366] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5636096. Throughput: 0: 985.9. Samples: 1109588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:15:55,190][18366] Avg episode reward: [(0, '23.199')] [2023-02-23 11:15:59,798][18653] Updated weights for policy 0, policy_version 1382 (0.0013) [2023-02-23 11:16:00,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5660672. Throughput: 0: 1011.1. Samples: 1116780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:16:00,189][18366] Avg episode reward: [(0, '22.856')] [2023-02-23 11:16:05,188][18366] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 5677056. Throughput: 0: 992.4. Samples: 1119456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:05,192][18366] Avg episode reward: [(0, '23.560')] [2023-02-23 11:16:10,188][18366] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5693440. Throughput: 0: 955.2. Samples: 1123934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:10,197][18366] Avg episode reward: [(0, '24.160')] [2023-02-23 11:16:11,966][18653] Updated weights for policy 0, policy_version 1392 (0.0011) [2023-02-23 11:16:15,187][18366] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5713920. Throughput: 0: 994.0. Samples: 1130300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:15,194][18366] Avg episode reward: [(0, '23.128')] [2023-02-23 11:16:15,209][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001395_5713920.pth... [2023-02-23 11:16:15,322][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001165_4771840.pth [2023-02-23 11:16:20,187][18366] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5738496. Throughput: 0: 1003.6. Samples: 1133822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:20,190][18366] Avg episode reward: [(0, '23.554')] [2023-02-23 11:16:20,537][18653] Updated weights for policy 0, policy_version 1402 (0.0021) [2023-02-23 11:16:25,187][18366] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 5754880. Throughput: 0: 974.5. Samples: 1139726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:16:25,191][18366] Avg episode reward: [(0, '22.641')] [2023-02-23 11:16:30,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5771264. Throughput: 0: 953.2. Samples: 1144348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:16:30,195][18366] Avg episode reward: [(0, '22.647')] [2023-02-23 11:16:32,644][18653] Updated weights for policy 0, policy_version 1412 (0.0014) [2023-02-23 11:16:35,187][18366] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 5791744. Throughput: 0: 972.1. Samples: 1147494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:16:35,196][18366] Avg episode reward: [(0, '22.690')] [2023-02-23 11:16:40,187][18366] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5816320. Throughput: 0: 1001.5. Samples: 1154654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:40,190][18366] Avg episode reward: [(0, '21.842')] [2023-02-23 11:16:41,569][18653] Updated weights for policy 0, policy_version 1422 (0.0012) [2023-02-23 11:16:45,189][18366] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3929.4). Total num frames: 5832704. Throughput: 0: 963.6. Samples: 1160144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 11:16:45,192][18366] Avg episode reward: [(0, '21.743')] [2023-02-23 11:16:50,187][18366] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 5849088. Throughput: 0: 954.9. Samples: 1162428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:16:50,193][18366] Avg episode reward: [(0, '21.398')] [2023-02-23 11:16:53,277][18653] Updated weights for policy 0, policy_version 1432 (0.0026) [2023-02-23 11:16:55,187][18366] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5873664. Throughput: 0: 988.2. Samples: 1168402. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 11:16:55,190][18366] Avg episode reward: [(0, '21.938')] [2023-02-23 11:17:00,187][18366] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3929.5). Total num frames: 5898240. Throughput: 0: 1008.4. Samples: 1175676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:17:00,193][18366] Avg episode reward: [(0, '21.780')] [2023-02-23 11:17:02,708][18653] Updated weights for policy 0, policy_version 1442 (0.0016) [2023-02-23 11:17:05,189][18366] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 5910528. Throughput: 0: 986.6. Samples: 1178220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 11:17:05,192][18366] Avg episode reward: [(0, '20.551')] [2023-02-23 11:17:10,189][18366] Fps is (10 sec: 2866.7, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 5926912. Throughput: 0: 956.2. Samples: 1182756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 11:17:10,192][18366] Avg episode reward: [(0, '20.604')] [2023-02-23 11:17:14,091][18653] Updated weights for policy 0, policy_version 1452 (0.0026) [2023-02-23 11:17:15,187][18366] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 5951488. Throughput: 0: 998.5. Samples: 1189280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 11:17:15,190][18366] Avg episode reward: [(0, '20.561')] [2023-02-23 11:17:20,187][18366] Fps is (10 sec: 4916.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 5976064. Throughput: 0: 1007.8. Samples: 1192844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 11:17:20,190][18366] Avg episode reward: [(0, '19.867')] [2023-02-23 11:17:23,801][18653] Updated weights for policy 0, policy_version 1462 (0.0019) [2023-02-23 11:17:25,188][18366] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 5992448. Throughput: 0: 978.8. Samples: 1198702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 11:17:25,201][18366] Avg episode reward: [(0, '20.498')] [2023-02-23 11:17:28,987][18639] Stopping Batcher_0... [2023-02-23 11:17:28,987][18639] Loop batcher_evt_loop terminating... [2023-02-23 11:17:28,988][18366] Component Batcher_0 stopped! [2023-02-23 11:17:28,999][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-23 11:17:29,052][18656] Stopping RolloutWorker_w3... [2023-02-23 11:17:29,052][18366] Component RolloutWorker_w3 stopped! [2023-02-23 11:17:29,054][18656] Loop rollout_proc3_evt_loop terminating... [2023-02-23 11:17:29,068][18366] Component RolloutWorker_w1 stopped! [2023-02-23 11:17:29,067][18655] Stopping RolloutWorker_w1... [2023-02-23 11:17:29,072][18655] Loop rollout_proc1_evt_loop terminating... [2023-02-23 11:17:29,092][18366] Component RolloutWorker_w2 stopped! [2023-02-23 11:17:29,097][18660] Stopping RolloutWorker_w2... [2023-02-23 11:17:29,107][18366] Component RolloutWorker_w0 stopped! [2023-02-23 11:17:29,109][18366] Component RolloutWorker_w5 stopped! [2023-02-23 11:17:29,109][18658] Stopping RolloutWorker_w5... [2023-02-23 11:17:29,113][18654] Stopping RolloutWorker_w0... [2023-02-23 11:17:29,114][18654] Loop rollout_proc0_evt_loop terminating... [2023-02-23 11:17:29,117][18653] Weights refcount: 2 0 [2023-02-23 11:17:29,113][18658] Loop rollout_proc5_evt_loop terminating... [2023-02-23 11:17:29,098][18660] Loop rollout_proc2_evt_loop terminating... [2023-02-23 11:17:29,139][18366] Component InferenceWorker_p0-w0 stopped! [2023-02-23 11:17:29,143][18653] Stopping InferenceWorker_p0-w0... [2023-02-23 11:17:29,145][18653] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 11:17:29,152][18366] Component RolloutWorker_w6 stopped! [2023-02-23 11:17:29,156][18659] Stopping RolloutWorker_w6... [2023-02-23 11:17:29,157][18659] Loop rollout_proc6_evt_loop terminating... [2023-02-23 11:17:29,169][18366] Component RolloutWorker_w4 stopped! [2023-02-23 11:17:29,173][18661] Stopping RolloutWorker_w4... [2023-02-23 11:17:29,174][18661] Loop rollout_proc4_evt_loop terminating... [2023-02-23 11:17:29,228][18639] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001280_5242880.pth [2023-02-23 11:17:29,236][18366] Component RolloutWorker_w7 stopped! [2023-02-23 11:17:29,236][18657] Stopping RolloutWorker_w7... [2023-02-23 11:17:29,240][18657] Loop rollout_proc7_evt_loop terminating... [2023-02-23 11:17:29,252][18639] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-23 11:17:29,584][18366] Component LearnerWorker_p0 stopped! [2023-02-23 11:17:29,589][18366] Waiting for process learner_proc0 to stop... [2023-02-23 11:17:29,592][18639] Stopping LearnerWorker_p0... [2023-02-23 11:17:29,593][18639] Loop learner_proc0_evt_loop terminating... [2023-02-23 11:17:31,360][18366] Waiting for process inference_proc0-0 to join... [2023-02-23 11:17:31,705][18366] Waiting for process rollout_proc0 to join... [2023-02-23 11:17:32,081][18366] Waiting for process rollout_proc1 to join... [2023-02-23 11:17:32,225][18366] Waiting for process rollout_proc2 to join... [2023-02-23 11:17:32,229][18366] Waiting for process rollout_proc3 to join... [2023-02-23 11:17:32,233][18366] Waiting for process rollout_proc4 to join... [2023-02-23 11:17:32,235][18366] Waiting for process rollout_proc5 to join... [2023-02-23 11:17:32,236][18366] Waiting for process rollout_proc6 to join... [2023-02-23 11:17:32,237][18366] Waiting for process rollout_proc7 to join... [2023-02-23 11:17:32,238][18366] Batcher 0 profile tree view: batching: 30.0857, releasing_batches: 0.0319 [2023-02-23 11:17:32,240][18366] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 592.4960 update_model: 9.0124 weight_update: 0.0031 one_step: 0.0168 handle_policy_step: 599.0080 deserialize: 17.4692, stack: 3.4186, obs_to_device_normalize: 135.6011, forward: 283.6011, send_messages: 31.3426 prepare_outputs: 97.2349 to_cpu: 61.0531 [2023-02-23 11:17:32,241][18366] Learner 0 profile tree view: misc: 0.0089, prepare_batch: 19.6194 train: 95.3883 epoch_init: 0.0070, minibatch_init: 0.0076, losses_postprocess: 0.6667, kl_divergence: 0.6946, after_optimizer: 4.0055 calculate_losses: 31.9788 losses_init: 0.0064, forward_head: 1.9995, bptt_initial: 21.0775, tail: 1.3029, advantages_returns: 0.3512, losses: 4.2806 bptt: 2.5908 bptt_forward_core: 2.4339 update: 57.2644 clip: 1.6880 [2023-02-23 11:17:32,242][18366] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3220, enqueue_policy_requests: 155.3707, env_step: 948.6785, overhead: 23.4526, complete_rollouts: 8.5426 save_policy_outputs: 23.1487 split_output_tensors: 11.2928 [2023-02-23 11:17:32,244][18366] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3696, enqueue_policy_requests: 155.7486, env_step: 945.5572, overhead: 22.8681, complete_rollouts: 7.9880 save_policy_outputs: 23.0281 split_output_tensors: 11.6300 [2023-02-23 11:17:32,245][18366] Loop Runner_EvtLoop terminating... [2023-02-23 11:17:32,247][18366] Runner profile tree view: main_loop: 1273.6448 [2023-02-23 11:17:32,248][18366] Collected {0: 6004736}, FPS: 3775.5 [2023-02-23 11:17:32,341][18366] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 11:17:32,343][18366] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 11:17:32,345][18366] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 11:17:32,347][18366] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 11:17:32,349][18366] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 11:17:32,351][18366] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 11:17:32,353][18366] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 11:17:32,354][18366] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 11:17:32,355][18366] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 11:17:32,356][18366] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 11:17:32,357][18366] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 11:17:32,358][18366] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 11:17:32,359][18366] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 11:17:32,360][18366] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 11:17:32,362][18366] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 11:17:32,391][18366] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 11:17:32,394][18366] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 11:17:32,398][18366] RunningMeanStd input shape: (1,) [2023-02-23 11:17:32,412][18366] ConvEncoder: input_channels=3 [2023-02-23 11:17:33,119][18366] Conv encoder output size: 512 [2023-02-23 11:17:33,121][18366] Policy head output size: 512 [2023-02-23 11:17:35,459][18366] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-23 11:17:36,837][18366] Num frames 100... [2023-02-23 11:17:36,956][18366] Num frames 200... [2023-02-23 11:17:37,066][18366] Num frames 300... [2023-02-23 11:17:37,186][18366] Num frames 400... [2023-02-23 11:17:37,300][18366] Num frames 500... [2023-02-23 11:17:37,421][18366] Num frames 600... [2023-02-23 11:17:37,535][18366] Num frames 700... [2023-02-23 11:17:37,649][18366] Num frames 800... [2023-02-23 11:17:37,764][18366] Num frames 900... [2023-02-23 11:17:37,849][18366] Avg episode rewards: #0: 19.250, true rewards: #0: 9.250 [2023-02-23 11:17:37,854][18366] Avg episode reward: 19.250, avg true_objective: 9.250 [2023-02-23 11:17:37,937][18366] Num frames 1000... [2023-02-23 11:17:38,051][18366] Num frames 1100... [2023-02-23 11:17:38,178][18366] Num frames 1200... [2023-02-23 11:17:38,293][18366] Num frames 1300... [2023-02-23 11:17:38,407][18366] Num frames 1400... [2023-02-23 11:17:38,524][18366] Num frames 1500... [2023-02-23 11:17:38,645][18366] Num frames 1600... [2023-02-23 11:17:38,773][18366] Avg episode rewards: #0: 17.805, true rewards: #0: 8.305 [2023-02-23 11:17:38,775][18366] Avg episode reward: 17.805, avg true_objective: 8.305 [2023-02-23 11:17:38,829][18366] Num frames 1700... [2023-02-23 11:17:38,940][18366] Num frames 1800... [2023-02-23 11:17:39,060][18366] Num frames 1900... [2023-02-23 11:17:39,186][18366] Num frames 2000... [2023-02-23 11:17:39,299][18366] Num frames 2100... [2023-02-23 11:17:39,422][18366] Num frames 2200... [2023-02-23 11:17:39,533][18366] Num frames 2300... [2023-02-23 11:17:39,650][18366] Num frames 2400... [2023-02-23 11:17:39,802][18366] Num frames 2500... [2023-02-23 11:17:39,970][18366] Num frames 2600... [2023-02-23 11:17:40,089][18366] Num frames 2700... [2023-02-23 11:17:40,235][18366] Num frames 2800... [2023-02-23 11:17:40,395][18366] Num frames 2900... [2023-02-23 11:17:40,554][18366] Num frames 3000... [2023-02-23 11:17:40,708][18366] Num frames 3100... [2023-02-23 11:17:40,867][18366] Num frames 3200... [2023-02-23 11:17:41,028][18366] Num frames 3300... [2023-02-23 11:17:41,222][18366] Avg episode rewards: #0: 27.280, true rewards: #0: 11.280 [2023-02-23 11:17:41,224][18366] Avg episode reward: 27.280, avg true_objective: 11.280 [2023-02-23 11:17:41,252][18366] Num frames 3400... [2023-02-23 11:17:41,409][18366] Num frames 3500... [2023-02-23 11:17:41,567][18366] Num frames 3600... [2023-02-23 11:17:41,722][18366] Num frames 3700... [2023-02-23 11:17:41,877][18366] Num frames 3800... [2023-02-23 11:17:42,039][18366] Num frames 3900... [2023-02-23 11:17:42,199][18366] Num frames 4000... [2023-02-23 11:17:42,375][18366] Num frames 4100... [2023-02-23 11:17:42,544][18366] Num frames 4200... [2023-02-23 11:17:42,706][18366] Num frames 4300... [2023-02-23 11:17:42,865][18366] Num frames 4400... [2023-02-23 11:17:43,025][18366] Num frames 4500... [2023-02-23 11:17:43,185][18366] Num frames 4600... [2023-02-23 11:17:43,354][18366] Num frames 4700... [2023-02-23 11:17:43,514][18366] Num frames 4800... [2023-02-23 11:17:43,674][18366] Num frames 4900... [2023-02-23 11:17:43,797][18366] Num frames 5000... [2023-02-23 11:17:43,917][18366] Num frames 5100... [2023-02-23 11:17:44,031][18366] Num frames 5200... [2023-02-23 11:17:44,149][18366] Num frames 5300... [2023-02-23 11:17:44,266][18366] Num frames 5400... [2023-02-23 11:17:44,420][18366] Avg episode rewards: #0: 33.677, true rewards: #0: 13.678 [2023-02-23 11:17:44,422][18366] Avg episode reward: 33.677, avg true_objective: 13.678 [2023-02-23 11:17:44,457][18366] Num frames 5500... [2023-02-23 11:17:44,566][18366] Num frames 5600... [2023-02-23 11:17:44,675][18366] Num frames 5700... [2023-02-23 11:17:44,786][18366] Num frames 5800... [2023-02-23 11:17:44,899][18366] Num frames 5900... [2023-02-23 11:17:45,019][18366] Num frames 6000... [2023-02-23 11:17:45,137][18366] Num frames 6100... [2023-02-23 11:17:45,357][18366] Num frames 6200... [2023-02-23 11:17:45,421][18366] Avg episode rewards: #0: 30.204, true rewards: #0: 12.404 [2023-02-23 11:17:45,424][18366] Avg episode reward: 30.204, avg true_objective: 12.404 [2023-02-23 11:17:45,599][18366] Num frames 6300... [2023-02-23 11:17:45,802][18366] Num frames 6400... [2023-02-23 11:17:45,964][18366] Num frames 6500... [2023-02-23 11:17:46,210][18366] Num frames 6600... [2023-02-23 11:17:46,449][18366] Num frames 6700... [2023-02-23 11:17:46,627][18366] Num frames 6800... [2023-02-23 11:17:46,825][18366] Num frames 6900... [2023-02-23 11:17:47,051][18366] Num frames 7000... [2023-02-23 11:17:47,232][18366] Num frames 7100... [2023-02-23 11:17:47,425][18366] Num frames 7200... [2023-02-23 11:17:47,607][18366] Num frames 7300... [2023-02-23 11:17:47,794][18366] Num frames 7400... [2023-02-23 11:17:48,029][18366] Num frames 7500... [2023-02-23 11:17:48,212][18366] Num frames 7600... [2023-02-23 11:17:48,418][18366] Num frames 7700... [2023-02-23 11:17:48,585][18366] Num frames 7800... [2023-02-23 11:17:48,749][18366] Num frames 7900... [2023-02-23 11:17:48,915][18366] Num frames 8000... [2023-02-23 11:17:49,133][18366] Num frames 8100... [2023-02-23 11:17:49,369][18366] Avg episode rewards: #0: 32.756, true rewards: #0: 13.590 [2023-02-23 11:17:49,376][18366] Avg episode reward: 32.756, avg true_objective: 13.590 [2023-02-23 11:17:49,500][18366] Num frames 8200... [2023-02-23 11:17:49,695][18366] Num frames 8300... [2023-02-23 11:17:49,895][18366] Num frames 8400... [2023-02-23 11:17:50,202][18366] Num frames 8500... [2023-02-23 11:17:50,468][18366] Num frames 8600... [2023-02-23 11:17:50,748][18366] Num frames 8700... [2023-02-23 11:17:50,968][18366] Num frames 8800... [2023-02-23 11:17:51,207][18366] Avg episode rewards: #0: 29.654, true rewards: #0: 12.654 [2023-02-23 11:17:51,210][18366] Avg episode reward: 29.654, avg true_objective: 12.654 [2023-02-23 11:17:51,366][18366] Num frames 8900... [2023-02-23 11:17:51,500][18366] Num frames 9000... [2023-02-23 11:17:51,613][18366] Num frames 9100... [2023-02-23 11:17:51,724][18366] Num frames 9200... [2023-02-23 11:17:51,845][18366] Num frames 9300... [2023-02-23 11:17:51,909][18366] Avg episode rewards: #0: 26.632, true rewards: #0: 11.633 [2023-02-23 11:17:51,910][18366] Avg episode reward: 26.632, avg true_objective: 11.633 [2023-02-23 11:17:52,020][18366] Num frames 9400... [2023-02-23 11:17:52,132][18366] Num frames 9500... [2023-02-23 11:17:52,243][18366] Num frames 9600... [2023-02-23 11:17:52,361][18366] Num frames 9700... [2023-02-23 11:17:52,480][18366] Num frames 9800... [2023-02-23 11:17:52,588][18366] Avg episode rewards: #0: 25.164, true rewards: #0: 10.942 [2023-02-23 11:17:52,589][18366] Avg episode reward: 25.164, avg true_objective: 10.942 [2023-02-23 11:17:52,652][18366] Num frames 9900... [2023-02-23 11:17:52,772][18366] Num frames 10000... [2023-02-23 11:17:52,885][18366] Num frames 10100... [2023-02-23 11:17:52,997][18366] Num frames 10200... [2023-02-23 11:17:53,121][18366] Num frames 10300... [2023-02-23 11:17:53,235][18366] Num frames 10400... [2023-02-23 11:17:53,351][18366] Num frames 10500... [2023-02-23 11:17:53,474][18366] Num frames 10600... [2023-02-23 11:17:53,557][18366] Avg episode rewards: #0: 24.016, true rewards: #0: 10.616 [2023-02-23 11:17:53,558][18366] Avg episode reward: 24.016, avg true_objective: 10.616 [2023-02-23 11:18:57,364][18366] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 11:18:58,012][18366] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 11:18:58,015][18366] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 11:18:58,016][18366] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 11:18:58,020][18366] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 11:18:58,021][18366] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 11:18:58,025][18366] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 11:18:58,028][18366] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 11:18:58,030][18366] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 11:18:58,032][18366] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 11:18:58,034][18366] Adding new argument 'hf_repository'='keshan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 11:18:58,039][18366] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 11:18:58,040][18366] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 11:18:58,041][18366] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 11:18:58,042][18366] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 11:18:58,046][18366] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 11:18:58,076][18366] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 11:18:58,079][18366] RunningMeanStd input shape: (1,) [2023-02-23 11:18:58,102][18366] ConvEncoder: input_channels=3 [2023-02-23 11:18:58,163][18366] Conv encoder output size: 512 [2023-02-23 11:18:58,166][18366] Policy head output size: 512 [2023-02-23 11:18:58,193][18366] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-23 11:18:58,862][18366] Num frames 100... [2023-02-23 11:18:59,022][18366] Num frames 200... [2023-02-23 11:18:59,181][18366] Num frames 300... [2023-02-23 11:18:59,344][18366] Num frames 400... [2023-02-23 11:18:59,494][18366] Num frames 500... [2023-02-23 11:18:59,649][18366] Num frames 600... [2023-02-23 11:18:59,814][18366] Num frames 700... [2023-02-23 11:18:59,967][18366] Num frames 800... [2023-02-23 11:19:00,121][18366] Num frames 900... [2023-02-23 11:19:00,278][18366] Num frames 1000... [2023-02-23 11:19:00,433][18366] Num frames 1100... [2023-02-23 11:19:00,586][18366] Num frames 1200... [2023-02-23 11:19:00,740][18366] Num frames 1300... [2023-02-23 11:19:00,907][18366] Num frames 1400... [2023-02-23 11:19:01,072][18366] Num frames 1500... [2023-02-23 11:19:01,236][18366] Num frames 1600... [2023-02-23 11:19:01,405][18366] Num frames 1700... [2023-02-23 11:19:01,625][18366] Num frames 1800... [2023-02-23 11:19:01,785][18366] Num frames 1900... [2023-02-23 11:19:01,954][18366] Avg episode rewards: #0: 52.549, true rewards: #0: 19.550 [2023-02-23 11:19:01,957][18366] Avg episode reward: 52.549, avg true_objective: 19.550 [2023-02-23 11:19:02,056][18366] Num frames 2000... [2023-02-23 11:19:02,220][18366] Num frames 2100... [2023-02-23 11:19:02,374][18366] Num frames 2200... [2023-02-23 11:19:02,531][18366] Num frames 2300... [2023-02-23 11:19:02,682][18366] Num frames 2400... [2023-02-23 11:19:02,888][18366] Avg episode rewards: #0: 30.495, true rewards: #0: 12.495 [2023-02-23 11:19:02,891][18366] Avg episode reward: 30.495, avg true_objective: 12.495 [2023-02-23 11:19:02,897][18366] Num frames 2500... [2023-02-23 11:19:03,047][18366] Num frames 2600... [2023-02-23 11:19:03,203][18366] Num frames 2700... [2023-02-23 11:19:03,347][18366] Num frames 2800... [2023-02-23 11:19:03,467][18366] Num frames 2900... [2023-02-23 11:19:03,593][18366] Num frames 3000... [2023-02-23 11:19:03,705][18366] Num frames 3100... [2023-02-23 11:19:03,823][18366] Num frames 3200... [2023-02-23 11:19:03,934][18366] Num frames 3300... [2023-02-23 11:19:04,050][18366] Num frames 3400... [2023-02-23 11:19:04,164][18366] Num frames 3500... [2023-02-23 11:19:04,308][18366] Num frames 3600... [2023-02-23 11:19:04,473][18366] Num frames 3700... [2023-02-23 11:19:04,641][18366] Num frames 3800... [2023-02-23 11:19:04,797][18366] Num frames 3900... [2023-02-23 11:19:04,955][18366] Num frames 4000... [2023-02-23 11:19:05,072][18366] Avg episode rewards: #0: 31.450, true rewards: #0: 13.450 [2023-02-23 11:19:05,076][18366] Avg episode reward: 31.450, avg true_objective: 13.450 [2023-02-23 11:19:05,179][18366] Num frames 4100... [2023-02-23 11:19:05,340][18366] Num frames 4200... [2023-02-23 11:19:05,494][18366] Num frames 4300... [2023-02-23 11:19:05,649][18366] Num frames 4400... [2023-02-23 11:19:05,805][18366] Num frames 4500... [2023-02-23 11:19:05,958][18366] Num frames 4600... [2023-02-23 11:19:06,118][18366] Num frames 4700... [2023-02-23 11:19:06,287][18366] Num frames 4800... [2023-02-23 11:19:06,449][18366] Num frames 4900... [2023-02-23 11:19:06,621][18366] Num frames 5000... [2023-02-23 11:19:06,788][18366] Num frames 5100... [2023-02-23 11:19:06,948][18366] Num frames 5200... [2023-02-23 11:19:07,115][18366] Num frames 5300... [2023-02-23 11:19:07,275][18366] Num frames 5400... [2023-02-23 11:19:07,443][18366] Num frames 5500... [2023-02-23 11:19:07,572][18366] Avg episode rewards: #0: 31.847, true rewards: #0: 13.847 [2023-02-23 11:19:07,574][18366] Avg episode reward: 31.847, avg true_objective: 13.847 [2023-02-23 11:19:07,689][18366] Num frames 5600... [2023-02-23 11:19:07,804][18366] Num frames 5700... [2023-02-23 11:19:07,930][18366] Num frames 5800... [2023-02-23 11:19:08,043][18366] Num frames 5900... [2023-02-23 11:19:08,153][18366] Num frames 6000... [2023-02-23 11:19:08,265][18366] Num frames 6100... [2023-02-23 11:19:08,377][18366] Num frames 6200... [2023-02-23 11:19:08,490][18366] Num frames 6300... [2023-02-23 11:19:08,650][18366] Avg episode rewards: #0: 29.790, true rewards: #0: 12.790 [2023-02-23 11:19:08,653][18366] Avg episode reward: 29.790, avg true_objective: 12.790 [2023-02-23 11:19:08,662][18366] Num frames 6400... [2023-02-23 11:19:08,773][18366] Num frames 6500... [2023-02-23 11:19:08,892][18366] Num frames 6600... [2023-02-23 11:19:09,003][18366] Num frames 6700... [2023-02-23 11:19:09,115][18366] Num frames 6800... [2023-02-23 11:19:09,218][18366] Avg episode rewards: #0: 25.738, true rewards: #0: 11.405 [2023-02-23 11:19:09,220][18366] Avg episode reward: 25.738, avg true_objective: 11.405 [2023-02-23 11:19:09,286][18366] Num frames 6900... [2023-02-23 11:19:09,398][18366] Num frames 7000... [2023-02-23 11:19:09,515][18366] Num frames 7100... [2023-02-23 11:19:09,643][18366] Num frames 7200... [2023-02-23 11:19:09,754][18366] Num frames 7300... [2023-02-23 11:19:09,868][18366] Num frames 7400... [2023-02-23 11:19:09,978][18366] Num frames 7500... [2023-02-23 11:19:10,092][18366] Num frames 7600... [2023-02-23 11:19:10,205][18366] Num frames 7700... [2023-02-23 11:19:10,272][18366] Avg episode rewards: #0: 24.867, true rewards: #0: 11.010 [2023-02-23 11:19:10,276][18366] Avg episode reward: 24.867, avg true_objective: 11.010 [2023-02-23 11:19:10,381][18366] Num frames 7800... [2023-02-23 11:19:10,495][18366] Num frames 7900... [2023-02-23 11:19:10,606][18366] Num frames 8000... [2023-02-23 11:19:10,724][18366] Num frames 8100... [2023-02-23 11:19:10,839][18366] Num frames 8200... [2023-02-23 11:19:10,955][18366] Num frames 8300... [2023-02-23 11:19:11,068][18366] Num frames 8400... [2023-02-23 11:19:11,182][18366] Num frames 8500... [2023-02-23 11:19:11,300][18366] Num frames 8600... [2023-02-23 11:19:11,427][18366] Num frames 8700... [2023-02-23 11:19:11,539][18366] Num frames 8800... [2023-02-23 11:19:11,659][18366] Num frames 8900... [2023-02-23 11:19:11,774][18366] Num frames 9000... [2023-02-23 11:19:11,909][18366] Num frames 9100... [2023-02-23 11:19:12,032][18366] Num frames 9200... [2023-02-23 11:19:12,147][18366] Num frames 9300... [2023-02-23 11:19:12,263][18366] Num frames 9400... [2023-02-23 11:19:12,379][18366] Num frames 9500... [2023-02-23 11:19:12,495][18366] Num frames 9600... [2023-02-23 11:19:12,598][18366] Avg episode rewards: #0: 28.426, true rewards: #0: 12.051 [2023-02-23 11:19:12,600][18366] Avg episode reward: 28.426, avg true_objective: 12.051 [2023-02-23 11:19:12,667][18366] Num frames 9700... [2023-02-23 11:19:12,787][18366] Num frames 9800... [2023-02-23 11:19:12,902][18366] Num frames 9900... [2023-02-23 11:19:13,014][18366] Num frames 10000... [2023-02-23 11:19:13,129][18366] Num frames 10100... [2023-02-23 11:19:13,242][18366] Num frames 10200... [2023-02-23 11:19:13,357][18366] Num frames 10300... [2023-02-23 11:19:13,468][18366] Num frames 10400... [2023-02-23 11:19:13,581][18366] Num frames 10500... [2023-02-23 11:19:13,704][18366] Num frames 10600... [2023-02-23 11:19:13,819][18366] Num frames 10700... [2023-02-23 11:19:13,930][18366] Num frames 10800... [2023-02-23 11:19:14,047][18366] Num frames 10900... [2023-02-23 11:19:14,162][18366] Num frames 11000... [2023-02-23 11:19:14,276][18366] Num frames 11100... [2023-02-23 11:19:14,370][18366] Avg episode rewards: #0: 29.814, true rewards: #0: 12.370 [2023-02-23 11:19:14,371][18366] Avg episode reward: 29.814, avg true_objective: 12.370 [2023-02-23 11:19:14,448][18366] Num frames 11200... [2023-02-23 11:19:14,560][18366] Num frames 11300... [2023-02-23 11:19:14,681][18366] Num frames 11400... [2023-02-23 11:19:14,799][18366] Num frames 11500... [2023-02-23 11:19:14,917][18366] Num frames 11600... [2023-02-23 11:19:15,031][18366] Num frames 11700... [2023-02-23 11:19:15,147][18366] Num frames 11800... [2023-02-23 11:19:15,260][18366] Num frames 11900... [2023-02-23 11:19:15,373][18366] Num frames 12000... [2023-02-23 11:19:15,465][18366] Avg episode rewards: #0: 28.432, true rewards: #0: 12.032 [2023-02-23 11:19:15,466][18366] Avg episode reward: 28.432, avg true_objective: 12.032 [2023-02-23 11:20:25,051][18366] Replay video saved to /content/train_dir/default_experiment/replay.mp4!