[2023-02-23 00:31:49,359][05422] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 00:31:49,362][05422] Rollout worker 0 uses device cpu [2023-02-23 00:31:49,364][05422] Rollout worker 1 uses device cpu [2023-02-23 00:31:49,366][05422] Rollout worker 2 uses device cpu [2023-02-23 00:31:49,367][05422] Rollout worker 3 uses device cpu [2023-02-23 00:31:49,369][05422] Rollout worker 4 uses device cpu [2023-02-23 00:31:49,370][05422] Rollout worker 5 uses device cpu [2023-02-23 00:31:49,371][05422] Rollout worker 6 uses device cpu [2023-02-23 00:31:49,373][05422] Rollout worker 7 uses device cpu [2023-02-23 00:31:49,553][05422] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:31:49,555][05422] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 00:31:49,586][05422] Starting all processes... [2023-02-23 00:31:49,588][05422] Starting process learner_proc0 [2023-02-23 00:31:49,641][05422] Starting all processes... [2023-02-23 00:31:49,652][05422] Starting process inference_proc0-0 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc0 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc1 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc2 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc3 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc4 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc5 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc6 [2023-02-23 00:31:49,654][05422] Starting process rollout_proc7 [2023-02-23 00:32:01,167][11215] Worker 0 uses CPU cores [0] [2023-02-23 00:32:01,315][11223] Worker 7 uses CPU cores [1] [2023-02-23 00:32:01,400][11219] Worker 3 uses CPU cores [1] [2023-02-23 00:32:01,406][11201] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:32:01,406][11201] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 00:32:01,410][11222] Worker 6 uses CPU cores [0] [2023-02-23 00:32:01,465][11221] Worker 5 uses CPU cores [1] [2023-02-23 00:32:01,478][11217] Worker 1 uses CPU cores [1] [2023-02-23 00:32:01,488][11220] Worker 4 uses CPU cores [0] [2023-02-23 00:32:01,535][11216] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:32:01,535][11216] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 00:32:01,561][11218] Worker 2 uses CPU cores [0] [2023-02-23 00:32:02,049][11201] Num visible devices: 1 [2023-02-23 00:32:02,049][11216] Num visible devices: 1 [2023-02-23 00:32:02,059][11201] Starting seed is not provided [2023-02-23 00:32:02,059][11201] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:32:02,059][11201] Initializing actor-critic model on device cuda:0 [2023-02-23 00:32:02,060][11201] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:32:02,061][11201] RunningMeanStd input shape: (1,) [2023-02-23 00:32:02,073][11201] ConvEncoder: input_channels=3 [2023-02-23 00:32:02,335][11201] Conv encoder output size: 512 [2023-02-23 00:32:02,336][11201] Policy head output size: 512 [2023-02-23 00:32:02,381][11201] Created Actor Critic model with architecture: [2023-02-23 00:32:02,381][11201] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 00:32:08,692][11201] Using optimizer [2023-02-23 00:32:08,694][11201] No checkpoints found [2023-02-23 00:32:08,694][11201] Did not load from checkpoint, starting from scratch! [2023-02-23 00:32:08,695][11201] Initialized policy 0 weights for model version 0 [2023-02-23 00:32:08,698][11201] LearnerWorker_p0 finished initialization! [2023-02-23 00:32:08,699][11201] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:32:08,916][11216] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:32:08,917][11216] RunningMeanStd input shape: (1,) [2023-02-23 00:32:08,929][11216] ConvEncoder: input_channels=3 [2023-02-23 00:32:09,023][11216] Conv encoder output size: 512 [2023-02-23 00:32:09,024][11216] Policy head output size: 512 [2023-02-23 00:32:09,546][05422] Heartbeat connected on Batcher_0 [2023-02-23 00:32:09,550][05422] Heartbeat connected on LearnerWorker_p0 [2023-02-23 00:32:09,565][05422] Heartbeat connected on RolloutWorker_w0 [2023-02-23 00:32:09,569][05422] Heartbeat connected on RolloutWorker_w1 [2023-02-23 00:32:09,573][05422] Heartbeat connected on RolloutWorker_w2 [2023-02-23 00:32:09,578][05422] Heartbeat connected on RolloutWorker_w3 [2023-02-23 00:32:09,579][05422] Heartbeat connected on RolloutWorker_w5 [2023-02-23 00:32:09,582][05422] Heartbeat connected on RolloutWorker_w4 [2023-02-23 00:32:09,585][05422] Heartbeat connected on RolloutWorker_w6 [2023-02-23 00:32:09,589][05422] Heartbeat connected on RolloutWorker_w7 [2023-02-23 00:32:10,488][05422] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:32:11,793][05422] Inference worker 0-0 is ready! [2023-02-23 00:32:11,798][05422] All inference workers are ready! Signal rollout workers to start! [2023-02-23 00:32:11,800][05422] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 00:32:11,907][11220] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:11,939][11222] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:11,963][11215] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:11,970][11218] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:12,038][11223] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:12,067][11221] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:12,082][11217] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:12,213][11219] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:32:13,698][11223] Decorrelating experience for 0 frames... [2023-02-23 00:32:13,699][11221] Decorrelating experience for 0 frames... [2023-02-23 00:32:13,707][11222] Decorrelating experience for 0 frames... [2023-02-23 00:32:13,707][11220] Decorrelating experience for 0 frames... [2023-02-23 00:32:13,722][11215] Decorrelating experience for 0 frames... [2023-02-23 00:32:13,745][11218] Decorrelating experience for 0 frames... [2023-02-23 00:32:14,804][11223] Decorrelating experience for 32 frames... [2023-02-23 00:32:14,810][11221] Decorrelating experience for 32 frames... [2023-02-23 00:32:14,938][11219] Decorrelating experience for 0 frames... [2023-02-23 00:32:15,271][11220] Decorrelating experience for 32 frames... [2023-02-23 00:32:15,276][11222] Decorrelating experience for 32 frames... [2023-02-23 00:32:15,281][11215] Decorrelating experience for 32 frames... [2023-02-23 00:32:15,488][05422] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:32:16,252][11219] Decorrelating experience for 32 frames... [2023-02-23 00:32:16,264][11217] Decorrelating experience for 0 frames... [2023-02-23 00:32:16,325][11218] Decorrelating experience for 32 frames... [2023-02-23 00:32:16,395][11223] Decorrelating experience for 64 frames... [2023-02-23 00:32:16,562][11215] Decorrelating experience for 64 frames... [2023-02-23 00:32:17,203][11217] Decorrelating experience for 32 frames... [2023-02-23 00:32:17,403][11223] Decorrelating experience for 96 frames... [2023-02-23 00:32:17,774][11222] Decorrelating experience for 64 frames... [2023-02-23 00:32:17,880][11220] Decorrelating experience for 64 frames... [2023-02-23 00:32:18,067][11218] Decorrelating experience for 64 frames... [2023-02-23 00:32:18,157][11221] Decorrelating experience for 64 frames... [2023-02-23 00:32:18,271][11215] Decorrelating experience for 96 frames... [2023-02-23 00:32:18,770][11217] Decorrelating experience for 64 frames... [2023-02-23 00:32:18,850][11219] Decorrelating experience for 64 frames... [2023-02-23 00:32:19,265][11217] Decorrelating experience for 96 frames... [2023-02-23 00:32:19,475][11220] Decorrelating experience for 96 frames... [2023-02-23 00:32:19,573][11222] Decorrelating experience for 96 frames... [2023-02-23 00:32:19,837][11219] Decorrelating experience for 96 frames... [2023-02-23 00:32:20,153][11221] Decorrelating experience for 96 frames... [2023-02-23 00:32:20,359][11218] Decorrelating experience for 96 frames... [2023-02-23 00:32:20,488][05422] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:32:23,539][11201] Signal inference workers to stop experience collection... [2023-02-23 00:32:23,562][11216] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 00:32:25,488][05422] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 148.5. Samples: 2228. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:32:25,490][05422] Avg episode reward: [(0, '2.063')] [2023-02-23 00:32:26,047][11201] Signal inference workers to resume experience collection... [2023-02-23 00:32:26,049][11216] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 00:32:30,488][05422] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 214.1. Samples: 4282. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-23 00:32:30,493][05422] Avg episode reward: [(0, '3.157')] [2023-02-23 00:32:35,488][05422] Fps is (10 sec: 3276.8, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 259.7. Samples: 6492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-23 00:32:35,490][05422] Avg episode reward: [(0, '3.782')] [2023-02-23 00:32:37,105][11216] Updated weights for policy 0, policy_version 10 (0.0012) [2023-02-23 00:32:40,488][05422] Fps is (10 sec: 3686.4, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 53248. Throughput: 0: 424.9. Samples: 12748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:32:40,490][05422] Avg episode reward: [(0, '4.381')] [2023-02-23 00:32:45,491][05422] Fps is (10 sec: 4504.3, 60 sec: 2223.4, 300 sec: 2223.4). Total num frames: 77824. Throughput: 0: 573.8. Samples: 20086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:32:45,498][05422] Avg episode reward: [(0, '4.412')] [2023-02-23 00:32:45,755][11216] Updated weights for policy 0, policy_version 20 (0.0015) [2023-02-23 00:32:50,490][05422] Fps is (10 sec: 4095.2, 60 sec: 2355.1, 300 sec: 2355.1). Total num frames: 94208. Throughput: 0: 563.2. Samples: 22530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:32:50,496][05422] Avg episode reward: [(0, '4.195')] [2023-02-23 00:32:55,488][05422] Fps is (10 sec: 3277.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 601.8. Samples: 27082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 00:32:55,491][05422] Avg episode reward: [(0, '4.247')] [2023-02-23 00:32:55,497][11201] Saving new best policy, reward=4.247! [2023-02-23 00:32:57,855][11216] Updated weights for policy 0, policy_version 30 (0.0014) [2023-02-23 00:33:00,488][05422] Fps is (10 sec: 4096.7, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 748.2. Samples: 33670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:33:00,496][05422] Avg episode reward: [(0, '4.374')] [2023-02-23 00:33:00,499][11201] Saving new best policy, reward=4.374! [2023-02-23 00:33:05,499][05422] Fps is (10 sec: 4500.7, 60 sec: 2829.4, 300 sec: 2829.4). Total num frames: 155648. Throughput: 0: 827.9. Samples: 37266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 00:33:05,507][05422] Avg episode reward: [(0, '4.275')] [2023-02-23 00:33:07,245][11216] Updated weights for policy 0, policy_version 40 (0.0015) [2023-02-23 00:33:10,488][05422] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 898.7. Samples: 42668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:33:10,491][05422] Avg episode reward: [(0, '4.472')] [2023-02-23 00:33:10,493][11201] Saving new best policy, reward=4.472! [2023-02-23 00:33:15,488][05422] Fps is (10 sec: 3280.4, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 959.1. Samples: 47442. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:33:15,490][05422] Avg episode reward: [(0, '4.576')] [2023-02-23 00:33:15,497][11201] Saving new best policy, reward=4.576! [2023-02-23 00:33:18,395][11216] Updated weights for policy 0, policy_version 50 (0.0025) [2023-02-23 00:33:20,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 986.8. Samples: 50898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:33:20,490][05422] Avg episode reward: [(0, '4.417')] [2023-02-23 00:33:25,488][05422] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3112.9). Total num frames: 233472. Throughput: 0: 1010.6. Samples: 58226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:33:25,490][05422] Avg episode reward: [(0, '4.368')] [2023-02-23 00:33:28,194][11216] Updated weights for policy 0, policy_version 60 (0.0040) [2023-02-23 00:33:30,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 953.9. Samples: 63010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:33:30,493][05422] Avg episode reward: [(0, '4.546')] [2023-02-23 00:33:35,488][05422] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 951.6. Samples: 65350. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-23 00:33:35,489][05422] Avg episode reward: [(0, '4.477')] [2023-02-23 00:33:39,020][11216] Updated weights for policy 0, policy_version 70 (0.0012) [2023-02-23 00:33:40,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3231.3). Total num frames: 290816. Throughput: 0: 998.8. Samples: 72028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:33:40,496][05422] Avg episode reward: [(0, '4.382')] [2023-02-23 00:33:45,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.7, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 1007.9. Samples: 79026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:33:45,494][05422] Avg episode reward: [(0, '4.430')] [2023-02-23 00:33:45,510][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... [2023-02-23 00:33:49,562][11216] Updated weights for policy 0, policy_version 80 (0.0022) [2023-02-23 00:33:50,489][05422] Fps is (10 sec: 3686.0, 60 sec: 3891.3, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 976.9. Samples: 81216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:33:50,494][05422] Avg episode reward: [(0, '4.358')] [2023-02-23 00:33:55,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 959.4. Samples: 85840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:33:55,494][05422] Avg episode reward: [(0, '4.352')] [2023-02-23 00:33:59,485][11216] Updated weights for policy 0, policy_version 90 (0.0011) [2023-02-23 00:34:00,488][05422] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 1012.9. Samples: 93024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:34:00,490][05422] Avg episode reward: [(0, '4.369')] [2023-02-23 00:34:05,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3960.2, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 1018.1. Samples: 96714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:34:05,499][05422] Avg episode reward: [(0, '4.411')] [2023-02-23 00:34:10,448][11216] Updated weights for policy 0, policy_version 100 (0.0032) [2023-02-23 00:34:10,488][05422] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 964.9. Samples: 101648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:10,493][05422] Avg episode reward: [(0, '4.520')] [2023-02-23 00:34:15,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3407.9). Total num frames: 425984. Throughput: 0: 976.7. Samples: 106962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:34:15,494][05422] Avg episode reward: [(0, '4.604')] [2023-02-23 00:34:15,499][11201] Saving new best policy, reward=4.604! [2023-02-23 00:34:19,961][11216] Updated weights for policy 0, policy_version 110 (0.0020) [2023-02-23 00:34:20,488][05422] Fps is (10 sec: 4096.3, 60 sec: 3959.5, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 1004.7. Samples: 110560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:20,493][05422] Avg episode reward: [(0, '4.377')] [2023-02-23 00:34:25,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 1010.4. Samples: 117494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:34:25,494][05422] Avg episode reward: [(0, '4.520')] [2023-02-23 00:34:30,488][05422] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 955.6. Samples: 122026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:30,490][05422] Avg episode reward: [(0, '4.542')] [2023-02-23 00:34:31,594][11216] Updated weights for policy 0, policy_version 120 (0.0033) [2023-02-23 00:34:35,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3474.5). Total num frames: 503808. Throughput: 0: 958.4. Samples: 124342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:34:35,494][05422] Avg episode reward: [(0, '4.557')] [2023-02-23 00:34:40,488][05422] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3522.6). Total num frames: 528384. Throughput: 0: 1011.8. Samples: 131370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:40,495][05422] Avg episode reward: [(0, '4.442')] [2023-02-23 00:34:40,766][11216] Updated weights for policy 0, policy_version 130 (0.0025) [2023-02-23 00:34:45,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3541.1). Total num frames: 548864. Throughput: 0: 995.8. Samples: 137836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:45,493][05422] Avg episode reward: [(0, '4.619')] [2023-02-23 00:34:45,506][11201] Saving new best policy, reward=4.619! [2023-02-23 00:34:50,489][05422] Fps is (10 sec: 3686.0, 60 sec: 3959.5, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 963.6. Samples: 140078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:34:50,491][05422] Avg episode reward: [(0, '4.621')] [2023-02-23 00:34:50,497][11201] Saving new best policy, reward=4.621! [2023-02-23 00:34:52,760][11216] Updated weights for policy 0, policy_version 140 (0.0034) [2023-02-23 00:34:55,488][05422] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 966.3. Samples: 145132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:34:55,496][05422] Avg episode reward: [(0, '4.546')] [2023-02-23 00:35:00,488][05422] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3565.9). Total num frames: 606208. Throughput: 0: 1007.3. Samples: 152290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:35:00,495][05422] Avg episode reward: [(0, '4.507')] [2023-02-23 00:35:01,334][11216] Updated weights for policy 0, policy_version 150 (0.0015) [2023-02-23 00:35:05,489][05422] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3581.0). Total num frames: 626688. Throughput: 0: 1005.4. Samples: 155804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:35:05,501][05422] Avg episode reward: [(0, '4.651')] [2023-02-23 00:35:05,518][11201] Saving new best policy, reward=4.651! [2023-02-23 00:35:10,488][05422] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 950.2. Samples: 160252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:35:10,495][05422] Avg episode reward: [(0, '4.751')] [2023-02-23 00:35:10,496][11201] Saving new best policy, reward=4.751! [2023-02-23 00:35:13,208][11216] Updated weights for policy 0, policy_version 160 (0.0021) [2023-02-23 00:35:15,488][05422] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3586.8). Total num frames: 663552. Throughput: 0: 983.3. Samples: 166274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:35:15,495][05422] Avg episode reward: [(0, '4.696')] [2023-02-23 00:35:20,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3621.7). Total num frames: 688128. Throughput: 0: 1012.9. Samples: 169924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:35:20,490][05422] Avg episode reward: [(0, '4.607')] [2023-02-23 00:35:21,694][11216] Updated weights for policy 0, policy_version 170 (0.0019) [2023-02-23 00:35:25,488][05422] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 999.3. Samples: 176338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:35:25,495][05422] Avg episode reward: [(0, '4.675')] [2023-02-23 00:35:30,488][05422] Fps is (10 sec: 3276.5, 60 sec: 3891.2, 300 sec: 3604.5). Total num frames: 720896. Throughput: 0: 956.0. Samples: 180856. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:35:30,493][05422] Avg episode reward: [(0, '4.637')] [2023-02-23 00:35:33,682][11216] Updated weights for policy 0, policy_version 180 (0.0019) [2023-02-23 00:35:35,488][05422] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3636.4). Total num frames: 745472. Throughput: 0: 973.8. Samples: 183900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:35:35,495][05422] Avg episode reward: [(0, '4.954')] [2023-02-23 00:35:35,506][11201] Saving new best policy, reward=4.954! [2023-02-23 00:35:40,488][05422] Fps is (10 sec: 4506.0, 60 sec: 3959.5, 300 sec: 3647.4). Total num frames: 765952. Throughput: 0: 1023.1. Samples: 191172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:35:40,494][05422] Avg episode reward: [(0, '5.142')] [2023-02-23 00:35:40,501][11201] Saving new best policy, reward=5.142! [2023-02-23 00:35:42,661][11216] Updated weights for policy 0, policy_version 190 (0.0015) [2023-02-23 00:35:45,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3657.8). Total num frames: 786432. Throughput: 0: 991.7. Samples: 196916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:35:45,493][05422] Avg episode reward: [(0, '5.108')] [2023-02-23 00:35:45,510][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... [2023-02-23 00:35:50,489][05422] Fps is (10 sec: 3276.5, 60 sec: 3891.2, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 963.5. Samples: 199160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:35:50,493][05422] Avg episode reward: [(0, '5.375')] [2023-02-23 00:35:50,500][11201] Saving new best policy, reward=5.375! [2023-02-23 00:35:54,078][11216] Updated weights for policy 0, policy_version 200 (0.0020) [2023-02-23 00:35:55,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3659.1). Total num frames: 823296. Throughput: 0: 995.6. Samples: 205056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:35:55,494][05422] Avg episode reward: [(0, '5.607')] [2023-02-23 00:35:55,507][11201] Saving new best policy, reward=5.607! [2023-02-23 00:36:00,492][05422] Fps is (10 sec: 4913.3, 60 sec: 4027.4, 300 sec: 3686.3). Total num frames: 847872. Throughput: 0: 1021.7. Samples: 212254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:36:00,503][05422] Avg episode reward: [(0, '5.957')] [2023-02-23 00:36:00,508][11201] Saving new best policy, reward=5.957! [2023-02-23 00:36:03,691][11216] Updated weights for policy 0, policy_version 210 (0.0013) [2023-02-23 00:36:05,488][05422] Fps is (10 sec: 4095.8, 60 sec: 3959.5, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 1000.2. Samples: 214932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:36:05,491][05422] Avg episode reward: [(0, '6.104')] [2023-02-23 00:36:05,504][11201] Saving new best policy, reward=6.104! [2023-02-23 00:36:10,488][05422] Fps is (10 sec: 3278.3, 60 sec: 3959.5, 300 sec: 3669.3). Total num frames: 880640. Throughput: 0: 959.5. Samples: 219514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:36:10,493][05422] Avg episode reward: [(0, '6.021')] [2023-02-23 00:36:14,516][11216] Updated weights for policy 0, policy_version 220 (0.0018) [2023-02-23 00:36:15,488][05422] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3694.8). Total num frames: 905216. Throughput: 0: 1009.1. Samples: 226264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:36:15,490][05422] Avg episode reward: [(0, '6.272')] [2023-02-23 00:36:15,502][11201] Saving new best policy, reward=6.272! [2023-02-23 00:36:20,488][05422] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3702.8). Total num frames: 925696. Throughput: 0: 1020.8. Samples: 229836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:36:20,494][05422] Avg episode reward: [(0, '6.347')] [2023-02-23 00:36:20,526][11201] Saving new best policy, reward=6.347! [2023-02-23 00:36:24,524][11216] Updated weights for policy 0, policy_version 230 (0.0012) [2023-02-23 00:36:25,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 985.8. Samples: 235532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:36:25,493][05422] Avg episode reward: [(0, '6.051')] [2023-02-23 00:36:30,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 958464. Throughput: 0: 960.1. Samples: 240122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:36:30,494][05422] Avg episode reward: [(0, '5.714')] [2023-02-23 00:36:34,909][11216] Updated weights for policy 0, policy_version 240 (0.0012) [2023-02-23 00:36:35,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3709.6). Total num frames: 983040. Throughput: 0: 987.2. Samples: 243582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:36:35,495][05422] Avg episode reward: [(0, '5.960')] [2023-02-23 00:36:40,488][05422] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 3731.9). Total num frames: 1007616. Throughput: 0: 1023.1. Samples: 251094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:36:40,490][05422] Avg episode reward: [(0, '6.455')] [2023-02-23 00:36:40,497][11201] Saving new best policy, reward=6.455! [2023-02-23 00:36:45,067][11216] Updated weights for policy 0, policy_version 250 (0.0014) [2023-02-23 00:36:45,490][05422] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 3723.6). Total num frames: 1024000. Throughput: 0: 977.9. Samples: 256256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:36:45,493][05422] Avg episode reward: [(0, '6.378')] [2023-02-23 00:36:50,488][05422] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 3715.7). Total num frames: 1040384. Throughput: 0: 967.9. Samples: 258488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:36:50,491][05422] Avg episode reward: [(0, '6.737')] [2023-02-23 00:36:50,497][11201] Saving new best policy, reward=6.737! [2023-02-23 00:36:55,171][11216] Updated weights for policy 0, policy_version 260 (0.0017) [2023-02-23 00:36:55,488][05422] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3736.7). Total num frames: 1064960. Throughput: 0: 1011.8. Samples: 265044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:36:55,491][05422] Avg episode reward: [(0, '6.783')] [2023-02-23 00:36:55,505][11201] Saving new best policy, reward=6.783! [2023-02-23 00:37:00,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3742.9). Total num frames: 1085440. Throughput: 0: 1019.7. Samples: 272152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:37:00,493][05422] Avg episode reward: [(0, '6.669')] [2023-02-23 00:37:05,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 991.0. Samples: 274430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:37:05,493][05422] Avg episode reward: [(0, '7.199')] [2023-02-23 00:37:05,503][11201] Saving new best policy, reward=7.199! [2023-02-23 00:37:06,276][11216] Updated weights for policy 0, policy_version 270 (0.0012) [2023-02-23 00:37:10,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 967.2. Samples: 279056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:37:10,491][05422] Avg episode reward: [(0, '7.036')] [2023-02-23 00:37:15,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1022.8. Samples: 286146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:37:15,490][05422] Avg episode reward: [(0, '6.957')] [2023-02-23 00:37:15,823][11216] Updated weights for policy 0, policy_version 280 (0.0029) [2023-02-23 00:37:20,489][05422] Fps is (10 sec: 4914.4, 60 sec: 4027.6, 300 sec: 3957.1). Total num frames: 1167360. Throughput: 0: 1026.9. Samples: 289794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:37:20,495][05422] Avg episode reward: [(0, '6.942')] [2023-02-23 00:37:25,491][05422] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 3943.2). Total num frames: 1179648. Throughput: 0: 977.7. Samples: 295092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:37:25,497][05422] Avg episode reward: [(0, '7.381')] [2023-02-23 00:37:25,511][11201] Saving new best policy, reward=7.381! [2023-02-23 00:37:27,081][11216] Updated weights for policy 0, policy_version 290 (0.0020) [2023-02-23 00:37:30,488][05422] Fps is (10 sec: 3277.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1200128. Throughput: 0: 969.9. Samples: 299898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:37:30,494][05422] Avg episode reward: [(0, '7.528')] [2023-02-23 00:37:30,499][11201] Saving new best policy, reward=7.528! [2023-02-23 00:37:35,488][05422] Fps is (10 sec: 4507.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1224704. Throughput: 0: 1001.2. Samples: 303542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:37:35,490][05422] Avg episode reward: [(0, '8.936')] [2023-02-23 00:37:35,502][11201] Saving new best policy, reward=8.936! [2023-02-23 00:37:36,302][11216] Updated weights for policy 0, policy_version 300 (0.0017) [2023-02-23 00:37:40,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1245184. Throughput: 0: 1018.3. Samples: 310866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:37:40,493][05422] Avg episode reward: [(0, '9.182')] [2023-02-23 00:37:40,498][11201] Saving new best policy, reward=9.182! [2023-02-23 00:37:45,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 1261568. Throughput: 0: 965.8. Samples: 315614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:37:45,497][05422] Avg episode reward: [(0, '9.474')] [2023-02-23 00:37:45,511][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth... [2023-02-23 00:37:45,644][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth [2023-02-23 00:37:45,662][11201] Saving new best policy, reward=9.474! [2023-02-23 00:37:47,925][11216] Updated weights for policy 0, policy_version 310 (0.0024) [2023-02-23 00:37:50,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1277952. Throughput: 0: 964.8. Samples: 317844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:37:50,498][05422] Avg episode reward: [(0, '8.774')] [2023-02-23 00:37:55,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1302528. Throughput: 0: 1016.0. Samples: 324776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:37:55,495][05422] Avg episode reward: [(0, '8.957')] [2023-02-23 00:37:56,645][11216] Updated weights for policy 0, policy_version 320 (0.0020) [2023-02-23 00:38:00,491][05422] Fps is (10 sec: 4504.2, 60 sec: 3959.3, 300 sec: 3957.3). Total num frames: 1323008. Throughput: 0: 1007.8. Samples: 331502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:38:00,493][05422] Avg episode reward: [(0, '8.858')] [2023-02-23 00:38:05,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1339392. Throughput: 0: 979.5. Samples: 333868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:38:05,493][05422] Avg episode reward: [(0, '8.496')] [2023-02-23 00:38:08,601][11216] Updated weights for policy 0, policy_version 330 (0.0025) [2023-02-23 00:38:10,488][05422] Fps is (10 sec: 3687.5, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1359872. Throughput: 0: 969.6. Samples: 338722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:38:10,490][05422] Avg episode reward: [(0, '8.330')] [2023-02-23 00:38:15,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1384448. Throughput: 0: 1026.4. Samples: 346088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:38:15,490][05422] Avg episode reward: [(0, '8.186')] [2023-02-23 00:38:16,936][11216] Updated weights for policy 0, policy_version 340 (0.0012) [2023-02-23 00:38:20,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 1404928. Throughput: 0: 1026.8. Samples: 349748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:38:20,490][05422] Avg episode reward: [(0, '8.229')] [2023-02-23 00:38:25,488][05422] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 3971.0). Total num frames: 1421312. Throughput: 0: 971.2. Samples: 354568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:38:25,494][05422] Avg episode reward: [(0, '8.576')] [2023-02-23 00:38:29,049][11216] Updated weights for policy 0, policy_version 350 (0.0021) [2023-02-23 00:38:30,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1437696. Throughput: 0: 985.1. Samples: 359942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:38:30,490][05422] Avg episode reward: [(0, '9.087')] [2023-02-23 00:38:35,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1462272. Throughput: 0: 1015.7. Samples: 363552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:38:35,490][05422] Avg episode reward: [(0, '10.211')] [2023-02-23 00:38:35,502][11201] Saving new best policy, reward=10.211! [2023-02-23 00:38:37,517][11216] Updated weights for policy 0, policy_version 360 (0.0013) [2023-02-23 00:38:40,491][05422] Fps is (10 sec: 4504.3, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 1482752. Throughput: 0: 1015.9. Samples: 370496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:38:40,495][05422] Avg episode reward: [(0, '9.502')] [2023-02-23 00:38:45,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1499136. Throughput: 0: 968.0. Samples: 375060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:38:45,493][05422] Avg episode reward: [(0, '9.801')] [2023-02-23 00:38:49,731][11216] Updated weights for policy 0, policy_version 370 (0.0022) [2023-02-23 00:38:50,488][05422] Fps is (10 sec: 3277.7, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 1515520. Throughput: 0: 967.6. Samples: 377408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:38:50,493][05422] Avg episode reward: [(0, '9.484')] [2023-02-23 00:38:55,488][05422] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 1540096. Throughput: 0: 1019.0. Samples: 384578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:38:55,491][05422] Avg episode reward: [(0, '10.178')] [2023-02-23 00:38:58,013][11216] Updated weights for policy 0, policy_version 380 (0.0020) [2023-02-23 00:39:00,488][05422] Fps is (10 sec: 4505.7, 60 sec: 3959.7, 300 sec: 3957.2). Total num frames: 1560576. Throughput: 0: 1000.4. Samples: 391106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:39:00,491][05422] Avg episode reward: [(0, '9.981')] [2023-02-23 00:39:05,490][05422] Fps is (10 sec: 3685.8, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 1576960. Throughput: 0: 969.4. Samples: 393372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:05,492][05422] Avg episode reward: [(0, '9.917')] [2023-02-23 00:39:10,109][11216] Updated weights for policy 0, policy_version 390 (0.0028) [2023-02-23 00:39:10,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1597440. Throughput: 0: 974.7. Samples: 398430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:39:10,490][05422] Avg episode reward: [(0, '10.314')] [2023-02-23 00:39:10,492][11201] Saving new best policy, reward=10.314! [2023-02-23 00:39:15,488][05422] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1622016. Throughput: 0: 1019.5. Samples: 405818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:15,490][05422] Avg episode reward: [(0, '10.825')] [2023-02-23 00:39:15,499][11201] Saving new best policy, reward=10.825! [2023-02-23 00:39:18,731][11216] Updated weights for policy 0, policy_version 400 (0.0012) [2023-02-23 00:39:20,488][05422] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1642496. Throughput: 0: 1019.0. Samples: 409408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:20,490][05422] Avg episode reward: [(0, '10.868')] [2023-02-23 00:39:20,493][11201] Saving new best policy, reward=10.868! [2023-02-23 00:39:25,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1654784. Throughput: 0: 967.8. Samples: 414044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:25,493][05422] Avg episode reward: [(0, '11.587')] [2023-02-23 00:39:25,513][11201] Saving new best policy, reward=11.587! [2023-02-23 00:39:30,488][05422] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1675264. Throughput: 0: 990.6. Samples: 419638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:39:30,490][05422] Avg episode reward: [(0, '10.919')] [2023-02-23 00:39:30,504][11216] Updated weights for policy 0, policy_version 410 (0.0031) [2023-02-23 00:39:35,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1699840. Throughput: 0: 1017.4. Samples: 423192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:35,490][05422] Avg episode reward: [(0, '11.477')] [2023-02-23 00:39:39,961][11216] Updated weights for policy 0, policy_version 420 (0.0012) [2023-02-23 00:39:40,489][05422] Fps is (10 sec: 4505.1, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 1720320. Throughput: 0: 1005.7. Samples: 429836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:40,494][05422] Avg episode reward: [(0, '11.722')] [2023-02-23 00:39:40,501][11201] Saving new best policy, reward=11.722! [2023-02-23 00:39:45,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1736704. Throughput: 0: 961.0. Samples: 434352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:39:45,492][05422] Avg episode reward: [(0, '12.240')] [2023-02-23 00:39:45,503][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth... [2023-02-23 00:39:45,635][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth [2023-02-23 00:39:45,649][11201] Saving new best policy, reward=12.240! [2023-02-23 00:39:50,488][05422] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1757184. Throughput: 0: 965.8. Samples: 436830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:39:50,492][05422] Avg episode reward: [(0, '12.671')] [2023-02-23 00:39:50,495][11201] Saving new best policy, reward=12.671! [2023-02-23 00:39:51,275][11216] Updated weights for policy 0, policy_version 430 (0.0021) [2023-02-23 00:39:55,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 1781760. Throughput: 0: 1012.2. Samples: 443980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:39:55,490][05422] Avg episode reward: [(0, '13.277')] [2023-02-23 00:39:55,501][11201] Saving new best policy, reward=13.277! [2023-02-23 00:40:00,489][05422] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1798144. Throughput: 0: 985.1. Samples: 450148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:00,494][05422] Avg episode reward: [(0, '13.504')] [2023-02-23 00:40:00,504][11201] Saving new best policy, reward=13.504! [2023-02-23 00:40:01,156][11216] Updated weights for policy 0, policy_version 440 (0.0013) [2023-02-23 00:40:05,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 1814528. Throughput: 0: 954.6. Samples: 452366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:40:05,490][05422] Avg episode reward: [(0, '13.061')] [2023-02-23 00:40:10,488][05422] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1835008. Throughput: 0: 971.9. Samples: 457778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:10,494][05422] Avg episode reward: [(0, '13.560')] [2023-02-23 00:40:10,500][11201] Saving new best policy, reward=13.560! [2023-02-23 00:40:11,713][11216] Updated weights for policy 0, policy_version 450 (0.0025) [2023-02-23 00:40:15,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1859584. Throughput: 0: 1011.8. Samples: 465170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:40:15,494][05422] Avg episode reward: [(0, '14.121')] [2023-02-23 00:40:15,504][11201] Saving new best policy, reward=14.121! [2023-02-23 00:40:20,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1875968. Throughput: 0: 1004.6. Samples: 468398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:40:20,490][05422] Avg episode reward: [(0, '14.239')] [2023-02-23 00:40:20,508][11201] Saving new best policy, reward=14.239! [2023-02-23 00:40:22,068][11216] Updated weights for policy 0, policy_version 460 (0.0027) [2023-02-23 00:40:25,488][05422] Fps is (10 sec: 3276.6, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1892352. Throughput: 0: 958.7. Samples: 472978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:25,493][05422] Avg episode reward: [(0, '14.583')] [2023-02-23 00:40:25,506][11201] Saving new best policy, reward=14.583! [2023-02-23 00:40:30,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 1916928. Throughput: 0: 992.9. Samples: 479034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:40:30,490][05422] Avg episode reward: [(0, '14.893')] [2023-02-23 00:40:30,495][11201] Saving new best policy, reward=14.893! [2023-02-23 00:40:32,149][11216] Updated weights for policy 0, policy_version 470 (0.0023) [2023-02-23 00:40:35,488][05422] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1941504. Throughput: 0: 1018.8. Samples: 482674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:40:35,491][05422] Avg episode reward: [(0, '15.143')] [2023-02-23 00:40:35,501][11201] Saving new best policy, reward=15.143! [2023-02-23 00:40:40,489][05422] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 1957888. Throughput: 0: 998.5. Samples: 488916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:40,496][05422] Avg episode reward: [(0, '16.094')] [2023-02-23 00:40:40,500][11201] Saving new best policy, reward=16.094! [2023-02-23 00:40:43,111][11216] Updated weights for policy 0, policy_version 480 (0.0014) [2023-02-23 00:40:45,488][05422] Fps is (10 sec: 2867.3, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 1970176. Throughput: 0: 963.7. Samples: 493514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:45,491][05422] Avg episode reward: [(0, '16.300')] [2023-02-23 00:40:45,570][11201] Saving new best policy, reward=16.300! [2023-02-23 00:40:50,488][05422] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1994752. Throughput: 0: 980.2. Samples: 496476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:40:50,490][05422] Avg episode reward: [(0, '17.453')] [2023-02-23 00:40:50,492][11201] Saving new best policy, reward=17.453! [2023-02-23 00:40:52,640][11216] Updated weights for policy 0, policy_version 490 (0.0017) [2023-02-23 00:40:55,488][05422] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2019328. Throughput: 0: 1021.2. Samples: 503732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:40:55,490][05422] Avg episode reward: [(0, '18.803')] [2023-02-23 00:40:55,507][11201] Saving new best policy, reward=18.803! [2023-02-23 00:41:00,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 2035712. Throughput: 0: 985.2. Samples: 509502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:41:00,494][05422] Avg episode reward: [(0, '19.180')] [2023-02-23 00:41:00,496][11201] Saving new best policy, reward=19.180! [2023-02-23 00:41:03,966][11216] Updated weights for policy 0, policy_version 500 (0.0012) [2023-02-23 00:41:05,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2052096. Throughput: 0: 962.9. Samples: 511728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:41:05,491][05422] Avg episode reward: [(0, '18.818')] [2023-02-23 00:41:10,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2072576. Throughput: 0: 990.5. Samples: 517550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:41:10,494][05422] Avg episode reward: [(0, '17.612')] [2023-02-23 00:41:13,063][11216] Updated weights for policy 0, policy_version 510 (0.0025) [2023-02-23 00:41:15,491][05422] Fps is (10 sec: 4504.3, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 2097152. Throughput: 0: 1019.5. Samples: 524916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:41:15,493][05422] Avg episode reward: [(0, '17.544')] [2023-02-23 00:41:20,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2113536. Throughput: 0: 1002.8. Samples: 527802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:41:20,496][05422] Avg episode reward: [(0, '18.016')] [2023-02-23 00:41:24,731][11216] Updated weights for policy 0, policy_version 520 (0.0024) [2023-02-23 00:41:25,488][05422] Fps is (10 sec: 3277.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2129920. Throughput: 0: 963.3. Samples: 532262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:41:25,493][05422] Avg episode reward: [(0, '19.215')] [2023-02-23 00:41:25,504][11201] Saving new best policy, reward=19.215! [2023-02-23 00:41:30,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2154496. Throughput: 0: 1007.1. Samples: 538834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:41:30,493][05422] Avg episode reward: [(0, '19.725')] [2023-02-23 00:41:30,496][11201] Saving new best policy, reward=19.725! [2023-02-23 00:41:33,618][11216] Updated weights for policy 0, policy_version 530 (0.0021) [2023-02-23 00:41:35,488][05422] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2179072. Throughput: 0: 1021.9. Samples: 542462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:41:35,493][05422] Avg episode reward: [(0, '19.342')] [2023-02-23 00:41:40,488][05422] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3971.1). Total num frames: 2195456. Throughput: 0: 989.3. Samples: 548252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:41:40,491][05422] Avg episode reward: [(0, '21.316')] [2023-02-23 00:41:40,498][11201] Saving new best policy, reward=21.316! [2023-02-23 00:41:45,488][05422] Fps is (10 sec: 2867.0, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 2207744. Throughput: 0: 964.0. Samples: 552882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:41:45,491][05422] Avg episode reward: [(0, '20.291')] [2023-02-23 00:41:45,573][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000540_2211840.pth... [2023-02-23 00:41:45,575][11216] Updated weights for policy 0, policy_version 540 (0.0017) [2023-02-23 00:41:45,682][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth [2023-02-23 00:41:50,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2232320. Throughput: 0: 986.5. Samples: 556122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:41:50,495][05422] Avg episode reward: [(0, '21.022')] [2023-02-23 00:41:54,083][11216] Updated weights for policy 0, policy_version 550 (0.0011) [2023-02-23 00:41:55,493][05422] Fps is (10 sec: 4912.9, 60 sec: 3959.1, 300 sec: 3971.0). Total num frames: 2256896. Throughput: 0: 1019.5. Samples: 563434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:41:55,495][05422] Avg episode reward: [(0, '22.037')] [2023-02-23 00:41:55,508][11201] Saving new best policy, reward=22.037! [2023-02-23 00:42:00,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2273280. Throughput: 0: 972.7. Samples: 568684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:42:00,490][05422] Avg episode reward: [(0, '22.558')] [2023-02-23 00:42:00,495][11201] Saving new best policy, reward=22.558! [2023-02-23 00:42:05,488][05422] Fps is (10 sec: 3278.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2289664. Throughput: 0: 956.8. Samples: 570858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:42:05,495][05422] Avg episode reward: [(0, '21.289')] [2023-02-23 00:42:06,350][11216] Updated weights for policy 0, policy_version 560 (0.0012) [2023-02-23 00:42:10,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2310144. Throughput: 0: 998.1. Samples: 577178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:42:10,493][05422] Avg episode reward: [(0, '20.907')] [2023-02-23 00:42:14,583][11216] Updated weights for policy 0, policy_version 570 (0.0014) [2023-02-23 00:42:15,489][05422] Fps is (10 sec: 4504.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2334720. Throughput: 0: 1016.0. Samples: 584554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:42:15,492][05422] Avg episode reward: [(0, '22.423')] [2023-02-23 00:42:20,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2351104. Throughput: 0: 989.3. Samples: 586982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:42:20,499][05422] Avg episode reward: [(0, '22.625')] [2023-02-23 00:42:20,507][11201] Saving new best policy, reward=22.625! [2023-02-23 00:42:25,488][05422] Fps is (10 sec: 3277.3, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 2367488. Throughput: 0: 964.0. Samples: 591632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:42:25,491][05422] Avg episode reward: [(0, '22.469')] [2023-02-23 00:42:26,595][11216] Updated weights for policy 0, policy_version 580 (0.0015) [2023-02-23 00:42:30,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2392064. Throughput: 0: 1014.0. Samples: 598510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:42:30,493][05422] Avg episode reward: [(0, '21.748')] [2023-02-23 00:42:34,766][11216] Updated weights for policy 0, policy_version 590 (0.0018) [2023-02-23 00:42:35,489][05422] Fps is (10 sec: 4914.8, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2416640. Throughput: 0: 1024.7. Samples: 602234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:42:35,495][05422] Avg episode reward: [(0, '21.653')] [2023-02-23 00:42:40,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2433024. Throughput: 0: 983.1. Samples: 607668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:42:40,499][05422] Avg episode reward: [(0, '21.643')] [2023-02-23 00:42:45,488][05422] Fps is (10 sec: 3277.1, 60 sec: 4027.8, 300 sec: 3971.0). Total num frames: 2449408. Throughput: 0: 972.4. Samples: 612442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:42:45,495][05422] Avg episode reward: [(0, '21.726')] [2023-02-23 00:42:46,908][11216] Updated weights for policy 0, policy_version 600 (0.0030) [2023-02-23 00:42:50,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2473984. Throughput: 0: 1003.8. Samples: 616030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:42:50,495][05422] Avg episode reward: [(0, '23.192')] [2023-02-23 00:42:50,501][11201] Saving new best policy, reward=23.192! [2023-02-23 00:42:55,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3971.1). Total num frames: 2494464. Throughput: 0: 1027.0. Samples: 623394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:42:55,498][05422] Avg episode reward: [(0, '24.235')] [2023-02-23 00:42:55,512][11201] Saving new best policy, reward=24.235! [2023-02-23 00:42:55,528][11216] Updated weights for policy 0, policy_version 610 (0.0018) [2023-02-23 00:43:00,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2510848. Throughput: 0: 969.6. Samples: 628184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:00,489][05422] Avg episode reward: [(0, '24.888')] [2023-02-23 00:43:00,497][11201] Saving new best policy, reward=24.888! [2023-02-23 00:43:05,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2527232. Throughput: 0: 966.0. Samples: 630452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:05,496][05422] Avg episode reward: [(0, '26.120')] [2023-02-23 00:43:05,514][11201] Saving new best policy, reward=26.120! [2023-02-23 00:43:07,368][11216] Updated weights for policy 0, policy_version 620 (0.0022) [2023-02-23 00:43:10,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2551808. Throughput: 0: 1011.7. Samples: 637156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:10,493][05422] Avg episode reward: [(0, '27.034')] [2023-02-23 00:43:10,497][11201] Saving new best policy, reward=27.034! [2023-02-23 00:43:15,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 2572288. Throughput: 0: 1013.5. Samples: 644116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:43:15,493][05422] Avg episode reward: [(0, '25.373')] [2023-02-23 00:43:16,731][11216] Updated weights for policy 0, policy_version 630 (0.0018) [2023-02-23 00:43:20,488][05422] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2588672. Throughput: 0: 982.0. Samples: 646422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:43:20,496][05422] Avg episode reward: [(0, '25.465')] [2023-02-23 00:43:25,488][05422] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2609152. Throughput: 0: 965.9. Samples: 651134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:43:25,495][05422] Avg episode reward: [(0, '24.594')] [2023-02-23 00:43:27,739][11216] Updated weights for policy 0, policy_version 640 (0.0017) [2023-02-23 00:43:30,488][05422] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2633728. Throughput: 0: 1022.6. Samples: 658458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:43:30,491][05422] Avg episode reward: [(0, '23.370')] [2023-02-23 00:43:35,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2654208. Throughput: 0: 1024.0. Samples: 662112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:43:35,495][05422] Avg episode reward: [(0, '21.802')] [2023-02-23 00:43:37,699][11216] Updated weights for policy 0, policy_version 650 (0.0012) [2023-02-23 00:43:40,495][05422] Fps is (10 sec: 3683.8, 60 sec: 3959.0, 300 sec: 3970.9). Total num frames: 2670592. Throughput: 0: 971.5. Samples: 667118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:43:40,497][05422] Avg episode reward: [(0, '21.880')] [2023-02-23 00:43:45,490][05422] Fps is (10 sec: 3276.2, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 2686976. Throughput: 0: 979.4. Samples: 672260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:45,493][05422] Avg episode reward: [(0, '21.545')] [2023-02-23 00:43:45,502][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000656_2686976.pth... [2023-02-23 00:43:45,618][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth [2023-02-23 00:43:48,320][11216] Updated weights for policy 0, policy_version 660 (0.0019) [2023-02-23 00:43:50,488][05422] Fps is (10 sec: 4098.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2711552. Throughput: 0: 1006.6. Samples: 675750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:50,491][05422] Avg episode reward: [(0, '20.230')] [2023-02-23 00:43:55,488][05422] Fps is (10 sec: 4506.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2732032. Throughput: 0: 1021.6. Samples: 683128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:43:55,494][05422] Avg episode reward: [(0, '20.512')] [2023-02-23 00:43:58,353][11216] Updated weights for policy 0, policy_version 670 (0.0011) [2023-02-23 00:44:00,488][05422] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 2748416. Throughput: 0: 967.0. Samples: 687632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:44:00,494][05422] Avg episode reward: [(0, '20.971')] [2023-02-23 00:44:05,488][05422] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2768896. Throughput: 0: 968.5. Samples: 690004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:44:05,491][05422] Avg episode reward: [(0, '20.765')] [2023-02-23 00:44:08,967][11216] Updated weights for policy 0, policy_version 680 (0.0014) [2023-02-23 00:44:10,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2789376. Throughput: 0: 1016.5. Samples: 696876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:44:10,490][05422] Avg episode reward: [(0, '19.482')] [2023-02-23 00:44:15,489][05422] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 2809856. Throughput: 0: 1003.0. Samples: 703594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:44:15,495][05422] Avg episode reward: [(0, '19.989')] [2023-02-23 00:44:19,360][11216] Updated weights for policy 0, policy_version 690 (0.0020) [2023-02-23 00:44:20,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2826240. Throughput: 0: 973.0. Samples: 705898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:44:20,490][05422] Avg episode reward: [(0, '22.313')] [2023-02-23 00:44:25,488][05422] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2846720. Throughput: 0: 973.1. Samples: 710900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:44:25,489][05422] Avg episode reward: [(0, '22.552')] [2023-02-23 00:44:29,102][11216] Updated weights for policy 0, policy_version 700 (0.0040) [2023-02-23 00:44:30,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2871296. Throughput: 0: 1020.3. Samples: 718172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:44:30,495][05422] Avg episode reward: [(0, '24.520')] [2023-02-23 00:44:35,490][05422] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 2891776. Throughput: 0: 1024.7. Samples: 721862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:44:35,492][05422] Avg episode reward: [(0, '24.967')] [2023-02-23 00:44:40,048][11216] Updated weights for policy 0, policy_version 710 (0.0011) [2023-02-23 00:44:40,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3971.0). Total num frames: 2908160. Throughput: 0: 967.5. Samples: 726664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:44:40,491][05422] Avg episode reward: [(0, '25.739')] [2023-02-23 00:44:45,488][05422] Fps is (10 sec: 3687.1, 60 sec: 4027.9, 300 sec: 3971.0). Total num frames: 2928640. Throughput: 0: 988.5. Samples: 732114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:44:45,492][05422] Avg episode reward: [(0, '26.508')] [2023-02-23 00:44:49,539][11216] Updated weights for policy 0, policy_version 720 (0.0011) [2023-02-23 00:44:50,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2953216. Throughput: 0: 1017.1. Samples: 735774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:44:50,494][05422] Avg episode reward: [(0, '26.296')] [2023-02-23 00:44:55,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 2973696. Throughput: 0: 1021.1. Samples: 742824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:44:55,492][05422] Avg episode reward: [(0, '27.919')] [2023-02-23 00:44:55,508][11201] Saving new best policy, reward=27.919! [2023-02-23 00:45:00,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2985984. Throughput: 0: 971.8. Samples: 747324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:45:00,490][05422] Avg episode reward: [(0, '26.484')] [2023-02-23 00:45:00,675][11216] Updated weights for policy 0, policy_version 730 (0.0013) [2023-02-23 00:45:05,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3006464. Throughput: 0: 974.4. Samples: 749746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:45:05,493][05422] Avg episode reward: [(0, '25.442')] [2023-02-23 00:45:10,230][11216] Updated weights for policy 0, policy_version 740 (0.0015) [2023-02-23 00:45:10,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3031040. Throughput: 0: 1017.3. Samples: 756678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:45:10,490][05422] Avg episode reward: [(0, '25.029')] [2023-02-23 00:45:15,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3051520. Throughput: 0: 1000.8. Samples: 763210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:45:15,490][05422] Avg episode reward: [(0, '25.032')] [2023-02-23 00:45:20,488][05422] Fps is (10 sec: 3276.7, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 3063808. Throughput: 0: 970.6. Samples: 765536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:45:20,491][05422] Avg episode reward: [(0, '24.345')] [2023-02-23 00:45:21,838][11216] Updated weights for policy 0, policy_version 750 (0.0025) [2023-02-23 00:45:25,488][05422] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3088384. Throughput: 0: 976.4. Samples: 770600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:45:25,496][05422] Avg episode reward: [(0, '22.208')] [2023-02-23 00:45:30,483][11216] Updated weights for policy 0, policy_version 760 (0.0018) [2023-02-23 00:45:30,494][05422] Fps is (10 sec: 4912.4, 60 sec: 4027.3, 300 sec: 3971.0). Total num frames: 3112960. Throughput: 0: 1017.3. Samples: 777900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:45:30,516][05422] Avg episode reward: [(0, '21.786')] [2023-02-23 00:45:35,488][05422] Fps is (10 sec: 4095.9, 60 sec: 3959.6, 300 sec: 3971.1). Total num frames: 3129344. Throughput: 0: 1017.3. Samples: 781552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:45:35,497][05422] Avg episode reward: [(0, '21.745')] [2023-02-23 00:45:40,490][05422] Fps is (10 sec: 3278.0, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 3145728. Throughput: 0: 959.8. Samples: 786018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:45:40,500][05422] Avg episode reward: [(0, '21.269')] [2023-02-23 00:45:42,494][11216] Updated weights for policy 0, policy_version 770 (0.0046) [2023-02-23 00:45:45,488][05422] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3166208. Throughput: 0: 986.9. Samples: 791736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:45:45,493][05422] Avg episode reward: [(0, '19.545')] [2023-02-23 00:45:45,507][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth... [2023-02-23 00:45:45,634][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000540_2211840.pth [2023-02-23 00:45:50,488][05422] Fps is (10 sec: 4506.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3190784. Throughput: 0: 1013.1. Samples: 795336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:45:50,495][05422] Avg episode reward: [(0, '20.559')] [2023-02-23 00:45:50,915][11216] Updated weights for policy 0, policy_version 780 (0.0025) [2023-02-23 00:45:55,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3211264. Throughput: 0: 1010.6. Samples: 802154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:45:55,493][05422] Avg episode reward: [(0, '22.321')] [2023-02-23 00:46:00,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3223552. Throughput: 0: 967.1. Samples: 806728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:46:00,493][05422] Avg episode reward: [(0, '22.366')] [2023-02-23 00:46:03,037][11216] Updated weights for policy 0, policy_version 790 (0.0037) [2023-02-23 00:46:05,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3244032. Throughput: 0: 973.2. Samples: 809328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:46:05,490][05422] Avg episode reward: [(0, '21.878')] [2023-02-23 00:46:10,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 3268608. Throughput: 0: 1019.9. Samples: 816496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:46:10,490][05422] Avg episode reward: [(0, '24.348')] [2023-02-23 00:46:11,547][11216] Updated weights for policy 0, policy_version 800 (0.0023) [2023-02-23 00:46:15,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3289088. Throughput: 0: 997.4. Samples: 822778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:46:15,493][05422] Avg episode reward: [(0, '25.191')] [2023-02-23 00:46:20,488][05422] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3305472. Throughput: 0: 967.5. Samples: 825088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:46:20,496][05422] Avg episode reward: [(0, '24.570')] [2023-02-23 00:46:23,352][11216] Updated weights for policy 0, policy_version 810 (0.0023) [2023-02-23 00:46:25,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3325952. Throughput: 0: 990.2. Samples: 830576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:46:25,495][05422] Avg episode reward: [(0, '24.721')] [2023-02-23 00:46:30,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.9, 300 sec: 3971.0). Total num frames: 3350528. Throughput: 0: 1024.2. Samples: 837824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:46:30,490][05422] Avg episode reward: [(0, '25.579')] [2023-02-23 00:46:31,814][11216] Updated weights for policy 0, policy_version 820 (0.0015) [2023-02-23 00:46:35,490][05422] Fps is (10 sec: 4095.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 3366912. Throughput: 0: 1020.1. Samples: 841242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:46:35,492][05422] Avg episode reward: [(0, '22.954')] [2023-02-23 00:46:40,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 3383296. Throughput: 0: 968.2. Samples: 845724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:46:40,494][05422] Avg episode reward: [(0, '22.674')] [2023-02-23 00:46:43,747][11216] Updated weights for policy 0, policy_version 830 (0.0021) [2023-02-23 00:46:45,488][05422] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3407872. Throughput: 0: 1002.8. Samples: 851856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:46:45,490][05422] Avg episode reward: [(0, '22.881')] [2023-02-23 00:46:50,488][05422] Fps is (10 sec: 4915.3, 60 sec: 4027.7, 300 sec: 3985.0). Total num frames: 3432448. Throughput: 0: 1026.2. Samples: 855506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:46:50,495][05422] Avg episode reward: [(0, '22.059')] [2023-02-23 00:46:52,219][11216] Updated weights for policy 0, policy_version 840 (0.0016) [2023-02-23 00:46:55,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3448832. Throughput: 0: 1006.8. Samples: 861800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:46:55,491][05422] Avg episode reward: [(0, '21.616')] [2023-02-23 00:47:00,488][05422] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3465216. Throughput: 0: 970.4. Samples: 866448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:47:00,494][05422] Avg episode reward: [(0, '21.640')] [2023-02-23 00:47:04,158][11216] Updated weights for policy 0, policy_version 850 (0.0038) [2023-02-23 00:47:05,488][05422] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3485696. Throughput: 0: 986.5. Samples: 869480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:47:05,490][05422] Avg episode reward: [(0, '22.373')] [2023-02-23 00:47:10,488][05422] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3510272. Throughput: 0: 1027.3. Samples: 876804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:47:10,491][05422] Avg episode reward: [(0, '23.329')] [2023-02-23 00:47:13,071][11216] Updated weights for policy 0, policy_version 860 (0.0012) [2023-02-23 00:47:15,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3526656. Throughput: 0: 990.8. Samples: 882412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:47:15,497][05422] Avg episode reward: [(0, '24.037')] [2023-02-23 00:47:20,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3543040. Throughput: 0: 966.4. Samples: 884730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:47:20,495][05422] Avg episode reward: [(0, '25.005')] [2023-02-23 00:47:24,422][11216] Updated weights for policy 0, policy_version 870 (0.0014) [2023-02-23 00:47:25,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3567616. Throughput: 0: 999.3. Samples: 890692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:47:25,490][05422] Avg episode reward: [(0, '25.160')] [2023-02-23 00:47:30,488][05422] Fps is (10 sec: 4915.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3592192. Throughput: 0: 1026.9. Samples: 898066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:47:30,490][05422] Avg episode reward: [(0, '26.276')] [2023-02-23 00:47:33,976][11216] Updated weights for policy 0, policy_version 880 (0.0022) [2023-02-23 00:47:35,490][05422] Fps is (10 sec: 4095.2, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3608576. Throughput: 0: 1009.0. Samples: 900912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:47:35,492][05422] Avg episode reward: [(0, '25.671')] [2023-02-23 00:47:40,491][05422] Fps is (10 sec: 2866.1, 60 sec: 3959.2, 300 sec: 3971.0). Total num frames: 3620864. Throughput: 0: 973.1. Samples: 905594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:47:40,495][05422] Avg episode reward: [(0, '25.055')] [2023-02-23 00:47:44,734][11216] Updated weights for policy 0, policy_version 890 (0.0027) [2023-02-23 00:47:45,488][05422] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3645440. Throughput: 0: 1013.0. Samples: 912032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:47:45,494][05422] Avg episode reward: [(0, '23.571')] [2023-02-23 00:47:45,591][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000891_3649536.pth... [2023-02-23 00:47:45,742][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000656_2686976.pth [2023-02-23 00:47:50,488][05422] Fps is (10 sec: 4917.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3670016. Throughput: 0: 1025.8. Samples: 915642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:47:50,497][05422] Avg episode reward: [(0, '23.293')] [2023-02-23 00:47:54,420][11216] Updated weights for policy 0, policy_version 900 (0.0023) [2023-02-23 00:47:55,489][05422] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3686400. Throughput: 0: 994.9. Samples: 921576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:47:55,491][05422] Avg episode reward: [(0, '22.794')] [2023-02-23 00:48:00,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3702784. Throughput: 0: 973.7. Samples: 926228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:00,497][05422] Avg episode reward: [(0, '21.682')] [2023-02-23 00:48:05,113][11216] Updated weights for policy 0, policy_version 910 (0.0016) [2023-02-23 00:48:05,488][05422] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3727360. Throughput: 0: 995.2. Samples: 929512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:05,490][05422] Avg episode reward: [(0, '23.644')] [2023-02-23 00:48:10,488][05422] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3751936. Throughput: 0: 1024.5. Samples: 936796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:10,490][05422] Avg episode reward: [(0, '25.304')] [2023-02-23 00:48:15,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3764224. Throughput: 0: 976.0. Samples: 941986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:48:15,496][05422] Avg episode reward: [(0, '25.331')] [2023-02-23 00:48:15,734][11216] Updated weights for policy 0, policy_version 920 (0.0018) [2023-02-23 00:48:20,488][05422] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3780608. Throughput: 0: 963.7. Samples: 944276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:48:20,490][05422] Avg episode reward: [(0, '25.845')] [2023-02-23 00:48:25,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3805184. Throughput: 0: 999.9. Samples: 950584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:25,491][05422] Avg episode reward: [(0, '25.904')] [2023-02-23 00:48:25,688][11216] Updated weights for policy 0, policy_version 930 (0.0016) [2023-02-23 00:48:30,490][05422] Fps is (10 sec: 4914.2, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 3829760. Throughput: 0: 1018.9. Samples: 957884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:30,493][05422] Avg episode reward: [(0, '24.940')] [2023-02-23 00:48:35,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3985.0). Total num frames: 3846144. Throughput: 0: 994.8. Samples: 960408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:48:35,500][05422] Avg episode reward: [(0, '24.071')] [2023-02-23 00:48:36,229][11216] Updated weights for policy 0, policy_version 940 (0.0021) [2023-02-23 00:48:40,488][05422] Fps is (10 sec: 3277.5, 60 sec: 4028.0, 300 sec: 3984.9). Total num frames: 3862528. Throughput: 0: 965.2. Samples: 965010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:48:40,495][05422] Avg episode reward: [(0, '24.124')] [2023-02-23 00:48:45,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3887104. Throughput: 0: 1012.1. Samples: 971772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:48:45,491][05422] Avg episode reward: [(0, '22.902')] [2023-02-23 00:48:46,356][11216] Updated weights for policy 0, policy_version 950 (0.0016) [2023-02-23 00:48:50,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3907584. Throughput: 0: 1019.3. Samples: 975380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:48:50,493][05422] Avg episode reward: [(0, '23.587')] [2023-02-23 00:48:55,488][05422] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3923968. Throughput: 0: 980.4. Samples: 980916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:48:55,489][05422] Avg episode reward: [(0, '22.551')] [2023-02-23 00:48:57,271][11216] Updated weights for policy 0, policy_version 960 (0.0011) [2023-02-23 00:49:00,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3940352. Throughput: 0: 970.1. Samples: 985640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:00,496][05422] Avg episode reward: [(0, '24.621')] [2023-02-23 00:49:05,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3964928. Throughput: 0: 998.8. Samples: 989222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:05,495][05422] Avg episode reward: [(0, '23.496')] [2023-02-23 00:49:06,532][11216] Updated weights for policy 0, policy_version 970 (0.0026) [2023-02-23 00:49:10,488][05422] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3989504. Throughput: 0: 1019.2. Samples: 996446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:49:10,493][05422] Avg episode reward: [(0, '25.519')] [2023-02-23 00:49:15,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 4001792. Throughput: 0: 964.0. Samples: 1001264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:15,496][05422] Avg episode reward: [(0, '26.034')] [2023-02-23 00:49:18,357][11216] Updated weights for policy 0, policy_version 980 (0.0015) [2023-02-23 00:49:20,488][05422] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 4022272. Throughput: 0: 958.6. Samples: 1003546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:20,489][05422] Avg episode reward: [(0, '26.995')] [2023-02-23 00:49:25,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4042752. Throughput: 0: 1003.7. Samples: 1010176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:25,490][05422] Avg episode reward: [(0, '29.030')] [2023-02-23 00:49:25,526][11201] Saving new best policy, reward=29.030! [2023-02-23 00:49:27,281][11216] Updated weights for policy 0, policy_version 990 (0.0016) [2023-02-23 00:49:30,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 4067328. Throughput: 0: 1010.3. Samples: 1017234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:49:30,491][05422] Avg episode reward: [(0, '29.094')] [2023-02-23 00:49:30,494][11201] Saving new best policy, reward=29.094! [2023-02-23 00:49:35,491][05422] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3984.9). Total num frames: 4083712. Throughput: 0: 979.7. Samples: 1019470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:49:35,495][05422] Avg episode reward: [(0, '28.854')] [2023-02-23 00:49:39,179][11216] Updated weights for policy 0, policy_version 1000 (0.0013) [2023-02-23 00:49:40,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4100096. Throughput: 0: 962.3. Samples: 1024218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:40,495][05422] Avg episode reward: [(0, '27.556')] [2023-02-23 00:49:45,488][05422] Fps is (10 sec: 4097.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4124672. Throughput: 0: 1012.3. Samples: 1031192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:49:45,490][05422] Avg episode reward: [(0, '27.241')] [2023-02-23 00:49:45,500][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001007_4124672.pth... [2023-02-23 00:49:45,613][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000773_3166208.pth [2023-02-23 00:49:47,918][11216] Updated weights for policy 0, policy_version 1010 (0.0025) [2023-02-23 00:49:50,490][05422] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 4145152. Throughput: 0: 1012.3. Samples: 1034780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:50,510][05422] Avg episode reward: [(0, '26.702')] [2023-02-23 00:49:55,491][05422] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 3984.9). Total num frames: 4161536. Throughput: 0: 966.7. Samples: 1039952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:49:55,497][05422] Avg episode reward: [(0, '27.615')] [2023-02-23 00:49:59,985][11216] Updated weights for policy 0, policy_version 1020 (0.0011) [2023-02-23 00:50:00,488][05422] Fps is (10 sec: 3277.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4177920. Throughput: 0: 971.2. Samples: 1044968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:50:00,491][05422] Avg episode reward: [(0, '27.115')] [2023-02-23 00:50:05,488][05422] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4202496. Throughput: 0: 1000.0. Samples: 1048544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:50:05,494][05422] Avg episode reward: [(0, '26.556')] [2023-02-23 00:50:08,347][11216] Updated weights for policy 0, policy_version 1030 (0.0013) [2023-02-23 00:50:10,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 4222976. Throughput: 0: 1014.4. Samples: 1055822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:50:10,495][05422] Avg episode reward: [(0, '27.447')] [2023-02-23 00:50:15,492][05422] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 3984.9). Total num frames: 4239360. Throughput: 0: 957.4. Samples: 1060320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:50:15,499][05422] Avg episode reward: [(0, '28.444')] [2023-02-23 00:50:20,404][11216] Updated weights for policy 0, policy_version 1040 (0.0018) [2023-02-23 00:50:20,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4259840. Throughput: 0: 959.3. Samples: 1062634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:50:20,490][05422] Avg episode reward: [(0, '28.743')] [2023-02-23 00:50:25,488][05422] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 4280320. Throughput: 0: 1010.2. Samples: 1069676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:50:25,491][05422] Avg episode reward: [(0, '27.220')] [2023-02-23 00:50:29,145][11216] Updated weights for policy 0, policy_version 1050 (0.0013) [2023-02-23 00:50:30,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 4304896. Throughput: 0: 1001.7. Samples: 1076270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:50:30,493][05422] Avg episode reward: [(0, '27.395')] [2023-02-23 00:50:35,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3971.1). Total num frames: 4317184. Throughput: 0: 974.8. Samples: 1078644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:50:35,495][05422] Avg episode reward: [(0, '27.487')] [2023-02-23 00:50:40,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4337664. Throughput: 0: 970.8. Samples: 1083634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:50:40,490][05422] Avg episode reward: [(0, '28.467')] [2023-02-23 00:50:40,891][11216] Updated weights for policy 0, policy_version 1060 (0.0013) [2023-02-23 00:50:45,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4362240. Throughput: 0: 1018.0. Samples: 1090778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:50:45,490][05422] Avg episode reward: [(0, '26.105')] [2023-02-23 00:50:50,066][11216] Updated weights for policy 0, policy_version 1070 (0.0011) [2023-02-23 00:50:50,488][05422] Fps is (10 sec: 4505.5, 60 sec: 3959.6, 300 sec: 3971.0). Total num frames: 4382720. Throughput: 0: 1018.8. Samples: 1094390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:50:50,498][05422] Avg episode reward: [(0, '26.765')] [2023-02-23 00:50:55,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3984.9). Total num frames: 4399104. Throughput: 0: 959.9. Samples: 1099016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:50:55,495][05422] Avg episode reward: [(0, '26.238')] [2023-02-23 00:51:00,488][05422] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 4419584. Throughput: 0: 986.9. Samples: 1104726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:51:00,491][05422] Avg episode reward: [(0, '27.297')] [2023-02-23 00:51:01,413][11216] Updated weights for policy 0, policy_version 1080 (0.0022) [2023-02-23 00:51:05,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 4444160. Throughput: 0: 1018.0. Samples: 1108446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:51:05,497][05422] Avg episode reward: [(0, '25.537')] [2023-02-23 00:51:10,488][05422] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4460544. Throughput: 0: 1007.6. Samples: 1115018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:51:10,494][05422] Avg episode reward: [(0, '25.350')] [2023-02-23 00:51:11,244][11216] Updated weights for policy 0, policy_version 1090 (0.0022) [2023-02-23 00:51:15,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.7, 300 sec: 3971.0). Total num frames: 4476928. Throughput: 0: 964.6. Samples: 1119676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:51:15,493][05422] Avg episode reward: [(0, '24.688')] [2023-02-23 00:51:20,488][05422] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4497408. Throughput: 0: 969.1. Samples: 1122252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:51:20,490][05422] Avg episode reward: [(0, '24.854')] [2023-02-23 00:51:21,649][11216] Updated weights for policy 0, policy_version 1100 (0.0034) [2023-02-23 00:51:25,488][05422] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4521984. Throughput: 0: 1020.4. Samples: 1129552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:51:25,495][05422] Avg episode reward: [(0, '22.971')] [2023-02-23 00:51:30,493][05422] Fps is (10 sec: 4503.4, 60 sec: 3959.1, 300 sec: 3984.9). Total num frames: 4542464. Throughput: 0: 995.6. Samples: 1135584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:51:30,495][05422] Avg episode reward: [(0, '23.899')] [2023-02-23 00:51:31,709][11216] Updated weights for policy 0, policy_version 1110 (0.0014) [2023-02-23 00:51:35,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4554752. Throughput: 0: 968.0. Samples: 1137950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:51:35,491][05422] Avg episode reward: [(0, '25.080')] [2023-02-23 00:51:40,488][05422] Fps is (10 sec: 3688.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4579328. Throughput: 0: 989.1. Samples: 1143524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:51:40,490][05422] Avg episode reward: [(0, '26.435')] [2023-02-23 00:51:42,102][11216] Updated weights for policy 0, policy_version 1120 (0.0030) [2023-02-23 00:51:45,488][05422] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4603904. Throughput: 0: 1027.7. Samples: 1150972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:51:45,491][05422] Avg episode reward: [(0, '25.343')] [2023-02-23 00:51:45,502][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001124_4603904.pth... [2023-02-23 00:51:45,627][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000891_3649536.pth [2023-02-23 00:51:50,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4620288. Throughput: 0: 1011.5. Samples: 1153964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:51:50,495][05422] Avg episode reward: [(0, '24.393')] [2023-02-23 00:51:52,842][11216] Updated weights for policy 0, policy_version 1130 (0.0012) [2023-02-23 00:51:55,488][05422] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 4632576. Throughput: 0: 966.8. Samples: 1158522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:51:55,496][05422] Avg episode reward: [(0, '24.711')] [2023-02-23 00:52:00,488][05422] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4657152. Throughput: 0: 1002.0. Samples: 1164764. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:52:00,495][05422] Avg episode reward: [(0, '23.945')] [2023-02-23 00:52:02,373][11216] Updated weights for policy 0, policy_version 1140 (0.0026) [2023-02-23 00:52:05,488][05422] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4681728. Throughput: 0: 1026.0. Samples: 1168424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:52:05,494][05422] Avg episode reward: [(0, '25.402')] [2023-02-23 00:52:10,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4698112. Throughput: 0: 997.4. Samples: 1174436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:52:10,494][05422] Avg episode reward: [(0, '24.969')] [2023-02-23 00:52:13,560][11216] Updated weights for policy 0, policy_version 1150 (0.0016) [2023-02-23 00:52:15,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4714496. Throughput: 0: 967.0. Samples: 1179092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:52:15,495][05422] Avg episode reward: [(0, '26.152')] [2023-02-23 00:52:20,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4739072. Throughput: 0: 983.6. Samples: 1182214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:52:20,497][05422] Avg episode reward: [(0, '26.900')] [2023-02-23 00:52:22,933][11216] Updated weights for policy 0, policy_version 1160 (0.0023) [2023-02-23 00:52:25,488][05422] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 4759552. Throughput: 0: 1021.6. Samples: 1189498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:52:25,494][05422] Avg episode reward: [(0, '29.178')] [2023-02-23 00:52:25,509][11201] Saving new best policy, reward=29.178! [2023-02-23 00:52:30,488][05422] Fps is (10 sec: 3686.3, 60 sec: 3891.5, 300 sec: 3957.2). Total num frames: 4775936. Throughput: 0: 979.1. Samples: 1195032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:52:30,493][05422] Avg episode reward: [(0, '29.569')] [2023-02-23 00:52:30,500][11201] Saving new best policy, reward=29.569! [2023-02-23 00:52:34,859][11216] Updated weights for policy 0, policy_version 1170 (0.0022) [2023-02-23 00:52:35,488][05422] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 4792320. Throughput: 0: 961.5. Samples: 1197230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:52:35,493][05422] Avg episode reward: [(0, '29.787')] [2023-02-23 00:52:35,507][11201] Saving new best policy, reward=29.787! [2023-02-23 00:52:40,488][05422] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4816896. Throughput: 0: 995.2. Samples: 1203308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:52:40,495][05422] Avg episode reward: [(0, '29.138')] [2023-02-23 00:52:43,437][11216] Updated weights for policy 0, policy_version 1180 (0.0014) [2023-02-23 00:52:45,488][05422] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4841472. Throughput: 0: 1021.6. Samples: 1210736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:52:45,494][05422] Avg episode reward: [(0, '29.574')] [2023-02-23 00:52:50,488][05422] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4857856. Throughput: 0: 994.9. Samples: 1213196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:52:50,491][05422] Avg episode reward: [(0, '30.228')] [2023-02-23 00:52:50,493][11201] Saving new best policy, reward=30.228! [2023-02-23 00:52:55,416][11216] Updated weights for policy 0, policy_version 1190 (0.0015) [2023-02-23 00:52:55,488][05422] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4874240. Throughput: 0: 963.3. Samples: 1217784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:52:55,496][05422] Avg episode reward: [(0, '30.731')] [2023-02-23 00:52:55,507][11201] Saving new best policy, reward=30.731! [2023-02-23 00:53:00,488][05422] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 4898816. Throughput: 0: 1009.6. Samples: 1224526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:53:00,490][05422] Avg episode reward: [(0, '28.819')] [2023-02-23 00:53:03,842][11216] Updated weights for policy 0, policy_version 1200 (0.0016) [2023-02-23 00:53:05,494][05422] Fps is (10 sec: 4502.8, 60 sec: 3959.1, 300 sec: 3957.1). Total num frames: 4919296. Throughput: 0: 1021.1. Samples: 1228172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:53:05,497][05422] Avg episode reward: [(0, '28.916')] [2023-02-23 00:53:10,488][05422] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 4935680. Throughput: 0: 983.2. Samples: 1233744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:53:10,493][05422] Avg episode reward: [(0, '29.610')] [2023-02-23 00:53:15,493][05422] Fps is (10 sec: 3277.2, 60 sec: 3959.1, 300 sec: 3971.0). Total num frames: 4952064. Throughput: 0: 965.7. Samples: 1238492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:53:15,495][05422] Avg episode reward: [(0, '28.477')] [2023-02-23 00:53:15,604][11216] Updated weights for policy 0, policy_version 1210 (0.0015) [2023-02-23 00:53:20,488][05422] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4976640. Throughput: 0: 996.0. Samples: 1242050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:53:20,496][05422] Avg episode reward: [(0, '26.695')] [2023-02-23 00:53:24,181][11216] Updated weights for policy 0, policy_version 1220 (0.0011) [2023-02-23 00:53:25,491][05422] Fps is (10 sec: 4916.2, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 5001216. Throughput: 0: 1024.4. Samples: 1249408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:53:25,497][05422] Avg episode reward: [(0, '25.489')] [2023-02-23 00:53:26,659][11201] Stopping Batcher_0... [2023-02-23 00:53:26,659][11201] Loop batcher_evt_loop terminating... [2023-02-23 00:53:26,661][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2023-02-23 00:53:26,660][05422] Component Batcher_0 stopped! [2023-02-23 00:53:26,719][11216] Weights refcount: 2 0 [2023-02-23 00:53:26,736][11216] Stopping InferenceWorker_p0-w0... [2023-02-23 00:53:26,736][05422] Component InferenceWorker_p0-w0 stopped! [2023-02-23 00:53:26,742][11216] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 00:53:26,773][11223] Stopping RolloutWorker_w7... [2023-02-23 00:53:26,774][05422] Component RolloutWorker_w7 stopped! [2023-02-23 00:53:26,779][11221] Stopping RolloutWorker_w5... [2023-02-23 00:53:26,782][11217] Stopping RolloutWorker_w1... [2023-02-23 00:53:26,788][11219] Stopping RolloutWorker_w3... [2023-02-23 00:53:26,780][05422] Component RolloutWorker_w5 stopped! [2023-02-23 00:53:26,790][05422] Component RolloutWorker_w1 stopped! [2023-02-23 00:53:26,791][05422] Component RolloutWorker_w3 stopped! [2023-02-23 00:53:26,780][11221] Loop rollout_proc5_evt_loop terminating... [2023-02-23 00:53:26,798][11219] Loop rollout_proc3_evt_loop terminating... [2023-02-23 00:53:26,778][11223] Loop rollout_proc7_evt_loop terminating... [2023-02-23 00:53:26,783][11217] Loop rollout_proc1_evt_loop terminating... [2023-02-23 00:53:26,825][11218] Stopping RolloutWorker_w2... [2023-02-23 00:53:26,825][11218] Loop rollout_proc2_evt_loop terminating... [2023-02-23 00:53:26,825][05422] Component RolloutWorker_w2 stopped! [2023-02-23 00:53:26,852][11201] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001007_4124672.pth [2023-02-23 00:53:26,856][05422] Component RolloutWorker_w6 stopped! [2023-02-23 00:53:26,856][11222] Stopping RolloutWorker_w6... [2023-02-23 00:53:26,862][11222] Loop rollout_proc6_evt_loop terminating... [2023-02-23 00:53:26,869][11220] Stopping RolloutWorker_w4... [2023-02-23 00:53:26,869][11220] Loop rollout_proc4_evt_loop terminating... [2023-02-23 00:53:26,869][05422] Component RolloutWorker_w4 stopped! [2023-02-23 00:53:26,877][11201] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2023-02-23 00:53:26,918][11215] Stopping RolloutWorker_w0... [2023-02-23 00:53:26,919][11215] Loop rollout_proc0_evt_loop terminating... [2023-02-23 00:53:26,918][05422] Component RolloutWorker_w0 stopped! [2023-02-23 00:53:27,207][11201] Stopping LearnerWorker_p0... [2023-02-23 00:53:27,208][11201] Loop learner_proc0_evt_loop terminating... [2023-02-23 00:53:27,208][05422] Component LearnerWorker_p0 stopped! [2023-02-23 00:53:27,210][05422] Waiting for process learner_proc0 to stop... [2023-02-23 00:53:29,603][05422] Waiting for process inference_proc0-0 to join... [2023-02-23 00:53:30,165][05422] Waiting for process rollout_proc0 to join... [2023-02-23 00:53:30,715][05422] Waiting for process rollout_proc1 to join... [2023-02-23 00:53:30,718][05422] Waiting for process rollout_proc2 to join... [2023-02-23 00:53:30,720][05422] Waiting for process rollout_proc3 to join... [2023-02-23 00:53:30,724][05422] Waiting for process rollout_proc4 to join... [2023-02-23 00:53:30,726][05422] Waiting for process rollout_proc5 to join... [2023-02-23 00:53:30,728][05422] Waiting for process rollout_proc6 to join... [2023-02-23 00:53:30,733][05422] Waiting for process rollout_proc7 to join... [2023-02-23 00:53:30,734][05422] Batcher 0 profile tree view: batching: 31.8950, releasing_batches: 0.0278 [2023-02-23 00:53:30,736][05422] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 629.9773 update_model: 8.8445 weight_update: 0.0011 one_step: 0.0119 handle_policy_step: 588.5976 deserialize: 17.1312, stack: 3.4701, obs_to_device_normalize: 132.6656, forward: 279.8045, send_messages: 30.6976 prepare_outputs: 95.2883 to_cpu: 60.5109 [2023-02-23 00:53:30,737][05422] Learner 0 profile tree view: misc: 0.0076, prepare_batch: 18.7716 train: 92.7187 epoch_init: 0.0230, minibatch_init: 0.0122, losses_postprocess: 0.7026, kl_divergence: 0.6492, after_optimizer: 41.1845 calculate_losses: 32.8179 losses_init: 0.0042, forward_head: 1.9767, bptt_initial: 21.9087, tail: 1.2376, advantages_returns: 0.3497, losses: 4.2405 bptt: 2.6913 bptt_forward_core: 2.6027 update: 16.6400 clip: 1.6279 [2023-02-23 00:53:30,738][05422] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4551, enqueue_policy_requests: 169.2354, env_step: 962.0005, overhead: 22.5413, complete_rollouts: 8.2889 save_policy_outputs: 22.7289 split_output_tensors: 10.9194 [2023-02-23 00:53:30,739][05422] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4209, enqueue_policy_requests: 162.3719, env_step: 970.4761, overhead: 22.6631, complete_rollouts: 8.4823 save_policy_outputs: 23.0393 split_output_tensors: 11.4559 [2023-02-23 00:53:30,741][05422] Loop Runner_EvtLoop terminating... [2023-02-23 00:53:30,744][05422] Runner profile tree view: main_loop: 1301.1577 [2023-02-23 00:53:30,745][05422] Collected {0: 5005312}, FPS: 3846.8 [2023-02-23 00:53:30,809][05422] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:53:30,810][05422] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:53:30,812][05422] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:53:30,814][05422] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:53:30,816][05422] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:53:30,819][05422] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:53:30,821][05422] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:53:30,822][05422] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:53:30,823][05422] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 00:53:30,824][05422] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 00:53:30,826][05422] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:53:30,827][05422] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:53:30,828][05422] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:53:30,829][05422] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:53:30,830][05422] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:53:30,854][05422] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:53:30,859][05422] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:53:30,862][05422] RunningMeanStd input shape: (1,) [2023-02-23 00:53:30,878][05422] ConvEncoder: input_channels=3 [2023-02-23 00:53:31,562][05422] Conv encoder output size: 512 [2023-02-23 00:53:31,567][05422] Policy head output size: 512 [2023-02-23 00:53:33,921][05422] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2023-02-23 00:53:35,134][05422] Num frames 100... [2023-02-23 00:53:35,245][05422] Num frames 200... [2023-02-23 00:53:35,361][05422] Num frames 300... [2023-02-23 00:53:35,470][05422] Num frames 400... [2023-02-23 00:53:35,613][05422] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 [2023-02-23 00:53:35,615][05422] Avg episode reward: 6.800, avg true_objective: 4.800 [2023-02-23 00:53:35,641][05422] Num frames 500... [2023-02-23 00:53:35,754][05422] Num frames 600... [2023-02-23 00:53:35,861][05422] Num frames 700... [2023-02-23 00:53:35,970][05422] Num frames 800... [2023-02-23 00:53:36,081][05422] Num frames 900... [2023-02-23 00:53:36,190][05422] Num frames 1000... [2023-02-23 00:53:36,305][05422] Num frames 1100... [2023-02-23 00:53:36,413][05422] Num frames 1200... [2023-02-23 00:53:36,524][05422] Num frames 1300... [2023-02-23 00:53:36,644][05422] Num frames 1400... [2023-02-23 00:53:36,750][05422] Avg episode rewards: #0: 14.220, true rewards: #0: 7.220 [2023-02-23 00:53:36,751][05422] Avg episode reward: 14.220, avg true_objective: 7.220 [2023-02-23 00:53:36,814][05422] Num frames 1500... [2023-02-23 00:53:36,930][05422] Num frames 1600... [2023-02-23 00:53:37,041][05422] Num frames 1700... [2023-02-23 00:53:37,152][05422] Num frames 1800... [2023-02-23 00:53:37,263][05422] Num frames 1900... [2023-02-23 00:53:37,377][05422] Num frames 2000... [2023-02-23 00:53:37,496][05422] Num frames 2100... [2023-02-23 00:53:37,611][05422] Num frames 2200... [2023-02-23 00:53:37,713][05422] Avg episode rewards: #0: 15.143, true rewards: #0: 7.477 [2023-02-23 00:53:37,716][05422] Avg episode reward: 15.143, avg true_objective: 7.477 [2023-02-23 00:53:37,780][05422] Num frames 2300... [2023-02-23 00:53:37,889][05422] Num frames 2400... [2023-02-23 00:53:37,998][05422] Num frames 2500... [2023-02-23 00:53:38,109][05422] Num frames 2600... [2023-02-23 00:53:38,218][05422] Num frames 2700... [2023-02-23 00:53:38,333][05422] Num frames 2800... [2023-02-23 00:53:38,441][05422] Num frames 2900... [2023-02-23 00:53:38,548][05422] Num frames 3000... [2023-02-23 00:53:38,654][05422] Num frames 3100... [2023-02-23 00:53:38,770][05422] Num frames 3200... [2023-02-23 00:53:38,880][05422] Num frames 3300... [2023-02-23 00:53:39,038][05422] Avg episode rewards: #0: 17.488, true rewards: #0: 8.487 [2023-02-23 00:53:39,040][05422] Avg episode reward: 17.488, avg true_objective: 8.487 [2023-02-23 00:53:39,049][05422] Num frames 3400... [2023-02-23 00:53:39,160][05422] Num frames 3500... [2023-02-23 00:53:39,271][05422] Num frames 3600... [2023-02-23 00:53:39,386][05422] Num frames 3700... [2023-02-23 00:53:39,494][05422] Num frames 3800... [2023-02-23 00:53:39,563][05422] Avg episode rewards: #0: 15.622, true rewards: #0: 7.622 [2023-02-23 00:53:39,564][05422] Avg episode reward: 15.622, avg true_objective: 7.622 [2023-02-23 00:53:39,663][05422] Num frames 3900... [2023-02-23 00:53:39,770][05422] Num frames 4000... [2023-02-23 00:53:39,879][05422] Num frames 4100... [2023-02-23 00:53:39,986][05422] Num frames 4200... [2023-02-23 00:53:40,138][05422] Avg episode rewards: #0: 14.152, true rewards: #0: 7.152 [2023-02-23 00:53:40,141][05422] Avg episode reward: 14.152, avg true_objective: 7.152 [2023-02-23 00:53:40,154][05422] Num frames 4300... [2023-02-23 00:53:40,263][05422] Num frames 4400... [2023-02-23 00:53:40,375][05422] Num frames 4500... [2023-02-23 00:53:40,482][05422] Num frames 4600... [2023-02-23 00:53:40,589][05422] Num frames 4700... [2023-02-23 00:53:40,700][05422] Num frames 4800... [2023-02-23 00:53:40,863][05422] Num frames 4900... [2023-02-23 00:53:41,016][05422] Avg episode rewards: #0: 14.090, true rewards: #0: 7.090 [2023-02-23 00:53:41,018][05422] Avg episode reward: 14.090, avg true_objective: 7.090 [2023-02-23 00:53:41,076][05422] Num frames 5000... [2023-02-23 00:53:41,227][05422] Num frames 5100... [2023-02-23 00:53:41,383][05422] Num frames 5200... [2023-02-23 00:53:41,532][05422] Num frames 5300... [2023-02-23 00:53:41,683][05422] Num frames 5400... [2023-02-23 00:53:41,833][05422] Num frames 5500... [2023-02-23 00:53:41,901][05422] Avg episode rewards: #0: 13.509, true rewards: #0: 6.884 [2023-02-23 00:53:41,903][05422] Avg episode reward: 13.509, avg true_objective: 6.884 [2023-02-23 00:53:42,043][05422] Num frames 5600... [2023-02-23 00:53:42,196][05422] Num frames 5700... [2023-02-23 00:53:42,351][05422] Num frames 5800... [2023-02-23 00:53:42,553][05422] Avg episode rewards: #0: 12.768, true rewards: #0: 6.546 [2023-02-23 00:53:42,555][05422] Avg episode reward: 12.768, avg true_objective: 6.546 [2023-02-23 00:53:42,572][05422] Num frames 5900... [2023-02-23 00:53:42,725][05422] Num frames 6000... [2023-02-23 00:53:42,881][05422] Num frames 6100... [2023-02-23 00:53:43,040][05422] Num frames 6200... [2023-02-23 00:53:43,199][05422] Num frames 6300... [2023-02-23 00:53:43,356][05422] Num frames 6400... [2023-02-23 00:53:43,515][05422] Num frames 6500... [2023-02-23 00:53:43,669][05422] Num frames 6600... [2023-02-23 00:53:43,830][05422] Avg episode rewards: #0: 13.266, true rewards: #0: 6.666 [2023-02-23 00:53:43,833][05422] Avg episode reward: 13.266, avg true_objective: 6.666 [2023-02-23 00:54:21,822][05422] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:54:22,095][05422] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:54:22,097][05422] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:54:22,099][05422] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:54:22,101][05422] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:54:22,103][05422] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:54:22,104][05422] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:54:22,105][05422] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 00:54:22,107][05422] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:54:22,108][05422] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 00:54:22,109][05422] Adding new argument 'hf_repository'='saikiranp/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 00:54:22,110][05422] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:54:22,111][05422] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:54:22,112][05422] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:54:22,113][05422] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:54:22,114][05422] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:54:22,135][05422] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:54:22,138][05422] RunningMeanStd input shape: (1,) [2023-02-23 00:54:22,156][05422] ConvEncoder: input_channels=3 [2023-02-23 00:54:22,214][05422] Conv encoder output size: 512 [2023-02-23 00:54:22,216][05422] Policy head output size: 512 [2023-02-23 00:54:22,243][05422] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2023-02-23 00:54:23,012][05422] Num frames 100... [2023-02-23 00:54:23,159][05422] Num frames 200... [2023-02-23 00:54:23,303][05422] Num frames 300... [2023-02-23 00:54:23,511][05422] Num frames 400... [2023-02-23 00:54:23,724][05422] Avg episode rewards: #0: 7.800, true rewards: #0: 4.800 [2023-02-23 00:54:23,727][05422] Avg episode reward: 7.800, avg true_objective: 4.800 [2023-02-23 00:54:23,771][05422] Num frames 500... [2023-02-23 00:54:23,929][05422] Num frames 600... [2023-02-23 00:54:24,119][05422] Num frames 700... [2023-02-23 00:54:24,300][05422] Num frames 800... [2023-02-23 00:54:24,472][05422] Num frames 900... [2023-02-23 00:54:24,652][05422] Num frames 1000... [2023-02-23 00:54:24,818][05422] Num frames 1100... [2023-02-23 00:54:24,980][05422] Num frames 1200... [2023-02-23 00:54:25,149][05422] Num frames 1300... [2023-02-23 00:54:25,315][05422] Num frames 1400... [2023-02-23 00:54:25,461][05422] Num frames 1500... [2023-02-23 00:54:25,647][05422] Num frames 1600... [2023-02-23 00:54:25,702][05422] Avg episode rewards: #0: 17.000, true rewards: #0: 8.000 [2023-02-23 00:54:25,705][05422] Avg episode reward: 17.000, avg true_objective: 8.000 [2023-02-23 00:54:25,892][05422] Num frames 1700... [2023-02-23 00:54:26,113][05422] Num frames 1800... [2023-02-23 00:54:26,309][05422] Num frames 1900... [2023-02-23 00:54:26,501][05422] Num frames 2000... [2023-02-23 00:54:26,690][05422] Num frames 2100... [2023-02-23 00:54:26,874][05422] Num frames 2200... [2023-02-23 00:54:27,065][05422] Num frames 2300... [2023-02-23 00:54:27,267][05422] Num frames 2400... [2023-02-23 00:54:27,448][05422] Num frames 2500... [2023-02-23 00:54:27,634][05422] Num frames 2600... [2023-02-23 00:54:27,819][05422] Num frames 2700... [2023-02-23 00:54:27,985][05422] Num frames 2800... [2023-02-23 00:54:28,139][05422] Num frames 2900... [2023-02-23 00:54:28,303][05422] Avg episode rewards: #0: 20.897, true rewards: #0: 9.897 [2023-02-23 00:54:28,305][05422] Avg episode reward: 20.897, avg true_objective: 9.897 [2023-02-23 00:54:28,341][05422] Num frames 3000... [2023-02-23 00:54:28,449][05422] Num frames 3100... [2023-02-23 00:54:28,557][05422] Num frames 3200... [2023-02-23 00:54:28,668][05422] Num frames 3300... [2023-02-23 00:54:28,785][05422] Num frames 3400... [2023-02-23 00:54:28,896][05422] Num frames 3500... [2023-02-23 00:54:29,007][05422] Num frames 3600... [2023-02-23 00:54:29,116][05422] Num frames 3700... [2023-02-23 00:54:29,225][05422] Num frames 3800... [2023-02-23 00:54:29,336][05422] Num frames 3900... [2023-02-23 00:54:29,444][05422] Num frames 4000... [2023-02-23 00:54:29,552][05422] Num frames 4100... [2023-02-23 00:54:29,661][05422] Num frames 4200... [2023-02-23 00:54:29,778][05422] Num frames 4300... [2023-02-23 00:54:29,889][05422] Num frames 4400... [2023-02-23 00:54:30,048][05422] Avg episode rewards: #0: 24.990, true rewards: #0: 11.240 [2023-02-23 00:54:30,049][05422] Avg episode reward: 24.990, avg true_objective: 11.240 [2023-02-23 00:54:30,059][05422] Num frames 4500... [2023-02-23 00:54:30,167][05422] Num frames 4600... [2023-02-23 00:54:30,275][05422] Num frames 4700... [2023-02-23 00:54:30,385][05422] Num frames 4800... [2023-02-23 00:54:30,494][05422] Num frames 4900... [2023-02-23 00:54:30,608][05422] Num frames 5000... [2023-02-23 00:54:30,724][05422] Num frames 5100... [2023-02-23 00:54:30,785][05422] Avg episode rewards: #0: 22.608, true rewards: #0: 10.208 [2023-02-23 00:54:30,788][05422] Avg episode reward: 22.608, avg true_objective: 10.208 [2023-02-23 00:54:30,894][05422] Num frames 5200... [2023-02-23 00:54:31,000][05422] Num frames 5300... [2023-02-23 00:54:31,108][05422] Num frames 5400... [2023-02-23 00:54:31,216][05422] Num frames 5500... [2023-02-23 00:54:31,326][05422] Num frames 5600... [2023-02-23 00:54:31,442][05422] Num frames 5700... [2023-02-23 00:54:31,566][05422] Num frames 5800... [2023-02-23 00:54:31,688][05422] Num frames 5900... [2023-02-23 00:54:31,805][05422] Num frames 6000... [2023-02-23 00:54:31,922][05422] Num frames 6100... [2023-02-23 00:54:32,053][05422] Num frames 6200... [2023-02-23 00:54:32,169][05422] Num frames 6300... [2023-02-23 00:54:32,284][05422] Num frames 6400... [2023-02-23 00:54:32,393][05422] Num frames 6500... [2023-02-23 00:54:32,502][05422] Num frames 6600... [2023-02-23 00:54:32,613][05422] Num frames 6700... [2023-02-23 00:54:32,724][05422] Num frames 6800... [2023-02-23 00:54:32,839][05422] Num frames 6900... [2023-02-23 00:54:32,957][05422] Num frames 7000... [2023-02-23 00:54:33,068][05422] Num frames 7100... [2023-02-23 00:54:33,177][05422] Num frames 7200... [2023-02-23 00:54:33,238][05422] Avg episode rewards: #0: 29.173, true rewards: #0: 12.007 [2023-02-23 00:54:33,240][05422] Avg episode reward: 29.173, avg true_objective: 12.007 [2023-02-23 00:54:33,345][05422] Num frames 7300... [2023-02-23 00:54:33,453][05422] Num frames 7400... [2023-02-23 00:54:33,561][05422] Num frames 7500... [2023-02-23 00:54:33,669][05422] Num frames 7600... [2023-02-23 00:54:33,782][05422] Num frames 7700... [2023-02-23 00:54:33,892][05422] Num frames 7800... [2023-02-23 00:54:33,999][05422] Num frames 7900... [2023-02-23 00:54:34,106][05422] Num frames 8000... [2023-02-23 00:54:34,214][05422] Num frames 8100... [2023-02-23 00:54:34,304][05422] Avg episode rewards: #0: 27.760, true rewards: #0: 11.617 [2023-02-23 00:54:34,306][05422] Avg episode reward: 27.760, avg true_objective: 11.617 [2023-02-23 00:54:34,383][05422] Num frames 8200... [2023-02-23 00:54:34,492][05422] Num frames 8300... [2023-02-23 00:54:34,601][05422] Num frames 8400... [2023-02-23 00:54:34,709][05422] Num frames 8500... [2023-02-23 00:54:34,822][05422] Num frames 8600... [2023-02-23 00:54:34,931][05422] Num frames 8700... [2023-02-23 00:54:35,042][05422] Num frames 8800... [2023-02-23 00:54:35,157][05422] Num frames 8900... [2023-02-23 00:54:35,248][05422] Avg episode rewards: #0: 26.165, true rewards: #0: 11.165 [2023-02-23 00:54:35,249][05422] Avg episode reward: 26.165, avg true_objective: 11.165 [2023-02-23 00:54:35,327][05422] Num frames 9000... [2023-02-23 00:54:35,436][05422] Num frames 9100... [2023-02-23 00:54:35,545][05422] Num frames 9200... [2023-02-23 00:54:35,655][05422] Num frames 9300... [2023-02-23 00:54:35,764][05422] Num frames 9400... [2023-02-23 00:54:35,876][05422] Num frames 9500... [2023-02-23 00:54:35,984][05422] Num frames 9600... [2023-02-23 00:54:36,097][05422] Num frames 9700... [2023-02-23 00:54:36,207][05422] Num frames 9800... [2023-02-23 00:54:36,316][05422] Num frames 9900... [2023-02-23 00:54:36,425][05422] Num frames 10000... [2023-02-23 00:54:36,538][05422] Num frames 10100... [2023-02-23 00:54:36,656][05422] Num frames 10200... [2023-02-23 00:54:36,769][05422] Num frames 10300... [2023-02-23 00:54:36,887][05422] Num frames 10400... [2023-02-23 00:54:36,996][05422] Num frames 10500... [2023-02-23 00:54:37,146][05422] Avg episode rewards: #0: 27.874, true rewards: #0: 11.763 [2023-02-23 00:54:37,148][05422] Avg episode reward: 27.874, avg true_objective: 11.763 [2023-02-23 00:54:37,165][05422] Num frames 10600... [2023-02-23 00:54:37,272][05422] Num frames 10700... [2023-02-23 00:54:37,380][05422] Num frames 10800... [2023-02-23 00:54:37,486][05422] Num frames 10900... [2023-02-23 00:54:37,595][05422] Num frames 11000... [2023-02-23 00:54:37,704][05422] Num frames 11100... [2023-02-23 00:54:37,864][05422] Num frames 11200... [2023-02-23 00:54:38,019][05422] Num frames 11300... [2023-02-23 00:54:38,175][05422] Num frames 11400... [2023-02-23 00:54:38,366][05422] Avg episode rewards: #0: 27.083, true rewards: #0: 11.483 [2023-02-23 00:54:38,371][05422] Avg episode reward: 27.083, avg true_objective: 11.483 [2023-02-23 00:55:44,973][05422] Replay video saved to /content/train_dir/default_experiment/replay.mp4!