[2023-02-22 23:31:48,812][05631] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 23:31:48,815][05631] Rollout worker 0 uses device cpu [2023-02-22 23:31:48,817][05631] Rollout worker 1 uses device cpu [2023-02-22 23:31:48,819][05631] Rollout worker 2 uses device cpu [2023-02-22 23:31:48,820][05631] Rollout worker 3 uses device cpu [2023-02-22 23:31:48,822][05631] Rollout worker 4 uses device cpu [2023-02-22 23:31:48,823][05631] Rollout worker 5 uses device cpu [2023-02-22 23:31:48,825][05631] Rollout worker 6 uses device cpu [2023-02-22 23:31:48,827][05631] Rollout worker 7 uses device cpu [2023-02-22 23:31:49,015][05631] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:31:49,017][05631] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 23:31:49,058][05631] Starting all processes... [2023-02-22 23:31:49,061][05631] Starting process learner_proc0 [2023-02-22 23:31:49,117][05631] Starting all processes... [2023-02-22 23:31:49,125][05631] Starting process inference_proc0-0 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc0 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc1 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc2 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc3 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc4 [2023-02-22 23:31:49,126][05631] Starting process rollout_proc5 [2023-02-22 23:31:49,127][05631] Starting process rollout_proc6 [2023-02-22 23:31:49,127][05631] Starting process rollout_proc7 [2023-02-22 23:32:00,965][11388] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:32:00,965][11388] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 23:32:02,486][11411] Worker 6 uses CPU cores [0] [2023-02-22 23:32:02,621][11408] Worker 4 uses CPU cores [0] [2023-02-22 23:32:02,756][11388] Num visible devices: 1 [2023-02-22 23:32:02,784][11388] Starting seed is not provided [2023-02-22 23:32:02,785][11388] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:32:02,785][11388] Initializing actor-critic model on device cuda:0 [2023-02-22 23:32:02,785][11388] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:32:02,788][11388] RunningMeanStd input shape: (1,) [2023-02-22 23:32:02,890][11388] ConvEncoder: input_channels=3 [2023-02-22 23:32:02,916][11406] Worker 1 uses CPU cores [1] [2023-02-22 23:32:03,032][11412] Worker 7 uses CPU cores [1] [2023-02-22 23:32:03,035][11407] Worker 2 uses CPU cores [0] [2023-02-22 23:32:03,123][11410] Worker 5 uses CPU cores [1] [2023-02-22 23:32:03,129][11402] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:32:03,129][11402] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 23:32:03,149][11402] Num visible devices: 1 [2023-02-22 23:32:03,143][11409] Worker 3 uses CPU cores [1] [2023-02-22 23:32:03,203][11403] Worker 0 uses CPU cores [0] [2023-02-22 23:32:03,390][11388] Conv encoder output size: 512 [2023-02-22 23:32:03,390][11388] Policy head output size: 512 [2023-02-22 23:32:03,459][11388] Created Actor Critic model with architecture: [2023-02-22 23:32:03,460][11388] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 23:32:09,008][05631] Heartbeat connected on Batcher_0 [2023-02-22 23:32:09,017][05631] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 23:32:09,033][05631] Heartbeat connected on RolloutWorker_w0 [2023-02-22 23:32:09,036][05631] Heartbeat connected on RolloutWorker_w1 [2023-02-22 23:32:09,040][05631] Heartbeat connected on RolloutWorker_w2 [2023-02-22 23:32:09,043][05631] Heartbeat connected on RolloutWorker_w3 [2023-02-22 23:32:09,046][05631] Heartbeat connected on RolloutWorker_w4 [2023-02-22 23:32:09,050][05631] Heartbeat connected on RolloutWorker_w5 [2023-02-22 23:32:09,061][05631] Heartbeat connected on RolloutWorker_w6 [2023-02-22 23:32:09,062][05631] Heartbeat connected on RolloutWorker_w7 [2023-02-22 23:32:11,237][11388] Using optimizer [2023-02-22 23:32:11,238][11388] No checkpoints found [2023-02-22 23:32:11,238][11388] Did not load from checkpoint, starting from scratch! [2023-02-22 23:32:11,239][11388] Initialized policy 0 weights for model version 0 [2023-02-22 23:32:11,243][11388] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:32:11,254][11388] LearnerWorker_p0 finished initialization! [2023-02-22 23:32:11,270][05631] Heartbeat connected on LearnerWorker_p0 [2023-02-22 23:32:11,358][11402] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:32:11,360][11402] RunningMeanStd input shape: (1,) [2023-02-22 23:32:11,373][11402] ConvEncoder: input_channels=3 [2023-02-22 23:32:11,474][11402] Conv encoder output size: 512 [2023-02-22 23:32:11,475][11402] Policy head output size: 512 [2023-02-22 23:32:13,904][05631] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:32:14,636][05631] Inference worker 0-0 is ready! [2023-02-22 23:32:14,639][05631] All inference workers are ready! Signal rollout workers to start! [2023-02-22 23:32:14,738][11410] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,780][11403] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,781][11411] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,805][11409] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,803][11407] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,882][11412] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,891][11406] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:14,939][11408] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:32:16,434][11410] Decorrelating experience for 0 frames... [2023-02-22 23:32:16,443][11409] Decorrelating experience for 0 frames... [2023-02-22 23:32:16,450][11411] Decorrelating experience for 0 frames... [2023-02-22 23:32:16,460][11403] Decorrelating experience for 0 frames... [2023-02-22 23:32:16,471][11407] Decorrelating experience for 0 frames... [2023-02-22 23:32:16,503][11408] Decorrelating experience for 0 frames... [2023-02-22 23:32:17,551][11406] Decorrelating experience for 0 frames... [2023-02-22 23:32:17,588][11412] Decorrelating experience for 0 frames... [2023-02-22 23:32:17,598][11410] Decorrelating experience for 32 frames... [2023-02-22 23:32:17,776][11407] Decorrelating experience for 32 frames... [2023-02-22 23:32:17,844][11408] Decorrelating experience for 32 frames... [2023-02-22 23:32:17,846][11411] Decorrelating experience for 32 frames... [2023-02-22 23:32:18,810][11403] Decorrelating experience for 32 frames... [2023-02-22 23:32:18,833][11406] Decorrelating experience for 32 frames... [2023-02-22 23:32:18,904][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:32:18,926][11409] Decorrelating experience for 32 frames... [2023-02-22 23:32:19,230][11407] Decorrelating experience for 64 frames... [2023-02-22 23:32:19,251][11412] Decorrelating experience for 32 frames... [2023-02-22 23:32:19,273][11410] Decorrelating experience for 64 frames... [2023-02-22 23:32:19,305][11408] Decorrelating experience for 64 frames... [2023-02-22 23:32:19,849][11411] Decorrelating experience for 64 frames... [2023-02-22 23:32:19,994][11403] Decorrelating experience for 64 frames... [2023-02-22 23:32:20,371][11411] Decorrelating experience for 96 frames... [2023-02-22 23:32:20,530][11406] Decorrelating experience for 64 frames... [2023-02-22 23:32:20,551][11409] Decorrelating experience for 64 frames... [2023-02-22 23:32:20,641][11410] Decorrelating experience for 96 frames... [2023-02-22 23:32:20,956][11412] Decorrelating experience for 64 frames... [2023-02-22 23:32:21,609][11408] Decorrelating experience for 96 frames... [2023-02-22 23:32:21,601][11412] Decorrelating experience for 96 frames... [2023-02-22 23:32:21,869][11403] Decorrelating experience for 96 frames... [2023-02-22 23:32:21,993][11407] Decorrelating experience for 96 frames... [2023-02-22 23:32:22,342][11409] Decorrelating experience for 96 frames... [2023-02-22 23:32:22,683][11406] Decorrelating experience for 96 frames... [2023-02-22 23:32:23,904][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:32:26,753][11388] Signal inference workers to stop experience collection... [2023-02-22 23:32:26,775][11402] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 23:32:28,904][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 143.6. Samples: 2154. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:32:28,907][05631] Avg episode reward: [(0, '1.807')] [2023-02-22 23:32:29,578][11388] Signal inference workers to resume experience collection... [2023-02-22 23:32:29,579][11402] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 23:32:33,904][05631] Fps is (10 sec: 1228.8, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 12288. Throughput: 0: 168.9. Samples: 3378. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-22 23:32:33,912][05631] Avg episode reward: [(0, '3.096')] [2023-02-22 23:32:38,904][05631] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 329.7. Samples: 8242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:32:38,906][05631] Avg episode reward: [(0, '3.630')] [2023-02-22 23:32:40,942][11402] Updated weights for policy 0, policy_version 10 (0.0017) [2023-02-22 23:32:43,904][05631] Fps is (10 sec: 3686.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 378.3. Samples: 11348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:32:43,907][05631] Avg episode reward: [(0, '4.235')] [2023-02-22 23:32:48,904][05631] Fps is (10 sec: 3686.4, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 475.6. Samples: 16646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:32:48,910][05631] Avg episode reward: [(0, '4.272')] [2023-02-22 23:32:53,907][05631] Fps is (10 sec: 2866.3, 60 sec: 1945.4, 300 sec: 1945.4). Total num frames: 77824. Throughput: 0: 507.3. Samples: 20294. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:32:53,910][05631] Avg episode reward: [(0, '4.315')] [2023-02-22 23:32:54,330][11402] Updated weights for policy 0, policy_version 20 (0.0018) [2023-02-22 23:32:58,904][05631] Fps is (10 sec: 2867.2, 60 sec: 2093.5, 300 sec: 2093.5). Total num frames: 94208. Throughput: 0: 496.9. Samples: 22362. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:32:58,906][05631] Avg episode reward: [(0, '4.348')] [2023-02-22 23:33:03,904][05631] Fps is (10 sec: 3687.6, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 114688. Throughput: 0: 633.5. Samples: 28508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:33:03,906][05631] Avg episode reward: [(0, '4.401')] [2023-02-22 23:33:03,916][11388] Saving new best policy, reward=4.401! [2023-02-22 23:33:05,132][11402] Updated weights for policy 0, policy_version 30 (0.0035) [2023-02-22 23:33:08,904][05631] Fps is (10 sec: 3686.4, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 733.0. Samples: 32984. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:33:08,906][05631] Avg episode reward: [(0, '4.430')] [2023-02-22 23:33:08,914][11388] Saving new best policy, reward=4.430! [2023-02-22 23:33:13,904][05631] Fps is (10 sec: 2048.0, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 135168. Throughput: 0: 703.6. Samples: 33818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-22 23:33:13,910][05631] Avg episode reward: [(0, '4.269')] [2023-02-22 23:33:18,904][05631] Fps is (10 sec: 2048.0, 60 sec: 2525.9, 300 sec: 2331.6). Total num frames: 151552. Throughput: 0: 757.2. Samples: 37450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:33:18,906][05631] Avg episode reward: [(0, '4.371')] [2023-02-22 23:33:21,443][11402] Updated weights for policy 0, policy_version 40 (0.0034) [2023-02-22 23:33:23,904][05631] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2457.6). Total num frames: 172032. Throughput: 0: 790.8. Samples: 43828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:33:23,910][05631] Avg episode reward: [(0, '4.504')] [2023-02-22 23:33:23,921][11388] Saving new best policy, reward=4.504! [2023-02-22 23:33:28,907][05631] Fps is (10 sec: 4094.7, 60 sec: 3208.4, 300 sec: 2566.7). Total num frames: 192512. Throughput: 0: 791.6. Samples: 46972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:33:28,918][05631] Avg episode reward: [(0, '4.443')] [2023-02-22 23:33:32,988][11402] Updated weights for policy 0, policy_version 50 (0.0038) [2023-02-22 23:33:33,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2560.0). Total num frames: 204800. Throughput: 0: 774.2. Samples: 51484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:33:33,912][05631] Avg episode reward: [(0, '4.449')] [2023-02-22 23:33:38,904][05631] Fps is (10 sec: 2458.4, 60 sec: 3140.3, 300 sec: 2554.0). Total num frames: 217088. Throughput: 0: 780.3. Samples: 55406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:33:38,912][05631] Avg episode reward: [(0, '4.264')] [2023-02-22 23:33:43,905][05631] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 2639.6). Total num frames: 237568. Throughput: 0: 801.0. Samples: 58406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:33:43,911][05631] Avg episode reward: [(0, '4.287')] [2023-02-22 23:33:43,922][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth... [2023-02-22 23:33:45,151][11402] Updated weights for policy 0, policy_version 60 (0.0028) [2023-02-22 23:33:48,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2716.3). Total num frames: 258048. Throughput: 0: 799.5. Samples: 64484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:33:48,911][05631] Avg episode reward: [(0, '4.361')] [2023-02-22 23:33:53,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3208.7, 300 sec: 2703.4). Total num frames: 270336. Throughput: 0: 795.9. Samples: 68798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:33:53,907][05631] Avg episode reward: [(0, '4.393')] [2023-02-22 23:33:58,642][11402] Updated weights for policy 0, policy_version 70 (0.0013) [2023-02-22 23:33:58,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2730.7). Total num frames: 286720. Throughput: 0: 820.9. Samples: 70760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:33:58,911][05631] Avg episode reward: [(0, '4.411')] [2023-02-22 23:34:03,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3140.3, 300 sec: 2755.5). Total num frames: 303104. Throughput: 0: 854.9. Samples: 75922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:34:03,912][05631] Avg episode reward: [(0, '4.501')] [2023-02-22 23:34:08,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2813.8). Total num frames: 323584. Throughput: 0: 848.8. Samples: 82024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:34:08,907][05631] Avg episode reward: [(0, '4.479')] [2023-02-22 23:34:09,146][11402] Updated weights for policy 0, policy_version 80 (0.0015) [2023-02-22 23:34:13,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2833.1). Total num frames: 339968. Throughput: 0: 828.7. Samples: 84260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:34:13,910][05631] Avg episode reward: [(0, '4.301')] [2023-02-22 23:34:18,904][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 2818.0). Total num frames: 352256. Throughput: 0: 813.3. Samples: 88082. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:34:18,913][05631] Avg episode reward: [(0, '4.291')] [2023-02-22 23:34:23,336][11402] Updated weights for policy 0, policy_version 90 (0.0024) [2023-02-22 23:34:23,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2835.7). Total num frames: 368640. Throughput: 0: 835.7. Samples: 93014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:34:23,907][05631] Avg episode reward: [(0, '4.328')] [2023-02-22 23:34:28,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3277.0, 300 sec: 2882.4). Total num frames: 389120. Throughput: 0: 831.8. Samples: 95838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:34:28,912][05631] Avg episode reward: [(0, '4.479')] [2023-02-22 23:34:33,905][05631] Fps is (10 sec: 3276.4, 60 sec: 3276.7, 300 sec: 2867.2). Total num frames: 401408. Throughput: 0: 810.8. Samples: 100972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:34:33,912][05631] Avg episode reward: [(0, '4.441')] [2023-02-22 23:34:36,124][11402] Updated weights for policy 0, policy_version 100 (0.0018) [2023-02-22 23:34:38,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 2853.1). Total num frames: 413696. Throughput: 0: 795.0. Samples: 104574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:34:38,906][05631] Avg episode reward: [(0, '4.471')] [2023-02-22 23:34:43,904][05631] Fps is (10 sec: 2867.6, 60 sec: 3208.6, 300 sec: 2867.2). Total num frames: 430080. Throughput: 0: 794.9. Samples: 106532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:34:43,907][05631] Avg episode reward: [(0, '4.549')] [2023-02-22 23:34:43,916][11388] Saving new best policy, reward=4.549! [2023-02-22 23:34:48,274][11402] Updated weights for policy 0, policy_version 110 (0.0015) [2023-02-22 23:34:48,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2906.8). Total num frames: 450560. Throughput: 0: 815.6. Samples: 112626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:34:48,906][05631] Avg episode reward: [(0, '4.612')] [2023-02-22 23:34:48,910][11388] Saving new best policy, reward=4.612! [2023-02-22 23:34:53,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 2918.4). Total num frames: 466944. Throughput: 0: 799.5. Samples: 118000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:34:53,911][05631] Avg episode reward: [(0, '4.579')] [2023-02-22 23:34:58,907][05631] Fps is (10 sec: 3275.7, 60 sec: 3276.6, 300 sec: 2929.2). Total num frames: 483328. Throughput: 0: 792.9. Samples: 119944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:34:58,911][05631] Avg episode reward: [(0, '4.527')] [2023-02-22 23:35:01,929][11402] Updated weights for policy 0, policy_version 120 (0.0032) [2023-02-22 23:35:03,904][05631] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 2915.4). Total num frames: 495616. Throughput: 0: 794.7. Samples: 123844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:35:03,911][05631] Avg episode reward: [(0, '4.660')] [2023-02-22 23:35:03,921][11388] Saving new best policy, reward=4.660! [2023-02-22 23:35:08,904][05631] Fps is (10 sec: 3277.8, 60 sec: 3208.5, 300 sec: 2949.1). Total num frames: 516096. Throughput: 0: 819.7. Samples: 129900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:35:08,912][05631] Avg episode reward: [(0, '4.703')] [2023-02-22 23:35:08,915][11388] Saving new best policy, reward=4.703! [2023-02-22 23:35:12,427][11402] Updated weights for policy 0, policy_version 130 (0.0019) [2023-02-22 23:35:13,906][05631] Fps is (10 sec: 4095.4, 60 sec: 3276.7, 300 sec: 2981.0). Total num frames: 536576. Throughput: 0: 822.8. Samples: 132864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:35:13,916][05631] Avg episode reward: [(0, '4.449')] [2023-02-22 23:35:18,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 2966.8). Total num frames: 548864. Throughput: 0: 803.5. Samples: 137128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:35:18,908][05631] Avg episode reward: [(0, '4.510')] [2023-02-22 23:35:23,904][05631] Fps is (10 sec: 2458.0, 60 sec: 3208.5, 300 sec: 2953.4). Total num frames: 561152. Throughput: 0: 811.9. Samples: 141108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:35:23,907][05631] Avg episode reward: [(0, '4.519')] [2023-02-22 23:35:26,269][11402] Updated weights for policy 0, policy_version 140 (0.0036) [2023-02-22 23:35:28,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2982.7). Total num frames: 581632. Throughput: 0: 839.2. Samples: 144294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:35:28,911][05631] Avg episode reward: [(0, '4.480')] [2023-02-22 23:35:33,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3010.6). Total num frames: 602112. Throughput: 0: 844.0. Samples: 150606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:35:33,908][05631] Avg episode reward: [(0, '4.208')] [2023-02-22 23:35:37,322][11402] Updated weights for policy 0, policy_version 150 (0.0013) [2023-02-22 23:35:38,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2997.1). Total num frames: 614400. Throughput: 0: 822.1. Samples: 154992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:35:38,907][05631] Avg episode reward: [(0, '4.311')] [2023-02-22 23:35:43,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3003.7). Total num frames: 630784. Throughput: 0: 822.1. Samples: 156936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:35:43,914][05631] Avg episode reward: [(0, '4.271')] [2023-02-22 23:35:43,927][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000154_630784.pth... [2023-02-22 23:35:48,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3010.1). Total num frames: 647168. Throughput: 0: 849.2. Samples: 162058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:35:48,908][05631] Avg episode reward: [(0, '4.508')] [2023-02-22 23:35:49,942][11402] Updated weights for policy 0, policy_version 160 (0.0023) [2023-02-22 23:35:53,904][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.4, 300 sec: 3053.4). Total num frames: 671744. Throughput: 0: 853.9. Samples: 168324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:35:53,907][05631] Avg episode reward: [(0, '4.452')] [2023-02-22 23:35:58,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3040.1). Total num frames: 684032. Throughput: 0: 840.2. Samples: 170670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:35:58,910][05631] Avg episode reward: [(0, '4.558')] [2023-02-22 23:36:02,437][11402] Updated weights for policy 0, policy_version 170 (0.0040) [2023-02-22 23:36:03,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3045.3). Total num frames: 700416. Throughput: 0: 834.4. Samples: 174678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:36:03,911][05631] Avg episode reward: [(0, '4.447')] [2023-02-22 23:36:08,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3050.2). Total num frames: 716800. Throughput: 0: 860.3. Samples: 179820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:36:08,907][05631] Avg episode reward: [(0, '4.720')] [2023-02-22 23:36:08,914][11388] Saving new best policy, reward=4.720! [2023-02-22 23:36:13,457][11402] Updated weights for policy 0, policy_version 180 (0.0018) [2023-02-22 23:36:13,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3072.0). Total num frames: 737280. Throughput: 0: 857.5. Samples: 182882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:36:13,907][05631] Avg episode reward: [(0, '4.898')] [2023-02-22 23:36:13,921][11388] Saving new best policy, reward=4.898! [2023-02-22 23:36:18,906][05631] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3076.2). Total num frames: 753664. Throughput: 0: 840.3. Samples: 188420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:36:18,909][05631] Avg episode reward: [(0, '4.718')] [2023-02-22 23:36:23,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3063.8). Total num frames: 765952. Throughput: 0: 830.3. Samples: 192356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:36:23,910][05631] Avg episode reward: [(0, '4.706')] [2023-02-22 23:36:27,095][11402] Updated weights for policy 0, policy_version 190 (0.0024) [2023-02-22 23:36:28,904][05631] Fps is (10 sec: 2867.8, 60 sec: 3345.1, 300 sec: 3068.0). Total num frames: 782336. Throughput: 0: 836.9. Samples: 194598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:36:28,911][05631] Avg episode reward: [(0, '4.807')] [2023-02-22 23:36:33,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 802816. Throughput: 0: 859.1. Samples: 200716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:36:33,907][05631] Avg episode reward: [(0, '4.739')] [2023-02-22 23:36:37,987][11402] Updated weights for policy 0, policy_version 200 (0.0014) [2023-02-22 23:36:38,908][05631] Fps is (10 sec: 3685.1, 60 sec: 3413.1, 300 sec: 3091.3). Total num frames: 819200. Throughput: 0: 833.4. Samples: 205828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:36:38,911][05631] Avg episode reward: [(0, '4.627')] [2023-02-22 23:36:43,904][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3079.6). Total num frames: 831488. Throughput: 0: 823.5. Samples: 207726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:36:43,911][05631] Avg episode reward: [(0, '4.542')] [2023-02-22 23:36:48,904][05631] Fps is (10 sec: 2868.2, 60 sec: 3345.1, 300 sec: 3083.2). Total num frames: 847872. Throughput: 0: 821.2. Samples: 211632. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:36:48,912][05631] Avg episode reward: [(0, '4.506')] [2023-02-22 23:36:51,466][11402] Updated weights for policy 0, policy_version 210 (0.0042) [2023-02-22 23:36:53,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3101.3). Total num frames: 868352. Throughput: 0: 839.3. Samples: 217588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:36:53,908][05631] Avg episode reward: [(0, '4.638')] [2023-02-22 23:36:58,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3104.3). Total num frames: 884736. Throughput: 0: 841.8. Samples: 220762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:36:58,909][05631] Avg episode reward: [(0, '4.493')] [2023-02-22 23:37:03,911][05631] Fps is (10 sec: 2865.4, 60 sec: 3276.4, 300 sec: 3093.1). Total num frames: 897024. Throughput: 0: 811.6. Samples: 224946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:37:03,917][05631] Avg episode reward: [(0, '4.574')] [2023-02-22 23:37:03,962][11402] Updated weights for policy 0, policy_version 220 (0.0019) [2023-02-22 23:37:08,904][05631] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3096.3). Total num frames: 913408. Throughput: 0: 811.8. Samples: 228886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:37:08,912][05631] Avg episode reward: [(0, '4.607')] [2023-02-22 23:37:13,904][05631] Fps is (10 sec: 3688.8, 60 sec: 3276.8, 300 sec: 3165.7). Total num frames: 933888. Throughput: 0: 831.1. Samples: 231996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:37:13,907][05631] Avg episode reward: [(0, '4.605')] [2023-02-22 23:37:15,709][11402] Updated weights for policy 0, policy_version 230 (0.0019) [2023-02-22 23:37:18,905][05631] Fps is (10 sec: 4095.5, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 954368. Throughput: 0: 832.8. Samples: 238194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:37:18,908][05631] Avg episode reward: [(0, '4.485')] [2023-02-22 23:37:23,905][05631] Fps is (10 sec: 3276.4, 60 sec: 3345.0, 300 sec: 3276.8). Total num frames: 966656. Throughput: 0: 814.0. Samples: 242456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:37:23,912][05631] Avg episode reward: [(0, '4.660')] [2023-02-22 23:37:28,904][05631] Fps is (10 sec: 2457.9, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 978944. Throughput: 0: 814.9. Samples: 244396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:37:28,914][05631] Avg episode reward: [(0, '4.772')] [2023-02-22 23:37:29,448][11402] Updated weights for policy 0, policy_version 240 (0.0015) [2023-02-22 23:37:33,904][05631] Fps is (10 sec: 3277.2, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 999424. Throughput: 0: 842.6. Samples: 249548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:37:33,913][05631] Avg episode reward: [(0, '4.759')] [2023-02-22 23:37:38,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.3, 300 sec: 3290.7). Total num frames: 1019904. Throughput: 0: 848.9. Samples: 255788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:37:38,907][05631] Avg episode reward: [(0, '4.585')] [2023-02-22 23:37:39,355][11402] Updated weights for policy 0, policy_version 250 (0.0018) [2023-02-22 23:37:43,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 1032192. Throughput: 0: 828.9. Samples: 258062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:37:43,917][05631] Avg episode reward: [(0, '4.611')] [2023-02-22 23:37:43,931][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth... [2023-02-22 23:37:44,111][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth [2023-02-22 23:37:48,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1048576. Throughput: 0: 821.2. Samples: 261896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:37:48,911][05631] Avg episode reward: [(0, '4.767')] [2023-02-22 23:37:53,274][11402] Updated weights for policy 0, policy_version 260 (0.0058) [2023-02-22 23:37:53,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 1064960. Throughput: 0: 849.5. Samples: 267114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:37:53,911][05631] Avg episode reward: [(0, '4.750')] [2023-02-22 23:37:58,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3290.7). Total num frames: 1085440. Throughput: 0: 850.6. Samples: 270272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:37:58,912][05631] Avg episode reward: [(0, '4.480')] [2023-02-22 23:38:03,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.7, 300 sec: 3290.7). Total num frames: 1101824. Throughput: 0: 831.2. Samples: 275596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:38:03,911][05631] Avg episode reward: [(0, '4.690')] [2023-02-22 23:38:04,835][11402] Updated weights for policy 0, policy_version 270 (0.0013) [2023-02-22 23:38:08,908][05631] Fps is (10 sec: 2866.0, 60 sec: 3344.8, 300 sec: 3318.4). Total num frames: 1114112. Throughput: 0: 821.8. Samples: 279438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:38:08,910][05631] Avg episode reward: [(0, '4.660')] [2023-02-22 23:38:13,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 1130496. Throughput: 0: 822.2. Samples: 281396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:38:13,912][05631] Avg episode reward: [(0, '4.739')] [2023-02-22 23:38:17,611][11402] Updated weights for policy 0, policy_version 280 (0.0028) [2023-02-22 23:38:18,904][05631] Fps is (10 sec: 3687.9, 60 sec: 3276.9, 300 sec: 3318.5). Total num frames: 1150976. Throughput: 0: 842.6. Samples: 287464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:38:18,912][05631] Avg episode reward: [(0, '4.893')] [2023-02-22 23:38:23,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 1167360. Throughput: 0: 828.3. Samples: 293062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:38:23,909][05631] Avg episode reward: [(0, '4.942')] [2023-02-22 23:38:23,927][11388] Saving new best policy, reward=4.942! [2023-02-22 23:38:28,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3318.5). Total num frames: 1183744. Throughput: 0: 821.6. Samples: 295036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:38:28,914][05631] Avg episode reward: [(0, '4.825')] [2023-02-22 23:38:30,292][11402] Updated weights for policy 0, policy_version 290 (0.0016) [2023-02-22 23:38:33,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 1196032. Throughput: 0: 824.9. Samples: 299016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:38:33,906][05631] Avg episode reward: [(0, '4.933')] [2023-02-22 23:38:38,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 1216512. Throughput: 0: 849.0. Samples: 305320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:38:38,910][05631] Avg episode reward: [(0, '4.834')] [2023-02-22 23:38:41,134][11402] Updated weights for policy 0, policy_version 300 (0.0022) [2023-02-22 23:38:43,906][05631] Fps is (10 sec: 4095.1, 60 sec: 3413.2, 300 sec: 3318.4). Total num frames: 1236992. Throughput: 0: 847.9. Samples: 308430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:38:43,909][05631] Avg episode reward: [(0, '4.849')] [2023-02-22 23:38:48,908][05631] Fps is (10 sec: 3275.4, 60 sec: 3344.8, 300 sec: 3318.4). Total num frames: 1249280. Throughput: 0: 824.5. Samples: 312702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:38:48,916][05631] Avg episode reward: [(0, '4.683')] [2023-02-22 23:38:53,904][05631] Fps is (10 sec: 2458.1, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 1261568. Throughput: 0: 829.0. Samples: 316738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:38:53,906][05631] Avg episode reward: [(0, '4.638')] [2023-02-22 23:38:55,000][11402] Updated weights for policy 0, policy_version 310 (0.0015) [2023-02-22 23:38:58,904][05631] Fps is (10 sec: 3688.0, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1286144. Throughput: 0: 854.5. Samples: 319850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:38:58,909][05631] Avg episode reward: [(0, '4.625')] [2023-02-22 23:39:03,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 1302528. Throughput: 0: 859.3. Samples: 326134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:39:03,907][05631] Avg episode reward: [(0, '4.734')] [2023-02-22 23:39:05,450][11402] Updated weights for policy 0, policy_version 320 (0.0023) [2023-02-22 23:39:08,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 3318.5). Total num frames: 1318912. Throughput: 0: 828.1. Samples: 330326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:39:08,911][05631] Avg episode reward: [(0, '4.796')] [2023-02-22 23:39:13,907][05631] Fps is (10 sec: 2866.5, 60 sec: 3344.9, 300 sec: 3318.4). Total num frames: 1331200. Throughput: 0: 827.8. Samples: 332290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:39:13,911][05631] Avg episode reward: [(0, '4.773')] [2023-02-22 23:39:18,545][11402] Updated weights for policy 0, policy_version 330 (0.0024) [2023-02-22 23:39:18,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1351680. Throughput: 0: 856.8. Samples: 337570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:39:18,907][05631] Avg episode reward: [(0, '4.517')] [2023-02-22 23:39:23,904][05631] Fps is (10 sec: 4097.0, 60 sec: 3413.3, 300 sec: 3332.3). Total num frames: 1372160. Throughput: 0: 857.0. Samples: 343884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:39:23,911][05631] Avg episode reward: [(0, '4.435')] [2023-02-22 23:39:28,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3332.4). Total num frames: 1384448. Throughput: 0: 833.1. Samples: 345916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:39:28,911][05631] Avg episode reward: [(0, '4.540')] [2023-02-22 23:39:30,901][11402] Updated weights for policy 0, policy_version 340 (0.0023) [2023-02-22 23:39:33,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 1396736. Throughput: 0: 821.3. Samples: 349656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-22 23:39:33,913][05631] Avg episode reward: [(0, '4.400')] [2023-02-22 23:39:38,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 1417216. Throughput: 0: 853.6. Samples: 355150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:39:38,911][05631] Avg episode reward: [(0, '4.516')] [2023-02-22 23:39:42,399][11402] Updated weights for policy 0, policy_version 350 (0.0026) [2023-02-22 23:39:43,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3346.2). Total num frames: 1437696. Throughput: 0: 856.7. Samples: 358400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 23:39:43,912][05631] Avg episode reward: [(0, '4.631')] [2023-02-22 23:39:43,925][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000351_1437696.pth... [2023-02-22 23:39:44,048][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000154_630784.pth [2023-02-22 23:39:48,915][05631] Fps is (10 sec: 3682.5, 60 sec: 3413.0, 300 sec: 3346.1). Total num frames: 1454080. Throughput: 0: 828.7. Samples: 363436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:39:48,919][05631] Avg episode reward: [(0, '4.724')] [2023-02-22 23:39:53,905][05631] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3332.4). Total num frames: 1466368. Throughput: 0: 823.1. Samples: 367368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:39:53,913][05631] Avg episode reward: [(0, '4.694')] [2023-02-22 23:39:56,151][11402] Updated weights for policy 0, policy_version 360 (0.0023) [2023-02-22 23:39:58,904][05631] Fps is (10 sec: 2870.3, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 1482752. Throughput: 0: 835.6. Samples: 369890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 23:39:58,911][05631] Avg episode reward: [(0, '4.647')] [2023-02-22 23:40:03,904][05631] Fps is (10 sec: 4096.3, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1507328. Throughput: 0: 861.2. Samples: 376324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:40:03,906][05631] Avg episode reward: [(0, '4.388')] [2023-02-22 23:40:05,705][11402] Updated weights for policy 0, policy_version 370 (0.0014) [2023-02-22 23:40:08,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1523712. Throughput: 0: 833.5. Samples: 381390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:40:08,911][05631] Avg episode reward: [(0, '4.430')] [2023-02-22 23:40:13,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3346.2). Total num frames: 1536000. Throughput: 0: 834.8. Samples: 383482. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:40:13,907][05631] Avg episode reward: [(0, '4.272')] [2023-02-22 23:40:18,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 1552384. Throughput: 0: 852.4. Samples: 388016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:40:18,911][05631] Avg episode reward: [(0, '4.473')] [2023-02-22 23:40:19,142][11402] Updated weights for policy 0, policy_version 380 (0.0020) [2023-02-22 23:40:23,906][05631] Fps is (10 sec: 3685.6, 60 sec: 3344.9, 300 sec: 3360.1). Total num frames: 1572864. Throughput: 0: 872.3. Samples: 394404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:40:23,910][05631] Avg episode reward: [(0, '4.602')] [2023-02-22 23:40:28,906][05631] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3360.1). Total num frames: 1593344. Throughput: 0: 865.5. Samples: 397348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:40:28,913][05631] Avg episode reward: [(0, '4.487')] [2023-02-22 23:40:30,651][11402] Updated weights for policy 0, policy_version 390 (0.0033) [2023-02-22 23:40:33,904][05631] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 1605632. Throughput: 0: 840.4. Samples: 401244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:40:33,908][05631] Avg episode reward: [(0, '4.482')] [2023-02-22 23:40:38,904][05631] Fps is (10 sec: 2867.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1622016. Throughput: 0: 859.3. Samples: 406034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:40:38,907][05631] Avg episode reward: [(0, '4.531')] [2023-02-22 23:40:42,462][11402] Updated weights for policy 0, policy_version 400 (0.0013) [2023-02-22 23:40:43,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 1642496. Throughput: 0: 871.7. Samples: 409118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:40:43,912][05631] Avg episode reward: [(0, '4.513')] [2023-02-22 23:40:48,906][05631] Fps is (10 sec: 3685.6, 60 sec: 3413.8, 300 sec: 3346.2). Total num frames: 1658880. Throughput: 0: 860.2. Samples: 415034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:40:48,909][05631] Avg episode reward: [(0, '4.554')] [2023-02-22 23:40:53,906][05631] Fps is (10 sec: 2866.7, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1671168. Throughput: 0: 836.1. Samples: 419014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:40:53,909][05631] Avg episode reward: [(0, '4.590')] [2023-02-22 23:40:55,803][11402] Updated weights for policy 0, policy_version 410 (0.0015) [2023-02-22 23:40:58,904][05631] Fps is (10 sec: 2867.9, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1687552. Throughput: 0: 834.1. Samples: 421016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:40:58,910][05631] Avg episode reward: [(0, '4.548')] [2023-02-22 23:41:03,904][05631] Fps is (10 sec: 3687.1, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 1708032. Throughput: 0: 870.4. Samples: 427182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:41:03,910][05631] Avg episode reward: [(0, '4.711')] [2023-02-22 23:41:05,799][11402] Updated weights for policy 0, policy_version 420 (0.0017) [2023-02-22 23:41:08,907][05631] Fps is (10 sec: 4094.6, 60 sec: 3413.1, 300 sec: 3360.1). Total num frames: 1728512. Throughput: 0: 856.7. Samples: 432956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:41:08,915][05631] Avg episode reward: [(0, '4.743')] [2023-02-22 23:41:13,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 1740800. Throughput: 0: 835.7. Samples: 434952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:41:13,907][05631] Avg episode reward: [(0, '4.858')] [2023-02-22 23:41:18,904][05631] Fps is (10 sec: 2868.1, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1757184. Throughput: 0: 838.3. Samples: 438966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:41:18,910][05631] Avg episode reward: [(0, '4.774')] [2023-02-22 23:41:19,510][11402] Updated weights for policy 0, policy_version 430 (0.0014) [2023-02-22 23:41:23,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 1777664. Throughput: 0: 870.1. Samples: 445190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:41:23,907][05631] Avg episode reward: [(0, '4.701')] [2023-02-22 23:41:28,904][05631] Fps is (10 sec: 4095.9, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 1798144. Throughput: 0: 871.1. Samples: 448316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:41:28,911][05631] Avg episode reward: [(0, '4.835')] [2023-02-22 23:41:30,401][11402] Updated weights for policy 0, policy_version 440 (0.0022) [2023-02-22 23:41:33,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1810432. Throughput: 0: 836.6. Samples: 452680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:41:33,907][05631] Avg episode reward: [(0, '4.692')] [2023-02-22 23:41:38,904][05631] Fps is (10 sec: 2457.7, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 1822720. Throughput: 0: 839.2. Samples: 456776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:41:38,907][05631] Avg episode reward: [(0, '4.623')] [2023-02-22 23:41:43,114][11402] Updated weights for policy 0, policy_version 450 (0.0033) [2023-02-22 23:41:43,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 1843200. Throughput: 0: 864.2. Samples: 459904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:41:43,912][05631] Avg episode reward: [(0, '4.584')] [2023-02-22 23:41:43,926][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000450_1843200.pth... [2023-02-22 23:41:44,058][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_1032192.pth [2023-02-22 23:41:48,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 1863680. Throughput: 0: 863.0. Samples: 466018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:41:48,911][05631] Avg episode reward: [(0, '4.599')] [2023-02-22 23:41:53,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 1875968. Throughput: 0: 829.9. Samples: 470300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:41:53,914][05631] Avg episode reward: [(0, '4.603')] [2023-02-22 23:41:55,621][11402] Updated weights for policy 0, policy_version 460 (0.0017) [2023-02-22 23:41:58,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.1). Total num frames: 1892352. Throughput: 0: 830.8. Samples: 472336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:41:58,911][05631] Avg episode reward: [(0, '4.639')] [2023-02-22 23:42:03,907][05631] Fps is (10 sec: 3685.2, 60 sec: 3413.1, 300 sec: 3387.8). Total num frames: 1912832. Throughput: 0: 863.2. Samples: 477814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:42:03,914][05631] Avg episode reward: [(0, '4.939')] [2023-02-22 23:42:06,642][11402] Updated weights for policy 0, policy_version 470 (0.0025) [2023-02-22 23:42:08,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 1933312. Throughput: 0: 862.4. Samples: 484000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:42:08,908][05631] Avg episode reward: [(0, '4.880')] [2023-02-22 23:42:13,904][05631] Fps is (10 sec: 3277.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 1945600. Throughput: 0: 837.9. Samples: 486022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:42:13,909][05631] Avg episode reward: [(0, '4.641')] [2023-02-22 23:42:18,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 1957888. Throughput: 0: 827.0. Samples: 489896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:42:18,908][05631] Avg episode reward: [(0, '4.552')] [2023-02-22 23:42:20,442][11402] Updated weights for policy 0, policy_version 480 (0.0035) [2023-02-22 23:42:23,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 1978368. Throughput: 0: 862.1. Samples: 495572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:42:23,911][05631] Avg episode reward: [(0, '4.688')] [2023-02-22 23:42:28,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 1998848. Throughput: 0: 862.6. Samples: 498720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:42:28,912][05631] Avg episode reward: [(0, '4.598')] [2023-02-22 23:42:30,591][11402] Updated weights for policy 0, policy_version 490 (0.0014) [2023-02-22 23:42:33,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2015232. Throughput: 0: 838.8. Samples: 503764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:42:33,907][05631] Avg episode reward: [(0, '4.445')] [2023-02-22 23:42:38,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2027520. Throughput: 0: 833.2. Samples: 507794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:42:38,910][05631] Avg episode reward: [(0, '4.582')] [2023-02-22 23:42:43,621][11402] Updated weights for policy 0, policy_version 500 (0.0021) [2023-02-22 23:42:43,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 2048000. Throughput: 0: 847.2. Samples: 510462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:42:43,910][05631] Avg episode reward: [(0, '4.807')] [2023-02-22 23:42:48,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 2068480. Throughput: 0: 861.9. Samples: 516596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:42:48,907][05631] Avg episode reward: [(0, '4.648')] [2023-02-22 23:42:53,905][05631] Fps is (10 sec: 3276.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2080768. Throughput: 0: 831.8. Samples: 521434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:42:53,908][05631] Avg episode reward: [(0, '4.616')] [2023-02-22 23:42:55,679][11402] Updated weights for policy 0, policy_version 510 (0.0019) [2023-02-22 23:42:58,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 2097152. Throughput: 0: 830.8. Samples: 523410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:42:58,907][05631] Avg episode reward: [(0, '4.531')] [2023-02-22 23:43:03,904][05631] Fps is (10 sec: 3277.2, 60 sec: 3345.2, 300 sec: 3387.9). Total num frames: 2113536. Throughput: 0: 856.4. Samples: 528432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:43:03,911][05631] Avg episode reward: [(0, '4.480')] [2023-02-22 23:43:06,912][11402] Updated weights for policy 0, policy_version 520 (0.0022) [2023-02-22 23:43:08,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2138112. Throughput: 0: 874.0. Samples: 534902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:43:08,915][05631] Avg episode reward: [(0, '4.669')] [2023-02-22 23:43:13,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 2150400. Throughput: 0: 863.7. Samples: 537586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:43:13,911][05631] Avg episode reward: [(0, '4.741')] [2023-02-22 23:43:18,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2166784. Throughput: 0: 839.6. Samples: 541544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:43:18,912][05631] Avg episode reward: [(0, '4.730')] [2023-02-22 23:43:20,233][11402] Updated weights for policy 0, policy_version 530 (0.0020) [2023-02-22 23:43:23,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 2183168. Throughput: 0: 867.4. Samples: 546826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:43:23,906][05631] Avg episode reward: [(0, '4.878')] [2023-02-22 23:43:28,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2203648. Throughput: 0: 879.0. Samples: 550018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:28,911][05631] Avg episode reward: [(0, '4.842')] [2023-02-22 23:43:30,009][11402] Updated weights for policy 0, policy_version 540 (0.0025) [2023-02-22 23:43:33,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2224128. Throughput: 0: 871.0. Samples: 555792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:33,911][05631] Avg episode reward: [(0, '4.430')] [2023-02-22 23:43:38,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 2236416. Throughput: 0: 853.9. Samples: 559860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:38,910][05631] Avg episode reward: [(0, '4.467')] [2023-02-22 23:43:43,349][11402] Updated weights for policy 0, policy_version 550 (0.0027) [2023-02-22 23:43:43,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 2252800. Throughput: 0: 855.5. Samples: 561908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:43,906][05631] Avg episode reward: [(0, '4.584')] [2023-02-22 23:43:43,925][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth... [2023-02-22 23:43:44,049][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000351_1437696.pth [2023-02-22 23:43:48,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2273280. Throughput: 0: 886.1. Samples: 568308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:48,915][05631] Avg episode reward: [(0, '4.476')] [2023-02-22 23:43:53,846][11402] Updated weights for policy 0, policy_version 560 (0.0014) [2023-02-22 23:43:53,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 2293760. Throughput: 0: 866.8. Samples: 573906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:53,909][05631] Avg episode reward: [(0, '4.371')] [2023-02-22 23:43:58,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2306048. Throughput: 0: 851.0. Samples: 575882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:43:58,908][05631] Avg episode reward: [(0, '4.574')] [2023-02-22 23:44:03,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2322432. Throughput: 0: 863.2. Samples: 580386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:03,913][05631] Avg episode reward: [(0, '4.903')] [2023-02-22 23:44:06,103][11402] Updated weights for policy 0, policy_version 570 (0.0042) [2023-02-22 23:44:08,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 2342912. Throughput: 0: 890.2. Samples: 586884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:44:08,906][05631] Avg episode reward: [(0, '4.780')] [2023-02-22 23:44:13,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2363392. Throughput: 0: 891.2. Samples: 590122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:13,909][05631] Avg episode reward: [(0, '4.532')] [2023-02-22 23:44:17,852][11402] Updated weights for policy 0, policy_version 580 (0.0020) [2023-02-22 23:44:18,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 2375680. Throughput: 0: 858.3. Samples: 594414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:18,915][05631] Avg episode reward: [(0, '4.614')] [2023-02-22 23:44:23,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2392064. Throughput: 0: 865.5. Samples: 598806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:44:23,912][05631] Avg episode reward: [(0, '4.504')] [2023-02-22 23:44:28,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2412544. Throughput: 0: 887.5. Samples: 601844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:28,912][05631] Avg episode reward: [(0, '4.419')] [2023-02-22 23:44:29,552][11402] Updated weights for policy 0, policy_version 590 (0.0032) [2023-02-22 23:44:33,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2433024. Throughput: 0: 881.3. Samples: 607966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:33,910][05631] Avg episode reward: [(0, '4.540')] [2023-02-22 23:44:38,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2445312. Throughput: 0: 849.0. Samples: 612112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:44:38,907][05631] Avg episode reward: [(0, '4.639')] [2023-02-22 23:44:42,762][11402] Updated weights for policy 0, policy_version 600 (0.0012) [2023-02-22 23:44:43,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3401.9). Total num frames: 2457600. Throughput: 0: 851.8. Samples: 614212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:44:43,907][05631] Avg episode reward: [(0, '4.761')] [2023-02-22 23:44:48,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 2478080. Throughput: 0: 877.8. Samples: 619886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:44:48,907][05631] Avg episode reward: [(0, '4.784')] [2023-02-22 23:44:52,863][11402] Updated weights for policy 0, policy_version 610 (0.0016) [2023-02-22 23:44:53,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2498560. Throughput: 0: 868.6. Samples: 625970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:44:53,918][05631] Avg episode reward: [(0, '4.622')] [2023-02-22 23:44:58,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 2514944. Throughput: 0: 841.2. Samples: 627978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:44:58,907][05631] Avg episode reward: [(0, '4.478')] [2023-02-22 23:45:03,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 2527232. Throughput: 0: 834.9. Samples: 631984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:03,907][05631] Avg episode reward: [(0, '4.492')] [2023-02-22 23:45:06,441][11402] Updated weights for policy 0, policy_version 620 (0.0033) [2023-02-22 23:45:08,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2547712. Throughput: 0: 866.0. Samples: 637776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:08,913][05631] Avg episode reward: [(0, '4.801')] [2023-02-22 23:45:13,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2568192. Throughput: 0: 867.9. Samples: 640898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:13,912][05631] Avg episode reward: [(0, '4.862')] [2023-02-22 23:45:17,445][11402] Updated weights for policy 0, policy_version 630 (0.0020) [2023-02-22 23:45:18,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 2580480. Throughput: 0: 839.6. Samples: 645750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:18,909][05631] Avg episode reward: [(0, '4.891')] [2023-02-22 23:45:23,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 2596864. Throughput: 0: 836.0. Samples: 649732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:23,912][05631] Avg episode reward: [(0, '4.774')] [2023-02-22 23:45:28,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2617344. Throughput: 0: 850.8. Samples: 652498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:45:28,909][05631] Avg episode reward: [(0, '4.810')] [2023-02-22 23:45:29,912][11402] Updated weights for policy 0, policy_version 640 (0.0017) [2023-02-22 23:45:33,905][05631] Fps is (10 sec: 4095.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2637824. Throughput: 0: 868.7. Samples: 658980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:33,911][05631] Avg episode reward: [(0, '4.690')] [2023-02-22 23:45:38,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 2650112. Throughput: 0: 844.4. Samples: 663968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:45:38,908][05631] Avg episode reward: [(0, '4.590')] [2023-02-22 23:45:41,971][11402] Updated weights for policy 0, policy_version 650 (0.0025) [2023-02-22 23:45:43,905][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 2666496. Throughput: 0: 845.6. Samples: 666030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:45:43,912][05631] Avg episode reward: [(0, '4.603')] [2023-02-22 23:45:43,924][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000651_2666496.pth... [2023-02-22 23:45:44,067][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000450_1843200.pth [2023-02-22 23:45:48,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2686976. Throughput: 0: 866.5. Samples: 670978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:45:48,912][05631] Avg episode reward: [(0, '4.608')] [2023-02-22 23:45:52,715][11402] Updated weights for policy 0, policy_version 660 (0.0013) [2023-02-22 23:45:53,904][05631] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2707456. Throughput: 0: 878.6. Samples: 677312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:53,909][05631] Avg episode reward: [(0, '4.865')] [2023-02-22 23:45:58,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2723840. Throughput: 0: 868.8. Samples: 679996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:45:58,908][05631] Avg episode reward: [(0, '4.826')] [2023-02-22 23:46:03,904][05631] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 2736128. Throughput: 0: 853.9. Samples: 684176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:46:03,915][05631] Avg episode reward: [(0, '4.655')] [2023-02-22 23:46:05,964][11402] Updated weights for policy 0, policy_version 670 (0.0042) [2023-02-22 23:46:08,904][05631] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 2752512. Throughput: 0: 881.3. Samples: 689390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:46:08,907][05631] Avg episode reward: [(0, '4.633')] [2023-02-22 23:46:13,904][05631] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2777088. Throughput: 0: 890.3. Samples: 692562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:46:13,907][05631] Avg episode reward: [(0, '4.646')] [2023-02-22 23:46:15,831][11402] Updated weights for policy 0, policy_version 680 (0.0031) [2023-02-22 23:46:18,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 2793472. Throughput: 0: 874.2. Samples: 698318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:46:18,910][05631] Avg episode reward: [(0, '4.606')] [2023-02-22 23:46:23,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 2805760. Throughput: 0: 853.6. Samples: 702380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:46:23,911][05631] Avg episode reward: [(0, '4.538')] [2023-02-22 23:46:28,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 2822144. Throughput: 0: 852.0. Samples: 704370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:46:28,907][05631] Avg episode reward: [(0, '4.635')] [2023-02-22 23:46:29,261][11402] Updated weights for policy 0, policy_version 690 (0.0040) [2023-02-22 23:46:33,904][05631] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2846720. Throughput: 0: 888.7. Samples: 710968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:46:33,906][05631] Avg episode reward: [(0, '4.670')] [2023-02-22 23:46:38,912][05631] Fps is (10 sec: 4092.7, 60 sec: 3549.4, 300 sec: 3457.2). Total num frames: 2863104. Throughput: 0: 875.5. Samples: 716716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:46:38,922][05631] Avg episode reward: [(0, '4.600')] [2023-02-22 23:46:39,711][11402] Updated weights for policy 0, policy_version 700 (0.0013) [2023-02-22 23:46:43,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 2875392. Throughput: 0: 861.5. Samples: 718762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:46:43,907][05631] Avg episode reward: [(0, '4.558')] [2023-02-22 23:46:48,904][05631] Fps is (10 sec: 2869.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2891776. Throughput: 0: 866.3. Samples: 723160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:46:48,906][05631] Avg episode reward: [(0, '4.860')] [2023-02-22 23:46:51,973][11402] Updated weights for policy 0, policy_version 710 (0.0023) [2023-02-22 23:46:53,904][05631] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2916352. Throughput: 0: 889.6. Samples: 729424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:46:53,912][05631] Avg episode reward: [(0, '5.049')] [2023-02-22 23:46:53,923][11388] Saving new best policy, reward=5.049! [2023-02-22 23:46:58,906][05631] Fps is (10 sec: 4095.4, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 2932736. Throughput: 0: 886.9. Samples: 732474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:46:58,911][05631] Avg episode reward: [(0, '4.561')] [2023-02-22 23:47:03,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 2945024. Throughput: 0: 852.5. Samples: 736682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:47:03,911][05631] Avg episode reward: [(0, '4.560')] [2023-02-22 23:47:04,193][11402] Updated weights for policy 0, policy_version 720 (0.0017) [2023-02-22 23:47:08,904][05631] Fps is (10 sec: 2867.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 2961408. Throughput: 0: 862.0. Samples: 741172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:47:08,912][05631] Avg episode reward: [(0, '4.552')] [2023-02-22 23:47:13,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2981888. Throughput: 0: 889.1. Samples: 744378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:47:13,907][05631] Avg episode reward: [(0, '4.634')] [2023-02-22 23:47:15,075][11402] Updated weights for policy 0, policy_version 730 (0.0040) [2023-02-22 23:47:18,911][05631] Fps is (10 sec: 4093.0, 60 sec: 3481.2, 300 sec: 3471.1). Total num frames: 3002368. Throughput: 0: 888.3. Samples: 750948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:47:18,914][05631] Avg episode reward: [(0, '4.812')] [2023-02-22 23:47:23,904][05631] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3014656. Throughput: 0: 849.1. Samples: 754920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:47:23,910][05631] Avg episode reward: [(0, '4.914')] [2023-02-22 23:47:28,529][11402] Updated weights for policy 0, policy_version 740 (0.0019) [2023-02-22 23:47:28,904][05631] Fps is (10 sec: 2869.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3031040. Throughput: 0: 849.5. Samples: 756990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:47:28,914][05631] Avg episode reward: [(0, '4.916')] [2023-02-22 23:47:33,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3051520. Throughput: 0: 880.3. Samples: 762772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:47:33,911][05631] Avg episode reward: [(0, '4.705')] [2023-02-22 23:47:38,005][11402] Updated weights for policy 0, policy_version 750 (0.0015) [2023-02-22 23:47:38,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3482.1, 300 sec: 3471.2). Total num frames: 3072000. Throughput: 0: 881.2. Samples: 769076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:47:38,915][05631] Avg episode reward: [(0, '4.615')] [2023-02-22 23:47:43,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 3088384. Throughput: 0: 858.9. Samples: 771122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:47:43,912][05631] Avg episode reward: [(0, '4.800')] [2023-02-22 23:47:43,930][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth... [2023-02-22 23:47:44,111][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth [2023-02-22 23:47:48,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3100672. Throughput: 0: 853.4. Samples: 775086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:47:48,910][05631] Avg episode reward: [(0, '4.830')] [2023-02-22 23:47:51,515][11402] Updated weights for policy 0, policy_version 760 (0.0023) [2023-02-22 23:47:53,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3121152. Throughput: 0: 883.5. Samples: 780928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:47:53,907][05631] Avg episode reward: [(0, '4.883')] [2023-02-22 23:47:58,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 3141632. Throughput: 0: 883.7. Samples: 784146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:47:58,907][05631] Avg episode reward: [(0, '4.331')] [2023-02-22 23:48:02,593][11402] Updated weights for policy 0, policy_version 770 (0.0023) [2023-02-22 23:48:03,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3153920. Throughput: 0: 847.1. Samples: 789062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:48:03,914][05631] Avg episode reward: [(0, '4.278')] [2023-02-22 23:48:08,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3170304. Throughput: 0: 849.1. Samples: 793128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:48:08,910][05631] Avg episode reward: [(0, '4.467')] [2023-02-22 23:48:13,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3190784. Throughput: 0: 867.5. Samples: 796028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:48:13,911][05631] Avg episode reward: [(0, '4.844')] [2023-02-22 23:48:14,624][11402] Updated weights for policy 0, policy_version 780 (0.0019) [2023-02-22 23:48:18,904][05631] Fps is (10 sec: 4096.1, 60 sec: 3482.0, 300 sec: 3485.1). Total num frames: 3211264. Throughput: 0: 880.3. Samples: 802384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:48:18,907][05631] Avg episode reward: [(0, '4.708')] [2023-02-22 23:48:23,904][05631] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3223552. Throughput: 0: 846.2. Samples: 807156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:48:23,912][05631] Avg episode reward: [(0, '4.558')] [2023-02-22 23:48:27,131][11402] Updated weights for policy 0, policy_version 790 (0.0021) [2023-02-22 23:48:28,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3239936. Throughput: 0: 844.6. Samples: 809130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:48:28,907][05631] Avg episode reward: [(0, '4.451')] [2023-02-22 23:48:33,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3256320. Throughput: 0: 870.0. Samples: 814236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:48:33,907][05631] Avg episode reward: [(0, '4.528')] [2023-02-22 23:48:37,962][11402] Updated weights for policy 0, policy_version 800 (0.0016) [2023-02-22 23:48:38,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3280896. Throughput: 0: 880.7. Samples: 820558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:48:38,910][05631] Avg episode reward: [(0, '4.724')] [2023-02-22 23:48:43,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3293184. Throughput: 0: 866.5. Samples: 823140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:48:43,907][05631] Avg episode reward: [(0, '4.668')] [2023-02-22 23:48:48,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3309568. Throughput: 0: 849.0. Samples: 827266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:48:48,910][05631] Avg episode reward: [(0, '4.877')] [2023-02-22 23:48:51,318][11402] Updated weights for policy 0, policy_version 810 (0.0019) [2023-02-22 23:48:53,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3325952. Throughput: 0: 873.2. Samples: 832422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:48:53,910][05631] Avg episode reward: [(0, '4.767')] [2023-02-22 23:48:58,908][05631] Fps is (10 sec: 3685.1, 60 sec: 3413.1, 300 sec: 3471.1). Total num frames: 3346432. Throughput: 0: 878.1. Samples: 835546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:48:58,910][05631] Avg episode reward: [(0, '4.604')] [2023-02-22 23:49:01,075][11402] Updated weights for policy 0, policy_version 820 (0.0034) [2023-02-22 23:49:03,910][05631] Fps is (10 sec: 3684.3, 60 sec: 3481.3, 300 sec: 3457.2). Total num frames: 3362816. Throughput: 0: 863.6. Samples: 841250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:49:03,913][05631] Avg episode reward: [(0, '4.576')] [2023-02-22 23:49:08,904][05631] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 3379200. Throughput: 0: 846.5. Samples: 845250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:08,906][05631] Avg episode reward: [(0, '4.498')] [2023-02-22 23:49:13,904][05631] Fps is (10 sec: 3278.7, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3395584. Throughput: 0: 849.2. Samples: 847342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:13,909][05631] Avg episode reward: [(0, '4.506')] [2023-02-22 23:49:14,827][11402] Updated weights for policy 0, policy_version 830 (0.0030) [2023-02-22 23:49:18,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3416064. Throughput: 0: 875.2. Samples: 853618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:49:18,906][05631] Avg episode reward: [(0, '4.693')] [2023-02-22 23:49:23,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3432448. Throughput: 0: 857.6. Samples: 859148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:49:23,909][05631] Avg episode reward: [(0, '4.597')] [2023-02-22 23:49:26,309][11402] Updated weights for policy 0, policy_version 840 (0.0017) [2023-02-22 23:49:28,904][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 3444736. Throughput: 0: 839.8. Samples: 860930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:28,910][05631] Avg episode reward: [(0, '4.558')] [2023-02-22 23:49:33,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3457024. Throughput: 0: 831.7. Samples: 864694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:33,907][05631] Avg episode reward: [(0, '4.570')] [2023-02-22 23:49:38,756][11402] Updated weights for policy 0, policy_version 850 (0.0015) [2023-02-22 23:49:38,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 3481600. Throughput: 0: 854.2. Samples: 870862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:49:38,907][05631] Avg episode reward: [(0, '4.601')] [2023-02-22 23:49:43,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3497984. Throughput: 0: 853.9. Samples: 873968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:43,912][05631] Avg episode reward: [(0, '4.768')] [2023-02-22 23:49:43,927][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth... [2023-02-22 23:49:44,062][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000651_2666496.pth [2023-02-22 23:49:48,906][05631] Fps is (10 sec: 2866.6, 60 sec: 3344.9, 300 sec: 3429.5). Total num frames: 3510272. Throughput: 0: 819.9. Samples: 878142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:49:48,918][05631] Avg episode reward: [(0, '4.804')] [2023-02-22 23:49:52,321][11402] Updated weights for policy 0, policy_version 860 (0.0027) [2023-02-22 23:49:53,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3526656. Throughput: 0: 822.6. Samples: 882266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:49:53,907][05631] Avg episode reward: [(0, '4.853')] [2023-02-22 23:49:58,904][05631] Fps is (10 sec: 3687.3, 60 sec: 3345.3, 300 sec: 3457.3). Total num frames: 3547136. Throughput: 0: 843.9. Samples: 885318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:49:58,910][05631] Avg episode reward: [(0, '5.042')] [2023-02-22 23:50:02,647][11402] Updated weights for policy 0, policy_version 870 (0.0026) [2023-02-22 23:50:03,907][05631] Fps is (10 sec: 4094.9, 60 sec: 3413.5, 300 sec: 3457.3). Total num frames: 3567616. Throughput: 0: 843.4. Samples: 891572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:03,913][05631] Avg episode reward: [(0, '5.002')] [2023-02-22 23:50:08,907][05631] Fps is (10 sec: 3275.9, 60 sec: 3344.9, 300 sec: 3429.5). Total num frames: 3579904. Throughput: 0: 811.4. Samples: 895662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:08,914][05631] Avg episode reward: [(0, '4.822')] [2023-02-22 23:50:13,904][05631] Fps is (10 sec: 2458.2, 60 sec: 3276.8, 300 sec: 3429.5). Total num frames: 3592192. Throughput: 0: 814.8. Samples: 897598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:50:13,912][05631] Avg episode reward: [(0, '4.515')] [2023-02-22 23:50:16,845][11402] Updated weights for policy 0, policy_version 880 (0.0021) [2023-02-22 23:50:18,904][05631] Fps is (10 sec: 3277.7, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 3612672. Throughput: 0: 842.8. Samples: 902618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:18,907][05631] Avg episode reward: [(0, '4.446')] [2023-02-22 23:50:23,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 3633152. Throughput: 0: 842.9. Samples: 908794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:50:23,910][05631] Avg episode reward: [(0, '4.407')] [2023-02-22 23:50:27,654][11402] Updated weights for policy 0, policy_version 890 (0.0018) [2023-02-22 23:50:28,905][05631] Fps is (10 sec: 3276.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3645440. Throughput: 0: 825.5. Samples: 911116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:50:28,915][05631] Avg episode reward: [(0, '4.415')] [2023-02-22 23:50:33,904][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3657728. Throughput: 0: 822.1. Samples: 915134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:50:33,909][05631] Avg episode reward: [(0, '4.589')] [2023-02-22 23:50:38,904][05631] Fps is (10 sec: 3277.0, 60 sec: 3276.8, 300 sec: 3429.5). Total num frames: 3678208. Throughput: 0: 848.9. Samples: 920466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:50:38,907][05631] Avg episode reward: [(0, '4.657')] [2023-02-22 23:50:40,411][11402] Updated weights for policy 0, policy_version 900 (0.0014) [2023-02-22 23:50:43,904][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3698688. Throughput: 0: 851.6. Samples: 923638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:43,910][05631] Avg episode reward: [(0, '4.567')] [2023-02-22 23:50:48,907][05631] Fps is (10 sec: 3685.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3715072. Throughput: 0: 835.2. Samples: 929158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:48,915][05631] Avg episode reward: [(0, '4.529')] [2023-02-22 23:50:52,614][11402] Updated weights for policy 0, policy_version 910 (0.0015) [2023-02-22 23:50:53,905][05631] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 3727360. Throughput: 0: 833.5. Samples: 933168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:50:53,909][05631] Avg episode reward: [(0, '4.433')] [2023-02-22 23:50:58,904][05631] Fps is (10 sec: 3277.6, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3747840. Throughput: 0: 837.4. Samples: 935280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:50:58,912][05631] Avg episode reward: [(0, '4.549')] [2023-02-22 23:51:03,904][05631] Fps is (10 sec: 3686.7, 60 sec: 3276.9, 300 sec: 3429.5). Total num frames: 3764224. Throughput: 0: 863.8. Samples: 941488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:51:03,913][05631] Avg episode reward: [(0, '4.351')] [2023-02-22 23:51:03,933][11402] Updated weights for policy 0, policy_version 920 (0.0028) [2023-02-22 23:51:08,904][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.5, 300 sec: 3415.6). Total num frames: 3784704. Throughput: 0: 847.2. Samples: 946918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:51:08,910][05631] Avg episode reward: [(0, '4.382')] [2023-02-22 23:51:13,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 3796992. Throughput: 0: 838.7. Samples: 948858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:51:13,907][05631] Avg episode reward: [(0, '4.390')] [2023-02-22 23:51:17,222][11402] Updated weights for policy 0, policy_version 930 (0.0019) [2023-02-22 23:51:18,904][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3813376. Throughput: 0: 846.1. Samples: 953210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:51:18,906][05631] Avg episode reward: [(0, '4.371')] [2023-02-22 23:51:23,904][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3833856. Throughput: 0: 867.4. Samples: 959498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:51:23,906][05631] Avg episode reward: [(0, '4.449')] [2023-02-22 23:51:27,444][11402] Updated weights for policy 0, policy_version 940 (0.0024) [2023-02-22 23:51:28,907][05631] Fps is (10 sec: 3685.2, 60 sec: 3413.2, 300 sec: 3401.7). Total num frames: 3850240. Throughput: 0: 864.0. Samples: 962520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:51:28,913][05631] Avg episode reward: [(0, '4.694')] [2023-02-22 23:51:33,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.9). Total num frames: 3866624. Throughput: 0: 831.0. Samples: 966550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:51:33,907][05631] Avg episode reward: [(0, '4.934')] [2023-02-22 23:51:38,904][05631] Fps is (10 sec: 2868.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 3878912. Throughput: 0: 836.6. Samples: 970816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:51:38,911][05631] Avg episode reward: [(0, '4.935')] [2023-02-22 23:51:41,154][11402] Updated weights for policy 0, policy_version 950 (0.0016) [2023-02-22 23:51:43,904][05631] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3415.6). Total num frames: 3899392. Throughput: 0: 857.5. Samples: 973868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:51:43,908][05631] Avg episode reward: [(0, '4.654')] [2023-02-22 23:51:43,921][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000952_3899392.pth... [2023-02-22 23:51:44,097][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000754_3088384.pth [2023-02-22 23:51:48,911][05631] Fps is (10 sec: 4093.0, 60 sec: 3413.1, 300 sec: 3401.7). Total num frames: 3919872. Throughput: 0: 854.8. Samples: 979962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:51:48,919][05631] Avg episode reward: [(0, '4.479')] [2023-02-22 23:51:52,951][11402] Updated weights for policy 0, policy_version 960 (0.0019) [2023-02-22 23:51:53,904][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 3932160. Throughput: 0: 823.4. Samples: 983970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:51:53,907][05631] Avg episode reward: [(0, '4.549')] [2023-02-22 23:51:58,904][05631] Fps is (10 sec: 2459.3, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 3944448. Throughput: 0: 821.8. Samples: 985838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:51:58,907][05631] Avg episode reward: [(0, '4.631')] [2023-02-22 23:52:03,904][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 3964928. Throughput: 0: 847.0. Samples: 991324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:52:03,907][05631] Avg episode reward: [(0, '4.554')] [2023-02-22 23:52:05,063][11402] Updated weights for policy 0, policy_version 970 (0.0025) [2023-02-22 23:52:08,904][05631] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 3985408. Throughput: 0: 847.3. Samples: 997626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:52:08,918][05631] Avg episode reward: [(0, '4.466')] [2023-02-22 23:52:13,904][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3388.0). Total num frames: 4001792. Throughput: 0: 823.7. Samples: 999584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:52:13,907][05631] Avg episode reward: [(0, '4.563')] [2023-02-22 23:52:15,232][11388] Stopping Batcher_0... [2023-02-22 23:52:15,232][11388] Loop batcher_evt_loop terminating... [2023-02-22 23:52:15,235][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 23:52:15,233][05631] Component Batcher_0 stopped! [2023-02-22 23:52:15,282][11402] Weights refcount: 2 0 [2023-02-22 23:52:15,288][11402] Stopping InferenceWorker_p0-w0... [2023-02-22 23:52:15,290][05631] Component InferenceWorker_p0-w0 stopped! [2023-02-22 23:52:15,299][11402] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 23:52:15,311][05631] Component RolloutWorker_w7 stopped! [2023-02-22 23:52:15,325][11406] Stopping RolloutWorker_w1... [2023-02-22 23:52:15,326][11406] Loop rollout_proc1_evt_loop terminating... [2023-02-22 23:52:15,325][05631] Component RolloutWorker_w3 stopped! [2023-02-22 23:52:15,330][05631] Component RolloutWorker_w1 stopped! [2023-02-22 23:52:15,311][11412] Stopping RolloutWorker_w7... [2023-02-22 23:52:15,332][11412] Loop rollout_proc7_evt_loop terminating... [2023-02-22 23:52:15,320][11409] Stopping RolloutWorker_w3... [2023-02-22 23:52:15,336][11409] Loop rollout_proc3_evt_loop terminating... [2023-02-22 23:52:15,348][11410] Stopping RolloutWorker_w5... [2023-02-22 23:52:15,349][05631] Component RolloutWorker_w5 stopped! [2023-02-22 23:52:15,349][11410] Loop rollout_proc5_evt_loop terminating... [2023-02-22 23:52:15,426][05631] Component RolloutWorker_w6 stopped! [2023-02-22 23:52:15,433][11411] Stopping RolloutWorker_w6... [2023-02-22 23:52:15,437][11388] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth [2023-02-22 23:52:15,449][11388] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 23:52:15,461][05631] Component RolloutWorker_w4 stopped! [2023-02-22 23:52:15,463][11408] Stopping RolloutWorker_w4... [2023-02-22 23:52:15,464][11408] Loop rollout_proc4_evt_loop terminating... [2023-02-22 23:52:15,434][11411] Loop rollout_proc6_evt_loop terminating... [2023-02-22 23:52:15,484][05631] Component RolloutWorker_w2 stopped! [2023-02-22 23:52:15,486][11407] Stopping RolloutWorker_w2... [2023-02-22 23:52:15,487][11407] Loop rollout_proc2_evt_loop terminating... [2023-02-22 23:52:15,495][05631] Component RolloutWorker_w0 stopped! [2023-02-22 23:52:15,501][11403] Stopping RolloutWorker_w0... [2023-02-22 23:52:15,502][11403] Loop rollout_proc0_evt_loop terminating... [2023-02-22 23:52:15,806][05631] Component LearnerWorker_p0 stopped! [2023-02-22 23:52:15,809][05631] Waiting for process learner_proc0 to stop... [2023-02-22 23:52:15,811][11388] Stopping LearnerWorker_p0... [2023-02-22 23:52:15,812][11388] Loop learner_proc0_evt_loop terminating... [2023-02-22 23:52:18,154][05631] Waiting for process inference_proc0-0 to join... [2023-02-22 23:52:18,644][05631] Waiting for process rollout_proc0 to join... [2023-02-22 23:52:19,367][05631] Waiting for process rollout_proc1 to join... [2023-02-22 23:52:19,369][05631] Waiting for process rollout_proc2 to join... [2023-02-22 23:52:19,373][05631] Waiting for process rollout_proc3 to join... [2023-02-22 23:52:19,376][05631] Waiting for process rollout_proc4 to join... [2023-02-22 23:52:19,379][05631] Waiting for process rollout_proc5 to join... [2023-02-22 23:52:19,381][05631] Waiting for process rollout_proc6 to join... [2023-02-22 23:52:19,387][05631] Waiting for process rollout_proc7 to join... [2023-02-22 23:52:19,389][05631] Batcher 0 profile tree view: batching: 27.4365, releasing_batches: 0.0255 [2023-02-22 23:52:19,392][05631] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0010 wait_policy_total: 587.6101 update_model: 8.2518 weight_update: 0.0021 one_step: 0.0078 handle_policy_step: 559.1051 deserialize: 16.1116, stack: 3.1038, obs_to_device_normalize: 120.2302, forward: 276.0293, send_messages: 27.5800 prepare_outputs: 88.2934 to_cpu: 54.9625 [2023-02-22 23:52:19,394][05631] Learner 0 profile tree view: misc: 0.0066, prepare_batch: 17.7100 train: 78.4534 epoch_init: 0.0058, minibatch_init: 0.0336, losses_postprocess: 0.5964, kl_divergence: 0.6001, after_optimizer: 33.1416 calculate_losses: 28.4165 losses_init: 0.0044, forward_head: 1.8369, bptt_initial: 18.7620, tail: 1.2395, advantages_returns: 0.2741, losses: 3.4618 bptt: 2.4634 bptt_forward_core: 2.3453 update: 14.9834 clip: 1.4365 [2023-02-22 23:52:19,397][05631] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2966, enqueue_policy_requests: 164.4713, env_step: 895.9777, overhead: 24.4791, complete_rollouts: 7.8818 save_policy_outputs: 22.8584 split_output_tensors: 11.0051 [2023-02-22 23:52:19,399][05631] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3409, enqueue_policy_requests: 166.1919, env_step: 894.0748, overhead: 24.5370, complete_rollouts: 7.6446 save_policy_outputs: 22.6258 split_output_tensors: 10.7994 [2023-02-22 23:52:19,404][05631] Loop Runner_EvtLoop terminating... [2023-02-22 23:52:19,406][05631] Runner profile tree view: main_loop: 1230.3498 [2023-02-22 23:52:19,408][05631] Collected {0: 4005888}, FPS: 3255.9 [2023-02-22 23:52:19,456][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 23:52:19,457][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 23:52:19,458][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 23:52:19,460][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 23:52:19,461][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 23:52:19,462][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 23:52:19,463][05631] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 23:52:19,465][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 23:52:19,466][05631] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 23:52:19,467][05631] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 23:52:19,468][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 23:52:19,469][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 23:52:19,471][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 23:52:19,472][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 23:52:19,473][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 23:52:19,503][05631] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:52:19,507][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:52:19,510][05631] RunningMeanStd input shape: (1,) [2023-02-22 23:52:19,529][05631] ConvEncoder: input_channels=3 [2023-02-22 23:52:20,207][05631] Conv encoder output size: 512 [2023-02-22 23:52:20,209][05631] Policy head output size: 512 [2023-02-22 23:52:22,551][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 23:52:23,838][05631] Num frames 100... [2023-02-22 23:52:23,950][05631] Num frames 200... [2023-02-22 23:52:24,064][05631] Num frames 300... [2023-02-22 23:52:24,213][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:24,215][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:24,239][05631] Num frames 400... [2023-02-22 23:52:24,352][05631] Num frames 500... [2023-02-22 23:52:24,466][05631] Num frames 600... [2023-02-22 23:52:24,581][05631] Num frames 700... [2023-02-22 23:52:24,714][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:24,716][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:24,759][05631] Num frames 800... [2023-02-22 23:52:24,889][05631] Num frames 900... [2023-02-22 23:52:25,004][05631] Num frames 1000... [2023-02-22 23:52:25,126][05631] Num frames 1100... [2023-02-22 23:52:25,239][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:25,242][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:25,301][05631] Num frames 1200... [2023-02-22 23:52:25,423][05631] Num frames 1300... [2023-02-22 23:52:25,542][05631] Num frames 1400... [2023-02-22 23:52:25,673][05631] Num frames 1500... [2023-02-22 23:52:25,775][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:25,777][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:25,876][05631] Num frames 1600... [2023-02-22 23:52:25,993][05631] Num frames 1700... [2023-02-22 23:52:26,108][05631] Num frames 1800... [2023-02-22 23:52:26,224][05631] Num frames 1900... [2023-02-22 23:52:26,309][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:26,312][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:26,415][05631] Num frames 2000... [2023-02-22 23:52:26,545][05631] Num frames 2100... [2023-02-22 23:52:26,667][05631] Num frames 2200... [2023-02-22 23:52:26,789][05631] Num frames 2300... [2023-02-22 23:52:26,886][05631] Avg episode rewards: #0: 4.060, true rewards: #0: 3.893 [2023-02-22 23:52:26,888][05631] Avg episode reward: 4.060, avg true_objective: 3.893 [2023-02-22 23:52:26,969][05631] Num frames 2400... [2023-02-22 23:52:27,088][05631] Num frames 2500... [2023-02-22 23:52:27,216][05631] Num frames 2600... [2023-02-22 23:52:27,331][05631] Num frames 2700... [2023-02-22 23:52:27,409][05631] Avg episode rewards: #0: 4.029, true rewards: #0: 3.886 [2023-02-22 23:52:27,411][05631] Avg episode reward: 4.029, avg true_objective: 3.886 [2023-02-22 23:52:27,509][05631] Num frames 2800... [2023-02-22 23:52:27,626][05631] Num frames 2900... [2023-02-22 23:52:27,739][05631] Num frames 3000... [2023-02-22 23:52:27,859][05631] Num frames 3100... [2023-02-22 23:52:27,996][05631] Avg episode rewards: #0: 4.210, true rewards: #0: 3.960 [2023-02-22 23:52:27,998][05631] Avg episode reward: 4.210, avg true_objective: 3.960 [2023-02-22 23:52:28,037][05631] Num frames 3200... [2023-02-22 23:52:28,158][05631] Num frames 3300... [2023-02-22 23:52:28,270][05631] Num frames 3400... [2023-02-22 23:52:28,384][05631] Num frames 3500... [2023-02-22 23:52:28,506][05631] Num frames 3600... [2023-02-22 23:52:28,627][05631] Num frames 3700... [2023-02-22 23:52:28,697][05631] Avg episode rewards: #0: 4.569, true rewards: #0: 4.124 [2023-02-22 23:52:28,699][05631] Avg episode reward: 4.569, avg true_objective: 4.124 [2023-02-22 23:52:28,801][05631] Num frames 3800... [2023-02-22 23:52:28,926][05631] Num frames 3900... [2023-02-22 23:52:29,037][05631] Num frames 4000... [2023-02-22 23:52:29,190][05631] Num frames 4100... [2023-02-22 23:52:29,359][05631] Num frames 4200... [2023-02-22 23:52:29,457][05631] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 [2023-02-22 23:52:29,459][05631] Avg episode reward: 4.824, avg true_objective: 4.224 [2023-02-22 23:52:51,161][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 23:52:51,297][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 23:52:51,299][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 23:52:51,301][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 23:52:51,303][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 23:52:51,305][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 23:52:51,307][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 23:52:51,309][05631] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-22 23:52:51,310][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 23:52:51,311][05631] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-22 23:52:51,312][05631] Adding new argument 'hf_repository'='pittawat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-22 23:52:51,314][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 23:52:51,315][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 23:52:51,316][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 23:52:51,317][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 23:52:51,318][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 23:52:51,337][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:52:51,339][05631] RunningMeanStd input shape: (1,) [2023-02-22 23:52:51,357][05631] ConvEncoder: input_channels=3 [2023-02-22 23:52:51,435][05631] Conv encoder output size: 512 [2023-02-22 23:52:51,438][05631] Policy head output size: 512 [2023-02-22 23:52:51,468][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 23:52:52,136][05631] Num frames 100... [2023-02-22 23:52:52,298][05631] Num frames 200... [2023-02-22 23:52:52,460][05631] Num frames 300... [2023-02-22 23:52:52,653][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-22 23:52:52,656][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-22 23:52:52,684][05631] Num frames 400... [2023-02-22 23:52:52,839][05631] Num frames 500... [2023-02-22 23:52:52,990][05631] Num frames 600... [2023-02-22 23:52:53,100][05631] Num frames 700... [2023-02-22 23:52:53,224][05631] Num frames 800... [2023-02-22 23:52:53,315][05631] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2023-02-22 23:52:53,316][05631] Avg episode reward: 4.660, avg true_objective: 4.160 [2023-02-22 23:52:53,406][05631] Num frames 900... [2023-02-22 23:52:53,524][05631] Num frames 1000... [2023-02-22 23:52:53,654][05631] Num frames 1100... [2023-02-22 23:52:53,776][05631] Num frames 1200... [2023-02-22 23:52:53,861][05631] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2023-02-22 23:52:53,864][05631] Avg episode reward: 4.387, avg true_objective: 4.053 [2023-02-22 23:52:53,963][05631] Num frames 1300... [2023-02-22 23:52:54,081][05631] Num frames 1400... [2023-02-22 23:52:54,204][05631] Num frames 1500... [2023-02-22 23:52:54,338][05631] Num frames 1600... [2023-02-22 23:52:54,390][05631] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2023-02-22 23:52:54,392][05631] Avg episode reward: 4.250, avg true_objective: 4.000 [2023-02-22 23:52:54,512][05631] Num frames 1700... [2023-02-22 23:52:54,627][05631] Num frames 1800... [2023-02-22 23:52:54,747][05631] Num frames 1900... [2023-02-22 23:52:54,895][05631] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 [2023-02-22 23:52:54,897][05631] Avg episode reward: 4.168, avg true_objective: 3.968 [2023-02-22 23:52:54,921][05631] Num frames 2000... [2023-02-22 23:52:55,042][05631] Num frames 2100... [2023-02-22 23:52:55,164][05631] Num frames 2200... [2023-02-22 23:52:55,290][05631] Num frames 2300... [2023-02-22 23:52:55,414][05631] Num frames 2400... [2023-02-22 23:52:55,507][05631] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2023-02-22 23:52:55,509][05631] Avg episode reward: 4.387, avg true_objective: 4.053 [2023-02-22 23:52:55,596][05631] Num frames 2500... [2023-02-22 23:52:55,727][05631] Num frames 2600... [2023-02-22 23:52:55,858][05631] Num frames 2700... [2023-02-22 23:52:55,984][05631] Num frames 2800... [2023-02-22 23:52:56,100][05631] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 [2023-02-22 23:52:56,101][05631] Avg episode reward: 4.497, avg true_objective: 4.069 [2023-02-22 23:52:56,175][05631] Num frames 2900... [2023-02-22 23:52:56,313][05631] Num frames 3000... [2023-02-22 23:52:56,446][05631] Num frames 3100... [2023-02-22 23:52:56,570][05631] Num frames 3200... [2023-02-22 23:52:56,623][05631] Avg episode rewards: #0: 4.625, true rewards: #0: 4.000 [2023-02-22 23:52:56,624][05631] Avg episode reward: 4.625, avg true_objective: 4.000 [2023-02-22 23:52:56,757][05631] Num frames 3300... [2023-02-22 23:52:56,876][05631] Num frames 3400... [2023-02-22 23:52:57,002][05631] Num frames 3500... [2023-02-22 23:52:57,125][05631] Num frames 3600... [2023-02-22 23:52:57,236][05631] Avg episode rewards: #0: 4.720, true rewards: #0: 4.053 [2023-02-22 23:52:57,239][05631] Avg episode reward: 4.720, avg true_objective: 4.053 [2023-02-22 23:52:57,337][05631] Num frames 3700... [2023-02-22 23:52:57,498][05631] Num frames 3800... [2023-02-22 23:52:57,664][05631] Num frames 3900... [2023-02-22 23:52:57,831][05631] Num frames 4000... [2023-02-22 23:52:57,939][05631] Avg episode rewards: #0: 4.632, true rewards: #0: 4.032 [2023-02-22 23:52:57,942][05631] Avg episode reward: 4.632, avg true_objective: 4.032 [2023-02-22 23:53:18,579][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 23:53:24,477][05631] The model has been pushed to https://huggingface.co/pittawat/rl_course_vizdoom_health_gathering_supreme [2023-02-22 23:56:04,373][05631] Environment doom_basic already registered, overwriting... [2023-02-22 23:56:04,375][05631] Environment doom_two_colors_easy already registered, overwriting... [2023-02-22 23:56:04,377][05631] Environment doom_two_colors_hard already registered, overwriting... [2023-02-22 23:56:04,379][05631] Environment doom_dm already registered, overwriting... [2023-02-22 23:56:04,386][05631] Environment doom_dwango5 already registered, overwriting... [2023-02-22 23:56:04,394][05631] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-22 23:56:04,403][05631] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-22 23:56:04,404][05631] Environment doom_my_way_home already registered, overwriting... [2023-02-22 23:56:04,405][05631] Environment doom_deadly_corridor already registered, overwriting... [2023-02-22 23:56:04,406][05631] Environment doom_defend_the_center already registered, overwriting... [2023-02-22 23:56:04,423][05631] Environment doom_defend_the_line already registered, overwriting... [2023-02-22 23:56:04,428][05631] Environment doom_health_gathering already registered, overwriting... [2023-02-22 23:56:04,436][05631] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-22 23:56:04,438][05631] Environment doom_battle already registered, overwriting... [2023-02-22 23:56:04,441][05631] Environment doom_battle2 already registered, overwriting... [2023-02-22 23:56:04,445][05631] Environment doom_duel_bots already registered, overwriting... [2023-02-22 23:56:04,451][05631] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-22 23:56:04,455][05631] Environment doom_duel already registered, overwriting... [2023-02-22 23:56:04,463][05631] Environment doom_deathmatch_full already registered, overwriting... [2023-02-22 23:56:04,470][05631] Environment doom_benchmark already registered, overwriting... [2023-02-22 23:56:04,473][05631] register_encoder_factory: [2023-02-22 23:56:04,505][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 23:56:04,518][05631] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2023-02-22 23:56:04,524][05631] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-22 23:56:04,528][05631] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-22 23:56:04,536][05631] Weights and Biases integration disabled [2023-02-22 23:56:04,550][05631] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-22 23:56:06,759][05631] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-22 23:56:06,762][05631] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 23:56:06,767][05631] Rollout worker 0 uses device cpu [2023-02-22 23:56:06,769][05631] Rollout worker 1 uses device cpu [2023-02-22 23:56:06,775][05631] Rollout worker 2 uses device cpu [2023-02-22 23:56:06,776][05631] Rollout worker 3 uses device cpu [2023-02-22 23:56:06,777][05631] Rollout worker 4 uses device cpu [2023-02-22 23:56:06,778][05631] Rollout worker 5 uses device cpu [2023-02-22 23:56:06,779][05631] Rollout worker 6 uses device cpu [2023-02-22 23:56:06,781][05631] Rollout worker 7 uses device cpu [2023-02-22 23:56:06,927][05631] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:56:06,932][05631] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 23:56:06,975][05631] Starting all processes... [2023-02-22 23:56:06,979][05631] Starting process learner_proc0 [2023-02-22 23:56:07,170][05631] Starting all processes... [2023-02-22 23:56:07,183][05631] Starting process inference_proc0-0 [2023-02-22 23:56:07,299][05631] Starting process rollout_proc0 [2023-02-22 23:56:07,304][05631] Starting process rollout_proc1 [2023-02-22 23:56:07,304][05631] Starting process rollout_proc2 [2023-02-22 23:56:07,304][05631] Starting process rollout_proc3 [2023-02-22 23:56:07,304][05631] Starting process rollout_proc4 [2023-02-22 23:56:07,304][05631] Starting process rollout_proc5 [2023-02-22 23:56:07,305][05631] Starting process rollout_proc6 [2023-02-22 23:56:07,305][05631] Starting process rollout_proc7 [2023-02-22 23:56:17,924][20332] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:56:17,924][20332] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 23:56:17,957][20332] Num visible devices: 1 [2023-02-22 23:56:17,986][20332] Starting seed is not provided [2023-02-22 23:56:17,987][20332] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:56:17,988][20332] Initializing actor-critic model on device cuda:0 [2023-02-22 23:56:17,989][20332] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:56:17,990][20332] RunningMeanStd input shape: (1,) [2023-02-22 23:56:18,029][20332] ConvEncoder: input_channels=3 [2023-02-22 23:56:18,746][20346] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:56:18,748][20346] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 23:56:18,818][20346] Num visible devices: 1 [2023-02-22 23:56:18,948][20348] Worker 2 uses CPU cores [0] [2023-02-22 23:56:19,060][20332] Conv encoder output size: 512 [2023-02-22 23:56:19,064][20332] Policy head output size: 512 [2023-02-22 23:56:19,187][20332] Created Actor Critic model with architecture: [2023-02-22 23:56:19,191][20332] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 23:56:19,351][20350] Worker 0 uses CPU cores [0] [2023-02-22 23:56:19,498][20347] Worker 1 uses CPU cores [1] [2023-02-22 23:56:19,976][20357] Worker 3 uses CPU cores [1] [2023-02-22 23:56:20,061][20359] Worker 4 uses CPU cores [0] [2023-02-22 23:56:20,178][20369] Worker 6 uses CPU cores [0] [2023-02-22 23:56:20,209][20361] Worker 5 uses CPU cores [1] [2023-02-22 23:56:20,256][20367] Worker 7 uses CPU cores [1] [2023-02-22 23:56:23,348][20332] Using optimizer [2023-02-22 23:56:23,350][20332] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 23:56:23,386][20332] Loading model from checkpoint [2023-02-22 23:56:23,392][20332] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-02-22 23:56:23,393][20332] Initialized policy 0 weights for model version 978 [2023-02-22 23:56:23,403][20332] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 23:56:23,412][20332] LearnerWorker_p0 finished initialization! [2023-02-22 23:56:23,621][20346] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 23:56:23,623][20346] RunningMeanStd input shape: (1,) [2023-02-22 23:56:23,643][20346] ConvEncoder: input_channels=3 [2023-02-22 23:56:23,806][20346] Conv encoder output size: 512 [2023-02-22 23:56:23,807][20346] Policy head output size: 512 [2023-02-22 23:56:24,550][05631] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:56:26,645][05631] Inference worker 0-0 is ready! [2023-02-22 23:56:26,648][05631] All inference workers are ready! Signal rollout workers to start! [2023-02-22 23:56:26,777][20350] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,777][20359] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,787][20348] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,789][20369] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,798][20361] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,813][20357] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,835][20347] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,840][20367] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 23:56:26,917][05631] Heartbeat connected on Batcher_0 [2023-02-22 23:56:26,925][05631] Heartbeat connected on LearnerWorker_p0 [2023-02-22 23:56:26,971][05631] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 23:56:27,680][20367] Decorrelating experience for 0 frames... [2023-02-22 23:56:27,684][20347] Decorrelating experience for 0 frames... [2023-02-22 23:56:28,022][20347] Decorrelating experience for 32 frames... [2023-02-22 23:56:28,222][20348] Decorrelating experience for 0 frames... [2023-02-22 23:56:28,225][20359] Decorrelating experience for 0 frames... [2023-02-22 23:56:28,227][20350] Decorrelating experience for 0 frames... [2023-02-22 23:56:28,231][20369] Decorrelating experience for 0 frames... [2023-02-22 23:56:29,301][20357] Decorrelating experience for 0 frames... [2023-02-22 23:56:29,343][20347] Decorrelating experience for 64 frames... [2023-02-22 23:56:29,363][20367] Decorrelating experience for 32 frames... [2023-02-22 23:56:29,550][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:56:29,633][20350] Decorrelating experience for 32 frames... [2023-02-22 23:56:29,642][20359] Decorrelating experience for 32 frames... [2023-02-22 23:56:29,645][20348] Decorrelating experience for 32 frames... [2023-02-22 23:56:29,662][20369] Decorrelating experience for 32 frames... [2023-02-22 23:56:30,434][20361] Decorrelating experience for 0 frames... [2023-02-22 23:56:30,678][20350] Decorrelating experience for 64 frames... [2023-02-22 23:56:30,697][20369] Decorrelating experience for 64 frames... [2023-02-22 23:56:30,880][20357] Decorrelating experience for 32 frames... [2023-02-22 23:56:30,957][20367] Decorrelating experience for 64 frames... [2023-02-22 23:56:31,425][20359] Decorrelating experience for 64 frames... [2023-02-22 23:56:31,499][20350] Decorrelating experience for 96 frames... [2023-02-22 23:56:31,668][05631] Heartbeat connected on RolloutWorker_w0 [2023-02-22 23:56:31,916][20347] Decorrelating experience for 96 frames... [2023-02-22 23:56:31,928][20361] Decorrelating experience for 32 frames... [2023-02-22 23:56:32,222][05631] Heartbeat connected on RolloutWorker_w1 [2023-02-22 23:56:32,578][20367] Decorrelating experience for 96 frames... [2023-02-22 23:56:32,634][20357] Decorrelating experience for 64 frames... [2023-02-22 23:56:32,739][20369] Decorrelating experience for 96 frames... [2023-02-22 23:56:32,874][05631] Heartbeat connected on RolloutWorker_w7 [2023-02-22 23:56:33,067][05631] Heartbeat connected on RolloutWorker_w6 [2023-02-22 23:56:33,391][20361] Decorrelating experience for 64 frames... [2023-02-22 23:56:34,188][20357] Decorrelating experience for 96 frames... [2023-02-22 23:56:34,550][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 2.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:56:34,553][05631] Avg episode reward: [(0, '0.800')] [2023-02-22 23:56:34,668][05631] Heartbeat connected on RolloutWorker_w3 [2023-02-22 23:56:35,037][20359] Decorrelating experience for 96 frames... [2023-02-22 23:56:35,360][20348] Decorrelating experience for 64 frames... [2023-02-22 23:56:35,692][05631] Heartbeat connected on RolloutWorker_w4 [2023-02-22 23:56:37,603][20361] Decorrelating experience for 96 frames... [2023-02-22 23:56:38,355][05631] Heartbeat connected on RolloutWorker_w5 [2023-02-22 23:56:39,226][20332] Signal inference workers to stop experience collection... [2023-02-22 23:56:39,233][20346] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 23:56:39,550][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 153.6. Samples: 2304. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 23:56:39,571][05631] Avg episode reward: [(0, '3.198')] [2023-02-22 23:56:39,689][20348] Decorrelating experience for 96 frames... [2023-02-22 23:56:39,812][05631] Heartbeat connected on RolloutWorker_w2 [2023-02-22 23:56:41,256][20332] Signal inference workers to resume experience collection... [2023-02-22 23:56:41,258][20346] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 23:56:44,550][05631] Fps is (10 sec: 1228.8, 60 sec: 614.4, 300 sec: 614.4). Total num frames: 4018176. Throughput: 0: 135.7. Samples: 2714. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-22 23:56:44,554][05631] Avg episode reward: [(0, '3.559')] [2023-02-22 23:56:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 4034560. Throughput: 0: 263.9. Samples: 6598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:56:49,553][05631] Avg episode reward: [(0, '4.147')] [2023-02-22 23:56:51,767][20346] Updated weights for policy 0, policy_version 988 (0.0018) [2023-02-22 23:56:54,550][05631] Fps is (10 sec: 3686.5, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4055040. Throughput: 0: 426.9. Samples: 12808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:56:54,553][05631] Avg episode reward: [(0, '4.484')] [2023-02-22 23:56:59,556][05631] Fps is (10 sec: 3684.3, 60 sec: 1872.2, 300 sec: 1872.2). Total num frames: 4071424. Throughput: 0: 439.5. Samples: 15384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:56:59,565][05631] Avg episode reward: [(0, '4.521')] [2023-02-22 23:57:04,552][05631] Fps is (10 sec: 2866.6, 60 sec: 1945.5, 300 sec: 1945.5). Total num frames: 4083712. Throughput: 0: 481.0. Samples: 19240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:04,556][05631] Avg episode reward: [(0, '4.622')] [2023-02-22 23:57:05,519][20346] Updated weights for policy 0, policy_version 998 (0.0019) [2023-02-22 23:57:09,550][05631] Fps is (10 sec: 2868.8, 60 sec: 2093.5, 300 sec: 2093.5). Total num frames: 4100096. Throughput: 0: 529.8. Samples: 23840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:57:09,553][05631] Avg episode reward: [(0, '4.583')] [2023-02-22 23:57:14,550][05631] Fps is (10 sec: 3687.2, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 4120576. Throughput: 0: 593.8. Samples: 26722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:14,555][05631] Avg episode reward: [(0, '4.683')] [2023-02-22 23:57:16,081][20346] Updated weights for policy 0, policy_version 1008 (0.0016) [2023-02-22 23:57:19,550][05631] Fps is (10 sec: 3686.3, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 4136960. Throughput: 0: 727.4. Samples: 32754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:57:19,559][05631] Avg episode reward: [(0, '4.449')] [2023-02-22 23:57:24,550][05631] Fps is (10 sec: 2867.2, 60 sec: 2389.3, 300 sec: 2389.3). Total num frames: 4149248. Throughput: 0: 763.0. Samples: 36638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:57:24,557][05631] Avg episode reward: [(0, '4.424')] [2023-02-22 23:57:29,550][05631] Fps is (10 sec: 2867.3, 60 sec: 2662.4, 300 sec: 2457.6). Total num frames: 4165632. Throughput: 0: 798.6. Samples: 38652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:29,553][05631] Avg episode reward: [(0, '4.537')] [2023-02-22 23:57:29,954][20346] Updated weights for policy 0, policy_version 1018 (0.0033) [2023-02-22 23:57:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2574.6). Total num frames: 4186112. Throughput: 0: 840.4. Samples: 44416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 23:57:34,553][05631] Avg episode reward: [(0, '4.629')] [2023-02-22 23:57:39,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 2676.1). Total num frames: 4206592. Throughput: 0: 832.5. Samples: 50272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:39,558][05631] Avg episode reward: [(0, '4.695')] [2023-02-22 23:57:41,200][20346] Updated weights for policy 0, policy_version 1028 (0.0020) [2023-02-22 23:57:44,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2662.4). Total num frames: 4218880. Throughput: 0: 819.3. Samples: 52248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:44,555][05631] Avg episode reward: [(0, '4.754')] [2023-02-22 23:57:49,553][05631] Fps is (10 sec: 2456.8, 60 sec: 3276.6, 300 sec: 2650.3). Total num frames: 4231168. Throughput: 0: 823.9. Samples: 56316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:49,563][05631] Avg episode reward: [(0, '4.916')] [2023-02-22 23:57:53,445][20346] Updated weights for policy 0, policy_version 1038 (0.0013) [2023-02-22 23:57:54,550][05631] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 2776.2). Total num frames: 4255744. Throughput: 0: 856.3. Samples: 62374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:54,553][05631] Avg episode reward: [(0, '4.714')] [2023-02-22 23:57:59,550][05631] Fps is (10 sec: 4097.3, 60 sec: 3345.4, 300 sec: 2802.5). Total num frames: 4272128. Throughput: 0: 863.5. Samples: 65578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:57:59,553][05631] Avg episode reward: [(0, '4.606')] [2023-02-22 23:58:04,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.5, 300 sec: 2826.2). Total num frames: 4288512. Throughput: 0: 833.8. Samples: 70274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:58:04,556][05631] Avg episode reward: [(0, '4.606')] [2023-02-22 23:58:04,572][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001047_4288512.pth... [2023-02-22 23:58:04,790][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000952_3899392.pth [2023-02-22 23:58:05,866][20346] Updated weights for policy 0, policy_version 1048 (0.0024) [2023-02-22 23:58:09,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2808.7). Total num frames: 4300800. Throughput: 0: 834.4. Samples: 74188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:58:09,552][05631] Avg episode reward: [(0, '4.619')] [2023-02-22 23:58:14,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 4321280. Throughput: 0: 852.8. Samples: 77028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:58:14,558][05631] Avg episode reward: [(0, '4.619')] [2023-02-22 23:58:16,727][20346] Updated weights for policy 0, policy_version 1058 (0.0025) [2023-02-22 23:58:19,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 2920.6). Total num frames: 4341760. Throughput: 0: 867.6. Samples: 83456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:58:19,553][05631] Avg episode reward: [(0, '4.566')] [2023-02-22 23:58:24,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2935.5). Total num frames: 4358144. Throughput: 0: 842.4. Samples: 88182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:58:24,557][05631] Avg episode reward: [(0, '4.674')] [2023-02-22 23:58:29,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2916.4). Total num frames: 4370432. Throughput: 0: 843.0. Samples: 90182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:58:29,555][05631] Avg episode reward: [(0, '4.674')] [2023-02-22 23:58:30,278][20346] Updated weights for policy 0, policy_version 1068 (0.0012) [2023-02-22 23:58:34,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2961.7). Total num frames: 4390912. Throughput: 0: 859.4. Samples: 94988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 23:58:34,556][05631] Avg episode reward: [(0, '4.850')] [2023-02-22 23:58:39,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3003.7). Total num frames: 4411392. Throughput: 0: 864.6. Samples: 101282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 23:58:39,561][05631] Avg episode reward: [(0, '4.769')] [2023-02-22 23:58:40,365][20346] Updated weights for policy 0, policy_version 1078 (0.0012) [2023-02-22 23:58:44,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2984.2). Total num frames: 4423680. Throughput: 0: 854.4. Samples: 104028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:58:44,565][05631] Avg episode reward: [(0, '4.715')] [2023-02-22 23:58:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 2994.3). Total num frames: 4440064. Throughput: 0: 839.8. Samples: 108064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:58:49,559][05631] Avg episode reward: [(0, '4.653')] [2023-02-22 23:58:53,834][20346] Updated weights for policy 0, policy_version 1088 (0.0017) [2023-02-22 23:58:54,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3003.7). Total num frames: 4456448. Throughput: 0: 863.9. Samples: 113062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:58:54,555][05631] Avg episode reward: [(0, '4.650')] [2023-02-22 23:58:59,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3065.4). Total num frames: 4481024. Throughput: 0: 871.6. Samples: 116252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:58:59,558][05631] Avg episode reward: [(0, '4.682')] [2023-02-22 23:59:04,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3046.4). Total num frames: 4493312. Throughput: 0: 857.3. Samples: 122034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:59:04,555][05631] Avg episode reward: [(0, '4.670')] [2023-02-22 23:59:04,645][20346] Updated weights for policy 0, policy_version 1098 (0.0022) [2023-02-22 23:59:09,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3028.6). Total num frames: 4505600. Throughput: 0: 839.5. Samples: 125958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 23:59:09,556][05631] Avg episode reward: [(0, '4.845')] [2023-02-22 23:59:14,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3035.9). Total num frames: 4521984. Throughput: 0: 838.8. Samples: 127930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 23:59:14,556][05631] Avg episode reward: [(0, '4.713')] [2023-02-22 23:59:17,554][20346] Updated weights for policy 0, policy_version 1108 (0.0032) [2023-02-22 23:59:19,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3089.6). Total num frames: 4546560. Throughput: 0: 864.8. Samples: 133904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 23:59:19,559][05631] Avg episode reward: [(0, '4.482')] [2023-02-22 23:59:24,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3094.8). Total num frames: 4562944. Throughput: 0: 852.2. Samples: 139630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:59:24,558][05631] Avg episode reward: [(0, '4.616')] [2023-02-22 23:59:29,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3077.5). Total num frames: 4575232. Throughput: 0: 830.9. Samples: 141418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:59:29,557][05631] Avg episode reward: [(0, '4.603')] [2023-02-22 23:59:30,861][20346] Updated weights for policy 0, policy_version 1118 (0.0027) [2023-02-22 23:59:34,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3061.2). Total num frames: 4587520. Throughput: 0: 821.4. Samples: 145026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 23:59:34,552][05631] Avg episode reward: [(0, '4.675')] [2023-02-22 23:59:39,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3108.8). Total num frames: 4612096. Throughput: 0: 846.0. Samples: 151134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:59:39,554][05631] Avg episode reward: [(0, '4.533')] [2023-02-22 23:59:41,452][20346] Updated weights for policy 0, policy_version 1128 (0.0012) [2023-02-22 23:59:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3113.0). Total num frames: 4628480. Throughput: 0: 844.3. Samples: 154244. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:59:44,558][05631] Avg episode reward: [(0, '4.554')] [2023-02-22 23:59:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3097.0). Total num frames: 4640768. Throughput: 0: 815.2. Samples: 158716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 23:59:49,552][05631] Avg episode reward: [(0, '4.432')] [2023-02-22 23:59:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3101.3). Total num frames: 4657152. Throughput: 0: 816.3. Samples: 162692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 23:59:54,552][05631] Avg episode reward: [(0, '4.620')] [2023-02-22 23:59:55,245][20346] Updated weights for policy 0, policy_version 1138 (0.0019) [2023-02-22 23:59:59,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3124.4). Total num frames: 4677632. Throughput: 0: 842.7. Samples: 165852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 23:59:59,557][05631] Avg episode reward: [(0, '4.670')] [2023-02-23 00:00:04,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3146.5). Total num frames: 4698112. Throughput: 0: 849.0. Samples: 172108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:00:04,553][05631] Avg episode reward: [(0, '4.743')] [2023-02-23 00:00:04,579][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001147_4698112.pth... [2023-02-23 00:00:04,774][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-02-23 00:00:05,895][20346] Updated weights for policy 0, policy_version 1148 (0.0022) [2023-02-23 00:00:09,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3131.2). Total num frames: 4710400. Throughput: 0: 816.0. Samples: 176352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:00:09,555][05631] Avg episode reward: [(0, '4.640')] [2023-02-23 00:00:14,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3116.5). Total num frames: 4722688. Throughput: 0: 820.4. Samples: 178334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:00:14,561][05631] Avg episode reward: [(0, '4.667')] [2023-02-23 00:00:19,119][20346] Updated weights for policy 0, policy_version 1158 (0.0023) [2023-02-23 00:00:19,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3137.4). Total num frames: 4743168. Throughput: 0: 854.7. Samples: 183486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:00:19,559][05631] Avg episode reward: [(0, '4.531')] [2023-02-23 00:00:24,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3157.3). Total num frames: 4763648. Throughput: 0: 859.8. Samples: 189826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:00:24,563][05631] Avg episode reward: [(0, '4.561')] [2023-02-23 00:00:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3143.1). Total num frames: 4775936. Throughput: 0: 841.6. Samples: 192118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:00:29,553][05631] Avg episode reward: [(0, '4.643')] [2023-02-23 00:00:31,191][20346] Updated weights for policy 0, policy_version 1168 (0.0030) [2023-02-23 00:00:34,551][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3145.7). Total num frames: 4792320. Throughput: 0: 832.8. Samples: 196192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:00:34,562][05631] Avg episode reward: [(0, '4.525')] [2023-02-23 00:00:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3164.4). Total num frames: 4812800. Throughput: 0: 861.9. Samples: 201478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:00:39,552][05631] Avg episode reward: [(0, '4.685')] [2023-02-23 00:00:42,314][20346] Updated weights for policy 0, policy_version 1178 (0.0013) [2023-02-23 00:00:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3182.3). Total num frames: 4833280. Throughput: 0: 861.6. Samples: 204622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:00:44,561][05631] Avg episode reward: [(0, '4.800')] [2023-02-23 00:00:49,550][05631] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3184.1). Total num frames: 4849664. Throughput: 0: 845.0. Samples: 210132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:00:49,562][05631] Avg episode reward: [(0, '4.596')] [2023-02-23 00:00:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3170.6). Total num frames: 4861952. Throughput: 0: 838.7. Samples: 214092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:00:54,560][05631] Avg episode reward: [(0, '4.559')] [2023-02-23 00:00:56,005][20346] Updated weights for policy 0, policy_version 1188 (0.0018) [2023-02-23 00:00:59,550][05631] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3172.5). Total num frames: 4878336. Throughput: 0: 842.4. Samples: 216242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:00:59,560][05631] Avg episode reward: [(0, '4.467')] [2023-02-23 00:01:04,551][05631] Fps is (10 sec: 3686.1, 60 sec: 3345.0, 300 sec: 3189.0). Total num frames: 4898816. Throughput: 0: 862.8. Samples: 222312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:04,554][05631] Avg episode reward: [(0, '4.771')] [2023-02-23 00:01:06,164][20346] Updated weights for policy 0, policy_version 1198 (0.0013) [2023-02-23 00:01:09,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3190.6). Total num frames: 4915200. Throughput: 0: 835.1. Samples: 227406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:09,555][05631] Avg episode reward: [(0, '4.942')] [2023-02-23 00:01:14,550][05631] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3177.9). Total num frames: 4927488. Throughput: 0: 827.6. Samples: 229360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:14,553][05631] Avg episode reward: [(0, '4.753')] [2023-02-23 00:01:19,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 4943872. Throughput: 0: 827.6. Samples: 233436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:19,554][05631] Avg episode reward: [(0, '4.772')] [2023-02-23 00:01:20,197][20346] Updated weights for policy 0, policy_version 1208 (0.0032) [2023-02-23 00:01:24,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 4964352. Throughput: 0: 849.5. Samples: 239706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:01:24,552][05631] Avg episode reward: [(0, '4.636')] [2023-02-23 00:01:29,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3304.6). Total num frames: 4980736. Throughput: 0: 848.1. Samples: 242786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:01:29,553][05631] Avg episode reward: [(0, '4.608')] [2023-02-23 00:01:31,680][20346] Updated weights for policy 0, policy_version 1218 (0.0012) [2023-02-23 00:01:34,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 4993024. Throughput: 0: 817.2. Samples: 246904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:34,553][05631] Avg episode reward: [(0, '4.551')] [2023-02-23 00:01:39,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 5009408. Throughput: 0: 823.4. Samples: 251146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:01:39,553][05631] Avg episode reward: [(0, '4.496')] [2023-02-23 00:01:44,091][20346] Updated weights for policy 0, policy_version 1228 (0.0018) [2023-02-23 00:01:44,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 5029888. Throughput: 0: 844.6. Samples: 254250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:01:44,556][05631] Avg episode reward: [(0, '4.489')] [2023-02-23 00:01:49,552][05631] Fps is (10 sec: 3685.5, 60 sec: 3276.7, 300 sec: 3360.1). Total num frames: 5046272. Throughput: 0: 847.0. Samples: 260430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:01:49,555][05631] Avg episode reward: [(0, '4.496')] [2023-02-23 00:01:54,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3360.2). Total num frames: 5062656. Throughput: 0: 822.0. Samples: 264396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:01:54,553][05631] Avg episode reward: [(0, '4.575')] [2023-02-23 00:01:57,329][20346] Updated weights for policy 0, policy_version 1238 (0.0023) [2023-02-23 00:01:59,550][05631] Fps is (10 sec: 2867.9, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 5074944. Throughput: 0: 822.0. Samples: 266352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:01:59,553][05631] Avg episode reward: [(0, '4.633')] [2023-02-23 00:02:04,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 5095424. Throughput: 0: 853.4. Samples: 271840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:04,553][05631] Avg episode reward: [(0, '4.712')] [2023-02-23 00:02:04,567][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001244_5095424.pth... [2023-02-23 00:02:04,745][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001047_4288512.pth [2023-02-23 00:02:07,883][20346] Updated weights for policy 0, policy_version 1248 (0.0021) [2023-02-23 00:02:09,550][05631] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5115904. Throughput: 0: 846.3. Samples: 277790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:02:09,560][05631] Avg episode reward: [(0, '4.587')] [2023-02-23 00:02:14,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5128192. Throughput: 0: 821.4. Samples: 279750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:14,555][05631] Avg episode reward: [(0, '4.632')] [2023-02-23 00:02:19,550][05631] Fps is (10 sec: 2457.7, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 5140480. Throughput: 0: 817.4. Samples: 283688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:19,557][05631] Avg episode reward: [(0, '4.535')] [2023-02-23 00:02:21,629][20346] Updated weights for policy 0, policy_version 1258 (0.0014) [2023-02-23 00:02:24,551][05631] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 5160960. Throughput: 0: 848.1. Samples: 289310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:24,553][05631] Avg episode reward: [(0, '4.831')] [2023-02-23 00:02:29,551][05631] Fps is (10 sec: 4095.5, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 5181440. Throughput: 0: 847.5. Samples: 292388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:29,555][05631] Avg episode reward: [(0, '4.835')] [2023-02-23 00:02:33,127][20346] Updated weights for policy 0, policy_version 1268 (0.0012) [2023-02-23 00:02:34,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 5193728. Throughput: 0: 819.1. Samples: 297288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:02:34,558][05631] Avg episode reward: [(0, '4.697')] [2023-02-23 00:02:39,551][05631] Fps is (10 sec: 2867.4, 60 sec: 3345.0, 300 sec: 3360.1). Total num frames: 5210112. Throughput: 0: 817.1. Samples: 301166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:02:39,557][05631] Avg episode reward: [(0, '4.401')] [2023-02-23 00:02:44,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 5230592. Throughput: 0: 832.3. Samples: 303806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-23 00:02:44,553][05631] Avg episode reward: [(0, '4.512')] [2023-02-23 00:02:45,441][20346] Updated weights for policy 0, policy_version 1278 (0.0026) [2023-02-23 00:02:49,550][05631] Fps is (10 sec: 4096.3, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 5251072. Throughput: 0: 853.4. Samples: 310244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:02:49,552][05631] Avg episode reward: [(0, '4.730')] [2023-02-23 00:02:54,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5263360. Throughput: 0: 833.7. Samples: 315306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:54,558][05631] Avg episode reward: [(0, '4.705')] [2023-02-23 00:02:57,585][20346] Updated weights for policy 0, policy_version 1288 (0.0020) [2023-02-23 00:02:59,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5279744. Throughput: 0: 834.5. Samples: 317302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:02:59,553][05631] Avg episode reward: [(0, '4.640')] [2023-02-23 00:03:04,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5296128. Throughput: 0: 847.8. Samples: 321838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:03:04,553][05631] Avg episode reward: [(0, '4.826')] [2023-02-23 00:03:09,067][20346] Updated weights for policy 0, policy_version 1298 (0.0019) [2023-02-23 00:03:09,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5316608. Throughput: 0: 857.7. Samples: 327906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:03:09,559][05631] Avg episode reward: [(0, '4.892')] [2023-02-23 00:03:14,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5332992. Throughput: 0: 853.2. Samples: 330782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:03:14,557][05631] Avg episode reward: [(0, '4.608')] [2023-02-23 00:03:19,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 5345280. Throughput: 0: 831.3. Samples: 334698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:03:19,554][05631] Avg episode reward: [(0, '4.631')] [2023-02-23 00:03:22,896][20346] Updated weights for policy 0, policy_version 1308 (0.0019) [2023-02-23 00:03:24,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5361664. Throughput: 0: 847.8. Samples: 339316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:03:24,558][05631] Avg episode reward: [(0, '4.570')] [2023-02-23 00:03:29,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5382144. Throughput: 0: 859.7. Samples: 342494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:03:29,556][05631] Avg episode reward: [(0, '4.385')] [2023-02-23 00:03:32,607][20346] Updated weights for policy 0, policy_version 1318 (0.0019) [2023-02-23 00:03:34,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 5402624. Throughput: 0: 853.7. Samples: 348662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:03:34,554][05631] Avg episode reward: [(0, '4.452')] [2023-02-23 00:03:39,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 5414912. Throughput: 0: 829.1. Samples: 352616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:03:39,556][05631] Avg episode reward: [(0, '4.533')] [2023-02-23 00:03:44,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3360.1). Total num frames: 5431296. Throughput: 0: 827.7. Samples: 354550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:03:44,553][05631] Avg episode reward: [(0, '4.619')] [2023-02-23 00:03:46,418][20346] Updated weights for policy 0, policy_version 1328 (0.0025) [2023-02-23 00:03:49,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5451776. Throughput: 0: 852.6. Samples: 360206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:03:49,553][05631] Avg episode reward: [(0, '4.738')] [2023-02-23 00:03:54,550][05631] Fps is (10 sec: 3686.6, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 5468160. Throughput: 0: 846.7. Samples: 366008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:03:54,558][05631] Avg episode reward: [(0, '4.729')] [2023-02-23 00:03:58,585][20346] Updated weights for policy 0, policy_version 1338 (0.0016) [2023-02-23 00:03:59,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 5480448. Throughput: 0: 827.2. Samples: 368004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:03:59,561][05631] Avg episode reward: [(0, '4.712')] [2023-02-23 00:04:04,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5496832. Throughput: 0: 829.1. Samples: 372008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:04:04,552][05631] Avg episode reward: [(0, '4.584')] [2023-02-23 00:04:04,576][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001342_5496832.pth... [2023-02-23 00:04:04,750][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001147_4698112.pth [2023-02-23 00:04:09,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5517312. Throughput: 0: 854.9. Samples: 377786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:04:09,556][05631] Avg episode reward: [(0, '4.547')] [2023-02-23 00:04:10,198][20346] Updated weights for policy 0, policy_version 1348 (0.0017) [2023-02-23 00:04:14,550][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5537792. Throughput: 0: 855.1. Samples: 380974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:04:14,554][05631] Avg episode reward: [(0, '4.556')] [2023-02-23 00:04:19,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 5550080. Throughput: 0: 828.9. Samples: 385962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:04:19,557][05631] Avg episode reward: [(0, '4.594')] [2023-02-23 00:04:23,148][20346] Updated weights for policy 0, policy_version 1358 (0.0016) [2023-02-23 00:04:24,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 5562368. Throughput: 0: 830.4. Samples: 389982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:04:24,557][05631] Avg episode reward: [(0, '4.687')] [2023-02-23 00:04:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5582848. Throughput: 0: 848.0. Samples: 392710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:04:29,560][05631] Avg episode reward: [(0, '4.624')] [2023-02-23 00:04:33,791][20346] Updated weights for policy 0, policy_version 1368 (0.0022) [2023-02-23 00:04:34,550][05631] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5603328. Throughput: 0: 856.8. Samples: 398762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:04:34,552][05631] Avg episode reward: [(0, '4.377')] [2023-02-23 00:04:39,553][05631] Fps is (10 sec: 3685.2, 60 sec: 3413.1, 300 sec: 3360.1). Total num frames: 5619712. Throughput: 0: 830.0. Samples: 403360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:04:39,556][05631] Avg episode reward: [(0, '4.492')] [2023-02-23 00:04:44,551][05631] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3360.1). Total num frames: 5632000. Throughput: 0: 831.6. Samples: 405428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:04:44,553][05631] Avg episode reward: [(0, '4.584')] [2023-02-23 00:04:47,455][20346] Updated weights for policy 0, policy_version 1378 (0.0022) [2023-02-23 00:04:49,550][05631] Fps is (10 sec: 3277.9, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5652480. Throughput: 0: 850.9. Samples: 410300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:04:49,557][05631] Avg episode reward: [(0, '4.574')] [2023-02-23 00:04:54,550][05631] Fps is (10 sec: 4096.5, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5672960. Throughput: 0: 864.8. Samples: 416702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:04:54,552][05631] Avg episode reward: [(0, '4.609')] [2023-02-23 00:04:58,250][20346] Updated weights for policy 0, policy_version 1388 (0.0023) [2023-02-23 00:04:59,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 5685248. Throughput: 0: 854.0. Samples: 419402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:04:59,556][05631] Avg episode reward: [(0, '4.708')] [2023-02-23 00:05:04,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5701632. Throughput: 0: 831.0. Samples: 423358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:05:04,556][05631] Avg episode reward: [(0, '4.778')] [2023-02-23 00:05:09,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5718016. Throughput: 0: 853.5. Samples: 428390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:05:09,553][05631] Avg episode reward: [(0, '4.709')] [2023-02-23 00:05:10,913][20346] Updated weights for policy 0, policy_version 1398 (0.0024) [2023-02-23 00:05:14,557][05631] Fps is (10 sec: 3683.6, 60 sec: 3344.7, 300 sec: 3373.9). Total num frames: 5738496. Throughput: 0: 862.6. Samples: 431534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:05:14,561][05631] Avg episode reward: [(0, '4.585')] [2023-02-23 00:05:19,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5754880. Throughput: 0: 853.5. Samples: 437168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:05:19,555][05631] Avg episode reward: [(0, '4.646')] [2023-02-23 00:05:23,050][20346] Updated weights for policy 0, policy_version 1408 (0.0024) [2023-02-23 00:05:24,550][05631] Fps is (10 sec: 3279.2, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 5771264. Throughput: 0: 841.6. Samples: 441230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:05:24,556][05631] Avg episode reward: [(0, '4.601')] [2023-02-23 00:05:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5787648. Throughput: 0: 839.3. Samples: 443194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:05:29,560][05631] Avg episode reward: [(0, '4.772')] [2023-02-23 00:05:34,268][20346] Updated weights for policy 0, policy_version 1418 (0.0029) [2023-02-23 00:05:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5808128. Throughput: 0: 869.2. Samples: 449414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:05:34,560][05631] Avg episode reward: [(0, '4.747')] [2023-02-23 00:05:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3360.1). Total num frames: 5824512. Throughput: 0: 850.8. Samples: 454990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:05:39,556][05631] Avg episode reward: [(0, '4.744')] [2023-02-23 00:05:44,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3346.2). Total num frames: 5836800. Throughput: 0: 834.4. Samples: 456952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:05:44,561][05631] Avg episode reward: [(0, '4.806')] [2023-02-23 00:05:47,939][20346] Updated weights for policy 0, policy_version 1428 (0.0026) [2023-02-23 00:05:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 5853184. Throughput: 0: 837.7. Samples: 461056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:05:49,553][05631] Avg episode reward: [(0, '4.606')] [2023-02-23 00:05:54,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5873664. Throughput: 0: 868.8. Samples: 467484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:05:54,553][05631] Avg episode reward: [(0, '4.802')] [2023-02-23 00:05:57,691][20346] Updated weights for policy 0, policy_version 1438 (0.0023) [2023-02-23 00:05:59,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 5894144. Throughput: 0: 867.9. Samples: 470582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:05:59,553][05631] Avg episode reward: [(0, '4.815')] [2023-02-23 00:06:04,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 5906432. Throughput: 0: 840.8. Samples: 475004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:06:04,553][05631] Avg episode reward: [(0, '4.737')] [2023-02-23 00:06:04,573][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001442_5906432.pth... [2023-02-23 00:06:04,731][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001244_5095424.pth [2023-02-23 00:06:09,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 5922816. Throughput: 0: 840.8. Samples: 479064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:06:09,552][05631] Avg episode reward: [(0, '4.726')] [2023-02-23 00:06:11,225][20346] Updated weights for policy 0, policy_version 1448 (0.0014) [2023-02-23 00:06:14,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.8, 300 sec: 3387.9). Total num frames: 5943296. Throughput: 0: 866.1. Samples: 482168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:06:14,553][05631] Avg episode reward: [(0, '4.481')] [2023-02-23 00:06:19,554][05631] Fps is (10 sec: 4094.2, 60 sec: 3481.3, 300 sec: 3387.8). Total num frames: 5963776. Throughput: 0: 870.9. Samples: 488606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:06:19,566][05631] Avg episode reward: [(0, '4.615')] [2023-02-23 00:06:22,499][20346] Updated weights for policy 0, policy_version 1458 (0.0018) [2023-02-23 00:06:24,552][05631] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 5976064. Throughput: 0: 842.9. Samples: 492924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:06:24,558][05631] Avg episode reward: [(0, '4.693')] [2023-02-23 00:06:29,550][05631] Fps is (10 sec: 2458.7, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 5988352. Throughput: 0: 844.6. Samples: 494960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:06:29,552][05631] Avg episode reward: [(0, '4.759')] [2023-02-23 00:06:34,550][05631] Fps is (10 sec: 3277.6, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6008832. Throughput: 0: 872.8. Samples: 500334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:06:34,558][05631] Avg episode reward: [(0, '4.592')] [2023-02-23 00:06:34,578][20346] Updated weights for policy 0, policy_version 1468 (0.0031) [2023-02-23 00:06:39,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6029312. Throughput: 0: 868.1. Samples: 506550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:06:39,553][05631] Avg episode reward: [(0, '4.625')] [2023-02-23 00:06:44,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 6045696. Throughput: 0: 848.2. Samples: 508750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:06:44,558][05631] Avg episode reward: [(0, '4.656')] [2023-02-23 00:06:47,096][20346] Updated weights for policy 0, policy_version 1478 (0.0013) [2023-02-23 00:06:49,551][05631] Fps is (10 sec: 2866.9, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 6057984. Throughput: 0: 842.9. Samples: 512936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:06:49,562][05631] Avg episode reward: [(0, '5.050')] [2023-02-23 00:06:54,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6078464. Throughput: 0: 876.6. Samples: 518510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:06:54,552][05631] Avg episode reward: [(0, '4.948')] [2023-02-23 00:06:57,628][20346] Updated weights for policy 0, policy_version 1488 (0.0019) [2023-02-23 00:06:59,550][05631] Fps is (10 sec: 4506.0, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 6103040. Throughput: 0: 880.2. Samples: 521776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:06:59,552][05631] Avg episode reward: [(0, '4.591')] [2023-02-23 00:07:04,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 6115328. Throughput: 0: 853.7. Samples: 527018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:07:04,553][05631] Avg episode reward: [(0, '4.626')] [2023-02-23 00:07:09,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6127616. Throughput: 0: 846.8. Samples: 531030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:07:09,558][05631] Avg episode reward: [(0, '4.606')] [2023-02-23 00:07:11,311][20346] Updated weights for policy 0, policy_version 1498 (0.0021) [2023-02-23 00:07:14,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 6148096. Throughput: 0: 854.6. Samples: 533416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:07:14,553][05631] Avg episode reward: [(0, '4.572')] [2023-02-23 00:07:19,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.6, 300 sec: 3415.7). Total num frames: 6168576. Throughput: 0: 874.3. Samples: 539676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:07:19,553][05631] Avg episode reward: [(0, '4.795')] [2023-02-23 00:07:21,462][20346] Updated weights for policy 0, policy_version 1508 (0.0016) [2023-02-23 00:07:24,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 6180864. Throughput: 0: 849.6. Samples: 544782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:07:24,557][05631] Avg episode reward: [(0, '4.955')] [2023-02-23 00:07:29,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 6197248. Throughput: 0: 846.3. Samples: 546834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:07:29,552][05631] Avg episode reward: [(0, '4.892')] [2023-02-23 00:07:34,551][05631] Fps is (10 sec: 3276.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6213632. Throughput: 0: 851.8. Samples: 551266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 00:07:34,556][05631] Avg episode reward: [(0, '4.991')] [2023-02-23 00:07:34,676][20346] Updated weights for policy 0, policy_version 1518 (0.0043) [2023-02-23 00:07:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6234112. Throughput: 0: 866.8. Samples: 557516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:07:39,553][05631] Avg episode reward: [(0, '4.851')] [2023-02-23 00:07:44,550][05631] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6250496. Throughput: 0: 860.0. Samples: 560478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:07:44,553][05631] Avg episode reward: [(0, '4.687')] [2023-02-23 00:07:46,632][20346] Updated weights for policy 0, policy_version 1528 (0.0013) [2023-02-23 00:07:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 6262784. Throughput: 0: 831.0. Samples: 564412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:07:49,559][05631] Avg episode reward: [(0, '4.607')] [2023-02-23 00:07:54,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6283264. Throughput: 0: 845.7. Samples: 569086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:07:54,564][05631] Avg episode reward: [(0, '4.788')] [2023-02-23 00:07:58,320][20346] Updated weights for policy 0, policy_version 1538 (0.0027) [2023-02-23 00:07:59,550][05631] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6303744. Throughput: 0: 863.1. Samples: 572254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:07:59,553][05631] Avg episode reward: [(0, '4.945')] [2023-02-23 00:08:04,553][05631] Fps is (10 sec: 3685.1, 60 sec: 3413.1, 300 sec: 3401.7). Total num frames: 6320128. Throughput: 0: 858.0. Samples: 578288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:08:04,559][05631] Avg episode reward: [(0, '5.220')] [2023-02-23 00:08:04,579][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001543_6320128.pth... [2023-02-23 00:08:04,802][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001342_5496832.pth [2023-02-23 00:08:04,839][20332] Saving new best policy, reward=5.220! [2023-02-23 00:08:09,550][05631] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6332416. Throughput: 0: 830.6. Samples: 582158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:08:09,556][05631] Avg episode reward: [(0, '5.416')] [2023-02-23 00:08:09,564][20332] Saving new best policy, reward=5.416! [2023-02-23 00:08:11,878][20346] Updated weights for policy 0, policy_version 1548 (0.0018) [2023-02-23 00:08:14,550][05631] Fps is (10 sec: 2868.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6348800. Throughput: 0: 828.2. Samples: 584104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:08:14,553][05631] Avg episode reward: [(0, '5.240')] [2023-02-23 00:08:19,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6369280. Throughput: 0: 855.5. Samples: 589764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:08:19,553][05631] Avg episode reward: [(0, '5.246')] [2023-02-23 00:08:22,158][20346] Updated weights for policy 0, policy_version 1558 (0.0023) [2023-02-23 00:08:24,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3401.8). Total num frames: 6385664. Throughput: 0: 849.7. Samples: 595754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:08:24,554][05631] Avg episode reward: [(0, '5.237')] [2023-02-23 00:08:29,551][05631] Fps is (10 sec: 3276.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6402048. Throughput: 0: 828.5. Samples: 597762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:08:29,556][05631] Avg episode reward: [(0, '5.200')] [2023-02-23 00:08:34,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6414336. Throughput: 0: 828.8. Samples: 601706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:08:34,554][05631] Avg episode reward: [(0, '5.333')] [2023-02-23 00:08:35,888][20346] Updated weights for policy 0, policy_version 1568 (0.0021) [2023-02-23 00:08:39,550][05631] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6434816. Throughput: 0: 858.9. Samples: 607736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:08:39,555][05631] Avg episode reward: [(0, '5.066')] [2023-02-23 00:08:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6455296. Throughput: 0: 859.2. Samples: 610918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:08:44,561][05631] Avg episode reward: [(0, '4.606')] [2023-02-23 00:08:46,584][20346] Updated weights for policy 0, policy_version 1578 (0.0015) [2023-02-23 00:08:49,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6467584. Throughput: 0: 829.7. Samples: 615622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:08:49,553][05631] Avg episode reward: [(0, '4.648')] [2023-02-23 00:08:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6483968. Throughput: 0: 830.1. Samples: 619512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:08:54,559][05631] Avg episode reward: [(0, '4.725')] [2023-02-23 00:08:59,489][20346] Updated weights for policy 0, policy_version 1588 (0.0018) [2023-02-23 00:08:59,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6504448. Throughput: 0: 850.7. Samples: 622386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:08:59,553][05631] Avg episode reward: [(0, '4.713')] [2023-02-23 00:09:04,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3415.6). Total num frames: 6524928. Throughput: 0: 864.0. Samples: 628642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:09:04,552][05631] Avg episode reward: [(0, '4.744')] [2023-02-23 00:09:09,556][05631] Fps is (10 sec: 3274.8, 60 sec: 3413.0, 300 sec: 3387.8). Total num frames: 6537216. Throughput: 0: 834.4. Samples: 633306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:09:09,566][05631] Avg episode reward: [(0, '4.822')] [2023-02-23 00:09:11,968][20346] Updated weights for policy 0, policy_version 1598 (0.0012) [2023-02-23 00:09:14,551][05631] Fps is (10 sec: 2457.3, 60 sec: 3345.0, 300 sec: 3387.9). Total num frames: 6549504. Throughput: 0: 835.5. Samples: 635358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:09:14,556][05631] Avg episode reward: [(0, '4.815')] [2023-02-23 00:09:19,550][05631] Fps is (10 sec: 3278.8, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 6569984. Throughput: 0: 856.2. Samples: 640234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:09:19,553][05631] Avg episode reward: [(0, '4.969')] [2023-02-23 00:09:23,025][20346] Updated weights for policy 0, policy_version 1608 (0.0017) [2023-02-23 00:09:24,550][05631] Fps is (10 sec: 4096.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 6590464. Throughput: 0: 860.7. Samples: 646466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:09:24,553][05631] Avg episode reward: [(0, '5.020')] [2023-02-23 00:09:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6602752. Throughput: 0: 841.2. Samples: 648774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:09:29,561][05631] Avg episode reward: [(0, '4.957')] [2023-02-23 00:09:34,551][05631] Fps is (10 sec: 2457.4, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 6615040. Throughput: 0: 819.4. Samples: 652494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:09:34,561][05631] Avg episode reward: [(0, '4.896')] [2023-02-23 00:09:37,051][20346] Updated weights for policy 0, policy_version 1618 (0.0022) [2023-02-23 00:09:39,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6635520. Throughput: 0: 844.2. Samples: 657500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:09:39,560][05631] Avg episode reward: [(0, '4.907')] [2023-02-23 00:09:44,551][05631] Fps is (10 sec: 4096.2, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 6656000. Throughput: 0: 850.1. Samples: 660640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:09:44,560][05631] Avg episode reward: [(0, '4.669')] [2023-02-23 00:09:48,070][20346] Updated weights for policy 0, policy_version 1628 (0.0013) [2023-02-23 00:09:49,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 6668288. Throughput: 0: 828.9. Samples: 665942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:09:49,553][05631] Avg episode reward: [(0, '4.817')] [2023-02-23 00:09:54,550][05631] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6684672. Throughput: 0: 813.2. Samples: 669894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:09:54,553][05631] Avg episode reward: [(0, '4.900')] [2023-02-23 00:09:59,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 6701056. Throughput: 0: 816.5. Samples: 672098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:09:59,552][05631] Avg episode reward: [(0, '5.044')] [2023-02-23 00:10:00,807][20346] Updated weights for policy 0, policy_version 1638 (0.0030) [2023-02-23 00:10:04,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 6721536. Throughput: 0: 850.1. Samples: 678490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:10:04,553][05631] Avg episode reward: [(0, '4.944')] [2023-02-23 00:10:04,584][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001642_6725632.pth... [2023-02-23 00:10:04,720][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001442_5906432.pth [2023-02-23 00:10:09,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.7, 300 sec: 3401.8). Total num frames: 6742016. Throughput: 0: 833.4. Samples: 683970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:10:09,554][05631] Avg episode reward: [(0, '4.623')] [2023-02-23 00:10:12,737][20346] Updated weights for policy 0, policy_version 1648 (0.0012) [2023-02-23 00:10:14,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 6754304. Throughput: 0: 826.0. Samples: 685946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:10:14,558][05631] Avg episode reward: [(0, '4.478')] [2023-02-23 00:10:19,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6770688. Throughput: 0: 830.3. Samples: 689858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:10:19,558][05631] Avg episode reward: [(0, '4.628')] [2023-02-23 00:10:24,403][20346] Updated weights for policy 0, policy_version 1658 (0.0019) [2023-02-23 00:10:24,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6791168. Throughput: 0: 858.6. Samples: 696136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:10:24,553][05631] Avg episode reward: [(0, '4.876')] [2023-02-23 00:10:29,550][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 6807552. Throughput: 0: 859.5. Samples: 699318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:10:29,555][05631] Avg episode reward: [(0, '4.753')] [2023-02-23 00:10:34,552][05631] Fps is (10 sec: 2866.5, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 6819840. Throughput: 0: 833.7. Samples: 703460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:10:34,555][05631] Avg episode reward: [(0, '4.770')] [2023-02-23 00:10:38,158][20346] Updated weights for policy 0, policy_version 1668 (0.0041) [2023-02-23 00:10:39,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6836224. Throughput: 0: 837.2. Samples: 707570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:10:39,552][05631] Avg episode reward: [(0, '4.691')] [2023-02-23 00:10:44,550][05631] Fps is (10 sec: 3687.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6856704. Throughput: 0: 858.0. Samples: 710708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:10:44,553][05631] Avg episode reward: [(0, '4.632')] [2023-02-23 00:10:48,107][20346] Updated weights for policy 0, policy_version 1678 (0.0029) [2023-02-23 00:10:49,554][05631] Fps is (10 sec: 3685.1, 60 sec: 3413.1, 300 sec: 3387.8). Total num frames: 6873088. Throughput: 0: 855.3. Samples: 716982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:10:49,562][05631] Avg episode reward: [(0, '4.465')] [2023-02-23 00:10:54,551][05631] Fps is (10 sec: 3276.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 6889472. Throughput: 0: 827.0. Samples: 721186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:10:54,556][05631] Avg episode reward: [(0, '4.593')] [2023-02-23 00:10:59,550][05631] Fps is (10 sec: 2868.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 6901760. Throughput: 0: 827.0. Samples: 723162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:10:59,553][05631] Avg episode reward: [(0, '4.829')] [2023-02-23 00:11:01,732][20346] Updated weights for policy 0, policy_version 1688 (0.0016) [2023-02-23 00:11:04,550][05631] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6922240. Throughput: 0: 858.4. Samples: 728486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:11:04,557][05631] Avg episode reward: [(0, '4.940')] [2023-02-23 00:11:09,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 6942720. Throughput: 0: 858.4. Samples: 734766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:11:09,553][05631] Avg episode reward: [(0, '4.687')] [2023-02-23 00:11:13,068][20346] Updated weights for policy 0, policy_version 1698 (0.0015) [2023-02-23 00:11:14,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3345.0, 300 sec: 3360.2). Total num frames: 6955008. Throughput: 0: 835.1. Samples: 736898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:11:14,553][05631] Avg episode reward: [(0, '4.729')] [2023-02-23 00:11:19,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 6971392. Throughput: 0: 831.1. Samples: 740860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:19,557][05631] Avg episode reward: [(0, '4.664')] [2023-02-23 00:11:24,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 6991872. Throughput: 0: 862.0. Samples: 746360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:24,558][05631] Avg episode reward: [(0, '4.733')] [2023-02-23 00:11:25,412][20346] Updated weights for policy 0, policy_version 1708 (0.0020) [2023-02-23 00:11:29,550][05631] Fps is (10 sec: 4096.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7012352. Throughput: 0: 862.5. Samples: 749522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:29,552][05631] Avg episode reward: [(0, '4.813')] [2023-02-23 00:11:34,553][05631] Fps is (10 sec: 3685.2, 60 sec: 3481.5, 300 sec: 3387.8). Total num frames: 7028736. Throughput: 0: 844.0. Samples: 754964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:11:34,556][05631] Avg episode reward: [(0, '4.683')] [2023-02-23 00:11:37,776][20346] Updated weights for policy 0, policy_version 1718 (0.0024) [2023-02-23 00:11:39,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7041024. Throughput: 0: 839.9. Samples: 758980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:39,554][05631] Avg episode reward: [(0, '4.530')] [2023-02-23 00:11:44,550][05631] Fps is (10 sec: 2868.1, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7057408. Throughput: 0: 848.7. Samples: 761354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:44,561][05631] Avg episode reward: [(0, '4.524')] [2023-02-23 00:11:48,822][20346] Updated weights for policy 0, policy_version 1728 (0.0013) [2023-02-23 00:11:49,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 7077888. Throughput: 0: 865.5. Samples: 767432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:11:49,556][05631] Avg episode reward: [(0, '4.634')] [2023-02-23 00:11:54,552][05631] Fps is (10 sec: 3685.7, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 7094272. Throughput: 0: 837.9. Samples: 772474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:11:54,555][05631] Avg episode reward: [(0, '4.665')] [2023-02-23 00:11:59,552][05631] Fps is (10 sec: 2866.5, 60 sec: 3413.2, 300 sec: 3360.1). Total num frames: 7106560. Throughput: 0: 833.7. Samples: 774416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:11:59,558][05631] Avg episode reward: [(0, '4.534')] [2023-02-23 00:12:02,783][20346] Updated weights for policy 0, policy_version 1738 (0.0029) [2023-02-23 00:12:04,550][05631] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7127040. Throughput: 0: 843.3. Samples: 778806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:04,552][05631] Avg episode reward: [(0, '4.577')] [2023-02-23 00:12:04,569][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001740_7127040.pth... [2023-02-23 00:12:04,697][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001543_6320128.pth [2023-02-23 00:12:09,550][05631] Fps is (10 sec: 4097.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7147520. Throughput: 0: 863.0. Samples: 785196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:09,553][05631] Avg episode reward: [(0, '4.826')] [2023-02-23 00:12:12,538][20346] Updated weights for policy 0, policy_version 1748 (0.0012) [2023-02-23 00:12:14,551][05631] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3374.0). Total num frames: 7163904. Throughput: 0: 863.5. Samples: 788380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:12:14,553][05631] Avg episode reward: [(0, '4.838')] [2023-02-23 00:12:19,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7176192. Throughput: 0: 829.7. Samples: 792298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:19,554][05631] Avg episode reward: [(0, '4.798')] [2023-02-23 00:12:24,550][05631] Fps is (10 sec: 2867.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 7192576. Throughput: 0: 837.0. Samples: 796644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:24,563][05631] Avg episode reward: [(0, '4.717')] [2023-02-23 00:12:26,256][20346] Updated weights for policy 0, policy_version 1758 (0.0017) [2023-02-23 00:12:29,550][05631] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7213056. Throughput: 0: 853.5. Samples: 799762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:29,553][05631] Avg episode reward: [(0, '4.653')] [2023-02-23 00:12:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3374.0). Total num frames: 7229440. Throughput: 0: 856.9. Samples: 805994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:12:34,555][05631] Avg episode reward: [(0, '4.714')] [2023-02-23 00:12:38,093][20346] Updated weights for policy 0, policy_version 1768 (0.0014) [2023-02-23 00:12:39,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 7241728. Throughput: 0: 831.7. Samples: 809900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:12:39,558][05631] Avg episode reward: [(0, '4.777')] [2023-02-23 00:12:44,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 7258112. Throughput: 0: 833.4. Samples: 811916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:12:44,560][05631] Avg episode reward: [(0, '4.729')] [2023-02-23 00:12:49,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 7278592. Throughput: 0: 859.3. Samples: 817476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:12:49,557][05631] Avg episode reward: [(0, '4.659')] [2023-02-23 00:12:49,933][20346] Updated weights for policy 0, policy_version 1778 (0.0021) [2023-02-23 00:12:54,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 7299072. Throughput: 0: 855.4. Samples: 823688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:12:54,558][05631] Avg episode reward: [(0, '4.428')] [2023-02-23 00:12:59,555][05631] Fps is (10 sec: 3275.1, 60 sec: 3413.2, 300 sec: 3360.1). Total num frames: 7311360. Throughput: 0: 829.2. Samples: 825698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:12:59,559][05631] Avg episode reward: [(0, '4.599')] [2023-02-23 00:13:03,182][20346] Updated weights for policy 0, policy_version 1788 (0.0013) [2023-02-23 00:13:04,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 7327744. Throughput: 0: 832.9. Samples: 829776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:13:04,552][05631] Avg episode reward: [(0, '4.744')] [2023-02-23 00:13:09,550][05631] Fps is (10 sec: 3688.3, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7348224. Throughput: 0: 862.1. Samples: 835438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:13:09,552][05631] Avg episode reward: [(0, '4.843')] [2023-02-23 00:13:13,295][20346] Updated weights for policy 0, policy_version 1798 (0.0013) [2023-02-23 00:13:14,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 7368704. Throughput: 0: 861.7. Samples: 838540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:13:14,561][05631] Avg episode reward: [(0, '4.630')] [2023-02-23 00:13:19,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 7380992. Throughput: 0: 831.0. Samples: 843390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:13:19,557][05631] Avg episode reward: [(0, '4.444')] [2023-02-23 00:13:24,551][05631] Fps is (10 sec: 2457.2, 60 sec: 3345.0, 300 sec: 3360.1). Total num frames: 7393280. Throughput: 0: 831.4. Samples: 847312. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:13:24,556][05631] Avg episode reward: [(0, '4.395')] [2023-02-23 00:13:27,300][20346] Updated weights for policy 0, policy_version 1808 (0.0034) [2023-02-23 00:13:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7413760. Throughput: 0: 844.0. Samples: 849898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:13:29,553][05631] Avg episode reward: [(0, '4.697')] [2023-02-23 00:13:34,550][05631] Fps is (10 sec: 4096.6, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7434240. Throughput: 0: 862.6. Samples: 856294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:13:34,553][05631] Avg episode reward: [(0, '4.761')] [2023-02-23 00:13:37,678][20346] Updated weights for policy 0, policy_version 1818 (0.0018) [2023-02-23 00:13:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 7450624. Throughput: 0: 836.2. Samples: 861318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:13:39,556][05631] Avg episode reward: [(0, '4.732')] [2023-02-23 00:13:44,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7462912. Throughput: 0: 837.4. Samples: 863376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:13:44,555][05631] Avg episode reward: [(0, '4.705')] [2023-02-23 00:13:49,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7483392. Throughput: 0: 851.7. Samples: 868104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:13:49,553][05631] Avg episode reward: [(0, '4.663')] [2023-02-23 00:13:50,369][20346] Updated weights for policy 0, policy_version 1828 (0.0024) [2023-02-23 00:13:54,550][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7503872. Throughput: 0: 868.0. Samples: 874498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:13:54,560][05631] Avg episode reward: [(0, '4.542')] [2023-02-23 00:13:59,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.9, 300 sec: 3374.0). Total num frames: 7520256. Throughput: 0: 863.0. Samples: 877374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:13:59,554][05631] Avg episode reward: [(0, '4.485')] [2023-02-23 00:14:02,340][20346] Updated weights for policy 0, policy_version 1838 (0.0014) [2023-02-23 00:14:04,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.1). Total num frames: 7532544. Throughput: 0: 845.5. Samples: 881438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:14:04,554][05631] Avg episode reward: [(0, '4.444')] [2023-02-23 00:14:04,579][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth... [2023-02-23 00:14:04,761][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001642_6725632.pth [2023-02-23 00:14:09,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7553024. Throughput: 0: 867.9. Samples: 886366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:14:09,552][05631] Avg episode reward: [(0, '4.507')] [2023-02-23 00:14:13,538][20346] Updated weights for policy 0, policy_version 1848 (0.0013) [2023-02-23 00:14:14,550][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7573504. Throughput: 0: 880.9. Samples: 889540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:14:14,553][05631] Avg episode reward: [(0, '4.652')] [2023-02-23 00:14:19,552][05631] Fps is (10 sec: 3276.2, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 7585792. Throughput: 0: 866.3. Samples: 895280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:14:19,558][05631] Avg episode reward: [(0, '4.794')] [2023-02-23 00:14:24,553][05631] Fps is (10 sec: 2866.2, 60 sec: 3481.5, 300 sec: 3387.8). Total num frames: 7602176. Throughput: 0: 840.3. Samples: 899136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:14:24,556][05631] Avg episode reward: [(0, '4.724')] [2023-02-23 00:14:27,629][20346] Updated weights for policy 0, policy_version 1858 (0.0042) [2023-02-23 00:14:29,550][05631] Fps is (10 sec: 2867.7, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7614464. Throughput: 0: 836.7. Samples: 901028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:14:29,562][05631] Avg episode reward: [(0, '4.653')] [2023-02-23 00:14:34,550][05631] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7634944. Throughput: 0: 856.9. Samples: 906664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:14:34,553][05631] Avg episode reward: [(0, '4.515')] [2023-02-23 00:14:38,134][20346] Updated weights for policy 0, policy_version 1868 (0.0013) [2023-02-23 00:14:39,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 7651328. Throughput: 0: 839.9. Samples: 912292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:14:39,554][05631] Avg episode reward: [(0, '4.396')] [2023-02-23 00:14:44,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 7667712. Throughput: 0: 820.2. Samples: 914284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:14:44,560][05631] Avg episode reward: [(0, '4.508')] [2023-02-23 00:14:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 7680000. Throughput: 0: 818.6. Samples: 918276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:14:49,559][05631] Avg episode reward: [(0, '4.580')] [2023-02-23 00:14:51,617][20346] Updated weights for policy 0, policy_version 1878 (0.0014) [2023-02-23 00:14:54,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 7704576. Throughput: 0: 847.6. Samples: 924506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:14:54,553][05631] Avg episode reward: [(0, '4.569')] [2023-02-23 00:14:59,551][05631] Fps is (10 sec: 4095.5, 60 sec: 3345.0, 300 sec: 3387.9). Total num frames: 7720960. Throughput: 0: 847.9. Samples: 927698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:14:59,558][05631] Avg episode reward: [(0, '4.678')] [2023-02-23 00:15:02,777][20346] Updated weights for policy 0, policy_version 1888 (0.0019) [2023-02-23 00:15:04,552][05631] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 7737344. Throughput: 0: 821.5. Samples: 932246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:15:04,555][05631] Avg episode reward: [(0, '4.664')] [2023-02-23 00:15:09,551][05631] Fps is (10 sec: 2867.4, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 7749632. Throughput: 0: 822.8. Samples: 936160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:15:09,561][05631] Avg episode reward: [(0, '4.795')] [2023-02-23 00:15:14,550][05631] Fps is (10 sec: 3277.4, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 7770112. Throughput: 0: 846.2. Samples: 939106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:15:14,552][05631] Avg episode reward: [(0, '4.709')] [2023-02-23 00:15:15,088][20346] Updated weights for policy 0, policy_version 1898 (0.0017) [2023-02-23 00:15:19,552][05631] Fps is (10 sec: 4095.2, 60 sec: 3413.3, 300 sec: 3387.8). Total num frames: 7790592. Throughput: 0: 856.2. Samples: 945196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:15:19,555][05631] Avg episode reward: [(0, '4.722')] [2023-02-23 00:15:24,551][05631] Fps is (10 sec: 3276.4, 60 sec: 3345.2, 300 sec: 3374.0). Total num frames: 7802880. Throughput: 0: 825.9. Samples: 949458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:15:24,555][05631] Avg episode reward: [(0, '4.616')] [2023-02-23 00:15:28,716][20346] Updated weights for policy 0, policy_version 1908 (0.0027) [2023-02-23 00:15:29,551][05631] Fps is (10 sec: 2458.0, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 7815168. Throughput: 0: 827.0. Samples: 951498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:15:29,558][05631] Avg episode reward: [(0, '4.583')] [2023-02-23 00:15:34,550][05631] Fps is (10 sec: 3277.3, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7835648. Throughput: 0: 851.8. Samples: 956608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:15:34,558][05631] Avg episode reward: [(0, '4.504')] [2023-02-23 00:15:38,810][20346] Updated weights for policy 0, policy_version 1918 (0.0014) [2023-02-23 00:15:39,550][05631] Fps is (10 sec: 4096.3, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7856128. Throughput: 0: 854.7. Samples: 962966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:15:39,553][05631] Avg episode reward: [(0, '4.470')] [2023-02-23 00:15:44,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7872512. Throughput: 0: 838.5. Samples: 965430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:15:44,553][05631] Avg episode reward: [(0, '4.574')] [2023-02-23 00:15:49,551][05631] Fps is (10 sec: 2866.9, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7884800. Throughput: 0: 825.4. Samples: 969390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:15:49,555][05631] Avg episode reward: [(0, '4.418')] [2023-02-23 00:15:52,719][20346] Updated weights for policy 0, policy_version 1928 (0.0032) [2023-02-23 00:15:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 7901184. Throughput: 0: 846.2. Samples: 974240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:15:54,561][05631] Avg episode reward: [(0, '4.475')] [2023-02-23 00:15:59,550][05631] Fps is (10 sec: 3686.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 7921664. Throughput: 0: 848.9. Samples: 977306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:15:59,561][05631] Avg episode reward: [(0, '4.648')] [2023-02-23 00:16:03,830][20346] Updated weights for policy 0, policy_version 1938 (0.0031) [2023-02-23 00:16:04,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3374.0). Total num frames: 7938048. Throughput: 0: 837.4. Samples: 982878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:16:04,560][05631] Avg episode reward: [(0, '4.642')] [2023-02-23 00:16:04,574][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001938_7938048.pth... [2023-02-23 00:16:04,764][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001740_7127040.pth [2023-02-23 00:16:09,553][05631] Fps is (10 sec: 2866.5, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 7950336. Throughput: 0: 829.3. Samples: 986776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:16:09,557][05631] Avg episode reward: [(0, '4.710')] [2023-02-23 00:16:14,554][05631] Fps is (10 sec: 2866.0, 60 sec: 3276.6, 300 sec: 3373.9). Total num frames: 7966720. Throughput: 0: 828.7. Samples: 988794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:16:14,559][05631] Avg episode reward: [(0, '4.632')] [2023-02-23 00:16:16,622][20346] Updated weights for policy 0, policy_version 1948 (0.0023) [2023-02-23 00:16:19,550][05631] Fps is (10 sec: 4097.0, 60 sec: 3345.2, 300 sec: 3387.9). Total num frames: 7991296. Throughput: 0: 850.7. Samples: 994890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:16:19,552][05631] Avg episode reward: [(0, '4.715')] [2023-02-23 00:16:24,550][05631] Fps is (10 sec: 3688.0, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 8003584. Throughput: 0: 834.4. Samples: 1000512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:16:24,553][05631] Avg episode reward: [(0, '4.703')] [2023-02-23 00:16:29,047][20346] Updated weights for policy 0, policy_version 1958 (0.0012) [2023-02-23 00:16:29,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 8019968. Throughput: 0: 824.0. Samples: 1002508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:16:29,557][05631] Avg episode reward: [(0, '4.941')] [2023-02-23 00:16:34,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8036352. Throughput: 0: 825.8. Samples: 1006550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:16:34,561][05631] Avg episode reward: [(0, '4.926')] [2023-02-23 00:16:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8056832. Throughput: 0: 852.5. Samples: 1012604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:16:39,553][05631] Avg episode reward: [(0, '4.666')] [2023-02-23 00:16:40,407][20346] Updated weights for policy 0, policy_version 1968 (0.0013) [2023-02-23 00:16:44,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8073216. Throughput: 0: 854.3. Samples: 1015750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:16:44,554][05631] Avg episode reward: [(0, '4.445')] [2023-02-23 00:16:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 8085504. Throughput: 0: 831.5. Samples: 1020296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:16:49,553][05631] Avg episode reward: [(0, '4.485')] [2023-02-23 00:16:54,184][20346] Updated weights for policy 0, policy_version 1978 (0.0016) [2023-02-23 00:16:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8101888. Throughput: 0: 832.3. Samples: 1024228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:16:54,553][05631] Avg episode reward: [(0, '4.399')] [2023-02-23 00:16:59,552][05631] Fps is (10 sec: 3685.7, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 8122368. Throughput: 0: 856.1. Samples: 1027318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:16:59,561][05631] Avg episode reward: [(0, '4.410')] [2023-02-23 00:17:03,955][20346] Updated weights for policy 0, policy_version 1988 (0.0013) [2023-02-23 00:17:04,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8142848. Throughput: 0: 860.2. Samples: 1033598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:17:04,555][05631] Avg episode reward: [(0, '4.532')] [2023-02-23 00:17:09,550][05631] Fps is (10 sec: 3277.4, 60 sec: 3413.5, 300 sec: 3360.1). Total num frames: 8155136. Throughput: 0: 835.7. Samples: 1038118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:17:09,556][05631] Avg episode reward: [(0, '4.676')] [2023-02-23 00:17:14,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3374.0). Total num frames: 8171520. Throughput: 0: 836.6. Samples: 1040156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:17:14,555][05631] Avg episode reward: [(0, '4.729')] [2023-02-23 00:17:17,513][20346] Updated weights for policy 0, policy_version 1998 (0.0052) [2023-02-23 00:17:19,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8192000. Throughput: 0: 859.5. Samples: 1045228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:17:19,552][05631] Avg episode reward: [(0, '4.773')] [2023-02-23 00:17:24,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 8212480. Throughput: 0: 861.3. Samples: 1051362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:17:24,552][05631] Avg episode reward: [(0, '4.655')] [2023-02-23 00:17:28,833][20346] Updated weights for policy 0, policy_version 2008 (0.0012) [2023-02-23 00:17:29,552][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8224768. Throughput: 0: 845.8. Samples: 1053810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:17:29,555][05631] Avg episode reward: [(0, '4.536')] [2023-02-23 00:17:34,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8237056. Throughput: 0: 833.5. Samples: 1057802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:17:34,554][05631] Avg episode reward: [(0, '4.410')] [2023-02-23 00:17:39,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8257536. Throughput: 0: 860.0. Samples: 1062930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:17:39,553][05631] Avg episode reward: [(0, '4.517')] [2023-02-23 00:17:41,098][20346] Updated weights for policy 0, policy_version 2018 (0.0015) [2023-02-23 00:17:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8278016. Throughput: 0: 861.8. Samples: 1066096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:17:44,559][05631] Avg episode reward: [(0, '4.732')] [2023-02-23 00:17:49,551][05631] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3374.0). Total num frames: 8294400. Throughput: 0: 844.7. Samples: 1071612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:17:49,553][05631] Avg episode reward: [(0, '4.768')] [2023-02-23 00:17:53,748][20346] Updated weights for policy 0, policy_version 2028 (0.0012) [2023-02-23 00:17:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.1). Total num frames: 8306688. Throughput: 0: 833.7. Samples: 1075634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:17:54,558][05631] Avg episode reward: [(0, '4.772')] [2023-02-23 00:17:59,550][05631] Fps is (10 sec: 2867.6, 60 sec: 3345.2, 300 sec: 3374.0). Total num frames: 8323072. Throughput: 0: 834.8. Samples: 1077724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:17:59,553][05631] Avg episode reward: [(0, '4.700')] [2023-02-23 00:18:04,480][20346] Updated weights for policy 0, policy_version 2038 (0.0016) [2023-02-23 00:18:04,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8347648. Throughput: 0: 863.8. Samples: 1084098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:18:04,558][05631] Avg episode reward: [(0, '4.493')] [2023-02-23 00:18:04,571][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002038_8347648.pth... [2023-02-23 00:18:04,734][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001839_7532544.pth [2023-02-23 00:18:09,552][05631] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3374.0). Total num frames: 8364032. Throughput: 0: 850.4. Samples: 1089630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:18:09,559][05631] Avg episode reward: [(0, '4.641')] [2023-02-23 00:18:14,551][05631] Fps is (10 sec: 2866.8, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8376320. Throughput: 0: 841.6. Samples: 1091684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:18:14,563][05631] Avg episode reward: [(0, '4.670')] [2023-02-23 00:18:17,976][20346] Updated weights for policy 0, policy_version 2048 (0.0033) [2023-02-23 00:18:19,550][05631] Fps is (10 sec: 2867.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8392704. Throughput: 0: 844.2. Samples: 1095792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:18:19,558][05631] Avg episode reward: [(0, '4.470')] [2023-02-23 00:18:24,550][05631] Fps is (10 sec: 3686.9, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8413184. Throughput: 0: 869.2. Samples: 1102042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:18:24,552][05631] Avg episode reward: [(0, '4.387')] [2023-02-23 00:18:27,863][20346] Updated weights for policy 0, policy_version 2058 (0.0012) [2023-02-23 00:18:29,552][05631] Fps is (10 sec: 4095.4, 60 sec: 3481.5, 300 sec: 3387.9). Total num frames: 8433664. Throughput: 0: 868.7. Samples: 1105190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:18:29,563][05631] Avg episode reward: [(0, '4.624')] [2023-02-23 00:18:34,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 8445952. Throughput: 0: 844.7. Samples: 1109622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:18:34,556][05631] Avg episode reward: [(0, '4.728')] [2023-02-23 00:18:39,550][05631] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8462336. Throughput: 0: 848.4. Samples: 1113812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:18:39,557][05631] Avg episode reward: [(0, '4.679')] [2023-02-23 00:18:41,498][20346] Updated weights for policy 0, policy_version 2068 (0.0022) [2023-02-23 00:18:44,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8482816. Throughput: 0: 868.7. Samples: 1116814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:18:44,553][05631] Avg episode reward: [(0, '4.642')] [2023-02-23 00:18:49,551][05631] Fps is (10 sec: 3686.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8499200. Throughput: 0: 864.7. Samples: 1123010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:18:49,558][05631] Avg episode reward: [(0, '4.780')] [2023-02-23 00:18:53,247][20346] Updated weights for policy 0, policy_version 2078 (0.0019) [2023-02-23 00:18:54,553][05631] Fps is (10 sec: 2866.4, 60 sec: 3413.2, 300 sec: 3360.1). Total num frames: 8511488. Throughput: 0: 835.7. Samples: 1127236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:18:54,561][05631] Avg episode reward: [(0, '4.694')] [2023-02-23 00:18:59,550][05631] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8527872. Throughput: 0: 833.0. Samples: 1129168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:18:59,557][05631] Avg episode reward: [(0, '4.570')] [2023-02-23 00:19:04,550][05631] Fps is (10 sec: 3687.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8548352. Throughput: 0: 862.8. Samples: 1134618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:19:04,556][05631] Avg episode reward: [(0, '4.572')] [2023-02-23 00:19:05,215][20346] Updated weights for policy 0, policy_version 2088 (0.0025) [2023-02-23 00:19:09,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 8568832. Throughput: 0: 867.5. Samples: 1141080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:09,553][05631] Avg episode reward: [(0, '4.726')] [2023-02-23 00:19:14,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 8581120. Throughput: 0: 846.0. Samples: 1143260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:19:14,554][05631] Avg episode reward: [(0, '4.600')] [2023-02-23 00:19:17,903][20346] Updated weights for policy 0, policy_version 2098 (0.0033) [2023-02-23 00:19:19,552][05631] Fps is (10 sec: 2866.7, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 8597504. Throughput: 0: 838.9. Samples: 1147372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:19:19,555][05631] Avg episode reward: [(0, '4.735')] [2023-02-23 00:19:24,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 8613888. Throughput: 0: 866.7. Samples: 1152814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:24,553][05631] Avg episode reward: [(0, '4.916')] [2023-02-23 00:19:28,713][20346] Updated weights for policy 0, policy_version 2108 (0.0019) [2023-02-23 00:19:29,550][05631] Fps is (10 sec: 3687.1, 60 sec: 3345.2, 300 sec: 3387.9). Total num frames: 8634368. Throughput: 0: 864.3. Samples: 1155706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:29,552][05631] Avg episode reward: [(0, '4.967')] [2023-02-23 00:19:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8650752. Throughput: 0: 838.4. Samples: 1160736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:34,553][05631] Avg episode reward: [(0, '4.814')] [2023-02-23 00:19:39,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 8663040. Throughput: 0: 834.9. Samples: 1164804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:19:39,554][05631] Avg episode reward: [(0, '4.634')] [2023-02-23 00:19:42,131][20346] Updated weights for policy 0, policy_version 2118 (0.0013) [2023-02-23 00:19:44,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 8683520. Throughput: 0: 849.1. Samples: 1167376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:19:44,552][05631] Avg episode reward: [(0, '4.521')] [2023-02-23 00:19:49,550][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 8704000. Throughput: 0: 870.7. Samples: 1173800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:49,553][05631] Avg episode reward: [(0, '4.511')] [2023-02-23 00:19:52,691][20346] Updated weights for policy 0, policy_version 2128 (0.0016) [2023-02-23 00:19:54,551][05631] Fps is (10 sec: 3685.9, 60 sec: 3481.7, 300 sec: 3387.9). Total num frames: 8720384. Throughput: 0: 838.5. Samples: 1178812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:19:54,554][05631] Avg episode reward: [(0, '4.582')] [2023-02-23 00:19:59,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 8732672. Throughput: 0: 835.2. Samples: 1180844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:19:59,559][05631] Avg episode reward: [(0, '4.830')] [2023-02-23 00:20:04,551][05631] Fps is (10 sec: 3277.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 8753152. Throughput: 0: 844.2. Samples: 1185360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:20:04,554][05631] Avg episode reward: [(0, '4.874')] [2023-02-23 00:20:04,570][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002137_8753152.pth... [2023-02-23 00:20:04,717][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001938_7938048.pth [2023-02-23 00:20:05,622][20346] Updated weights for policy 0, policy_version 2138 (0.0018) [2023-02-23 00:20:09,550][05631] Fps is (10 sec: 4095.9, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 8773632. Throughput: 0: 861.7. Samples: 1191590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:20:09,553][05631] Avg episode reward: [(0, '4.805')] [2023-02-23 00:20:14,556][05631] Fps is (10 sec: 3684.4, 60 sec: 3481.2, 300 sec: 3387.8). Total num frames: 8790016. Throughput: 0: 868.0. Samples: 1194770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:20:14,561][05631] Avg episode reward: [(0, '4.688')] [2023-02-23 00:20:17,543][20346] Updated weights for policy 0, policy_version 2148 (0.0026) [2023-02-23 00:20:19,550][05631] Fps is (10 sec: 2867.3, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 8802304. Throughput: 0: 844.8. Samples: 1198754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:20:19,552][05631] Avg episode reward: [(0, '4.813')] [2023-02-23 00:20:24,550][05631] Fps is (10 sec: 2869.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 8818688. Throughput: 0: 855.7. Samples: 1203312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:20:24,560][05631] Avg episode reward: [(0, '4.604')] [2023-02-23 00:20:28,842][20346] Updated weights for policy 0, policy_version 2158 (0.0013) [2023-02-23 00:20:29,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 8839168. Throughput: 0: 866.3. Samples: 1206358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:20:29,561][05631] Avg episode reward: [(0, '4.641')] [2023-02-23 00:20:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 8855552. Throughput: 0: 866.2. Samples: 1212778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:20:34,553][05631] Avg episode reward: [(0, '4.731')] [2023-02-23 00:20:39,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 8871936. Throughput: 0: 845.0. Samples: 1216834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:20:39,557][05631] Avg episode reward: [(0, '4.659')] [2023-02-23 00:20:42,116][20346] Updated weights for policy 0, policy_version 2168 (0.0028) [2023-02-23 00:20:44,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 8888320. Throughput: 0: 844.7. Samples: 1218854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:20:44,553][05631] Avg episode reward: [(0, '4.657')] [2023-02-23 00:20:49,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8908800. Throughput: 0: 871.8. Samples: 1224590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:20:49,553][05631] Avg episode reward: [(0, '4.884')] [2023-02-23 00:20:52,075][20346] Updated weights for policy 0, policy_version 2178 (0.0013) [2023-02-23 00:20:54,552][05631] Fps is (10 sec: 4095.3, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 8929280. Throughput: 0: 871.2. Samples: 1230794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:20:54,565][05631] Avg episode reward: [(0, '4.982')] [2023-02-23 00:20:59,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 8941568. Throughput: 0: 842.9. Samples: 1232694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:20:59,558][05631] Avg episode reward: [(0, '4.819')] [2023-02-23 00:21:04,550][05631] Fps is (10 sec: 2458.0, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 8953856. Throughput: 0: 843.6. Samples: 1236718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:21:04,553][05631] Avg episode reward: [(0, '4.813')] [2023-02-23 00:21:05,718][20346] Updated weights for policy 0, policy_version 2188 (0.0013) [2023-02-23 00:21:09,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 8974336. Throughput: 0: 872.8. Samples: 1242586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:21:09,553][05631] Avg episode reward: [(0, '4.942')] [2023-02-23 00:21:14,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.7, 300 sec: 3401.8). Total num frames: 8994816. Throughput: 0: 873.4. Samples: 1245660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:21:14,555][05631] Avg episode reward: [(0, '4.893')] [2023-02-23 00:21:16,942][20346] Updated weights for policy 0, policy_version 2198 (0.0022) [2023-02-23 00:21:19,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9007104. Throughput: 0: 837.9. Samples: 1250484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:21:19,556][05631] Avg episode reward: [(0, '4.819')] [2023-02-23 00:21:24,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9023488. Throughput: 0: 837.3. Samples: 1254512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:21:24,559][05631] Avg episode reward: [(0, '4.550')] [2023-02-23 00:21:29,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9039872. Throughput: 0: 849.7. Samples: 1257090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:21:29,559][05631] Avg episode reward: [(0, '4.495')] [2023-02-23 00:21:29,739][20346] Updated weights for policy 0, policy_version 2208 (0.0031) [2023-02-23 00:21:34,552][05631] Fps is (10 sec: 4095.1, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 9064448. Throughput: 0: 858.6. Samples: 1263230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:21:34,555][05631] Avg episode reward: [(0, '4.616')] [2023-02-23 00:21:39,551][05631] Fps is (10 sec: 3686.1, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9076736. Throughput: 0: 827.4. Samples: 1268026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:21:39,558][05631] Avg episode reward: [(0, '4.694')] [2023-02-23 00:21:42,122][20346] Updated weights for policy 0, policy_version 2218 (0.0026) [2023-02-23 00:21:44,550][05631] Fps is (10 sec: 2458.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9089024. Throughput: 0: 829.0. Samples: 1269998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:21:44,553][05631] Avg episode reward: [(0, '4.667')] [2023-02-23 00:21:49,550][05631] Fps is (10 sec: 3277.1, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 9109504. Throughput: 0: 844.4. Samples: 1274718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:21:49,553][05631] Avg episode reward: [(0, '5.077')] [2023-02-23 00:21:53,036][20346] Updated weights for policy 0, policy_version 2228 (0.0016) [2023-02-23 00:21:54,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3415.7). Total num frames: 9129984. Throughput: 0: 857.6. Samples: 1281180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:21:54,559][05631] Avg episode reward: [(0, '4.911')] [2023-02-23 00:21:59,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3401.8). Total num frames: 9146368. Throughput: 0: 849.3. Samples: 1283880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:21:59,554][05631] Avg episode reward: [(0, '4.898')] [2023-02-23 00:22:04,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9158656. Throughput: 0: 831.3. Samples: 1287892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:22:04,552][05631] Avg episode reward: [(0, '4.683')] [2023-02-23 00:22:04,571][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002236_9158656.pth... [2023-02-23 00:22:04,835][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002038_8347648.pth [2023-02-23 00:22:06,923][20346] Updated weights for policy 0, policy_version 2238 (0.0021) [2023-02-23 00:22:09,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9175040. Throughput: 0: 850.0. Samples: 1292764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:22:09,553][05631] Avg episode reward: [(0, '4.525')] [2023-02-23 00:22:14,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9195520. Throughput: 0: 862.9. Samples: 1295922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:22:14,553][05631] Avg episode reward: [(0, '4.572')] [2023-02-23 00:22:16,447][20346] Updated weights for policy 0, policy_version 2248 (0.0019) [2023-02-23 00:22:19,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 9216000. Throughput: 0: 859.2. Samples: 1301890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:22:19,560][05631] Avg episode reward: [(0, '4.762')] [2023-02-23 00:22:24,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9228288. Throughput: 0: 841.7. Samples: 1305904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:22:24,562][05631] Avg episode reward: [(0, '4.788')] [2023-02-23 00:22:29,550][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9244672. Throughput: 0: 841.0. Samples: 1307844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:22:29,554][05631] Avg episode reward: [(0, '5.008')] [2023-02-23 00:22:30,279][20346] Updated weights for policy 0, policy_version 2258 (0.0012) [2023-02-23 00:22:34,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3415.6). Total num frames: 9265152. Throughput: 0: 866.8. Samples: 1313726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:22:34,553][05631] Avg episode reward: [(0, '4.812')] [2023-02-23 00:22:39,550][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.4, 300 sec: 3401.8). Total num frames: 9281536. Throughput: 0: 849.6. Samples: 1319414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:22:39,553][05631] Avg episode reward: [(0, '4.714')] [2023-02-23 00:22:41,736][20346] Updated weights for policy 0, policy_version 2268 (0.0013) [2023-02-23 00:22:44,550][05631] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 9293824. Throughput: 0: 834.3. Samples: 1321422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:22:44,556][05631] Avg episode reward: [(0, '4.571')] [2023-02-23 00:22:49,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9310208. Throughput: 0: 836.0. Samples: 1325510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:22:49,553][05631] Avg episode reward: [(0, '4.592')] [2023-02-23 00:22:53,824][20346] Updated weights for policy 0, policy_version 2278 (0.0023) [2023-02-23 00:22:54,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 9330688. Throughput: 0: 863.7. Samples: 1331630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:22:54,553][05631] Avg episode reward: [(0, '4.844')] [2023-02-23 00:22:59,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9351168. Throughput: 0: 861.4. Samples: 1334686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:22:59,552][05631] Avg episode reward: [(0, '4.676')] [2023-02-23 00:23:04,552][05631] Fps is (10 sec: 3276.2, 60 sec: 3413.2, 300 sec: 3387.9). Total num frames: 9363456. Throughput: 0: 829.9. Samples: 1339238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:23:04,560][05631] Avg episode reward: [(0, '4.641')] [2023-02-23 00:23:06,742][20346] Updated weights for policy 0, policy_version 2288 (0.0013) [2023-02-23 00:23:09,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 9375744. Throughput: 0: 830.7. Samples: 1343284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:23:09,555][05631] Avg episode reward: [(0, '4.677')] [2023-02-23 00:23:14,550][05631] Fps is (10 sec: 3277.4, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9396224. Throughput: 0: 851.6. Samples: 1346164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:23:14,552][05631] Avg episode reward: [(0, '4.701')] [2023-02-23 00:23:17,597][20346] Updated weights for policy 0, policy_version 2298 (0.0014) [2023-02-23 00:23:19,554][05631] Fps is (10 sec: 4094.3, 60 sec: 3344.8, 300 sec: 3401.7). Total num frames: 9416704. Throughput: 0: 856.9. Samples: 1352288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:23:19,557][05631] Avg episode reward: [(0, '4.431')] [2023-02-23 00:23:24,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 9433088. Throughput: 0: 829.8. Samples: 1356756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:23:24,560][05631] Avg episode reward: [(0, '4.568')] [2023-02-23 00:23:29,550][05631] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 9445376. Throughput: 0: 828.3. Samples: 1358694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:23:29,556][05631] Avg episode reward: [(0, '4.460')] [2023-02-23 00:23:31,384][20346] Updated weights for policy 0, policy_version 2308 (0.0015) [2023-02-23 00:23:34,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9465856. Throughput: 0: 849.6. Samples: 1363740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:23:34,552][05631] Avg episode reward: [(0, '4.759')] [2023-02-23 00:23:39,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9486336. Throughput: 0: 851.6. Samples: 1369950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:23:39,553][05631] Avg episode reward: [(0, '4.737')] [2023-02-23 00:23:41,990][20346] Updated weights for policy 0, policy_version 2318 (0.0014) [2023-02-23 00:23:44,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 9498624. Throughput: 0: 840.6. Samples: 1372514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:23:44,556][05631] Avg episode reward: [(0, '4.778')] [2023-02-23 00:23:49,551][05631] Fps is (10 sec: 2457.3, 60 sec: 3345.0, 300 sec: 3387.9). Total num frames: 9510912. Throughput: 0: 826.9. Samples: 1376450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:23:49,554][05631] Avg episode reward: [(0, '4.816')] [2023-02-23 00:23:54,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9531392. Throughput: 0: 847.1. Samples: 1381404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:23:54,553][05631] Avg episode reward: [(0, '4.618')] [2023-02-23 00:23:55,309][20346] Updated weights for policy 0, policy_version 2328 (0.0020) [2023-02-23 00:23:59,550][05631] Fps is (10 sec: 4096.4, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9551872. Throughput: 0: 849.2. Samples: 1384378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:23:59,559][05631] Avg episode reward: [(0, '4.606')] [2023-02-23 00:24:04,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9564160. Throughput: 0: 835.8. Samples: 1389894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:24:04,559][05631] Avg episode reward: [(0, '4.633')] [2023-02-23 00:24:04,574][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002335_9564160.pth... [2023-02-23 00:24:04,823][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002137_8753152.pth [2023-02-23 00:24:07,895][20346] Updated weights for policy 0, policy_version 2338 (0.0018) [2023-02-23 00:24:09,552][05631] Fps is (10 sec: 2866.6, 60 sec: 3413.2, 300 sec: 3387.9). Total num frames: 9580544. Throughput: 0: 821.5. Samples: 1393726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:24:09,558][05631] Avg episode reward: [(0, '4.744')] [2023-02-23 00:24:14,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 9596928. Throughput: 0: 822.6. Samples: 1395712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:24:14,552][05631] Avg episode reward: [(0, '4.597')] [2023-02-23 00:24:19,250][20346] Updated weights for policy 0, policy_version 2348 (0.0013) [2023-02-23 00:24:19,550][05631] Fps is (10 sec: 3687.3, 60 sec: 3345.3, 300 sec: 3401.8). Total num frames: 9617408. Throughput: 0: 846.7. Samples: 1401842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:24:19,553][05631] Avg episode reward: [(0, '4.468')] [2023-02-23 00:24:24,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 9633792. Throughput: 0: 834.8. Samples: 1407516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:24:24,553][05631] Avg episode reward: [(0, '4.419')] [2023-02-23 00:24:29,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9646080. Throughput: 0: 820.8. Samples: 1409448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:24:29,560][05631] Avg episode reward: [(0, '4.531')] [2023-02-23 00:24:33,175][20346] Updated weights for policy 0, policy_version 2358 (0.0029) [2023-02-23 00:24:34,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 9662464. Throughput: 0: 818.9. Samples: 1413300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:24:34,553][05631] Avg episode reward: [(0, '4.514')] [2023-02-23 00:24:39,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 9682944. Throughput: 0: 842.6. Samples: 1419320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:24:39,563][05631] Avg episode reward: [(0, '4.534')] [2023-02-23 00:24:42,964][20346] Updated weights for policy 0, policy_version 2368 (0.0024) [2023-02-23 00:24:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 9703424. Throughput: 0: 846.8. Samples: 1422482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:24:44,558][05631] Avg episode reward: [(0, '4.342')] [2023-02-23 00:24:49,550][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 9715712. Throughput: 0: 826.3. Samples: 1427076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:24:49,558][05631] Avg episode reward: [(0, '4.595')] [2023-02-23 00:24:54,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 9728000. Throughput: 0: 830.2. Samples: 1431084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:24:54,553][05631] Avg episode reward: [(0, '4.671')] [2023-02-23 00:24:56,694][20346] Updated weights for policy 0, policy_version 2378 (0.0020) [2023-02-23 00:24:59,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 9748480. Throughput: 0: 855.1. Samples: 1434192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:24:59,564][05631] Avg episode reward: [(0, '4.548')] [2023-02-23 00:25:04,550][05631] Fps is (10 sec: 4095.9, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 9768960. Throughput: 0: 856.5. Samples: 1440386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:04,564][05631] Avg episode reward: [(0, '4.483')] [2023-02-23 00:25:07,949][20346] Updated weights for policy 0, policy_version 2388 (0.0015) [2023-02-23 00:25:09,559][05631] Fps is (10 sec: 3683.1, 60 sec: 3413.0, 300 sec: 3374.0). Total num frames: 9785344. Throughput: 0: 831.5. Samples: 1444942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:25:09,566][05631] Avg episode reward: [(0, '4.498')] [2023-02-23 00:25:14,552][05631] Fps is (10 sec: 2866.6, 60 sec: 3344.9, 300 sec: 3374.0). Total num frames: 9797632. Throughput: 0: 835.3. Samples: 1447038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:25:14,556][05631] Avg episode reward: [(0, '4.677')] [2023-02-23 00:25:19,550][05631] Fps is (10 sec: 3279.5, 60 sec: 3345.0, 300 sec: 3387.9). Total num frames: 9818112. Throughput: 0: 860.7. Samples: 1452032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:19,556][05631] Avg episode reward: [(0, '4.564')] [2023-02-23 00:25:20,274][20346] Updated weights for policy 0, policy_version 2398 (0.0013) [2023-02-23 00:25:24,550][05631] Fps is (10 sec: 4096.9, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 9838592. Throughput: 0: 863.4. Samples: 1458174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:24,553][05631] Avg episode reward: [(0, '4.700')] [2023-02-23 00:25:29,550][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 9850880. Throughput: 0: 848.2. Samples: 1460650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:29,561][05631] Avg episode reward: [(0, '4.704')] [2023-02-23 00:25:33,325][20346] Updated weights for policy 0, policy_version 2408 (0.0019) [2023-02-23 00:25:34,550][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 9863168. Throughput: 0: 832.0. Samples: 1464514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:25:34,555][05631] Avg episode reward: [(0, '4.750')] [2023-02-23 00:25:39,550][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9883648. Throughput: 0: 852.1. Samples: 1469430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:39,552][05631] Avg episode reward: [(0, '4.673')] [2023-02-23 00:25:44,169][20346] Updated weights for policy 0, policy_version 2418 (0.0020) [2023-02-23 00:25:44,550][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9904128. Throughput: 0: 852.4. Samples: 1472552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:25:44,559][05631] Avg episode reward: [(0, '4.552')] [2023-02-23 00:25:49,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 9920512. Throughput: 0: 835.5. Samples: 1477984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:25:49,556][05631] Avg episode reward: [(0, '4.541')] [2023-02-23 00:25:54,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 9932800. Throughput: 0: 820.6. Samples: 1481860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:25:54,559][05631] Avg episode reward: [(0, '4.661')] [2023-02-23 00:25:58,327][20346] Updated weights for policy 0, policy_version 2428 (0.0030) [2023-02-23 00:25:59,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9949184. Throughput: 0: 819.3. Samples: 1483906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:25:59,553][05631] Avg episode reward: [(0, '4.710')] [2023-02-23 00:26:04,550][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 9969664. Throughput: 0: 840.1. Samples: 1489838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:26:04,553][05631] Avg episode reward: [(0, '4.772')] [2023-02-23 00:26:04,574][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002434_9969664.pth... [2023-02-23 00:26:04,701][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002236_9158656.pth [2023-02-23 00:26:08,983][20346] Updated weights for policy 0, policy_version 2438 (0.0020) [2023-02-23 00:26:09,550][05631] Fps is (10 sec: 3686.3, 60 sec: 3345.6, 300 sec: 3360.1). Total num frames: 9986048. Throughput: 0: 827.5. Samples: 1495410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:26:09,554][05631] Avg episode reward: [(0, '4.747')] [2023-02-23 00:26:14,550][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.2, 300 sec: 3360.1). Total num frames: 9998336. Throughput: 0: 816.5. Samples: 1497392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:26:14,559][05631] Avg episode reward: [(0, '4.726')] [2023-02-23 00:26:16,747][20332] Stopping Batcher_0... [2023-02-23 00:26:16,748][20332] Loop batcher_evt_loop terminating... [2023-02-23 00:26:16,748][05631] Component Batcher_0 stopped! [2023-02-23 00:26:16,760][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-23 00:26:16,819][20346] Weights refcount: 2 0 [2023-02-23 00:26:16,824][05631] Component InferenceWorker_p0-w0 stopped! [2023-02-23 00:26:16,830][20346] Stopping InferenceWorker_p0-w0... [2023-02-23 00:26:16,832][20346] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 00:26:16,918][20369] Stopping RolloutWorker_w6... [2023-02-23 00:26:16,919][20369] Loop rollout_proc6_evt_loop terminating... [2023-02-23 00:26:16,920][05631] Component RolloutWorker_w6 stopped! [2023-02-23 00:26:16,938][20350] Stopping RolloutWorker_w0... [2023-02-23 00:26:16,939][20350] Loop rollout_proc0_evt_loop terminating... [2023-02-23 00:26:16,938][05631] Component RolloutWorker_w0 stopped! [2023-02-23 00:26:16,955][20359] Stopping RolloutWorker_w4... [2023-02-23 00:26:16,955][05631] Component RolloutWorker_w4 stopped! [2023-02-23 00:26:16,963][20348] Stopping RolloutWorker_w2... [2023-02-23 00:26:16,957][20359] Loop rollout_proc4_evt_loop terminating... [2023-02-23 00:26:16,964][20348] Loop rollout_proc2_evt_loop terminating... [2023-02-23 00:26:16,963][05631] Component RolloutWorker_w2 stopped! [2023-02-23 00:26:17,003][05631] Component RolloutWorker_w3 stopped! [2023-02-23 00:26:17,007][20357] Stopping RolloutWorker_w3... [2023-02-23 00:26:17,013][05631] Component RolloutWorker_w5 stopped! [2023-02-23 00:26:17,017][20361] Stopping RolloutWorker_w5... [2023-02-23 00:26:17,038][20357] Loop rollout_proc3_evt_loop terminating... [2023-02-23 00:26:17,018][20361] Loop rollout_proc5_evt_loop terminating... [2023-02-23 00:26:17,059][05631] Component RolloutWorker_w7 stopped! [2023-02-23 00:26:17,065][20367] Stopping RolloutWorker_w7... [2023-02-23 00:26:17,066][20367] Loop rollout_proc7_evt_loop terminating... [2023-02-23 00:26:17,078][05631] Component RolloutWorker_w1 stopped! [2023-02-23 00:26:17,084][20347] Stopping RolloutWorker_w1... [2023-02-23 00:26:17,092][20332] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002335_9564160.pth [2023-02-23 00:26:17,108][20347] Loop rollout_proc1_evt_loop terminating... [2023-02-23 00:26:17,125][20332] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-23 00:26:17,423][20332] Stopping LearnerWorker_p0... [2023-02-23 00:26:17,423][05631] Component LearnerWorker_p0 stopped! [2023-02-23 00:26:17,425][05631] Waiting for process learner_proc0 to stop... [2023-02-23 00:26:17,438][20332] Loop learner_proc0_evt_loop terminating... [2023-02-23 00:26:19,855][05631] Waiting for process inference_proc0-0 to join... [2023-02-23 00:26:19,950][05631] Waiting for process rollout_proc0 to join... [2023-02-23 00:26:19,953][05631] Waiting for process rollout_proc1 to join... [2023-02-23 00:26:20,088][05631] Waiting for process rollout_proc2 to join... [2023-02-23 00:26:20,091][05631] Waiting for process rollout_proc3 to join... [2023-02-23 00:26:20,092][05631] Waiting for process rollout_proc4 to join... [2023-02-23 00:26:20,093][05631] Waiting for process rollout_proc5 to join... [2023-02-23 00:26:20,095][05631] Waiting for process rollout_proc6 to join... [2023-02-23 00:26:20,097][05631] Waiting for process rollout_proc7 to join... [2023-02-23 00:26:20,099][05631] Batcher 0 profile tree view: batching: 41.0052, releasing_batches: 0.0383 [2023-02-23 00:26:20,101][05631] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0045 wait_policy_total: 873.6751 update_model: 11.9382 weight_update: 0.0014 one_step: 0.0174 handle_policy_step: 837.9301 deserialize: 23.5857, stack: 4.7781, obs_to_device_normalize: 181.7927, forward: 411.5190, send_messages: 40.9359 prepare_outputs: 133.1880 to_cpu: 82.7740 [2023-02-23 00:26:20,103][05631] Learner 0 profile tree view: misc: 0.0115, prepare_batch: 25.4264 train: 123.5324 epoch_init: 0.0115, minibatch_init: 0.0210, losses_postprocess: 0.9462, kl_divergence: 0.8872, after_optimizer: 4.9524 calculate_losses: 42.0782 losses_init: 0.0179, forward_head: 2.8687, bptt_initial: 27.5621, tail: 1.8407, advantages_returns: 0.5740, losses: 5.1063 bptt: 3.5571 bptt_forward_core: 3.4021 update: 73.4774 clip: 2.2143 [2023-02-23 00:26:20,104][05631] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5945, enqueue_policy_requests: 246.9817, env_step: 1341.2903, overhead: 37.3601, complete_rollouts: 11.7287 save_policy_outputs: 34.9184 split_output_tensors: 16.8515 [2023-02-23 00:26:20,105][05631] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4761, enqueue_policy_requests: 247.1433, env_step: 1345.4331, overhead: 35.6691, complete_rollouts: 10.7025 save_policy_outputs: 33.2625 split_output_tensors: 15.9364 [2023-02-23 00:26:20,107][05631] Loop Runner_EvtLoop terminating... [2023-02-23 00:26:20,108][05631] Runner profile tree view: main_loop: 1813.1331 [2023-02-23 00:26:20,110][05631] Collected {0: 10006528}, FPS: 3309.5 [2023-02-23 00:26:20,166][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:26:20,167][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:26:20,168][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:26:20,171][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:26:20,172][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:26:20,173][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:26:20,175][05631] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:26:20,176][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:26:20,177][05631] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 00:26:20,178][05631] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 00:26:20,180][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:26:20,181][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:26:20,182][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:26:20,184][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:26:20,185][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:26:20,217][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:26:20,218][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:26:20,236][05631] ConvEncoder: input_channels=3 [2023-02-23 00:26:20,286][05631] Conv encoder output size: 512 [2023-02-23 00:26:20,288][05631] Policy head output size: 512 [2023-02-23 00:26:20,316][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-23 00:26:20,867][05631] Num frames 100... [2023-02-23 00:26:21,004][05631] Num frames 200... [2023-02-23 00:26:21,127][05631] Num frames 300... [2023-02-23 00:26:21,239][05631] Num frames 400... [2023-02-23 00:26:21,314][05631] Avg episode rewards: #0: 4.160, true rewards: #0: 4.160 [2023-02-23 00:26:21,315][05631] Avg episode reward: 4.160, avg true_objective: 4.160 [2023-02-23 00:26:21,428][05631] Num frames 500... [2023-02-23 00:26:21,552][05631] Num frames 600... [2023-02-23 00:26:21,669][05631] Num frames 700... [2023-02-23 00:26:21,783][05631] Num frames 800... [2023-02-23 00:26:21,835][05631] Avg episode rewards: #0: 4.000, true rewards: #0: 4.000 [2023-02-23 00:26:21,837][05631] Avg episode reward: 4.000, avg true_objective: 4.000 [2023-02-23 00:26:21,964][05631] Num frames 900... [2023-02-23 00:26:22,098][05631] Num frames 1000... [2023-02-23 00:26:22,221][05631] Num frames 1100... [2023-02-23 00:26:22,345][05631] Num frames 1200... [2023-02-23 00:26:22,423][05631] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2023-02-23 00:26:22,424][05631] Avg episode reward: 4.387, avg true_objective: 4.053 [2023-02-23 00:26:22,527][05631] Num frames 1300... [2023-02-23 00:26:22,643][05631] Num frames 1400... [2023-02-23 00:26:22,761][05631] Num frames 1500... [2023-02-23 00:26:22,882][05631] Num frames 1600... [2023-02-23 00:26:22,934][05631] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2023-02-23 00:26:22,936][05631] Avg episode reward: 4.250, avg true_objective: 4.000 [2023-02-23 00:26:23,065][05631] Num frames 1700... [2023-02-23 00:26:23,188][05631] Num frames 1800... [2023-02-23 00:26:23,316][05631] Avg episode rewards: #0: 3.912, true rewards: #0: 3.712 [2023-02-23 00:26:23,319][05631] Avg episode reward: 3.912, avg true_objective: 3.712 [2023-02-23 00:26:23,374][05631] Num frames 1900... [2023-02-23 00:26:23,493][05631] Num frames 2000... [2023-02-23 00:26:23,611][05631] Num frames 2100... [2023-02-23 00:26:23,683][05631] Avg episode rewards: #0: 3.687, true rewards: #0: 3.520 [2023-02-23 00:26:23,684][05631] Avg episode reward: 3.687, avg true_objective: 3.520 [2023-02-23 00:26:23,791][05631] Num frames 2200... [2023-02-23 00:26:23,914][05631] Num frames 2300... [2023-02-23 00:26:24,032][05631] Num frames 2400... [2023-02-23 00:26:24,151][05631] Num frames 2500... [2023-02-23 00:26:24,239][05631] Avg episode rewards: #0: 3.897, true rewards: #0: 3.611 [2023-02-23 00:26:24,241][05631] Avg episode reward: 3.897, avg true_objective: 3.611 [2023-02-23 00:26:24,340][05631] Num frames 2600... [2023-02-23 00:26:24,456][05631] Num frames 2700... [2023-02-23 00:26:24,571][05631] Num frames 2800... [2023-02-23 00:26:24,684][05631] Num frames 2900... [2023-02-23 00:26:24,752][05631] Avg episode rewards: #0: 4.261, true rewards: #0: 3.636 [2023-02-23 00:26:24,754][05631] Avg episode reward: 4.261, avg true_objective: 3.636 [2023-02-23 00:26:24,865][05631] Num frames 3000... [2023-02-23 00:26:24,981][05631] Num frames 3100... [2023-02-23 00:26:25,119][05631] Avg episode rewards: #0: 4.072, true rewards: #0: 3.517 [2023-02-23 00:26:25,121][05631] Avg episode reward: 4.072, avg true_objective: 3.517 [2023-02-23 00:26:25,166][05631] Num frames 3200... [2023-02-23 00:26:25,288][05631] Num frames 3300... [2023-02-23 00:26:25,411][05631] Num frames 3400... [2023-02-23 00:26:25,533][05631] Num frames 3500... [2023-02-23 00:26:25,645][05631] Avg episode rewards: #0: 4.049, true rewards: #0: 3.549 [2023-02-23 00:26:25,647][05631] Avg episode reward: 4.049, avg true_objective: 3.549 [2023-02-23 00:26:45,517][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:26:45,677][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:26:45,679][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:26:45,682][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:26:45,684][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:26:45,686][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:26:45,688][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:26:45,690][05631] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 00:26:45,692][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:26:45,694][05631] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 00:26:45,695][05631] Adding new argument 'hf_repository'='pittawat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 00:26:45,696][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:26:45,697][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:26:45,698][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:26:45,699][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:26:45,700][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:26:45,734][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:26:45,737][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:26:45,762][05631] ConvEncoder: input_channels=3 [2023-02-23 00:26:45,846][05631] Conv encoder output size: 512 [2023-02-23 00:26:45,848][05631] Policy head output size: 512 [2023-02-23 00:26:45,882][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-23 00:26:46,683][05631] Num frames 100... [2023-02-23 00:26:46,875][05631] Num frames 200... [2023-02-23 00:26:47,065][05631] Num frames 300... [2023-02-23 00:26:47,271][05631] Num frames 400... [2023-02-23 00:26:47,443][05631] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2023-02-23 00:26:47,445][05631] Avg episode reward: 5.480, avg true_objective: 4.480 [2023-02-23 00:26:47,554][05631] Num frames 500... [2023-02-23 00:26:47,751][05631] Num frames 600... [2023-02-23 00:26:47,950][05631] Num frames 700... [2023-02-23 00:26:48,129][05631] Num frames 800... [2023-02-23 00:26:48,297][05631] Avg episode rewards: #0: 4.820, true rewards: #0: 4.320 [2023-02-23 00:26:48,300][05631] Avg episode reward: 4.820, avg true_objective: 4.320 [2023-02-23 00:26:48,369][05631] Num frames 900... [2023-02-23 00:26:48,498][05631] Num frames 1000... [2023-02-23 00:26:48,614][05631] Num frames 1100... [2023-02-23 00:26:48,732][05631] Num frames 1200... [2023-02-23 00:26:48,840][05631] Avg episode rewards: #0: 4.493, true rewards: #0: 4.160 [2023-02-23 00:26:48,842][05631] Avg episode reward: 4.493, avg true_objective: 4.160 [2023-02-23 00:26:48,905][05631] Num frames 1300... [2023-02-23 00:26:49,017][05631] Num frames 1400... [2023-02-23 00:26:49,133][05631] Num frames 1500... [2023-02-23 00:26:49,244][05631] Num frames 1600... [2023-02-23 00:26:49,337][05631] Avg episode rewards: #0: 4.330, true rewards: #0: 4.080 [2023-02-23 00:26:49,339][05631] Avg episode reward: 4.330, avg true_objective: 4.080 [2023-02-23 00:26:49,426][05631] Num frames 1700... [2023-02-23 00:26:49,548][05631] Num frames 1800... [2023-02-23 00:26:49,664][05631] Num frames 1900... [2023-02-23 00:26:49,783][05631] Num frames 2000... [2023-02-23 00:26:49,858][05631] Avg episode rewards: #0: 4.232, true rewards: #0: 4.032 [2023-02-23 00:26:49,861][05631] Avg episode reward: 4.232, avg true_objective: 4.032 [2023-02-23 00:26:49,963][05631] Num frames 2100... [2023-02-23 00:26:50,090][05631] Num frames 2200... [2023-02-23 00:26:50,210][05631] Num frames 2300... [2023-02-23 00:26:50,335][05631] Num frames 2400... [2023-02-23 00:26:50,450][05631] Num frames 2500... [2023-02-23 00:26:50,577][05631] Num frames 2600... [2023-02-23 00:26:50,663][05631] Avg episode rewards: #0: 5.040, true rewards: #0: 4.373 [2023-02-23 00:26:50,665][05631] Avg episode reward: 5.040, avg true_objective: 4.373 [2023-02-23 00:26:50,758][05631] Num frames 2700... [2023-02-23 00:26:50,901][05631] Num frames 2800... [2023-02-23 00:26:51,019][05631] Num frames 2900... [2023-02-23 00:26:51,133][05631] Num frames 3000... [2023-02-23 00:26:51,199][05631] Avg episode rewards: #0: 4.869, true rewards: #0: 4.297 [2023-02-23 00:26:51,201][05631] Avg episode reward: 4.869, avg true_objective: 4.297 [2023-02-23 00:26:51,318][05631] Num frames 3100... [2023-02-23 00:26:51,442][05631] Num frames 3200... [2023-02-23 00:26:51,570][05631] Num frames 3300... [2023-02-23 00:26:51,731][05631] Avg episode rewards: #0: 4.740, true rewards: #0: 4.240 [2023-02-23 00:26:51,733][05631] Avg episode reward: 4.740, avg true_objective: 4.240 [2023-02-23 00:26:51,749][05631] Num frames 3400... [2023-02-23 00:26:51,877][05631] Num frames 3500... [2023-02-23 00:26:52,003][05631] Num frames 3600... [2023-02-23 00:26:52,125][05631] Num frames 3700... [2023-02-23 00:26:52,247][05631] Num frames 3800... [2023-02-23 00:26:52,368][05631] Num frames 3900... [2023-02-23 00:26:52,429][05631] Avg episode rewards: #0: 5.116, true rewards: #0: 4.338 [2023-02-23 00:26:52,432][05631] Avg episode reward: 5.116, avg true_objective: 4.338 [2023-02-23 00:26:52,553][05631] Num frames 4000... [2023-02-23 00:26:52,670][05631] Num frames 4100... [2023-02-23 00:26:52,789][05631] Num frames 4200... [2023-02-23 00:26:52,907][05631] Num frames 4300... [2023-02-23 00:26:52,988][05631] Avg episode rewards: #0: 5.020, true rewards: #0: 4.320 [2023-02-23 00:26:52,991][05631] Avg episode reward: 5.020, avg true_objective: 4.320 [2023-02-23 00:27:13,979][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:27:19,636][05631] The model has been pushed to https://huggingface.co/pittawat/rl_course_vizdoom_health_gathering_supreme [2023-02-23 00:38:29,240][05631] Environment doom_basic already registered, overwriting... [2023-02-23 00:38:29,244][05631] Environment doom_two_colors_easy already registered, overwriting... [2023-02-23 00:38:29,246][05631] Environment doom_two_colors_hard already registered, overwriting... [2023-02-23 00:38:29,248][05631] Environment doom_dm already registered, overwriting... [2023-02-23 00:38:29,250][05631] Environment doom_dwango5 already registered, overwriting... [2023-02-23 00:38:29,251][05631] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-23 00:38:29,253][05631] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-23 00:38:29,254][05631] Environment doom_my_way_home already registered, overwriting... [2023-02-23 00:38:29,256][05631] Environment doom_deadly_corridor already registered, overwriting... [2023-02-23 00:38:29,258][05631] Environment doom_defend_the_center already registered, overwriting... [2023-02-23 00:38:29,259][05631] Environment doom_defend_the_line already registered, overwriting... [2023-02-23 00:38:29,261][05631] Environment doom_health_gathering already registered, overwriting... [2023-02-23 00:38:29,262][05631] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-23 00:38:29,263][05631] Environment doom_battle already registered, overwriting... [2023-02-23 00:38:29,265][05631] Environment doom_battle2 already registered, overwriting... [2023-02-23 00:38:29,266][05631] Environment doom_duel_bots already registered, overwriting... [2023-02-23 00:38:29,268][05631] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-23 00:38:29,269][05631] Environment doom_duel already registered, overwriting... [2023-02-23 00:38:29,271][05631] Environment doom_deathmatch_full already registered, overwriting... [2023-02-23 00:38:29,272][05631] Environment doom_benchmark already registered, overwriting... [2023-02-23 00:38:29,274][05631] register_encoder_factory: [2023-02-23 00:38:29,308][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:38:29,311][05631] Overriding arg 'gamma' with value 0.98 passed from command line [2023-02-23 00:38:29,317][05631] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-23 00:38:29,319][05631] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-23 00:38:29,322][05631] Weights and Biases integration disabled [2023-02-23 00:38:29,326][05631] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-23 00:38:32,154][05631] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.98 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-23 00:38:32,160][05631] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 00:38:32,167][05631] Rollout worker 0 uses device cpu [2023-02-23 00:38:32,172][05631] Rollout worker 1 uses device cpu [2023-02-23 00:38:32,174][05631] Rollout worker 2 uses device cpu [2023-02-23 00:38:32,177][05631] Rollout worker 3 uses device cpu [2023-02-23 00:38:32,180][05631] Rollout worker 4 uses device cpu [2023-02-23 00:38:32,182][05631] Rollout worker 5 uses device cpu [2023-02-23 00:38:32,184][05631] Rollout worker 6 uses device cpu [2023-02-23 00:38:32,193][05631] Rollout worker 7 uses device cpu [2023-02-23 00:38:32,318][05631] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:38:32,323][05631] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 00:38:32,354][05631] Starting all processes... [2023-02-23 00:38:32,359][05631] Starting process learner_proc0 [2023-02-23 00:38:32,505][05631] Starting all processes... [2023-02-23 00:38:32,516][05631] Starting process inference_proc0-0 [2023-02-23 00:38:32,517][05631] Starting process rollout_proc0 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc1 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc2 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc3 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc4 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc5 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc6 [2023-02-23 00:38:32,521][05631] Starting process rollout_proc7 [2023-02-23 00:38:44,016][34379] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:38:44,017][34379] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 00:38:44,094][34379] Num visible devices: 1 [2023-02-23 00:38:44,127][34379] Starting seed is not provided [2023-02-23 00:38:44,127][34379] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:38:44,128][34379] Initializing actor-critic model on device cuda:0 [2023-02-23 00:38:44,129][34379] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:38:44,136][34379] RunningMeanStd input shape: (1,) [2023-02-23 00:38:44,260][34379] ConvEncoder: input_channels=3 [2023-02-23 00:38:44,635][34394] Worker 0 uses CPU cores [0] [2023-02-23 00:38:44,821][34393] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:38:44,822][34393] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 00:38:44,894][34393] Num visible devices: 1 [2023-02-23 00:38:45,181][34379] Conv encoder output size: 512 [2023-02-23 00:38:45,182][34379] Policy head output size: 512 [2023-02-23 00:38:45,290][34379] Created Actor Critic model with architecture: [2023-02-23 00:38:45,292][34395] Worker 2 uses CPU cores [0] [2023-02-23 00:38:45,291][34379] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 00:38:45,460][34404] Worker 4 uses CPU cores [0] [2023-02-23 00:38:45,507][34400] Worker 1 uses CPU cores [1] [2023-02-23 00:38:45,837][34406] Worker 3 uses CPU cores [1] [2023-02-23 00:38:45,918][34415] Worker 5 uses CPU cores [1] [2023-02-23 00:38:46,026][34408] Worker 6 uses CPU cores [0] [2023-02-23 00:38:46,052][34416] Worker 7 uses CPU cores [1] [2023-02-23 00:38:49,594][34379] Using optimizer [2023-02-23 00:38:49,595][34379] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-23 00:38:49,629][34379] Loading model from checkpoint [2023-02-23 00:38:49,634][34379] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2023-02-23 00:38:49,635][34379] Initialized policy 0 weights for model version 2443 [2023-02-23 00:38:49,637][34379] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:38:49,644][34379] LearnerWorker_p0 finished initialization! [2023-02-23 00:38:49,753][34393] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:38:49,754][34393] RunningMeanStd input shape: (1,) [2023-02-23 00:38:49,767][34393] ConvEncoder: input_channels=3 [2023-02-23 00:38:49,868][34393] Conv encoder output size: 512 [2023-02-23 00:38:49,869][34393] Policy head output size: 512 [2023-02-23 00:38:52,129][05631] Inference worker 0-0 is ready! [2023-02-23 00:38:52,131][05631] All inference workers are ready! Signal rollout workers to start! [2023-02-23 00:38:52,234][34406] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,236][34408] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,235][34400] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,237][34404] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,232][34416] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,237][34394] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,232][34395] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,239][34415] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:38:52,310][05631] Heartbeat connected on Batcher_0 [2023-02-23 00:38:52,315][05631] Heartbeat connected on LearnerWorker_p0 [2023-02-23 00:38:52,366][05631] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 00:38:52,742][34406] Decorrelating experience for 0 frames... [2023-02-23 00:38:53,295][34415] Decorrelating experience for 0 frames... [2023-02-23 00:38:53,847][34395] Decorrelating experience for 0 frames... [2023-02-23 00:38:53,851][34408] Decorrelating experience for 0 frames... [2023-02-23 00:38:53,857][34404] Decorrelating experience for 0 frames... [2023-02-23 00:38:53,859][34394] Decorrelating experience for 0 frames... [2023-02-23 00:38:54,328][05631] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:38:55,102][34404] Decorrelating experience for 32 frames... [2023-02-23 00:38:55,113][34394] Decorrelating experience for 32 frames... [2023-02-23 00:38:55,146][34395] Decorrelating experience for 32 frames... [2023-02-23 00:38:55,194][34400] Decorrelating experience for 0 frames... [2023-02-23 00:38:55,186][34416] Decorrelating experience for 0 frames... [2023-02-23 00:38:56,646][34408] Decorrelating experience for 32 frames... [2023-02-23 00:38:56,994][34416] Decorrelating experience for 32 frames... [2023-02-23 00:38:57,000][34400] Decorrelating experience for 32 frames... [2023-02-23 00:38:57,003][34406] Decorrelating experience for 32 frames... [2023-02-23 00:38:57,016][34394] Decorrelating experience for 64 frames... [2023-02-23 00:38:57,018][34395] Decorrelating experience for 64 frames... [2023-02-23 00:38:58,783][34404] Decorrelating experience for 64 frames... [2023-02-23 00:38:58,802][34415] Decorrelating experience for 32 frames... [2023-02-23 00:38:59,076][34408] Decorrelating experience for 64 frames... [2023-02-23 00:38:59,141][34406] Decorrelating experience for 64 frames... [2023-02-23 00:38:59,160][34400] Decorrelating experience for 64 frames... [2023-02-23 00:38:59,261][34395] Decorrelating experience for 96 frames... [2023-02-23 00:38:59,275][34394] Decorrelating experience for 96 frames... [2023-02-23 00:38:59,327][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:38:59,623][05631] Heartbeat connected on RolloutWorker_w2 [2023-02-23 00:38:59,671][05631] Heartbeat connected on RolloutWorker_w0 [2023-02-23 00:39:00,510][34404] Decorrelating experience for 96 frames... [2023-02-23 00:39:00,708][05631] Heartbeat connected on RolloutWorker_w4 [2023-02-23 00:39:00,726][34408] Decorrelating experience for 96 frames... [2023-02-23 00:39:00,911][05631] Heartbeat connected on RolloutWorker_w6 [2023-02-23 00:39:01,609][34415] Decorrelating experience for 64 frames... [2023-02-23 00:39:01,635][34416] Decorrelating experience for 64 frames... [2023-02-23 00:39:01,747][34406] Decorrelating experience for 96 frames... [2023-02-23 00:39:01,758][34400] Decorrelating experience for 96 frames... [2023-02-23 00:39:02,043][05631] Heartbeat connected on RolloutWorker_w3 [2023-02-23 00:39:02,062][05631] Heartbeat connected on RolloutWorker_w1 [2023-02-23 00:39:04,220][34415] Decorrelating experience for 96 frames... [2023-02-23 00:39:04,249][34416] Decorrelating experience for 96 frames... [2023-02-23 00:39:04,327][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10006528. Throughput: 0: 166.6. Samples: 1666. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:39:04,329][05631] Avg episode reward: [(0, '1.925')] [2023-02-23 00:39:04,908][05631] Heartbeat connected on RolloutWorker_w5 [2023-02-23 00:39:04,955][05631] Heartbeat connected on RolloutWorker_w7 [2023-02-23 00:39:05,985][34379] Signal inference workers to stop experience collection... [2023-02-23 00:39:06,010][34393] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 00:39:07,046][34379] Signal inference workers to resume experience collection... [2023-02-23 00:39:07,049][34393] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 00:39:07,049][34379] Stopping Batcher_0... [2023-02-23 00:39:07,052][34379] Loop batcher_evt_loop terminating... [2023-02-23 00:39:07,050][05631] Component Batcher_0 stopped! [2023-02-23 00:39:07,084][34406] Stopping RolloutWorker_w3... [2023-02-23 00:39:07,085][34406] Loop rollout_proc3_evt_loop terminating... [2023-02-23 00:39:07,084][05631] Component RolloutWorker_w3 stopped! [2023-02-23 00:39:07,091][34415] Stopping RolloutWorker_w5... [2023-02-23 00:39:07,092][34400] Stopping RolloutWorker_w1... [2023-02-23 00:39:07,093][34415] Loop rollout_proc5_evt_loop terminating... [2023-02-23 00:39:07,094][34400] Loop rollout_proc1_evt_loop terminating... [2023-02-23 00:39:07,091][05631] Component RolloutWorker_w5 stopped! [2023-02-23 00:39:07,097][34416] Stopping RolloutWorker_w7... [2023-02-23 00:39:07,098][34416] Loop rollout_proc7_evt_loop terminating... [2023-02-23 00:39:07,098][05631] Component RolloutWorker_w1 stopped! [2023-02-23 00:39:07,105][05631] Component RolloutWorker_w7 stopped! [2023-02-23 00:39:07,143][05631] Component RolloutWorker_w0 stopped! [2023-02-23 00:39:07,150][34394] Stopping RolloutWorker_w0... [2023-02-23 00:39:07,150][34394] Loop rollout_proc0_evt_loop terminating... [2023-02-23 00:39:07,160][05631] Component RolloutWorker_w4 stopped! [2023-02-23 00:39:07,166][34404] Stopping RolloutWorker_w4... [2023-02-23 00:39:07,166][34404] Loop rollout_proc4_evt_loop terminating... [2023-02-23 00:39:07,159][34393] Weights refcount: 2 0 [2023-02-23 00:39:07,182][05631] Component InferenceWorker_p0-w0 stopped! [2023-02-23 00:39:07,191][34408] Stopping RolloutWorker_w6... [2023-02-23 00:39:07,191][05631] Component RolloutWorker_w6 stopped! [2023-02-23 00:39:07,193][34393] Stopping InferenceWorker_p0-w0... [2023-02-23 00:39:07,194][34393] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 00:39:07,200][34408] Loop rollout_proc6_evt_loop terminating... [2023-02-23 00:39:07,204][05631] Component RolloutWorker_w2 stopped! [2023-02-23 00:39:07,206][34395] Stopping RolloutWorker_w2... [2023-02-23 00:39:07,208][34395] Loop rollout_proc2_evt_loop terminating... [2023-02-23 00:39:09,166][34379] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2023-02-23 00:39:09,278][34379] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002434_9969664.pth [2023-02-23 00:39:09,286][34379] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2023-02-23 00:39:09,425][05631] Component LearnerWorker_p0 stopped! [2023-02-23 00:39:09,431][05631] Waiting for process learner_proc0 to stop... [2023-02-23 00:39:09,426][34379] Stopping LearnerWorker_p0... [2023-02-23 00:39:09,436][34379] Loop learner_proc0_evt_loop terminating... [2023-02-23 00:39:10,754][05631] Waiting for process inference_proc0-0 to join... [2023-02-23 00:39:10,758][05631] Waiting for process rollout_proc0 to join... [2023-02-23 00:39:10,763][05631] Waiting for process rollout_proc1 to join... [2023-02-23 00:39:10,765][05631] Waiting for process rollout_proc2 to join... [2023-02-23 00:39:10,770][05631] Waiting for process rollout_proc3 to join... [2023-02-23 00:39:10,774][05631] Waiting for process rollout_proc4 to join... [2023-02-23 00:39:10,778][05631] Waiting for process rollout_proc5 to join... [2023-02-23 00:39:10,781][05631] Waiting for process rollout_proc6 to join... [2023-02-23 00:39:10,783][05631] Waiting for process rollout_proc7 to join... [2023-02-23 00:39:10,787][05631] Batcher 0 profile tree view: batching: 0.0630, releasing_batches: 0.0022 [2023-02-23 00:39:10,793][05631] InferenceWorker_p0-w0 profile tree view: update_model: 0.0221 wait_policy: 0.0000 wait_policy_total: 9.8845 one_step: 0.0029 handle_policy_step: 3.6858 deserialize: 0.0487, stack: 0.0120, obs_to_device_normalize: 0.3783, forward: 2.8836, send_messages: 0.0690 prepare_outputs: 0.2014 to_cpu: 0.1193 [2023-02-23 00:39:10,795][05631] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 5.0141 train: 0.6550 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0004, after_optimizer: 0.0048 calculate_losses: 0.1413 losses_init: 0.0000, forward_head: 0.1155, bptt_initial: 0.0169, tail: 0.0014, advantages_returns: 0.0011, losses: 0.0034 bptt: 0.0026 bptt_forward_core: 0.0025 update: 0.5072 clip: 0.0054 [2023-02-23 00:39:10,796][05631] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0233, enqueue_policy_requests: 0.9685, env_step: 3.1668, overhead: 0.0812, complete_rollouts: 0.0138 save_policy_outputs: 0.0673 split_output_tensors: 0.0201 [2023-02-23 00:39:10,797][05631] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.1870, env_step: 0.7674, overhead: 0.0097, complete_rollouts: 0.0001 save_policy_outputs: 0.0074 split_output_tensors: 0.0037 [2023-02-23 00:39:10,799][05631] Loop Runner_EvtLoop terminating... [2023-02-23 00:39:10,800][05631] Runner profile tree view: main_loop: 38.4461 [2023-02-23 00:39:10,801][05631] Collected {0: 10014720}, FPS: 213.1 [2023-02-23 00:39:10,846][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:39:10,850][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:39:10,854][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:39:10,856][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:39:10,858][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:39:10,859][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:39:10,865][05631] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:39:10,867][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:39:10,872][05631] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 00:39:10,874][05631] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 00:39:10,875][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:39:10,877][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:39:10,878][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:39:10,880][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:39:10,883][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:39:10,923][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:39:10,926][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:39:10,947][05631] ConvEncoder: input_channels=3 [2023-02-23 00:39:11,007][05631] Conv encoder output size: 512 [2023-02-23 00:39:11,009][05631] Policy head output size: 512 [2023-02-23 00:39:11,044][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2023-02-23 00:39:11,890][05631] Num frames 100... [2023-02-23 00:39:12,066][05631] Num frames 200... [2023-02-23 00:39:12,256][05631] Num frames 300... [2023-02-23 00:39:12,465][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:39:12,468][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:39:12,504][05631] Num frames 400... [2023-02-23 00:39:12,689][05631] Num frames 500... [2023-02-23 00:39:12,859][05631] Num frames 600... [2023-02-23 00:39:13,033][05631] Num frames 700... [2023-02-23 00:39:13,213][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:39:13,216][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:39:13,278][05631] Num frames 800... [2023-02-23 00:39:13,449][05631] Num frames 900... [2023-02-23 00:39:13,626][05631] Num frames 1000... [2023-02-23 00:39:13,797][05631] Num frames 1100... [2023-02-23 00:39:13,944][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:39:13,947][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:39:14,040][05631] Num frames 1200... [2023-02-23 00:39:14,212][05631] Num frames 1300... [2023-02-23 00:39:14,396][05631] Num frames 1400... [2023-02-23 00:39:14,571][05631] Num frames 1500... [2023-02-23 00:39:14,703][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:39:14,706][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:39:14,832][05631] Num frames 1600... [2023-02-23 00:39:14,955][05631] Num frames 1700... [2023-02-23 00:39:15,076][05631] Num frames 1800... [2023-02-23 00:39:15,196][05631] Num frames 1900... [2023-02-23 00:39:15,312][05631] Num frames 2000... [2023-02-23 00:39:15,443][05631] Num frames 2100... [2023-02-23 00:39:15,570][05631] Num frames 2200... [2023-02-23 00:39:15,676][05631] Avg episode rewards: #0: 5.280, true rewards: #0: 4.480 [2023-02-23 00:39:15,678][05631] Avg episode reward: 5.280, avg true_objective: 4.480 [2023-02-23 00:39:15,760][05631] Num frames 2300... [2023-02-23 00:39:15,888][05631] Num frames 2400... [2023-02-23 00:39:16,008][05631] Num frames 2500... [2023-02-23 00:39:16,129][05631] Num frames 2600... [2023-02-23 00:39:16,262][05631] Avg episode rewards: #0: 5.260, true rewards: #0: 4.427 [2023-02-23 00:39:16,264][05631] Avg episode reward: 5.260, avg true_objective: 4.427 [2023-02-23 00:39:16,324][05631] Num frames 2700... [2023-02-23 00:39:16,453][05631] Num frames 2800... [2023-02-23 00:39:16,577][05631] Num frames 2900... [2023-02-23 00:39:16,695][05631] Num frames 3000... [2023-02-23 00:39:16,797][05631] Avg episode rewards: #0: 5.057, true rewards: #0: 4.343 [2023-02-23 00:39:16,799][05631] Avg episode reward: 5.057, avg true_objective: 4.343 [2023-02-23 00:39:16,880][05631] Num frames 3100... [2023-02-23 00:39:17,017][05631] Num frames 3200... [2023-02-23 00:39:17,146][05631] Num frames 3300... [2023-02-23 00:39:17,277][05631] Num frames 3400... [2023-02-23 00:39:17,451][05631] Avg episode rewards: #0: 5.110, true rewards: #0: 4.360 [2023-02-23 00:39:17,454][05631] Avg episode reward: 5.110, avg true_objective: 4.360 [2023-02-23 00:39:17,476][05631] Num frames 3500... [2023-02-23 00:39:17,595][05631] Num frames 3600... [2023-02-23 00:39:17,733][05631] Num frames 3700... [2023-02-23 00:39:17,839][05631] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 [2023-02-23 00:39:17,842][05631] Avg episode reward: 4.827, avg true_objective: 4.160 [2023-02-23 00:39:17,915][05631] Num frames 3800... [2023-02-23 00:39:18,037][05631] Num frames 3900... [2023-02-23 00:39:18,159][05631] Num frames 4000... [2023-02-23 00:39:18,282][05631] Num frames 4100... [2023-02-23 00:39:18,412][05631] Avg episode rewards: #0: 4.860, true rewards: #0: 4.160 [2023-02-23 00:39:18,415][05631] Avg episode reward: 4.860, avg true_objective: 4.160 [2023-02-23 00:39:41,642][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:39:41,852][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:39:41,854][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:39:41,857][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:39:41,859][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:39:41,865][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:39:41,866][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:39:41,868][05631] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 00:39:41,869][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:39:41,870][05631] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 00:39:41,871][05631] Adding new argument 'hf_repository'='pittawat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 00:39:41,873][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:39:41,877][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:39:41,878][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:39:41,880][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:39:41,881][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:39:41,906][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:39:41,909][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:39:41,928][05631] ConvEncoder: input_channels=3 [2023-02-23 00:39:41,987][05631] Conv encoder output size: 512 [2023-02-23 00:39:41,990][05631] Policy head output size: 512 [2023-02-23 00:39:42,019][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2023-02-23 00:39:42,823][05631] Num frames 100... [2023-02-23 00:39:43,007][05631] Num frames 200... [2023-02-23 00:39:43,184][05631] Num frames 300... [2023-02-23 00:39:43,391][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:39:43,394][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:39:43,429][05631] Num frames 400... [2023-02-23 00:39:43,622][05631] Num frames 500... [2023-02-23 00:39:43,820][05631] Num frames 600... [2023-02-23 00:39:44,016][05631] Num frames 700... [2023-02-23 00:39:44,226][05631] Num frames 800... [2023-02-23 00:39:44,433][05631] Num frames 900... [2023-02-23 00:39:44,553][05631] Avg episode rewards: #0: 5.640, true rewards: #0: 4.640 [2023-02-23 00:39:44,555][05631] Avg episode reward: 5.640, avg true_objective: 4.640 [2023-02-23 00:39:44,709][05631] Num frames 1000... [2023-02-23 00:39:44,901][05631] Num frames 1100... [2023-02-23 00:39:45,072][05631] Num frames 1200... [2023-02-23 00:39:45,252][05631] Num frames 1300... [2023-02-23 00:39:45,337][05631] Avg episode rewards: #0: 5.040, true rewards: #0: 4.373 [2023-02-23 00:39:45,340][05631] Avg episode reward: 5.040, avg true_objective: 4.373 [2023-02-23 00:39:45,500][05631] Num frames 1400... [2023-02-23 00:39:45,630][05631] Num frames 1500... [2023-02-23 00:39:45,751][05631] Num frames 1600... [2023-02-23 00:39:45,867][05631] Num frames 1700... [2023-02-23 00:39:45,955][05631] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320 [2023-02-23 00:39:45,963][05631] Avg episode reward: 5.070, avg true_objective: 4.320 [2023-02-23 00:39:46,059][05631] Num frames 1800... [2023-02-23 00:39:46,186][05631] Num frames 1900... [2023-02-23 00:39:46,324][05631] Num frames 2000... [2023-02-23 00:39:46,443][05631] Num frames 2100... [2023-02-23 00:39:46,551][05631] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288 [2023-02-23 00:39:46,553][05631] Avg episode reward: 5.088, avg true_objective: 4.288 [2023-02-23 00:39:46,628][05631] Num frames 2200... [2023-02-23 00:39:46,751][05631] Num frames 2300... [2023-02-23 00:39:46,870][05631] Num frames 2400... [2023-02-23 00:39:46,990][05631] Num frames 2500... [2023-02-23 00:39:47,080][05631] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213 [2023-02-23 00:39:47,082][05631] Avg episode reward: 4.880, avg true_objective: 4.213 [2023-02-23 00:39:47,182][05631] Num frames 2600... [2023-02-23 00:39:47,318][05631] Num frames 2700... [2023-02-23 00:39:47,455][05631] Num frames 2800... [2023-02-23 00:39:47,581][05631] Num frames 2900... [2023-02-23 00:39:47,727][05631] Avg episode rewards: #0: 4.966, true rewards: #0: 4.251 [2023-02-23 00:39:47,729][05631] Avg episode reward: 4.966, avg true_objective: 4.251 [2023-02-23 00:39:47,764][05631] Num frames 3000... [2023-02-23 00:39:47,890][05631] Num frames 3100... [2023-02-23 00:39:48,018][05631] Num frames 3200... [2023-02-23 00:39:48,138][05631] Num frames 3300... [2023-02-23 00:39:48,265][05631] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200 [2023-02-23 00:39:48,267][05631] Avg episode reward: 4.825, avg true_objective: 4.200 [2023-02-23 00:39:48,322][05631] Num frames 3400... [2023-02-23 00:39:48,453][05631] Num frames 3500... [2023-02-23 00:39:48,573][05631] Num frames 3600... [2023-02-23 00:39:48,693][05631] Num frames 3700... [2023-02-23 00:39:48,808][05631] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160 [2023-02-23 00:39:48,811][05631] Avg episode reward: 4.716, avg true_objective: 4.160 [2023-02-23 00:39:48,888][05631] Num frames 3800... [2023-02-23 00:39:49,016][05631] Num frames 3900... [2023-02-23 00:39:49,139][05631] Num frames 4000... [2023-02-23 00:39:49,195][05631] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2023-02-23 00:39:49,198][05631] Avg episode reward: 4.500, avg true_objective: 4.000 [2023-02-23 00:40:09,146][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:40:14,495][05631] The model has been pushed to https://huggingface.co/pittawat/rl_course_vizdoom_health_gathering_supreme [2023-02-23 00:44:45,344][05631] Environment doom_basic already registered, overwriting... [2023-02-23 00:44:45,347][05631] Environment doom_two_colors_easy already registered, overwriting... [2023-02-23 00:44:45,351][05631] Environment doom_two_colors_hard already registered, overwriting... [2023-02-23 00:44:45,352][05631] Environment doom_dm already registered, overwriting... [2023-02-23 00:44:45,353][05631] Environment doom_dwango5 already registered, overwriting... [2023-02-23 00:44:45,356][05631] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-23 00:44:45,357][05631] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-23 00:44:45,360][05631] Environment doom_my_way_home already registered, overwriting... [2023-02-23 00:44:45,361][05631] Environment doom_deadly_corridor already registered, overwriting... [2023-02-23 00:44:45,362][05631] Environment doom_defend_the_center already registered, overwriting... [2023-02-23 00:44:45,366][05631] Environment doom_defend_the_line already registered, overwriting... [2023-02-23 00:44:45,367][05631] Environment doom_health_gathering already registered, overwriting... [2023-02-23 00:44:45,368][05631] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-23 00:44:45,369][05631] Environment doom_battle already registered, overwriting... [2023-02-23 00:44:45,371][05631] Environment doom_battle2 already registered, overwriting... [2023-02-23 00:44:45,373][05631] Environment doom_duel_bots already registered, overwriting... [2023-02-23 00:44:45,376][05631] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-23 00:44:45,377][05631] Environment doom_duel already registered, overwriting... [2023-02-23 00:44:45,378][05631] Environment doom_deathmatch_full already registered, overwriting... [2023-02-23 00:44:45,381][05631] Environment doom_benchmark already registered, overwriting... [2023-02-23 00:44:45,382][05631] register_encoder_factory: [2023-02-23 00:44:45,417][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:44:45,419][05631] Overriding arg 'gamma' with value 0.97 passed from command line [2023-02-23 00:44:45,426][05631] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-23 00:44:45,428][05631] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-23 00:44:45,429][05631] Weights and Biases integration disabled [2023-02-23 00:44:45,438][05631] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-23 00:44:47,591][05631] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.97 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-23 00:44:47,598][05631] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 00:44:47,602][05631] Rollout worker 0 uses device cpu [2023-02-23 00:44:47,603][05631] Rollout worker 1 uses device cpu [2023-02-23 00:44:47,606][05631] Rollout worker 2 uses device cpu [2023-02-23 00:44:47,608][05631] Rollout worker 3 uses device cpu [2023-02-23 00:44:47,609][05631] Rollout worker 4 uses device cpu [2023-02-23 00:44:47,610][05631] Rollout worker 5 uses device cpu [2023-02-23 00:44:47,612][05631] Rollout worker 6 uses device cpu [2023-02-23 00:44:47,613][05631] Rollout worker 7 uses device cpu [2023-02-23 00:44:47,732][05631] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:44:47,734][05631] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 00:44:47,890][05631] Starting all processes... [2023-02-23 00:44:47,893][05631] Starting process learner_proc0 [2023-02-23 00:44:48,025][05631] Starting all processes... [2023-02-23 00:44:48,033][05631] Starting process inference_proc0-0 [2023-02-23 00:44:48,033][05631] Starting process rollout_proc0 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc1 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc2 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc3 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc4 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc5 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc6 [2023-02-23 00:44:48,035][05631] Starting process rollout_proc7 [2023-02-23 00:44:56,168][39829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:44:56,168][39829] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 00:44:56,202][39829] Num visible devices: 1 [2023-02-23 00:44:56,239][39829] Starting seed is not provided [2023-02-23 00:44:56,240][39829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:44:56,241][39829] Initializing actor-critic model on device cuda:0 [2023-02-23 00:44:56,242][39829] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:44:56,245][39829] RunningMeanStd input shape: (1,) [2023-02-23 00:44:56,288][39829] ConvEncoder: input_channels=3 [2023-02-23 00:44:57,277][39829] Conv encoder output size: 512 [2023-02-23 00:44:57,280][39829] Policy head output size: 512 [2023-02-23 00:44:57,450][39829] Created Actor Critic model with architecture: [2023-02-23 00:44:57,468][39829] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 00:44:57,531][39843] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:44:57,537][39843] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 00:44:57,689][39843] Num visible devices: 1 [2023-02-23 00:44:58,288][39844] Worker 1 uses CPU cores [1] [2023-02-23 00:44:58,437][39850] Worker 0 uses CPU cores [0] [2023-02-23 00:44:58,552][39852] Worker 3 uses CPU cores [1] [2023-02-23 00:44:58,941][39854] Worker 2 uses CPU cores [0] [2023-02-23 00:44:59,465][39864] Worker 5 uses CPU cores [1] [2023-02-23 00:44:59,470][39857] Worker 4 uses CPU cores [0] [2023-02-23 00:44:59,564][39860] Worker 6 uses CPU cores [0] [2023-02-23 00:44:59,766][39866] Worker 7 uses CPU cores [1] [2023-02-23 00:45:02,236][39829] Using optimizer [2023-02-23 00:45:02,236][39829] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... [2023-02-23 00:45:02,270][39829] Loading model from checkpoint [2023-02-23 00:45:02,274][39829] Loaded experiment state at self.train_step=2445, self.env_steps=10014720 [2023-02-23 00:45:02,275][39829] Initialized policy 0 weights for model version 2445 [2023-02-23 00:45:02,278][39829] LearnerWorker_p0 finished initialization! [2023-02-23 00:45:02,280][39829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:45:02,390][39843] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:45:02,392][39843] RunningMeanStd input shape: (1,) [2023-02-23 00:45:02,405][39843] ConvEncoder: input_channels=3 [2023-02-23 00:45:02,507][39843] Conv encoder output size: 512 [2023-02-23 00:45:02,507][39843] Policy head output size: 512 [2023-02-23 00:45:04,694][05631] Inference worker 0-0 is ready! [2023-02-23 00:45:04,696][05631] All inference workers are ready! Signal rollout workers to start! [2023-02-23 00:45:04,798][39844] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,799][39852] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,790][39857] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,792][39860] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,797][39866] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,795][39854] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,799][39850] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:04,795][39864] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:45:05,438][05631] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10014720. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:45:05,630][39864] Decorrelating experience for 0 frames... [2023-02-23 00:45:05,633][39866] Decorrelating experience for 0 frames... [2023-02-23 00:45:05,940][39854] Decorrelating experience for 0 frames... [2023-02-23 00:45:05,942][39860] Decorrelating experience for 0 frames... [2023-02-23 00:45:05,945][39857] Decorrelating experience for 0 frames... [2023-02-23 00:45:06,265][39844] Decorrelating experience for 0 frames... [2023-02-23 00:45:06,275][39860] Decorrelating experience for 32 frames... [2023-02-23 00:45:06,729][39854] Decorrelating experience for 32 frames... [2023-02-23 00:45:07,336][39864] Decorrelating experience for 32 frames... [2023-02-23 00:45:07,354][39866] Decorrelating experience for 32 frames... [2023-02-23 00:45:07,407][39852] Decorrelating experience for 0 frames... [2023-02-23 00:45:07,509][39860] Decorrelating experience for 64 frames... [2023-02-23 00:45:07,686][39854] Decorrelating experience for 64 frames... [2023-02-23 00:45:07,711][39844] Decorrelating experience for 32 frames... [2023-02-23 00:45:07,725][05631] Heartbeat connected on Batcher_0 [2023-02-23 00:45:07,728][05631] Heartbeat connected on LearnerWorker_p0 [2023-02-23 00:45:07,759][05631] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 00:45:08,275][39850] Decorrelating experience for 0 frames... [2023-02-23 00:45:08,280][39857] Decorrelating experience for 32 frames... [2023-02-23 00:45:08,857][39852] Decorrelating experience for 32 frames... [2023-02-23 00:45:09,043][39866] Decorrelating experience for 64 frames... [2023-02-23 00:45:09,122][39860] Decorrelating experience for 96 frames... [2023-02-23 00:45:09,189][39864] Decorrelating experience for 64 frames... [2023-02-23 00:45:09,271][39857] Decorrelating experience for 64 frames... [2023-02-23 00:45:09,351][05631] Heartbeat connected on RolloutWorker_w6 [2023-02-23 00:45:09,472][39844] Decorrelating experience for 64 frames... [2023-02-23 00:45:09,955][39850] Decorrelating experience for 32 frames... [2023-02-23 00:45:10,106][39854] Decorrelating experience for 96 frames... [2023-02-23 00:45:10,365][05631] Heartbeat connected on RolloutWorker_w2 [2023-02-23 00:45:10,438][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:45:10,596][39852] Decorrelating experience for 64 frames... [2023-02-23 00:45:10,741][39864] Decorrelating experience for 96 frames... [2023-02-23 00:45:10,986][05631] Heartbeat connected on RolloutWorker_w5 [2023-02-23 00:45:11,290][39844] Decorrelating experience for 96 frames... [2023-02-23 00:45:11,683][05631] Heartbeat connected on RolloutWorker_w1 [2023-02-23 00:45:12,330][39857] Decorrelating experience for 96 frames... [2023-02-23 00:45:12,849][05631] Heartbeat connected on RolloutWorker_w4 [2023-02-23 00:45:13,241][39852] Decorrelating experience for 96 frames... [2023-02-23 00:45:13,738][05631] Heartbeat connected on RolloutWorker_w3 [2023-02-23 00:45:15,355][39866] Decorrelating experience for 96 frames... [2023-02-23 00:45:15,439][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 137.8. Samples: 1378. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:45:15,444][05631] Avg episode reward: [(0, '2.665')] [2023-02-23 00:45:16,721][05631] Heartbeat connected on RolloutWorker_w7 [2023-02-23 00:45:16,893][39829] Signal inference workers to stop experience collection... [2023-02-23 00:45:16,943][39843] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 00:45:17,440][39850] Decorrelating experience for 64 frames... [2023-02-23 00:45:18,212][39850] Decorrelating experience for 96 frames... [2023-02-23 00:45:18,373][05631] Heartbeat connected on RolloutWorker_w0 [2023-02-23 00:45:18,423][39829] Signal inference workers to resume experience collection... [2023-02-23 00:45:18,424][39843] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 00:45:18,435][39829] Stopping Batcher_0... [2023-02-23 00:45:18,436][39829] Loop batcher_evt_loop terminating... [2023-02-23 00:45:18,437][05631] Component Batcher_0 stopped! [2023-02-23 00:45:18,467][05631] Component RolloutWorker_w7 stopped! [2023-02-23 00:45:18,489][05631] Component RolloutWorker_w1 stopped! [2023-02-23 00:45:18,472][39866] Stopping RolloutWorker_w7... [2023-02-23 00:45:18,496][39866] Loop rollout_proc7_evt_loop terminating... [2023-02-23 00:45:18,494][39844] Stopping RolloutWorker_w1... [2023-02-23 00:45:18,505][05631] Component RolloutWorker_w5 stopped! [2023-02-23 00:45:18,511][39864] Stopping RolloutWorker_w5... [2023-02-23 00:45:18,511][39864] Loop rollout_proc5_evt_loop terminating... [2023-02-23 00:45:18,517][05631] Component RolloutWorker_w3 stopped! [2023-02-23 00:45:18,522][39852] Stopping RolloutWorker_w3... [2023-02-23 00:45:18,523][39852] Loop rollout_proc3_evt_loop terminating... [2023-02-23 00:45:18,526][39844] Loop rollout_proc1_evt_loop terminating... [2023-02-23 00:45:18,534][39857] Stopping RolloutWorker_w4... [2023-02-23 00:45:18,534][05631] Component RolloutWorker_w4 stopped! [2023-02-23 00:45:18,545][39857] Loop rollout_proc4_evt_loop terminating... [2023-02-23 00:45:18,557][39850] Stopping RolloutWorker_w0... [2023-02-23 00:45:18,557][39850] Loop rollout_proc0_evt_loop terminating... [2023-02-23 00:45:18,557][05631] Component RolloutWorker_w0 stopped! [2023-02-23 00:45:18,585][39854] Stopping RolloutWorker_w2... [2023-02-23 00:45:18,585][05631] Component RolloutWorker_w2 stopped! [2023-02-23 00:45:18,591][39860] Stopping RolloutWorker_w6... [2023-02-23 00:45:18,592][39860] Loop rollout_proc6_evt_loop terminating... [2023-02-23 00:45:18,586][39854] Loop rollout_proc2_evt_loop terminating... [2023-02-23 00:45:18,591][05631] Component RolloutWorker_w6 stopped! [2023-02-23 00:45:18,613][39843] Weights refcount: 2 0 [2023-02-23 00:45:18,629][05631] Component InferenceWorker_p0-w0 stopped! [2023-02-23 00:45:18,633][39843] Stopping InferenceWorker_p0-w0... [2023-02-23 00:45:18,633][39843] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 00:45:21,008][39829] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... [2023-02-23 00:45:21,173][39829] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth [2023-02-23 00:45:21,183][39829] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... [2023-02-23 00:45:21,420][39829] Stopping LearnerWorker_p0... [2023-02-23 00:45:21,421][05631] Component LearnerWorker_p0 stopped! [2023-02-23 00:45:21,421][39829] Loop learner_proc0_evt_loop terminating... [2023-02-23 00:45:21,424][05631] Waiting for process learner_proc0 to stop... [2023-02-23 00:45:22,957][05631] Waiting for process inference_proc0-0 to join... [2023-02-23 00:45:22,959][05631] Waiting for process rollout_proc0 to join... [2023-02-23 00:45:22,962][05631] Waiting for process rollout_proc1 to join... [2023-02-23 00:45:23,075][05631] Waiting for process rollout_proc2 to join... [2023-02-23 00:45:23,077][05631] Waiting for process rollout_proc3 to join... [2023-02-23 00:45:23,083][05631] Waiting for process rollout_proc4 to join... [2023-02-23 00:45:23,085][05631] Waiting for process rollout_proc5 to join... [2023-02-23 00:45:23,088][05631] Waiting for process rollout_proc6 to join... [2023-02-23 00:45:23,089][05631] Waiting for process rollout_proc7 to join... [2023-02-23 00:45:23,093][05631] Batcher 0 profile tree view: batching: 0.0454, releasing_batches: 0.0004 [2023-02-23 00:45:23,095][05631] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 8.5334 update_model: 0.0442 weight_update: 0.0259 one_step: 0.1178 handle_policy_step: 3.6239 deserialize: 0.0496, stack: 0.0089, obs_to_device_normalize: 0.3427, forward: 2.7631, send_messages: 0.1124 prepare_outputs: 0.2459 to_cpu: 0.1354 [2023-02-23 00:45:23,097][05631] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 6.3831 train: 0.7580 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0005, after_optimizer: 0.0041 calculate_losses: 0.1495 losses_init: 0.0000, forward_head: 0.1162, bptt_initial: 0.0202, tail: 0.0017, advantages_returns: 0.0010, losses: 0.0060 bptt: 0.0038 bptt_forward_core: 0.0037 update: 0.6026 clip: 0.0081 [2023-02-23 00:45:23,101][05631] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006 [2023-02-23 00:45:23,104][05631] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0175, env_step: 0.2402, overhead: 0.0012, complete_rollouts: 0.0000 save_policy_outputs: 0.0009 split_output_tensors: 0.0004 [2023-02-23 00:45:23,105][05631] Loop Runner_EvtLoop terminating... [2023-02-23 00:45:23,107][05631] Runner profile tree view: main_loop: 35.2166 [2023-02-23 00:45:23,108][05631] Collected {0: 10022912}, FPS: 232.6 [2023-02-23 00:51:07,960][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:51:07,965][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:51:07,967][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:51:07,970][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:51:07,972][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:51:07,974][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:51:07,976][05631] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:51:07,977][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:51:07,978][05631] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 00:51:07,979][05631] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 00:51:07,980][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:51:07,981][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:51:07,983][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:51:07,984][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:51:07,985][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:51:08,013][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:51:08,016][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:51:08,038][05631] ConvEncoder: input_channels=3 [2023-02-23 00:51:08,096][05631] Conv encoder output size: 512 [2023-02-23 00:51:08,100][05631] Policy head output size: 512 [2023-02-23 00:51:08,130][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... [2023-02-23 00:51:08,807][05631] Num frames 100... [2023-02-23 00:51:08,941][05631] Num frames 200... [2023-02-23 00:51:09,054][05631] Num frames 300... [2023-02-23 00:51:09,207][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:51:09,209][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:51:09,233][05631] Num frames 400... [2023-02-23 00:51:09,366][05631] Num frames 500... [2023-02-23 00:51:09,487][05631] Num frames 600... [2023-02-23 00:51:09,611][05631] Num frames 700... [2023-02-23 00:51:09,743][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:51:09,746][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:51:09,788][05631] Num frames 800... [2023-02-23 00:51:09,912][05631] Num frames 900... [2023-02-23 00:51:10,031][05631] Num frames 1000... [2023-02-23 00:51:10,159][05631] Num frames 1100... [2023-02-23 00:51:10,318][05631] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 [2023-02-23 00:51:10,322][05631] Avg episode reward: 4.280, avg true_objective: 3.947 [2023-02-23 00:51:10,352][05631] Num frames 1200... [2023-02-23 00:51:10,477][05631] Num frames 1300... [2023-02-23 00:51:10,596][05631] Num frames 1400... [2023-02-23 00:51:10,710][05631] Num frames 1500... [2023-02-23 00:51:10,824][05631] Num frames 1600... [2023-02-23 00:51:10,877][05631] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 [2023-02-23 00:51:10,879][05631] Avg episode reward: 4.500, avg true_objective: 4.000 [2023-02-23 00:51:11,019][05631] Num frames 1700... [2023-02-23 00:51:11,141][05631] Num frames 1800... [2023-02-23 00:51:11,273][05631] Avg episode rewards: #0: 4.112, true rewards: #0: 3.712 [2023-02-23 00:51:11,275][05631] Avg episode reward: 4.112, avg true_objective: 3.712 [2023-02-23 00:51:11,332][05631] Num frames 1900... [2023-02-23 00:51:11,459][05631] Num frames 2000... [2023-02-23 00:51:11,582][05631] Num frames 2100... [2023-02-23 00:51:11,713][05631] Num frames 2200... [2023-02-23 00:51:11,817][05631] Avg episode rewards: #0: 4.067, true rewards: #0: 3.733 [2023-02-23 00:51:11,820][05631] Avg episode reward: 4.067, avg true_objective: 3.733 [2023-02-23 00:51:11,896][05631] Num frames 2300... [2023-02-23 00:51:12,021][05631] Num frames 2400... [2023-02-23 00:51:12,141][05631] Num frames 2500... [2023-02-23 00:51:12,259][05631] Num frames 2600... [2023-02-23 00:51:12,342][05631] Avg episode rewards: #0: 4.034, true rewards: #0: 3.749 [2023-02-23 00:51:12,344][05631] Avg episode reward: 4.034, avg true_objective: 3.749 [2023-02-23 00:51:12,452][05631] Num frames 2700... [2023-02-23 00:51:12,567][05631] Num frames 2800... [2023-02-23 00:51:12,689][05631] Num frames 2900... [2023-02-23 00:51:12,803][05631] Num frames 3000... [2023-02-23 00:51:12,932][05631] Num frames 3100... [2023-02-23 00:51:12,993][05631] Avg episode rewards: #0: 4.255, true rewards: #0: 3.880 [2023-02-23 00:51:12,995][05631] Avg episode reward: 4.255, avg true_objective: 3.880 [2023-02-23 00:51:13,113][05631] Num frames 3200... [2023-02-23 00:51:13,236][05631] Num frames 3300... [2023-02-23 00:51:13,369][05631] Num frames 3400... [2023-02-23 00:51:13,526][05631] Avg episode rewards: #0: 4.209, true rewards: #0: 3.876 [2023-02-23 00:51:13,528][05631] Avg episode reward: 4.209, avg true_objective: 3.876 [2023-02-23 00:51:13,554][05631] Num frames 3500... [2023-02-23 00:51:13,678][05631] Num frames 3600... [2023-02-23 00:51:13,801][05631] Num frames 3700... [2023-02-23 00:51:13,924][05631] Num frames 3800... [2023-02-23 00:51:14,038][05631] Num frames 3900... [2023-02-23 00:51:14,135][05631] Avg episode rewards: #0: 4.336, true rewards: #0: 3.936 [2023-02-23 00:51:14,137][05631] Avg episode reward: 4.336, avg true_objective: 3.936 [2023-02-23 00:51:35,049][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:51:35,237][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:51:35,239][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 00:51:35,246][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 00:51:35,248][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 00:51:35,250][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 00:51:35,252][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 00:51:35,257][05631] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 00:51:35,258][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 00:51:35,259][05631] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 00:51:35,261][05631] Adding new argument 'hf_repository'='pittawat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 00:51:35,262][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 00:51:35,263][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 00:51:35,269][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 00:51:35,270][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 00:51:35,271][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 00:51:35,306][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:51:35,308][05631] RunningMeanStd input shape: (1,) [2023-02-23 00:51:35,336][05631] ConvEncoder: input_channels=3 [2023-02-23 00:51:35,400][05631] Conv encoder output size: 512 [2023-02-23 00:51:35,402][05631] Policy head output size: 512 [2023-02-23 00:51:35,434][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... [2023-02-23 00:51:36,290][05631] Num frames 100... [2023-02-23 00:51:36,467][05631] Num frames 200... [2023-02-23 00:51:36,651][05631] Num frames 300... [2023-02-23 00:51:36,865][05631] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2023-02-23 00:51:36,868][05631] Avg episode reward: 3.840, avg true_objective: 3.840 [2023-02-23 00:51:36,908][05631] Num frames 400... [2023-02-23 00:51:37,124][05631] Num frames 500... [2023-02-23 00:51:37,321][05631] Num frames 600... [2023-02-23 00:51:37,519][05631] Num frames 700... [2023-02-23 00:51:37,712][05631] Num frames 800... [2023-02-23 00:51:37,896][05631] Num frames 900... [2023-02-23 00:51:38,009][05631] Avg episode rewards: #0: 5.640, true rewards: #0: 4.640 [2023-02-23 00:51:38,012][05631] Avg episode reward: 5.640, avg true_objective: 4.640 [2023-02-23 00:51:38,141][05631] Num frames 1000... [2023-02-23 00:51:38,301][05631] Num frames 1100... [2023-02-23 00:51:38,469][05631] Num frames 1200... [2023-02-23 00:51:38,603][05631] Num frames 1300... [2023-02-23 00:51:38,674][05631] Avg episode rewards: #0: 5.040, true rewards: #0: 4.373 [2023-02-23 00:51:38,675][05631] Avg episode reward: 5.040, avg true_objective: 4.373 [2023-02-23 00:51:38,785][05631] Num frames 1400... [2023-02-23 00:51:38,906][05631] Num frames 1500... [2023-02-23 00:51:39,030][05631] Num frames 1600... [2023-02-23 00:51:39,208][05631] Avg episode rewards: #0: 4.740, true rewards: #0: 4.240 [2023-02-23 00:51:39,210][05631] Avg episode reward: 4.740, avg true_objective: 4.240 [2023-02-23 00:51:39,221][05631] Num frames 1700... [2023-02-23 00:51:39,334][05631] Num frames 1800... [2023-02-23 00:51:39,452][05631] Num frames 1900... [2023-02-23 00:51:39,574][05631] Num frames 2000... [2023-02-23 00:51:39,721][05631] Avg episode rewards: #0: 4.560, true rewards: #0: 4.160 [2023-02-23 00:51:39,723][05631] Avg episode reward: 4.560, avg true_objective: 4.160 [2023-02-23 00:51:39,752][05631] Num frames 2100... [2023-02-23 00:51:39,874][05631] Num frames 2200... [2023-02-23 00:51:39,995][05631] Num frames 2300... [2023-02-23 00:51:40,113][05631] Num frames 2400... [2023-02-23 00:51:40,283][05631] Avg episode rewards: #0: 4.493, true rewards: #0: 4.160 [2023-02-23 00:51:40,286][05631] Avg episode reward: 4.493, avg true_objective: 4.160 [2023-02-23 00:51:40,295][05631] Num frames 2500... [2023-02-23 00:51:40,408][05631] Num frames 2600... [2023-02-23 00:51:40,526][05631] Num frames 2700... [2023-02-23 00:51:40,641][05631] Num frames 2800... [2023-02-23 00:51:40,787][05631] Avg episode rewards: #0: 4.400, true rewards: #0: 4.114 [2023-02-23 00:51:40,789][05631] Avg episode reward: 4.400, avg true_objective: 4.114 [2023-02-23 00:51:40,818][05631] Num frames 2900... [2023-02-23 00:51:40,943][05631] Num frames 3000... [2023-02-23 00:51:41,058][05631] Num frames 3100... [2023-02-23 00:51:41,172][05631] Num frames 3200... [2023-02-23 00:51:41,286][05631] Num frames 3300... [2023-02-23 00:51:41,458][05631] Avg episode rewards: #0: 4.740, true rewards: #0: 4.240 [2023-02-23 00:51:41,459][05631] Avg episode reward: 4.740, avg true_objective: 4.240 [2023-02-23 00:51:41,473][05631] Num frames 3400... [2023-02-23 00:51:41,594][05631] Num frames 3500... [2023-02-23 00:51:41,710][05631] Num frames 3600... [2023-02-23 00:51:41,824][05631] Num frames 3700... [2023-02-23 00:51:41,944][05631] Num frames 3800... [2023-02-23 00:51:42,010][05631] Avg episode rewards: #0: 4.787, true rewards: #0: 4.231 [2023-02-23 00:51:42,013][05631] Avg episode reward: 4.787, avg true_objective: 4.231 [2023-02-23 00:51:42,125][05631] Num frames 3900... [2023-02-23 00:51:42,253][05631] Num frames 4000... [2023-02-23 00:51:42,370][05631] Num frames 4100... [2023-02-23 00:51:42,528][05631] Avg episode rewards: #0: 4.692, true rewards: #0: 4.192 [2023-02-23 00:51:42,532][05631] Avg episode reward: 4.692, avg true_objective: 4.192 [2023-02-23 00:52:02,869][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 00:52:08,657][05631] The model has been pushed to https://huggingface.co/pittawat/rl_course_vizdoom_health_gathering_supreme [2023-02-23 00:54:19,819][05631] Environment doom_basic already registered, overwriting... [2023-02-23 00:54:19,822][05631] Environment doom_two_colors_easy already registered, overwriting... [2023-02-23 00:54:19,825][05631] Environment doom_two_colors_hard already registered, overwriting... [2023-02-23 00:54:19,827][05631] Environment doom_dm already registered, overwriting... [2023-02-23 00:54:19,829][05631] Environment doom_dwango5 already registered, overwriting... [2023-02-23 00:54:19,832][05631] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-23 00:54:19,833][05631] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-23 00:54:19,834][05631] Environment doom_my_way_home already registered, overwriting... [2023-02-23 00:54:19,835][05631] Environment doom_deadly_corridor already registered, overwriting... [2023-02-23 00:54:19,836][05631] Environment doom_defend_the_center already registered, overwriting... [2023-02-23 00:54:19,838][05631] Environment doom_defend_the_line already registered, overwriting... [2023-02-23 00:54:19,840][05631] Environment doom_health_gathering already registered, overwriting... [2023-02-23 00:54:19,841][05631] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-23 00:54:19,842][05631] Environment doom_battle already registered, overwriting... [2023-02-23 00:54:19,844][05631] Environment doom_battle2 already registered, overwriting... [2023-02-23 00:54:19,845][05631] Environment doom_duel_bots already registered, overwriting... [2023-02-23 00:54:19,847][05631] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-23 00:54:19,848][05631] Environment doom_duel already registered, overwriting... [2023-02-23 00:54:19,849][05631] Environment doom_deathmatch_full already registered, overwriting... [2023-02-23 00:54:19,850][05631] Environment doom_benchmark already registered, overwriting... [2023-02-23 00:54:19,852][05631] register_encoder_factory: [2023-02-23 00:54:19,883][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 00:54:19,884][05631] Overriding arg 'gamma' with value 0.96 passed from command line [2023-02-23 00:54:19,886][05631] Overriding arg 'train_for_env_steps' with value 15000000 passed from command line [2023-02-23 00:54:19,889][05631] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-23 00:54:19,890][05631] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-23 00:54:19,891][05631] Weights and Biases integration disabled [2023-02-23 00:54:19,895][05631] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-23 00:54:22,147][05631] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.96 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=15000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-23 00:54:22,151][05631] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-23 00:54:22,156][05631] Rollout worker 0 uses device cpu [2023-02-23 00:54:22,158][05631] Rollout worker 1 uses device cpu [2023-02-23 00:54:22,161][05631] Rollout worker 2 uses device cpu [2023-02-23 00:54:22,162][05631] Rollout worker 3 uses device cpu [2023-02-23 00:54:22,163][05631] Rollout worker 4 uses device cpu [2023-02-23 00:54:22,172][05631] Rollout worker 5 uses device cpu [2023-02-23 00:54:22,175][05631] Rollout worker 6 uses device cpu [2023-02-23 00:54:22,177][05631] Rollout worker 7 uses device cpu [2023-02-23 00:54:22,353][05631] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:54:22,356][05631] InferenceWorker_p0-w0: min num requests: 2 [2023-02-23 00:54:22,397][05631] Starting all processes... [2023-02-23 00:54:22,400][05631] Starting process learner_proc0 [2023-02-23 00:54:22,591][05631] Starting all processes... [2023-02-23 00:54:22,608][05631] Starting process inference_proc0-0 [2023-02-23 00:54:22,609][05631] Starting process rollout_proc0 [2023-02-23 00:54:22,612][05631] Starting process rollout_proc1 [2023-02-23 00:54:22,760][05631] Starting process rollout_proc2 [2023-02-23 00:54:22,761][05631] Starting process rollout_proc3 [2023-02-23 00:54:22,762][05631] Starting process rollout_proc4 [2023-02-23 00:54:22,762][05631] Starting process rollout_proc5 [2023-02-23 00:54:22,762][05631] Starting process rollout_proc6 [2023-02-23 00:54:22,762][05631] Starting process rollout_proc7 [2023-02-23 00:54:31,419][45637] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:54:31,421][45637] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-23 00:54:31,476][45637] Num visible devices: 1 [2023-02-23 00:54:31,516][45637] Starting seed is not provided [2023-02-23 00:54:31,517][45637] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:54:31,518][45637] Initializing actor-critic model on device cuda:0 [2023-02-23 00:54:31,519][45637] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:54:31,521][45637] RunningMeanStd input shape: (1,) [2023-02-23 00:54:31,653][45637] ConvEncoder: input_channels=3 [2023-02-23 00:54:32,511][45651] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:54:32,515][45651] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-23 00:54:32,580][45637] Conv encoder output size: 512 [2023-02-23 00:54:32,587][45637] Policy head output size: 512 [2023-02-23 00:54:32,594][45651] Num visible devices: 1 [2023-02-23 00:54:32,712][45637] Created Actor Critic model with architecture: [2023-02-23 00:54:32,716][45637] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-23 00:54:32,847][45652] Worker 0 uses CPU cores [0] [2023-02-23 00:54:33,106][45653] Worker 2 uses CPU cores [0] [2023-02-23 00:54:33,432][45660] Worker 1 uses CPU cores [1] [2023-02-23 00:54:33,794][45670] Worker 6 uses CPU cores [0] [2023-02-23 00:54:33,902][45662] Worker 3 uses CPU cores [1] [2023-02-23 00:54:34,056][45672] Worker 7 uses CPU cores [1] [2023-02-23 00:54:34,067][45664] Worker 4 uses CPU cores [0] [2023-02-23 00:54:34,145][45674] Worker 5 uses CPU cores [1] [2023-02-23 00:54:36,782][45637] Using optimizer [2023-02-23 00:54:36,783][45637] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... [2023-02-23 00:54:36,825][45637] Loading model from checkpoint [2023-02-23 00:54:36,832][45637] Loaded experiment state at self.train_step=2447, self.env_steps=10022912 [2023-02-23 00:54:36,833][45637] Initialized policy 0 weights for model version 2447 [2023-02-23 00:54:36,842][45637] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-23 00:54:36,853][45637] LearnerWorker_p0 finished initialization! [2023-02-23 00:54:37,027][45651] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 00:54:37,029][45651] RunningMeanStd input shape: (1,) [2023-02-23 00:54:37,047][45651] ConvEncoder: input_channels=3 [2023-02-23 00:54:37,203][45651] Conv encoder output size: 512 [2023-02-23 00:54:37,205][45651] Policy head output size: 512 [2023-02-23 00:54:39,653][05631] Inference worker 0-0 is ready! [2023-02-23 00:54:39,655][05631] All inference workers are ready! Signal rollout workers to start! [2023-02-23 00:54:39,750][45664] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,754][45670] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,752][45653] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,754][45652] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,759][45674] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,762][45662] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,768][45672] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,761][45660] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-23 00:54:39,895][05631] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10022912. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:54:40,581][45670] Decorrelating experience for 0 frames... [2023-02-23 00:54:40,585][45653] Decorrelating experience for 0 frames... [2023-02-23 00:54:40,964][45653] Decorrelating experience for 32 frames... [2023-02-23 00:54:41,036][45662] Decorrelating experience for 0 frames... [2023-02-23 00:54:41,039][45672] Decorrelating experience for 0 frames... [2023-02-23 00:54:41,041][45674] Decorrelating experience for 0 frames... [2023-02-23 00:54:41,708][45653] Decorrelating experience for 64 frames... [2023-02-23 00:54:41,761][45670] Decorrelating experience for 32 frames... [2023-02-23 00:54:41,967][45660] Decorrelating experience for 0 frames... [2023-02-23 00:54:42,343][05631] Heartbeat connected on Batcher_0 [2023-02-23 00:54:42,350][05631] Heartbeat connected on LearnerWorker_p0 [2023-02-23 00:54:42,369][45674] Decorrelating experience for 32 frames... [2023-02-23 00:54:42,376][45662] Decorrelating experience for 32 frames... [2023-02-23 00:54:42,384][45672] Decorrelating experience for 32 frames... [2023-02-23 00:54:42,383][05631] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-23 00:54:42,431][45670] Decorrelating experience for 64 frames... [2023-02-23 00:54:42,949][45664] Decorrelating experience for 0 frames... [2023-02-23 00:54:43,386][45660] Decorrelating experience for 32 frames... [2023-02-23 00:54:43,521][45653] Decorrelating experience for 96 frames... [2023-02-23 00:54:43,664][45670] Decorrelating experience for 96 frames... [2023-02-23 00:54:43,713][05631] Heartbeat connected on RolloutWorker_w2 [2023-02-23 00:54:43,951][05631] Heartbeat connected on RolloutWorker_w6 [2023-02-23 00:54:43,969][45674] Decorrelating experience for 64 frames... [2023-02-23 00:54:43,972][45662] Decorrelating experience for 64 frames... [2023-02-23 00:54:43,975][45672] Decorrelating experience for 64 frames... [2023-02-23 00:54:44,217][45664] Decorrelating experience for 32 frames... [2023-02-23 00:54:44,895][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10022912. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:54:45,587][45660] Decorrelating experience for 64 frames... [2023-02-23 00:54:45,723][45672] Decorrelating experience for 96 frames... [2023-02-23 00:54:45,737][45662] Decorrelating experience for 96 frames... [2023-02-23 00:54:45,740][45674] Decorrelating experience for 96 frames... [2023-02-23 00:54:45,899][45664] Decorrelating experience for 64 frames... [2023-02-23 00:54:46,056][05631] Heartbeat connected on RolloutWorker_w3 [2023-02-23 00:54:46,061][05631] Heartbeat connected on RolloutWorker_w7 [2023-02-23 00:54:46,066][05631] Heartbeat connected on RolloutWorker_w5 [2023-02-23 00:54:48,065][45660] Decorrelating experience for 96 frames... [2023-02-23 00:54:49,137][05631] Heartbeat connected on RolloutWorker_w1 [2023-02-23 00:54:49,569][45652] Decorrelating experience for 0 frames... [2023-02-23 00:54:49,895][05631] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10022912. Throughput: 0: 191.0. Samples: 1910. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-23 00:54:49,904][05631] Avg episode reward: [(0, '2.787')] [2023-02-23 00:54:50,044][45664] Decorrelating experience for 96 frames... [2023-02-23 00:54:50,400][45637] Signal inference workers to stop experience collection... [2023-02-23 00:54:50,437][45651] InferenceWorker_p0-w0: stopping experience collection [2023-02-23 00:54:50,762][05631] Heartbeat connected on RolloutWorker_w4 [2023-02-23 00:54:51,399][45652] Decorrelating experience for 32 frames... [2023-02-23 00:54:51,728][45637] Signal inference workers to resume experience collection... [2023-02-23 00:54:51,728][45651] InferenceWorker_p0-w0: resuming experience collection [2023-02-23 00:54:54,895][05631] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 10035200. Throughput: 0: 219.2. Samples: 3288. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-23 00:54:54,900][05631] Avg episode reward: [(0, '3.027')] [2023-02-23 00:54:55,514][45652] Decorrelating experience for 64 frames... [2023-02-23 00:54:58,813][45652] Decorrelating experience for 96 frames... [2023-02-23 00:54:59,251][05631] Heartbeat connected on RolloutWorker_w0 [2023-02-23 00:54:59,895][05631] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 10047488. Throughput: 0: 256.6. Samples: 5132. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2023-02-23 00:54:59,902][05631] Avg episode reward: [(0, '3.913')] [2023-02-23 00:55:03,567][45651] Updated weights for policy 0, policy_version 2457 (0.0654) [2023-02-23 00:55:04,895][05631] Fps is (10 sec: 3276.7, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 10067968. Throughput: 0: 433.6. Samples: 10840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 00:55:04,898][05631] Avg episode reward: [(0, '4.593')] [2023-02-23 00:55:09,895][05631] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 10084352. Throughput: 0: 551.1. Samples: 16534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:55:09,898][05631] Avg episode reward: [(0, '4.883')] [2023-02-23 00:55:14,895][05631] Fps is (10 sec: 2867.2, 60 sec: 2106.5, 300 sec: 2106.5). Total num frames: 10096640. Throughput: 0: 527.7. Samples: 18470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:55:14,903][05631] Avg episode reward: [(0, '4.908')] [2023-02-23 00:55:16,754][45651] Updated weights for policy 0, policy_version 2467 (0.0030) [2023-02-23 00:55:19,895][05631] Fps is (10 sec: 2867.2, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 10113024. Throughput: 0: 558.4. Samples: 22336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:55:19,902][05631] Avg episode reward: [(0, '4.924')] [2023-02-23 00:55:24,895][05631] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 10133504. Throughput: 0: 629.9. Samples: 28344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:55:24,897][05631] Avg episode reward: [(0, '4.619')] [2023-02-23 00:55:27,532][45651] Updated weights for policy 0, policy_version 2477 (0.0027) [2023-02-23 00:55:29,895][05631] Fps is (10 sec: 4096.0, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 10153984. Throughput: 0: 698.1. Samples: 31416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:55:29,898][05631] Avg episode reward: [(0, '4.531')] [2023-02-23 00:55:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 10166272. Throughput: 0: 756.6. Samples: 35958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:55:34,906][05631] Avg episode reward: [(0, '4.622')] [2023-02-23 00:55:39,896][05631] Fps is (10 sec: 2457.3, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 10178560. Throughput: 0: 814.6. Samples: 39948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:55:39,899][05631] Avg episode reward: [(0, '4.828')] [2023-02-23 00:55:41,409][45651] Updated weights for policy 0, policy_version 2487 (0.0034) [2023-02-23 00:55:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2709.7). Total num frames: 10199040. Throughput: 0: 837.3. Samples: 42810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:55:44,898][05631] Avg episode reward: [(0, '4.735')] [2023-02-23 00:55:49,895][05631] Fps is (10 sec: 4096.5, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 10219520. Throughput: 0: 848.6. Samples: 49028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:55:49,898][05631] Avg episode reward: [(0, '4.727')] [2023-02-23 00:55:51,857][45651] Updated weights for policy 0, policy_version 2497 (0.0015) [2023-02-23 00:55:54,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2785.3). Total num frames: 10231808. Throughput: 0: 825.7. Samples: 53690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:55:54,898][05631] Avg episode reward: [(0, '4.731')] [2023-02-23 00:55:59,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2816.0). Total num frames: 10248192. Throughput: 0: 825.6. Samples: 55620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:55:59,898][05631] Avg episode reward: [(0, '4.853')] [2023-02-23 00:56:04,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2843.1). Total num frames: 10264576. Throughput: 0: 846.6. Samples: 60432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:56:04,898][05631] Avg episode reward: [(0, '4.780')] [2023-02-23 00:56:05,187][45651] Updated weights for policy 0, policy_version 2507 (0.0020) [2023-02-23 00:56:09,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2912.7). Total num frames: 10285056. Throughput: 0: 852.0. Samples: 66686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:56:09,902][05631] Avg episode reward: [(0, '4.848')] [2023-02-23 00:56:14,895][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 2931.9). Total num frames: 10301440. Throughput: 0: 841.4. Samples: 69280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:56:14,899][05631] Avg episode reward: [(0, '4.763')] [2023-02-23 00:56:17,335][45651] Updated weights for policy 0, policy_version 2517 (0.0028) [2023-02-23 00:56:19,897][05631] Fps is (10 sec: 2866.7, 60 sec: 3345.0, 300 sec: 2908.1). Total num frames: 10313728. Throughput: 0: 827.7. Samples: 73204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:56:19,900][05631] Avg episode reward: [(0, '4.814')] [2023-02-23 00:56:19,915][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002518_10313728.pth... [2023-02-23 00:56:20,104][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth [2023-02-23 00:56:24,895][05631] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 2925.7). Total num frames: 10330112. Throughput: 0: 844.7. Samples: 77960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:56:24,897][05631] Avg episode reward: [(0, '4.862')] [2023-02-23 00:56:29,237][45651] Updated weights for policy 0, policy_version 2527 (0.0033) [2023-02-23 00:56:29,895][05631] Fps is (10 sec: 3687.1, 60 sec: 3276.8, 300 sec: 2978.9). Total num frames: 10350592. Throughput: 0: 848.4. Samples: 80988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:56:29,898][05631] Avg episode reward: [(0, '4.880')] [2023-02-23 00:56:34,900][05631] Fps is (10 sec: 3684.7, 60 sec: 3344.8, 300 sec: 2991.7). Total num frames: 10366976. Throughput: 0: 836.6. Samples: 86678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:56:34,902][05631] Avg episode reward: [(0, '4.507')] [2023-02-23 00:56:39,903][05631] Fps is (10 sec: 3274.1, 60 sec: 3412.9, 300 sec: 3003.5). Total num frames: 10383360. Throughput: 0: 821.3. Samples: 90656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:56:39,907][05631] Avg episode reward: [(0, '4.497')] [2023-02-23 00:56:42,485][45651] Updated weights for policy 0, policy_version 2537 (0.0020) [2023-02-23 00:56:44,895][05631] Fps is (10 sec: 3278.3, 60 sec: 3345.1, 300 sec: 3014.7). Total num frames: 10399744. Throughput: 0: 822.2. Samples: 92620. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 00:56:44,902][05631] Avg episode reward: [(0, '4.669')] [2023-02-23 00:56:49,895][05631] Fps is (10 sec: 3689.5, 60 sec: 3345.1, 300 sec: 3056.2). Total num frames: 10420224. Throughput: 0: 850.7. Samples: 98712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:56:49,902][05631] Avg episode reward: [(0, '4.774')] [2023-02-23 00:56:52,595][45651] Updated weights for policy 0, policy_version 2547 (0.0013) [2023-02-23 00:56:54,895][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3064.4). Total num frames: 10436608. Throughput: 0: 838.9. Samples: 104436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:56:54,898][05631] Avg episode reward: [(0, '4.482')] [2023-02-23 00:56:59,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3042.7). Total num frames: 10448896. Throughput: 0: 823.1. Samples: 106320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:56:59,904][05631] Avg episode reward: [(0, '4.475')] [2023-02-23 00:57:04,896][05631] Fps is (10 sec: 2867.0, 60 sec: 3345.0, 300 sec: 3050.8). Total num frames: 10465280. Throughput: 0: 822.2. Samples: 110200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:57:04,899][05631] Avg episode reward: [(0, '4.676')] [2023-02-23 00:57:06,618][45651] Updated weights for policy 0, policy_version 2557 (0.0020) [2023-02-23 00:57:09,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3085.7). Total num frames: 10485760. Throughput: 0: 850.8. Samples: 116248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:57:09,902][05631] Avg episode reward: [(0, '4.709')] [2023-02-23 00:57:14,895][05631] Fps is (10 sec: 3686.7, 60 sec: 3345.1, 300 sec: 3091.8). Total num frames: 10502144. Throughput: 0: 851.7. Samples: 119314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:57:14,900][05631] Avg episode reward: [(0, '4.603')] [2023-02-23 00:57:18,298][45651] Updated weights for policy 0, policy_version 2567 (0.0020) [2023-02-23 00:57:19,898][05631] Fps is (10 sec: 2866.5, 60 sec: 3345.0, 300 sec: 3072.0). Total num frames: 10514432. Throughput: 0: 822.2. Samples: 123674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:57:19,903][05631] Avg episode reward: [(0, '4.601')] [2023-02-23 00:57:24,895][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3078.2). Total num frames: 10530816. Throughput: 0: 820.0. Samples: 127548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:57:24,898][05631] Avg episode reward: [(0, '4.578')] [2023-02-23 00:57:29,895][05631] Fps is (10 sec: 3687.3, 60 sec: 3345.1, 300 sec: 3108.1). Total num frames: 10551296. Throughput: 0: 842.8. Samples: 130544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:57:29,904][05631] Avg episode reward: [(0, '4.698')] [2023-02-23 00:57:30,882][45651] Updated weights for policy 0, policy_version 2577 (0.0031) [2023-02-23 00:57:34,895][05631] Fps is (10 sec: 4096.2, 60 sec: 3413.6, 300 sec: 3136.4). Total num frames: 10571776. Throughput: 0: 842.9. Samples: 136642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:57:34,898][05631] Avg episode reward: [(0, '4.803')] [2023-02-23 00:57:39,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.5, 300 sec: 3117.5). Total num frames: 10584064. Throughput: 0: 814.7. Samples: 141096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:57:39,900][05631] Avg episode reward: [(0, '4.791')] [2023-02-23 00:57:44,034][45651] Updated weights for policy 0, policy_version 2587 (0.0025) [2023-02-23 00:57:44,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3099.7). Total num frames: 10596352. Throughput: 0: 815.3. Samples: 143008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:57:44,899][05631] Avg episode reward: [(0, '4.920')] [2023-02-23 00:57:49,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3125.9). Total num frames: 10616832. Throughput: 0: 842.1. Samples: 148092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:57:49,898][05631] Avg episode reward: [(0, '5.085')] [2023-02-23 00:57:54,539][45651] Updated weights for policy 0, policy_version 2597 (0.0016) [2023-02-23 00:57:54,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3150.8). Total num frames: 10637312. Throughput: 0: 846.8. Samples: 154356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:57:54,903][05631] Avg episode reward: [(0, '4.804')] [2023-02-23 00:57:59,901][05631] Fps is (10 sec: 3274.7, 60 sec: 3344.7, 300 sec: 3133.3). Total num frames: 10649600. Throughput: 0: 830.9. Samples: 156708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:57:59,905][05631] Avg episode reward: [(0, '4.581')] [2023-02-23 00:58:04,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3136.9). Total num frames: 10665984. Throughput: 0: 818.8. Samples: 160520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:58:04,898][05631] Avg episode reward: [(0, '4.692')] [2023-02-23 00:58:08,407][45651] Updated weights for policy 0, policy_version 2607 (0.0031) [2023-02-23 00:58:09,895][05631] Fps is (10 sec: 3278.9, 60 sec: 3276.8, 300 sec: 3140.3). Total num frames: 10682368. Throughput: 0: 850.8. Samples: 165832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:58:09,904][05631] Avg episode reward: [(0, '4.884')] [2023-02-23 00:58:14,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3162.5). Total num frames: 10702848. Throughput: 0: 852.1. Samples: 168890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:58:14,897][05631] Avg episode reward: [(0, '4.921')] [2023-02-23 00:58:19,350][45651] Updated weights for policy 0, policy_version 2617 (0.0013) [2023-02-23 00:58:19,895][05631] Fps is (10 sec: 3686.3, 60 sec: 3413.5, 300 sec: 3165.1). Total num frames: 10719232. Throughput: 0: 838.7. Samples: 174386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:58:19,899][05631] Avg episode reward: [(0, '4.714')] [2023-02-23 00:58:19,911][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002617_10719232.pth... [2023-02-23 00:58:20,116][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth [2023-02-23 00:58:24,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3149.4). Total num frames: 10731520. Throughput: 0: 826.2. Samples: 178274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 00:58:24,902][05631] Avg episode reward: [(0, '4.534')] [2023-02-23 00:58:29,895][05631] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3152.1). Total num frames: 10747904. Throughput: 0: 832.3. Samples: 180462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:58:29,898][05631] Avg episode reward: [(0, '4.472')] [2023-02-23 00:58:32,043][45651] Updated weights for policy 0, policy_version 2627 (0.0013) [2023-02-23 00:58:34,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3172.2). Total num frames: 10768384. Throughput: 0: 857.9. Samples: 186696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:58:34,903][05631] Avg episode reward: [(0, '4.644')] [2023-02-23 00:58:39,897][05631] Fps is (10 sec: 4095.2, 60 sec: 3413.2, 300 sec: 3191.4). Total num frames: 10788864. Throughput: 0: 837.3. Samples: 192036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 00:58:39,905][05631] Avg episode reward: [(0, '4.651')] [2023-02-23 00:58:44,281][45651] Updated weights for policy 0, policy_version 2637 (0.0014) [2023-02-23 00:58:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3176.5). Total num frames: 10801152. Throughput: 0: 829.3. Samples: 194020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:58:44,904][05631] Avg episode reward: [(0, '4.730')] [2023-02-23 00:58:49,895][05631] Fps is (10 sec: 2867.8, 60 sec: 3345.1, 300 sec: 3178.5). Total num frames: 10817536. Throughput: 0: 838.8. Samples: 198268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:58:49,897][05631] Avg episode reward: [(0, '5.068')] [2023-02-23 00:58:54,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3196.5). Total num frames: 10838016. Throughput: 0: 857.2. Samples: 204404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:58:54,898][05631] Avg episode reward: [(0, '5.087')] [2023-02-23 00:58:55,545][45651] Updated weights for policy 0, policy_version 2647 (0.0021) [2023-02-23 00:58:59,898][05631] Fps is (10 sec: 3685.4, 60 sec: 3413.5, 300 sec: 3198.0). Total num frames: 10854400. Throughput: 0: 858.9. Samples: 207542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:58:59,902][05631] Avg episode reward: [(0, '4.912')] [2023-02-23 00:59:04,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3199.5). Total num frames: 10870784. Throughput: 0: 829.3. Samples: 211706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:59:04,900][05631] Avg episode reward: [(0, '4.666')] [2023-02-23 00:59:09,375][45651] Updated weights for policy 0, policy_version 2657 (0.0023) [2023-02-23 00:59:09,895][05631] Fps is (10 sec: 2868.0, 60 sec: 3345.1, 300 sec: 3185.8). Total num frames: 10883072. Throughput: 0: 837.0. Samples: 215940. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 00:59:09,903][05631] Avg episode reward: [(0, '4.553')] [2023-02-23 00:59:14,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3202.3). Total num frames: 10903552. Throughput: 0: 857.7. Samples: 219060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 00:59:14,901][05631] Avg episode reward: [(0, '4.504')] [2023-02-23 00:59:19,232][45651] Updated weights for policy 0, policy_version 2667 (0.0038) [2023-02-23 00:59:19,901][05631] Fps is (10 sec: 4093.4, 60 sec: 3413.0, 300 sec: 3218.2). Total num frames: 10924032. Throughput: 0: 859.9. Samples: 225396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:59:19,904][05631] Avg episode reward: [(0, '4.636')] [2023-02-23 00:59:24,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3204.9). Total num frames: 10936320. Throughput: 0: 830.3. Samples: 229398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:59:24,900][05631] Avg episode reward: [(0, '4.745')] [2023-02-23 00:59:29,895][05631] Fps is (10 sec: 2459.2, 60 sec: 3345.1, 300 sec: 3192.1). Total num frames: 10948608. Throughput: 0: 827.3. Samples: 231248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 00:59:29,901][05631] Avg episode reward: [(0, '4.467')] [2023-02-23 00:59:33,285][45651] Updated weights for policy 0, policy_version 2677 (0.0033) [2023-02-23 00:59:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 10969088. Throughput: 0: 847.7. Samples: 236416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:59:34,904][05631] Avg episode reward: [(0, '4.595')] [2023-02-23 00:59:39,897][05631] Fps is (10 sec: 4095.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 10989568. Throughput: 0: 851.6. Samples: 242726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 00:59:39,904][05631] Avg episode reward: [(0, '4.756')] [2023-02-23 00:59:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3318.5). Total num frames: 11001856. Throughput: 0: 824.9. Samples: 244662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:59:44,900][05631] Avg episode reward: [(0, '4.875')] [2023-02-23 00:59:45,305][45651] Updated weights for policy 0, policy_version 2687 (0.0017) [2023-02-23 00:59:49,895][05631] Fps is (10 sec: 2458.0, 60 sec: 3276.8, 300 sec: 3318.5). Total num frames: 11014144. Throughput: 0: 818.7. Samples: 248546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:59:49,899][05631] Avg episode reward: [(0, '4.856')] [2023-02-23 00:59:54,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11038720. Throughput: 0: 850.3. Samples: 254202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 00:59:54,897][05631] Avg episode reward: [(0, '4.858')] [2023-02-23 00:59:56,934][45651] Updated weights for policy 0, policy_version 2697 (0.0013) [2023-02-23 00:59:59,895][05631] Fps is (10 sec: 4505.6, 60 sec: 3413.5, 300 sec: 3360.1). Total num frames: 11059200. Throughput: 0: 850.9. Samples: 257352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 00:59:59,899][05631] Avg episode reward: [(0, '4.786')] [2023-02-23 01:00:04,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 11071488. Throughput: 0: 819.4. Samples: 262264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:00:04,898][05631] Avg episode reward: [(0, '4.617')] [2023-02-23 01:00:09,897][05631] Fps is (10 sec: 2457.2, 60 sec: 3345.0, 300 sec: 3346.2). Total num frames: 11083776. Throughput: 0: 819.7. Samples: 266286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:00:09,906][05631] Avg episode reward: [(0, '4.651')] [2023-02-23 01:00:10,279][45651] Updated weights for policy 0, policy_version 2707 (0.0020) [2023-02-23 01:00:14,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11104256. Throughput: 0: 837.0. Samples: 268914. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:00:14,900][05631] Avg episode reward: [(0, '4.596')] [2023-02-23 01:00:19,895][05631] Fps is (10 sec: 4096.6, 60 sec: 3345.4, 300 sec: 3360.1). Total num frames: 11124736. Throughput: 0: 860.4. Samples: 275136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:00:19,903][05631] Avg episode reward: [(0, '4.642')] [2023-02-23 01:00:19,917][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002716_11124736.pth... [2023-02-23 01:00:20,080][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002518_10313728.pth [2023-02-23 01:00:20,708][45651] Updated weights for policy 0, policy_version 2717 (0.0014) [2023-02-23 01:00:24,898][05631] Fps is (10 sec: 3275.7, 60 sec: 3344.9, 300 sec: 3332.3). Total num frames: 11137024. Throughput: 0: 825.5. Samples: 279874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:00:24,901][05631] Avg episode reward: [(0, '4.615')] [2023-02-23 01:00:29,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 11153408. Throughput: 0: 826.8. Samples: 281866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:00:29,898][05631] Avg episode reward: [(0, '4.647')] [2023-02-23 01:00:34,604][45651] Updated weights for policy 0, policy_version 2727 (0.0024) [2023-02-23 01:00:34,895][05631] Fps is (10 sec: 3277.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11169792. Throughput: 0: 839.7. Samples: 286334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:00:34,898][05631] Avg episode reward: [(0, '4.733')] [2023-02-23 01:00:39,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.2, 300 sec: 3360.1). Total num frames: 11190272. Throughput: 0: 854.0. Samples: 292632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:00:39,898][05631] Avg episode reward: [(0, '4.762')] [2023-02-23 01:00:44,897][05631] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 11206656. Throughput: 0: 847.3. Samples: 295480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:00:44,903][05631] Avg episode reward: [(0, '4.719')] [2023-02-23 01:00:45,871][45651] Updated weights for policy 0, policy_version 2737 (0.0032) [2023-02-23 01:00:49,897][05631] Fps is (10 sec: 2866.5, 60 sec: 3413.2, 300 sec: 3346.2). Total num frames: 11218944. Throughput: 0: 824.6. Samples: 299374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:00:49,904][05631] Avg episode reward: [(0, '4.637')] [2023-02-23 01:00:54,896][05631] Fps is (10 sec: 2867.5, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 11235328. Throughput: 0: 844.2. Samples: 304274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:00:54,899][05631] Avg episode reward: [(0, '4.677')] [2023-02-23 01:00:58,096][45651] Updated weights for policy 0, policy_version 2747 (0.0013) [2023-02-23 01:00:59,895][05631] Fps is (10 sec: 3687.2, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 11255808. Throughput: 0: 855.7. Samples: 307420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:00:59,903][05631] Avg episode reward: [(0, '4.830')] [2023-02-23 01:01:04,897][05631] Fps is (10 sec: 3685.8, 60 sec: 3344.9, 300 sec: 3346.2). Total num frames: 11272192. Throughput: 0: 843.7. Samples: 313104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:01:04,900][05631] Avg episode reward: [(0, '4.840')] [2023-02-23 01:01:09,895][05631] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3332.3). Total num frames: 11284480. Throughput: 0: 828.4. Samples: 317148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:09,898][05631] Avg episode reward: [(0, '4.770')] [2023-02-23 01:01:11,463][45651] Updated weights for policy 0, policy_version 2757 (0.0043) [2023-02-23 01:01:14,895][05631] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3346.2). Total num frames: 11300864. Throughput: 0: 826.7. Samples: 319066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-23 01:01:14,899][05631] Avg episode reward: [(0, '4.703')] [2023-02-23 01:01:19,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3360.1). Total num frames: 11321344. Throughput: 0: 860.7. Samples: 325066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:19,900][05631] Avg episode reward: [(0, '4.784')] [2023-02-23 01:01:21,790][45651] Updated weights for policy 0, policy_version 2767 (0.0017) [2023-02-23 01:01:24,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3360.1). Total num frames: 11341824. Throughput: 0: 851.3. Samples: 330942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:01:24,904][05631] Avg episode reward: [(0, '4.672')] [2023-02-23 01:01:29,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3346.3). Total num frames: 11354112. Throughput: 0: 831.9. Samples: 332914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:01:29,897][05631] Avg episode reward: [(0, '4.539')] [2023-02-23 01:01:34,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3346.3). Total num frames: 11370496. Throughput: 0: 833.2. Samples: 336864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:34,903][05631] Avg episode reward: [(0, '4.577')] [2023-02-23 01:01:35,597][45651] Updated weights for policy 0, policy_version 2777 (0.0013) [2023-02-23 01:01:39,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11390976. Throughput: 0: 858.8. Samples: 342918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:39,902][05631] Avg episode reward: [(0, '4.821')] [2023-02-23 01:01:44,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 11411456. Throughput: 0: 858.8. Samples: 346066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:44,898][05631] Avg episode reward: [(0, '4.698')] [2023-02-23 01:01:46,031][45651] Updated weights for policy 0, policy_version 2787 (0.0013) [2023-02-23 01:01:49,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3346.2). Total num frames: 11423744. Throughput: 0: 833.9. Samples: 350628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:01:49,898][05631] Avg episode reward: [(0, '4.570')] [2023-02-23 01:01:54,895][05631] Fps is (10 sec: 2457.5, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 11436032. Throughput: 0: 831.9. Samples: 354584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 01:01:54,904][05631] Avg episode reward: [(0, '4.489')] [2023-02-23 01:01:59,243][45651] Updated weights for policy 0, policy_version 2797 (0.0017) [2023-02-23 01:01:59,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11456512. Throughput: 0: 853.5. Samples: 357474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:01:59,902][05631] Avg episode reward: [(0, '4.342')] [2023-02-23 01:02:04,895][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.5, 300 sec: 3360.1). Total num frames: 11476992. Throughput: 0: 859.9. Samples: 363762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:02:04,898][05631] Avg episode reward: [(0, '4.529')] [2023-02-23 01:02:09,896][05631] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 11493376. Throughput: 0: 828.3. Samples: 368218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:02:09,899][05631] Avg episode reward: [(0, '4.388')] [2023-02-23 01:02:11,359][45651] Updated weights for policy 0, policy_version 2807 (0.0019) [2023-02-23 01:02:14,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 11505664. Throughput: 0: 828.4. Samples: 370190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:02:14,897][05631] Avg episode reward: [(0, '4.362')] [2023-02-23 01:02:19,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 11526144. Throughput: 0: 855.6. Samples: 375366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:02:19,901][05631] Avg episode reward: [(0, '4.558')] [2023-02-23 01:02:19,917][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002814_11526144.pth... [2023-02-23 01:02:20,064][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002617_10719232.pth [2023-02-23 01:02:22,779][45651] Updated weights for policy 0, policy_version 2817 (0.0020) [2023-02-23 01:02:24,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 11546624. Throughput: 0: 856.6. Samples: 381466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:02:24,902][05631] Avg episode reward: [(0, '4.469')] [2023-02-23 01:02:29,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3346.2). Total num frames: 11558912. Throughput: 0: 839.2. Samples: 383832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:02:29,901][05631] Avg episode reward: [(0, '4.539')] [2023-02-23 01:02:34,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3346.2). Total num frames: 11571200. Throughput: 0: 824.6. Samples: 387734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:02:34,898][05631] Avg episode reward: [(0, '4.643')] [2023-02-23 01:02:36,713][45651] Updated weights for policy 0, policy_version 2827 (0.0026) [2023-02-23 01:02:39,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 11591680. Throughput: 0: 847.9. Samples: 392740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:02:39,898][05631] Avg episode reward: [(0, '4.634')] [2023-02-23 01:02:44,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 11612160. Throughput: 0: 853.1. Samples: 395862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:02:44,899][05631] Avg episode reward: [(0, '4.616')] [2023-02-23 01:02:46,528][45651] Updated weights for policy 0, policy_version 2837 (0.0015) [2023-02-23 01:02:49,900][05631] Fps is (10 sec: 3684.6, 60 sec: 3413.0, 300 sec: 3360.0). Total num frames: 11628544. Throughput: 0: 840.3. Samples: 401582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:02:49,903][05631] Avg episode reward: [(0, '4.600')] [2023-02-23 01:02:54,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.2). Total num frames: 11640832. Throughput: 0: 828.6. Samples: 405504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:02:54,902][05631] Avg episode reward: [(0, '4.534')] [2023-02-23 01:02:59,895][05631] Fps is (10 sec: 2868.6, 60 sec: 3345.0, 300 sec: 3360.1). Total num frames: 11657216. Throughput: 0: 831.1. Samples: 407592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:02:59,908][05631] Avg episode reward: [(0, '4.532')] [2023-02-23 01:03:00,189][45651] Updated weights for policy 0, policy_version 2847 (0.0018) [2023-02-23 01:03:04,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 11677696. Throughput: 0: 855.7. Samples: 413874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:03:04,904][05631] Avg episode reward: [(0, '4.745')] [2023-02-23 01:03:09,896][05631] Fps is (10 sec: 4095.6, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 11698176. Throughput: 0: 839.2. Samples: 419232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:09,907][05631] Avg episode reward: [(0, '4.788')] [2023-02-23 01:03:11,512][45651] Updated weights for policy 0, policy_version 2857 (0.0020) [2023-02-23 01:03:14,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 11710464. Throughput: 0: 830.9. Samples: 421224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:14,901][05631] Avg episode reward: [(0, '4.857')] [2023-02-23 01:03:19,895][05631] Fps is (10 sec: 2867.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 11726848. Throughput: 0: 837.6. Samples: 425426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:19,898][05631] Avg episode reward: [(0, '4.875')] [2023-02-23 01:03:23,691][45651] Updated weights for policy 0, policy_version 2867 (0.0019) [2023-02-23 01:03:24,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 11747328. Throughput: 0: 864.4. Samples: 431640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:24,898][05631] Avg episode reward: [(0, '4.924')] [2023-02-23 01:03:29,897][05631] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3374.0). Total num frames: 11763712. Throughput: 0: 866.8. Samples: 434868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:29,902][05631] Avg episode reward: [(0, '5.046')] [2023-02-23 01:03:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3360.1). Total num frames: 11780096. Throughput: 0: 832.3. Samples: 439032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:03:34,898][05631] Avg episode reward: [(0, '5.050')] [2023-02-23 01:03:36,605][45651] Updated weights for policy 0, policy_version 2877 (0.0014) [2023-02-23 01:03:39,895][05631] Fps is (10 sec: 2867.7, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11792384. Throughput: 0: 839.5. Samples: 443282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:03:39,898][05631] Avg episode reward: [(0, '5.342')] [2023-02-23 01:03:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 11812864. Throughput: 0: 863.5. Samples: 446450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:03:44,902][05631] Avg episode reward: [(0, '5.616')] [2023-02-23 01:03:44,908][45637] Saving new best policy, reward=5.616! [2023-02-23 01:03:47,080][45651] Updated weights for policy 0, policy_version 2887 (0.0024) [2023-02-23 01:03:49,895][05631] Fps is (10 sec: 4095.9, 60 sec: 3413.6, 300 sec: 3374.0). Total num frames: 11833344. Throughput: 0: 862.2. Samples: 452674. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:03:49,902][05631] Avg episode reward: [(0, '5.501')] [2023-02-23 01:03:54,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 11845632. Throughput: 0: 835.4. Samples: 456826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:03:54,903][05631] Avg episode reward: [(0, '5.344')] [2023-02-23 01:03:59,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 11862016. Throughput: 0: 836.2. Samples: 458852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:03:59,897][05631] Avg episode reward: [(0, '5.217')] [2023-02-23 01:04:00,791][45651] Updated weights for policy 0, policy_version 2897 (0.0033) [2023-02-23 01:04:04,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 11882496. Throughput: 0: 866.1. Samples: 464400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:04:04,902][05631] Avg episode reward: [(0, '5.303')] [2023-02-23 01:04:09,900][05631] Fps is (10 sec: 4094.9, 60 sec: 3413.2, 300 sec: 3387.8). Total num frames: 11902976. Throughput: 0: 870.2. Samples: 470800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:04:09,904][05631] Avg episode reward: [(0, '5.586')] [2023-02-23 01:04:10,841][45651] Updated weights for policy 0, policy_version 2907 (0.0013) [2023-02-23 01:04:14,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3360.2). Total num frames: 11915264. Throughput: 0: 844.0. Samples: 472848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:04:14,901][05631] Avg episode reward: [(0, '5.629')] [2023-02-23 01:04:14,909][45637] Saving new best policy, reward=5.629! [2023-02-23 01:04:19,895][05631] Fps is (10 sec: 2458.3, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11927552. Throughput: 0: 838.3. Samples: 476756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:04:19,898][05631] Avg episode reward: [(0, '5.828')] [2023-02-23 01:04:19,916][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002912_11927552.pth... [2023-02-23 01:04:20,164][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002716_11124736.pth [2023-02-23 01:04:20,188][45637] Saving new best policy, reward=5.828! [2023-02-23 01:04:24,270][45651] Updated weights for policy 0, policy_version 2917 (0.0023) [2023-02-23 01:04:24,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 11948032. Throughput: 0: 866.8. Samples: 482288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-23 01:04:24,898][05631] Avg episode reward: [(0, '5.835')] [2023-02-23 01:04:24,902][45637] Saving new best policy, reward=5.835! [2023-02-23 01:04:29,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3387.9). Total num frames: 11968512. Throughput: 0: 857.6. Samples: 485044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:04:29,903][05631] Avg episode reward: [(0, '6.089')] [2023-02-23 01:04:29,915][45637] Saving new best policy, reward=6.089! [2023-02-23 01:04:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11980800. Throughput: 0: 820.9. Samples: 489616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:04:34,900][05631] Avg episode reward: [(0, '6.125')] [2023-02-23 01:04:34,903][45637] Saving new best policy, reward=6.125! [2023-02-23 01:04:37,466][45651] Updated weights for policy 0, policy_version 2927 (0.0018) [2023-02-23 01:04:39,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 11993088. Throughput: 0: 814.0. Samples: 493458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:04:39,898][05631] Avg episode reward: [(0, '5.904')] [2023-02-23 01:04:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 12013568. Throughput: 0: 824.9. Samples: 495974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:04:44,898][05631] Avg episode reward: [(0, '6.066')] [2023-02-23 01:04:48,763][45651] Updated weights for policy 0, policy_version 2937 (0.0016) [2023-02-23 01:04:49,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 12034048. Throughput: 0: 843.7. Samples: 502366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:04:49,898][05631] Avg episode reward: [(0, '6.302')] [2023-02-23 01:04:49,910][45637] Saving new best policy, reward=6.302! [2023-02-23 01:04:54,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3360.1). Total num frames: 12050432. Throughput: 0: 815.5. Samples: 507494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:04:54,902][05631] Avg episode reward: [(0, '6.597')] [2023-02-23 01:04:54,906][45637] Saving new best policy, reward=6.597! [2023-02-23 01:04:59,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 12062720. Throughput: 0: 813.8. Samples: 509468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:04:59,907][05631] Avg episode reward: [(0, '6.749')] [2023-02-23 01:04:59,919][45637] Saving new best policy, reward=6.749! [2023-02-23 01:05:02,360][45651] Updated weights for policy 0, policy_version 2947 (0.0020) [2023-02-23 01:05:04,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 12079104. Throughput: 0: 825.2. Samples: 513892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:05:04,903][05631] Avg episode reward: [(0, '6.564')] [2023-02-23 01:05:09,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3277.0, 300 sec: 3374.0). Total num frames: 12099584. Throughput: 0: 844.4. Samples: 520286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:05:09,903][05631] Avg episode reward: [(0, '7.006')] [2023-02-23 01:05:09,914][45637] Saving new best policy, reward=7.006! [2023-02-23 01:05:12,011][45651] Updated weights for policy 0, policy_version 2957 (0.0018) [2023-02-23 01:05:14,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3360.1). Total num frames: 12115968. Throughput: 0: 847.9. Samples: 523200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:05:14,901][05631] Avg episode reward: [(0, '6.761')] [2023-02-23 01:05:19,896][05631] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 12132352. Throughput: 0: 834.5. Samples: 527170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:05:19,902][05631] Avg episode reward: [(0, '6.529')] [2023-02-23 01:05:24,897][05631] Fps is (10 sec: 3276.3, 60 sec: 3345.0, 300 sec: 3374.0). Total num frames: 12148736. Throughput: 0: 855.2. Samples: 531944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:05:24,903][05631] Avg episode reward: [(0, '6.062')] [2023-02-23 01:05:25,769][45651] Updated weights for policy 0, policy_version 2967 (0.0013) [2023-02-23 01:05:29,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 12169216. Throughput: 0: 872.1. Samples: 535220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:05:29,898][05631] Avg episode reward: [(0, '5.930')] [2023-02-23 01:05:34,895][05631] Fps is (10 sec: 3687.0, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 12185600. Throughput: 0: 864.0. Samples: 541244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:05:34,900][05631] Avg episode reward: [(0, '5.824')] [2023-02-23 01:05:36,978][45651] Updated weights for policy 0, policy_version 2977 (0.0023) [2023-02-23 01:05:39,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 12201984. Throughput: 0: 838.9. Samples: 545246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:05:39,901][05631] Avg episode reward: [(0, '5.892')] [2023-02-23 01:05:44,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 12214272. Throughput: 0: 840.6. Samples: 547294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:05:44,902][05631] Avg episode reward: [(0, '5.849')] [2023-02-23 01:05:48,954][45651] Updated weights for policy 0, policy_version 2987 (0.0017) [2023-02-23 01:05:49,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 12234752. Throughput: 0: 877.6. Samples: 553382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:05:49,897][05631] Avg episode reward: [(0, '5.986')] [2023-02-23 01:05:54,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 12255232. Throughput: 0: 866.7. Samples: 559288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:05:54,901][05631] Avg episode reward: [(0, '6.156')] [2023-02-23 01:05:59,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3387.9). Total num frames: 12271616. Throughput: 0: 845.7. Samples: 561258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:05:59,898][05631] Avg episode reward: [(0, '6.426')] [2023-02-23 01:06:01,587][45651] Updated weights for policy 0, policy_version 2997 (0.0019) [2023-02-23 01:06:04,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 12283904. Throughput: 0: 847.8. Samples: 565322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:04,898][05631] Avg episode reward: [(0, '6.678')] [2023-02-23 01:06:09,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12304384. Throughput: 0: 874.3. Samples: 571286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:09,898][05631] Avg episode reward: [(0, '6.951')] [2023-02-23 01:06:12,362][45651] Updated weights for policy 0, policy_version 3007 (0.0020) [2023-02-23 01:06:14,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 12324864. Throughput: 0: 868.7. Samples: 574310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:06:14,901][05631] Avg episode reward: [(0, '6.966')] [2023-02-23 01:06:19,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3374.0). Total num frames: 12337152. Throughput: 0: 842.2. Samples: 579144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:06:19,904][05631] Avg episode reward: [(0, '6.444')] [2023-02-23 01:06:19,915][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003012_12337152.pth... [2023-02-23 01:06:20,190][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002814_11526144.pth [2023-02-23 01:06:24,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3374.0). Total num frames: 12349440. Throughput: 0: 840.2. Samples: 583054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:24,902][05631] Avg episode reward: [(0, '6.206')] [2023-02-23 01:06:26,156][45651] Updated weights for policy 0, policy_version 3017 (0.0022) [2023-02-23 01:06:29,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 12369920. Throughput: 0: 859.2. Samples: 585956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:29,901][05631] Avg episode reward: [(0, '6.142')] [2023-02-23 01:06:34,895][05631] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 12394496. Throughput: 0: 865.6. Samples: 592334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:34,902][05631] Avg episode reward: [(0, '6.648')] [2023-02-23 01:06:35,942][45651] Updated weights for policy 0, policy_version 3027 (0.0015) [2023-02-23 01:06:39,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 12406784. Throughput: 0: 839.0. Samples: 597044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:06:39,899][05631] Avg episode reward: [(0, '6.811')] [2023-02-23 01:06:44,896][05631] Fps is (10 sec: 2457.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 12419072. Throughput: 0: 838.7. Samples: 599002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:06:44,907][05631] Avg episode reward: [(0, '6.693')] [2023-02-23 01:06:49,224][45651] Updated weights for policy 0, policy_version 3037 (0.0047) [2023-02-23 01:06:49,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12439552. Throughput: 0: 864.9. Samples: 604244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:06:49,900][05631] Avg episode reward: [(0, '6.716')] [2023-02-23 01:06:54,895][05631] Fps is (10 sec: 4505.9, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 12464128. Throughput: 0: 874.9. Samples: 610658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:06:54,902][05631] Avg episode reward: [(0, '7.022')] [2023-02-23 01:06:54,905][45637] Saving new best policy, reward=7.022! [2023-02-23 01:06:59,896][05631] Fps is (10 sec: 3686.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 12476416. Throughput: 0: 862.4. Samples: 613118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:06:59,899][05631] Avg episode reward: [(0, '7.130')] [2023-02-23 01:06:59,921][45637] Saving new best policy, reward=7.130! [2023-02-23 01:07:00,611][45651] Updated weights for policy 0, policy_version 3047 (0.0013) [2023-02-23 01:07:04,897][05631] Fps is (10 sec: 2866.6, 60 sec: 3481.5, 300 sec: 3387.9). Total num frames: 12492800. Throughput: 0: 846.6. Samples: 617244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:07:04,900][05631] Avg episode reward: [(0, '7.680')] [2023-02-23 01:07:04,906][45637] Saving new best policy, reward=7.680! [2023-02-23 01:07:09,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12509184. Throughput: 0: 873.8. Samples: 622376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:07:09,900][05631] Avg episode reward: [(0, '7.844')] [2023-02-23 01:07:09,916][45637] Saving new best policy, reward=7.844! [2023-02-23 01:07:12,387][45651] Updated weights for policy 0, policy_version 3057 (0.0016) [2023-02-23 01:07:14,895][05631] Fps is (10 sec: 3687.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12529664. Throughput: 0: 877.9. Samples: 625460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:07:14,898][05631] Avg episode reward: [(0, '8.428')] [2023-02-23 01:07:14,901][45637] Saving new best policy, reward=8.428! [2023-02-23 01:07:19,900][05631] Fps is (10 sec: 3684.8, 60 sec: 3481.3, 300 sec: 3387.8). Total num frames: 12546048. Throughput: 0: 859.8. Samples: 631028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:07:19,906][05631] Avg episode reward: [(0, '8.101')] [2023-02-23 01:07:24,899][05631] Fps is (10 sec: 2866.0, 60 sec: 3481.4, 300 sec: 3387.8). Total num frames: 12558336. Throughput: 0: 845.1. Samples: 635078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:07:24,902][05631] Avg episode reward: [(0, '8.047')] [2023-02-23 01:07:25,298][45651] Updated weights for policy 0, policy_version 3067 (0.0019) [2023-02-23 01:07:29,897][05631] Fps is (10 sec: 3277.5, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 12578816. Throughput: 0: 848.9. Samples: 637204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:07:29,904][05631] Avg episode reward: [(0, '7.663')] [2023-02-23 01:07:34,895][05631] Fps is (10 sec: 4097.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 12599296. Throughput: 0: 875.3. Samples: 643634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:07:34,901][05631] Avg episode reward: [(0, '7.817')] [2023-02-23 01:07:35,568][45651] Updated weights for policy 0, policy_version 3077 (0.0013) [2023-02-23 01:07:39,895][05631] Fps is (10 sec: 3687.2, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 12615680. Throughput: 0: 856.8. Samples: 649212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:07:39,898][05631] Avg episode reward: [(0, '8.678')] [2023-02-23 01:07:39,906][45637] Saving new best policy, reward=8.678! [2023-02-23 01:07:44,898][05631] Fps is (10 sec: 2866.5, 60 sec: 3481.5, 300 sec: 3387.9). Total num frames: 12627968. Throughput: 0: 845.1. Samples: 651150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:07:44,904][05631] Avg episode reward: [(0, '9.092')] [2023-02-23 01:07:44,907][45637] Saving new best policy, reward=9.092! [2023-02-23 01:07:49,266][45651] Updated weights for policy 0, policy_version 3087 (0.0031) [2023-02-23 01:07:49,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12644352. Throughput: 0: 844.3. Samples: 655234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:07:49,898][05631] Avg episode reward: [(0, '9.950')] [2023-02-23 01:07:49,906][45637] Saving new best policy, reward=9.950! [2023-02-23 01:07:54,895][05631] Fps is (10 sec: 3687.4, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 12664832. Throughput: 0: 870.3. Samples: 661540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:07:54,897][05631] Avg episode reward: [(0, '9.684')] [2023-02-23 01:07:59,184][45651] Updated weights for policy 0, policy_version 3097 (0.0015) [2023-02-23 01:07:59,900][05631] Fps is (10 sec: 4093.8, 60 sec: 3481.3, 300 sec: 3415.6). Total num frames: 12685312. Throughput: 0: 874.9. Samples: 664834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:07:59,907][05631] Avg episode reward: [(0, '10.341')] [2023-02-23 01:07:59,920][45637] Saving new best policy, reward=10.341! [2023-02-23 01:08:04,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 12697600. Throughput: 0: 847.3. Samples: 669152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:08:04,901][05631] Avg episode reward: [(0, '10.326')] [2023-02-23 01:08:09,895][05631] Fps is (10 sec: 2868.7, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12713984. Throughput: 0: 851.5. Samples: 673390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:08:09,904][05631] Avg episode reward: [(0, '10.086')] [2023-02-23 01:08:12,637][45651] Updated weights for policy 0, policy_version 3107 (0.0029) [2023-02-23 01:08:14,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 12734464. Throughput: 0: 873.8. Samples: 676524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:08:14,897][05631] Avg episode reward: [(0, '10.051')] [2023-02-23 01:08:19,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.9, 300 sec: 3415.6). Total num frames: 12754944. Throughput: 0: 875.3. Samples: 683022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:08:19,902][05631] Avg episode reward: [(0, '10.714')] [2023-02-23 01:08:19,919][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003114_12754944.pth... [2023-02-23 01:08:20,139][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002912_11927552.pth [2023-02-23 01:08:20,165][45637] Saving new best policy, reward=10.714! [2023-02-23 01:08:24,021][45651] Updated weights for policy 0, policy_version 3117 (0.0024) [2023-02-23 01:08:24,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3401.8). Total num frames: 12767232. Throughput: 0: 842.6. Samples: 687128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:08:24,903][05631] Avg episode reward: [(0, '10.647')] [2023-02-23 01:08:29,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.2, 300 sec: 3387.9). Total num frames: 12779520. Throughput: 0: 846.3. Samples: 689232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:08:29,900][05631] Avg episode reward: [(0, '10.786')] [2023-02-23 01:08:29,925][45637] Saving new best policy, reward=10.786! [2023-02-23 01:08:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 12800000. Throughput: 0: 873.2. Samples: 694530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:08:34,898][05631] Avg episode reward: [(0, '11.436')] [2023-02-23 01:08:34,969][45637] Saving new best policy, reward=11.436! [2023-02-23 01:08:36,076][45651] Updated weights for policy 0, policy_version 3127 (0.0017) [2023-02-23 01:08:39,895][05631] Fps is (10 sec: 4505.7, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 12824576. Throughput: 0: 874.1. Samples: 700876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:08:39,897][05631] Avg episode reward: [(0, '11.847')] [2023-02-23 01:08:39,912][45637] Saving new best policy, reward=11.847! [2023-02-23 01:08:44,896][05631] Fps is (10 sec: 3686.0, 60 sec: 3481.7, 300 sec: 3401.7). Total num frames: 12836864. Throughput: 0: 844.7. Samples: 702842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:08:44,902][05631] Avg episode reward: [(0, '13.454')] [2023-02-23 01:08:44,905][45637] Saving new best policy, reward=13.454! [2023-02-23 01:08:49,166][45651] Updated weights for policy 0, policy_version 3137 (0.0021) [2023-02-23 01:08:49,896][05631] Fps is (10 sec: 2457.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12849152. Throughput: 0: 836.4. Samples: 706792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:08:49,899][05631] Avg episode reward: [(0, '13.977')] [2023-02-23 01:08:49,909][45637] Saving new best policy, reward=13.977! [2023-02-23 01:08:54,895][05631] Fps is (10 sec: 3277.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 12869632. Throughput: 0: 863.2. Samples: 712234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:08:54,898][05631] Avg episode reward: [(0, '15.807')] [2023-02-23 01:08:54,903][45637] Saving new best policy, reward=15.807! [2023-02-23 01:08:59,610][45651] Updated weights for policy 0, policy_version 3147 (0.0014) [2023-02-23 01:08:59,896][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.6, 300 sec: 3415.6). Total num frames: 12890112. Throughput: 0: 863.8. Samples: 715396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:08:59,900][05631] Avg episode reward: [(0, '16.315')] [2023-02-23 01:08:59,916][45637] Saving new best policy, reward=16.315! [2023-02-23 01:09:04,899][05631] Fps is (10 sec: 3275.7, 60 sec: 3413.1, 300 sec: 3387.9). Total num frames: 12902400. Throughput: 0: 833.6. Samples: 720538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:09:04,901][05631] Avg episode reward: [(0, '18.288')] [2023-02-23 01:09:04,997][45637] Saving new best policy, reward=18.288! [2023-02-23 01:09:09,895][05631] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12918784. Throughput: 0: 831.2. Samples: 724534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:09:09,898][05631] Avg episode reward: [(0, '18.817')] [2023-02-23 01:09:09,907][45637] Saving new best policy, reward=18.817! [2023-02-23 01:09:13,324][45651] Updated weights for policy 0, policy_version 3157 (0.0029) [2023-02-23 01:09:14,895][05631] Fps is (10 sec: 3278.0, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 12935168. Throughput: 0: 835.9. Samples: 726846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:09:14,900][05631] Avg episode reward: [(0, '18.814')] [2023-02-23 01:09:19,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 12955648. Throughput: 0: 861.4. Samples: 733294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:09:19,898][05631] Avg episode reward: [(0, '18.579')] [2023-02-23 01:09:23,718][45651] Updated weights for policy 0, policy_version 3167 (0.0015) [2023-02-23 01:09:24,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 12972032. Throughput: 0: 837.8. Samples: 738576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:09:24,902][05631] Avg episode reward: [(0, '19.201')] [2023-02-23 01:09:24,909][45637] Saving new best policy, reward=19.201! [2023-02-23 01:09:29,896][05631] Fps is (10 sec: 2866.8, 60 sec: 3413.3, 300 sec: 3401.7). Total num frames: 12984320. Throughput: 0: 835.1. Samples: 740420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:09:29,906][05631] Avg episode reward: [(0, '18.018')] [2023-02-23 01:09:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 13004800. Throughput: 0: 843.6. Samples: 744752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:09:34,897][05631] Avg episode reward: [(0, '17.767')] [2023-02-23 01:09:36,633][45651] Updated weights for policy 0, policy_version 3177 (0.0022) [2023-02-23 01:09:39,895][05631] Fps is (10 sec: 4096.6, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 13025280. Throughput: 0: 865.9. Samples: 751200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:09:39,900][05631] Avg episode reward: [(0, '18.498')] [2023-02-23 01:09:44,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 13041664. Throughput: 0: 863.7. Samples: 754264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:09:44,898][05631] Avg episode reward: [(0, '19.606')] [2023-02-23 01:09:44,902][45637] Saving new best policy, reward=19.606! [2023-02-23 01:09:48,603][45651] Updated weights for policy 0, policy_version 3187 (0.0021) [2023-02-23 01:09:49,895][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3401.8). Total num frames: 13053952. Throughput: 0: 839.9. Samples: 758332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:09:49,902][05631] Avg episode reward: [(0, '19.044')] [2023-02-23 01:09:54,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 13070336. Throughput: 0: 856.6. Samples: 763082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:09:54,902][05631] Avg episode reward: [(0, '19.874')] [2023-02-23 01:09:54,990][45637] Saving new best policy, reward=19.874! [2023-02-23 01:09:59,866][45651] Updated weights for policy 0, policy_version 3197 (0.0024) [2023-02-23 01:09:59,895][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.4, 300 sec: 3443.4). Total num frames: 13094912. Throughput: 0: 874.7. Samples: 766206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:09:59,898][05631] Avg episode reward: [(0, '22.045')] [2023-02-23 01:09:59,917][45637] Saving new best policy, reward=22.045! [2023-02-23 01:10:04,897][05631] Fps is (10 sec: 4095.4, 60 sec: 3481.7, 300 sec: 3429.5). Total num frames: 13111296. Throughput: 0: 866.6. Samples: 772292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:04,902][05631] Avg episode reward: [(0, '22.070')] [2023-02-23 01:10:04,906][45637] Saving new best policy, reward=22.070! [2023-02-23 01:10:09,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 13123584. Throughput: 0: 837.5. Samples: 776262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:10:09,898][05631] Avg episode reward: [(0, '22.871')] [2023-02-23 01:10:09,912][45637] Saving new best policy, reward=22.871! [2023-02-23 01:10:13,613][45651] Updated weights for policy 0, policy_version 3207 (0.0020) [2023-02-23 01:10:14,895][05631] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 13139968. Throughput: 0: 839.8. Samples: 778208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:10:14,902][05631] Avg episode reward: [(0, '23.593')] [2023-02-23 01:10:14,905][45637] Saving new best policy, reward=23.593! [2023-02-23 01:10:19,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 13160448. Throughput: 0: 870.4. Samples: 783920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:10:19,897][05631] Avg episode reward: [(0, '24.125')] [2023-02-23 01:10:19,920][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003213_13160448.pth... [2023-02-23 01:10:20,090][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003012_12337152.pth [2023-02-23 01:10:20,102][45637] Saving new best policy, reward=24.125! [2023-02-23 01:10:23,671][45651] Updated weights for policy 0, policy_version 3217 (0.0025) [2023-02-23 01:10:24,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 13176832. Throughput: 0: 860.6. Samples: 789926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:24,908][05631] Avg episode reward: [(0, '24.725')] [2023-02-23 01:10:24,943][45637] Saving new best policy, reward=24.725! [2023-02-23 01:10:29,897][05631] Fps is (10 sec: 3276.1, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13193216. Throughput: 0: 835.6. Samples: 791868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:10:29,904][05631] Avg episode reward: [(0, '24.600')] [2023-02-23 01:10:34,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 13205504. Throughput: 0: 835.9. Samples: 795948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:34,898][05631] Avg episode reward: [(0, '24.329')] [2023-02-23 01:10:37,271][45651] Updated weights for policy 0, policy_version 3227 (0.0025) [2023-02-23 01:10:39,895][05631] Fps is (10 sec: 3277.5, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 13225984. Throughput: 0: 864.1. Samples: 801966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:39,902][05631] Avg episode reward: [(0, '24.278')] [2023-02-23 01:10:44,895][05631] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 13250560. Throughput: 0: 865.9. Samples: 805172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:44,903][05631] Avg episode reward: [(0, '23.675')] [2023-02-23 01:10:47,937][45651] Updated weights for policy 0, policy_version 3237 (0.0057) [2023-02-23 01:10:49,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13262848. Throughput: 0: 838.4. Samples: 810020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:49,902][05631] Avg episode reward: [(0, '24.430')] [2023-02-23 01:10:54,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 13275136. Throughput: 0: 839.5. Samples: 814038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:10:54,905][05631] Avg episode reward: [(0, '24.569')] [2023-02-23 01:10:59,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 13295616. Throughput: 0: 862.8. Samples: 817032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:10:59,898][05631] Avg episode reward: [(0, '23.907')] [2023-02-23 01:11:00,034][45651] Updated weights for policy 0, policy_version 3247 (0.0024) [2023-02-23 01:11:04,895][05631] Fps is (10 sec: 4505.7, 60 sec: 3481.7, 300 sec: 3443.4). Total num frames: 13320192. Throughput: 0: 879.6. Samples: 823504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-23 01:11:04,897][05631] Avg episode reward: [(0, '24.881')] [2023-02-23 01:11:04,905][45637] Saving new best policy, reward=24.881! [2023-02-23 01:11:09,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13332480. Throughput: 0: 847.1. Samples: 828046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:11:09,898][05631] Avg episode reward: [(0, '25.242')] [2023-02-23 01:11:09,919][45637] Saving new best policy, reward=25.242! [2023-02-23 01:11:12,994][45651] Updated weights for policy 0, policy_version 3257 (0.0012) [2023-02-23 01:11:14,897][05631] Fps is (10 sec: 2457.0, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 13344768. Throughput: 0: 846.6. Samples: 829964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:11:14,900][05631] Avg episode reward: [(0, '25.806')] [2023-02-23 01:11:14,903][45637] Saving new best policy, reward=25.806! [2023-02-23 01:11:19,898][05631] Fps is (10 sec: 2867.0, 60 sec: 3345.0, 300 sec: 3429.5). Total num frames: 13361152. Throughput: 0: 867.5. Samples: 834984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:11:19,901][05631] Avg episode reward: [(0, '25.705')] [2023-02-23 01:11:23,720][45651] Updated weights for policy 0, policy_version 3267 (0.0013) [2023-02-23 01:11:24,895][05631] Fps is (10 sec: 4096.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 13385728. Throughput: 0: 874.7. Samples: 841326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:11:24,901][05631] Avg episode reward: [(0, '24.753')] [2023-02-23 01:11:29,895][05631] Fps is (10 sec: 3686.7, 60 sec: 3413.5, 300 sec: 3401.8). Total num frames: 13398016. Throughput: 0: 861.4. Samples: 843934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:11:29,903][05631] Avg episode reward: [(0, '25.665')] [2023-02-23 01:11:34,896][05631] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13414400. Throughput: 0: 845.4. Samples: 848062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:11:34,903][05631] Avg episode reward: [(0, '25.659')] [2023-02-23 01:11:37,075][45651] Updated weights for policy 0, policy_version 3277 (0.0019) [2023-02-23 01:11:39,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 13430784. Throughput: 0: 871.9. Samples: 853272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:11:39,898][05631] Avg episode reward: [(0, '25.147')] [2023-02-23 01:11:44,895][05631] Fps is (10 sec: 4096.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 13455360. Throughput: 0: 877.0. Samples: 856496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:11:44,898][05631] Avg episode reward: [(0, '25.970')] [2023-02-23 01:11:44,902][45637] Saving new best policy, reward=25.970! [2023-02-23 01:11:46,587][45651] Updated weights for policy 0, policy_version 3287 (0.0020) [2023-02-23 01:11:49,900][05631] Fps is (10 sec: 4093.8, 60 sec: 3481.3, 300 sec: 3415.6). Total num frames: 13471744. Throughput: 0: 856.5. Samples: 862052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:11:49,903][05631] Avg episode reward: [(0, '26.362')] [2023-02-23 01:11:49,922][45637] Saving new best policy, reward=26.362! [2023-02-23 01:11:54,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 13484032. Throughput: 0: 843.0. Samples: 865980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:11:54,901][05631] Avg episode reward: [(0, '25.805')] [2023-02-23 01:11:59,895][05631] Fps is (10 sec: 2868.7, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 13500416. Throughput: 0: 849.7. Samples: 868198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:11:59,897][05631] Avg episode reward: [(0, '26.384')] [2023-02-23 01:11:59,909][45637] Saving new best policy, reward=26.384! [2023-02-23 01:12:00,454][45651] Updated weights for policy 0, policy_version 3297 (0.0037) [2023-02-23 01:12:04,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 13520896. Throughput: 0: 878.9. Samples: 874534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:12:04,897][05631] Avg episode reward: [(0, '25.979')] [2023-02-23 01:12:09,900][05631] Fps is (10 sec: 3684.5, 60 sec: 3413.0, 300 sec: 3415.6). Total num frames: 13537280. Throughput: 0: 858.0. Samples: 879940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:09,903][05631] Avg episode reward: [(0, '25.287')] [2023-02-23 01:12:11,565][45651] Updated weights for policy 0, policy_version 3307 (0.0013) [2023-02-23 01:12:14,896][05631] Fps is (10 sec: 3276.4, 60 sec: 3481.7, 300 sec: 3415.7). Total num frames: 13553664. Throughput: 0: 846.1. Samples: 882010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:14,901][05631] Avg episode reward: [(0, '24.655')] [2023-02-23 01:12:19,895][05631] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3429.6). Total num frames: 13570048. Throughput: 0: 850.3. Samples: 886326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:12:19,904][05631] Avg episode reward: [(0, '24.571')] [2023-02-23 01:12:19,915][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003313_13570048.pth... [2023-02-23 01:12:20,102][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003114_12754944.pth [2023-02-23 01:12:23,583][45651] Updated weights for policy 0, policy_version 3317 (0.0034) [2023-02-23 01:12:24,895][05631] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 13590528. Throughput: 0: 873.6. Samples: 892586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:12:24,907][05631] Avg episode reward: [(0, '22.477')] [2023-02-23 01:12:29,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13606912. Throughput: 0: 873.0. Samples: 895780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:29,903][05631] Avg episode reward: [(0, '22.109')] [2023-02-23 01:12:34,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13623296. Throughput: 0: 840.4. Samples: 899864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:34,901][05631] Avg episode reward: [(0, '21.858')] [2023-02-23 01:12:36,210][45651] Updated weights for policy 0, policy_version 3327 (0.0015) [2023-02-23 01:12:39,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 13635584. Throughput: 0: 852.7. Samples: 904350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:12:39,897][05631] Avg episode reward: [(0, '24.157')] [2023-02-23 01:12:44,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 13660160. Throughput: 0: 875.8. Samples: 907610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:44,902][05631] Avg episode reward: [(0, '23.659')] [2023-02-23 01:12:46,831][45651] Updated weights for policy 0, policy_version 3337 (0.0018) [2023-02-23 01:12:49,895][05631] Fps is (10 sec: 4095.8, 60 sec: 3413.6, 300 sec: 3429.5). Total num frames: 13676544. Throughput: 0: 874.8. Samples: 913900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:49,903][05631] Avg episode reward: [(0, '24.030')] [2023-02-23 01:12:54,898][05631] Fps is (10 sec: 2866.3, 60 sec: 3413.1, 300 sec: 3401.8). Total num frames: 13688832. Throughput: 0: 844.8. Samples: 917954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:12:54,901][05631] Avg episode reward: [(0, '24.643')] [2023-02-23 01:12:59,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 13705216. Throughput: 0: 843.9. Samples: 919984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:12:59,898][05631] Avg episode reward: [(0, '26.372')] [2023-02-23 01:13:00,374][45651] Updated weights for policy 0, policy_version 3347 (0.0023) [2023-02-23 01:13:04,897][05631] Fps is (10 sec: 3686.9, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 13725696. Throughput: 0: 875.9. Samples: 925742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:13:04,900][05631] Avg episode reward: [(0, '27.486')] [2023-02-23 01:13:04,905][45637] Saving new best policy, reward=27.486! [2023-02-23 01:13:09,787][45651] Updated weights for policy 0, policy_version 3357 (0.0013) [2023-02-23 01:13:09,895][05631] Fps is (10 sec: 4505.7, 60 sec: 3550.2, 300 sec: 3443.4). Total num frames: 13750272. Throughput: 0: 879.8. Samples: 932176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:13:09,901][05631] Avg episode reward: [(0, '27.197')] [2023-02-23 01:13:14,897][05631] Fps is (10 sec: 3686.2, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 13762560. Throughput: 0: 854.6. Samples: 934240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:13:14,900][05631] Avg episode reward: [(0, '26.746')] [2023-02-23 01:13:19,898][05631] Fps is (10 sec: 2457.0, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 13774848. Throughput: 0: 852.4. Samples: 938222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:13:19,908][05631] Avg episode reward: [(0, '28.616')] [2023-02-23 01:13:19,923][45637] Saving new best policy, reward=28.616! [2023-02-23 01:13:23,491][45651] Updated weights for policy 0, policy_version 3367 (0.0018) [2023-02-23 01:13:24,895][05631] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 13795328. Throughput: 0: 880.3. Samples: 943964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:13:24,899][05631] Avg episode reward: [(0, '29.309')] [2023-02-23 01:13:24,906][45637] Saving new best policy, reward=29.309! [2023-02-23 01:13:29,895][05631] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 13815808. Throughput: 0: 877.4. Samples: 947094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:13:29,902][05631] Avg episode reward: [(0, '28.912')] [2023-02-23 01:13:34,721][45651] Updated weights for policy 0, policy_version 3377 (0.0014) [2023-02-23 01:13:34,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 13832192. Throughput: 0: 851.3. Samples: 952210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:13:34,898][05631] Avg episode reward: [(0, '28.374')] [2023-02-23 01:13:39,897][05631] Fps is (10 sec: 2866.6, 60 sec: 3481.5, 300 sec: 3415.6). Total num frames: 13844480. Throughput: 0: 849.4. Samples: 956176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:13:39,905][05631] Avg episode reward: [(0, '29.335')] [2023-02-23 01:13:39,920][45637] Saving new best policy, reward=29.335! [2023-02-23 01:13:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 13864960. Throughput: 0: 859.2. Samples: 958648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:13:44,898][05631] Avg episode reward: [(0, '29.085')] [2023-02-23 01:13:46,994][45651] Updated weights for policy 0, policy_version 3387 (0.0019) [2023-02-23 01:13:49,895][05631] Fps is (10 sec: 4096.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 13885440. Throughput: 0: 872.9. Samples: 965022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:13:49,897][05631] Avg episode reward: [(0, '28.127')] [2023-02-23 01:13:54,906][05631] Fps is (10 sec: 3273.2, 60 sec: 3481.1, 300 sec: 3415.5). Total num frames: 13897728. Throughput: 0: 842.1. Samples: 970080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:13:54,915][05631] Avg episode reward: [(0, '26.479')] [2023-02-23 01:13:59,719][45651] Updated weights for policy 0, policy_version 3397 (0.0013) [2023-02-23 01:13:59,897][05631] Fps is (10 sec: 2866.8, 60 sec: 3481.5, 300 sec: 3429.6). Total num frames: 13914112. Throughput: 0: 840.0. Samples: 972040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:13:59,906][05631] Avg episode reward: [(0, '25.509')] [2023-02-23 01:14:04,895][05631] Fps is (10 sec: 3280.4, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 13930496. Throughput: 0: 856.0. Samples: 976740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:14:04,898][05631] Avg episode reward: [(0, '24.994')] [2023-02-23 01:14:09,895][05631] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 13950976. Throughput: 0: 871.4. Samples: 983176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:14:09,898][05631] Avg episode reward: [(0, '23.322')] [2023-02-23 01:14:10,160][45651] Updated weights for policy 0, policy_version 3407 (0.0020) [2023-02-23 01:14:14,896][05631] Fps is (10 sec: 3685.9, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 13967360. Throughput: 0: 867.4. Samples: 986128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:14:14,899][05631] Avg episode reward: [(0, '23.329')] [2023-02-23 01:14:19,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3429.5). Total num frames: 13983744. Throughput: 0: 842.7. Samples: 990132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:14:19,897][05631] Avg episode reward: [(0, '23.106')] [2023-02-23 01:14:19,918][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003414_13983744.pth... [2023-02-23 01:14:20,127][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003213_13160448.pth [2023-02-23 01:14:23,926][45651] Updated weights for policy 0, policy_version 3417 (0.0016) [2023-02-23 01:14:24,895][05631] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14000128. Throughput: 0: 857.5. Samples: 994764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:14:24,898][05631] Avg episode reward: [(0, '23.673')] [2023-02-23 01:14:29,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14020608. Throughput: 0: 871.4. Samples: 997862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:14:29,903][05631] Avg episode reward: [(0, '25.290')] [2023-02-23 01:14:34,251][45651] Updated weights for policy 0, policy_version 3427 (0.0014) [2023-02-23 01:14:34,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 14036992. Throughput: 0: 858.6. Samples: 1003660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:14:34,902][05631] Avg episode reward: [(0, '25.524')] [2023-02-23 01:14:39,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3415.6). Total num frames: 14049280. Throughput: 0: 835.0. Samples: 1007648. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:14:39,898][05631] Avg episode reward: [(0, '26.169')] [2023-02-23 01:14:44,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 14065664. Throughput: 0: 834.2. Samples: 1009576. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:14:44,897][05631] Avg episode reward: [(0, '26.559')] [2023-02-23 01:14:47,454][45651] Updated weights for policy 0, policy_version 3437 (0.0029) [2023-02-23 01:14:49,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 14086144. Throughput: 0: 863.0. Samples: 1015576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:14:49,898][05631] Avg episode reward: [(0, '26.514')] [2023-02-23 01:14:54,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3482.2, 300 sec: 3429.5). Total num frames: 14106624. Throughput: 0: 855.2. Samples: 1021658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:14:54,901][05631] Avg episode reward: [(0, '26.384')] [2023-02-23 01:14:58,876][45651] Updated weights for policy 0, policy_version 3447 (0.0013) [2023-02-23 01:14:59,895][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 14118912. Throughput: 0: 834.4. Samples: 1023676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-23 01:14:59,899][05631] Avg episode reward: [(0, '26.063')] [2023-02-23 01:15:04,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 14131200. Throughput: 0: 835.2. Samples: 1027716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:15:04,909][05631] Avg episode reward: [(0, '26.568')] [2023-02-23 01:15:09,895][05631] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14155776. Throughput: 0: 867.8. Samples: 1033816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:15:09,897][05631] Avg episode reward: [(0, '26.731')] [2023-02-23 01:15:10,714][45651] Updated weights for policy 0, policy_version 3457 (0.0018) [2023-02-23 01:15:14,898][05631] Fps is (10 sec: 4504.1, 60 sec: 3481.5, 300 sec: 3443.4). Total num frames: 14176256. Throughput: 0: 870.1. Samples: 1037018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:15:14,900][05631] Avg episode reward: [(0, '27.516')] [2023-02-23 01:15:19,896][05631] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 14188544. Throughput: 0: 847.9. Samples: 1041814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-23 01:15:19,899][05631] Avg episode reward: [(0, '26.904')] [2023-02-23 01:15:23,590][45651] Updated weights for policy 0, policy_version 3467 (0.0030) [2023-02-23 01:15:24,895][05631] Fps is (10 sec: 2458.4, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 14200832. Throughput: 0: 848.1. Samples: 1045812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:15:24,902][05631] Avg episode reward: [(0, '26.383')] [2023-02-23 01:15:29,895][05631] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 14221312. Throughput: 0: 868.0. Samples: 1048636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:15:29,906][05631] Avg episode reward: [(0, '26.338')] [2023-02-23 01:15:34,105][45651] Updated weights for policy 0, policy_version 3477 (0.0028) [2023-02-23 01:15:34,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14241792. Throughput: 0: 875.5. Samples: 1054972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:15:34,898][05631] Avg episode reward: [(0, '25.812')] [2023-02-23 01:15:39,898][05631] Fps is (10 sec: 3685.2, 60 sec: 3481.4, 300 sec: 3415.6). Total num frames: 14258176. Throughput: 0: 848.1. Samples: 1059826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-23 01:15:39,901][05631] Avg episode reward: [(0, '25.314')] [2023-02-23 01:15:44,896][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 14270464. Throughput: 0: 846.8. Samples: 1061784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:15:44,909][05631] Avg episode reward: [(0, '24.355')] [2023-02-23 01:15:47,579][45651] Updated weights for policy 0, policy_version 3487 (0.0012) [2023-02-23 01:15:49,895][05631] Fps is (10 sec: 3277.9, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14290944. Throughput: 0: 864.2. Samples: 1066606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:15:49,902][05631] Avg episode reward: [(0, '23.775')] [2023-02-23 01:15:54,895][05631] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14311424. Throughput: 0: 874.9. Samples: 1073186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:15:54,898][05631] Avg episode reward: [(0, '23.428')] [2023-02-23 01:15:57,628][45651] Updated weights for policy 0, policy_version 3497 (0.0013) [2023-02-23 01:15:59,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 14327808. Throughput: 0: 862.8. Samples: 1075840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:15:59,898][05631] Avg episode reward: [(0, '23.884')] [2023-02-23 01:16:04,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 14340096. Throughput: 0: 845.6. Samples: 1079866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:04,898][05631] Avg episode reward: [(0, '23.815')] [2023-02-23 01:16:09,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14360576. Throughput: 0: 870.1. Samples: 1084966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:09,898][05631] Avg episode reward: [(0, '23.374')] [2023-02-23 01:16:10,784][45651] Updated weights for policy 0, policy_version 3507 (0.0020) [2023-02-23 01:16:14,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3457.3). Total num frames: 14381056. Throughput: 0: 877.8. Samples: 1088138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:14,898][05631] Avg episode reward: [(0, '23.445')] [2023-02-23 01:16:19,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 14397440. Throughput: 0: 866.1. Samples: 1093946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:19,900][05631] Avg episode reward: [(0, '24.174')] [2023-02-23 01:16:19,910][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003515_14397440.pth... [2023-02-23 01:16:20,136][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003313_13570048.pth [2023-02-23 01:16:22,403][45651] Updated weights for policy 0, policy_version 3517 (0.0016) [2023-02-23 01:16:24,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 14409728. Throughput: 0: 842.6. Samples: 1097740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:24,909][05631] Avg episode reward: [(0, '24.328')] [2023-02-23 01:16:29,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 14426112. Throughput: 0: 844.3. Samples: 1099776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:16:29,898][05631] Avg episode reward: [(0, '24.616')] [2023-02-23 01:16:34,160][45651] Updated weights for policy 0, policy_version 3527 (0.0020) [2023-02-23 01:16:34,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14446592. Throughput: 0: 875.1. Samples: 1105984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:16:34,897][05631] Avg episode reward: [(0, '24.564')] [2023-02-23 01:16:39,896][05631] Fps is (10 sec: 4095.9, 60 sec: 3481.8, 300 sec: 3429.5). Total num frames: 14467072. Throughput: 0: 854.8. Samples: 1111654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:16:39,901][05631] Avg episode reward: [(0, '24.356')] [2023-02-23 01:16:44,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 14479360. Throughput: 0: 840.8. Samples: 1113676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-23 01:16:44,899][05631] Avg episode reward: [(0, '25.001')] [2023-02-23 01:16:47,088][45651] Updated weights for policy 0, policy_version 3537 (0.0015) [2023-02-23 01:16:49,895][05631] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 14495744. Throughput: 0: 842.0. Samples: 1117758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:16:49,898][05631] Avg episode reward: [(0, '25.580')] [2023-02-23 01:16:54,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14516224. Throughput: 0: 867.9. Samples: 1124020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:54,903][05631] Avg episode reward: [(0, '26.539')] [2023-02-23 01:16:57,383][45651] Updated weights for policy 0, policy_version 3547 (0.0019) [2023-02-23 01:16:59,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 14536704. Throughput: 0: 870.0. Samples: 1127286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:16:59,901][05631] Avg episode reward: [(0, '26.424')] [2023-02-23 01:17:04,898][05631] Fps is (10 sec: 3276.0, 60 sec: 3481.4, 300 sec: 3429.6). Total num frames: 14548992. Throughput: 0: 839.5. Samples: 1131724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:04,901][05631] Avg episode reward: [(0, '27.096')] [2023-02-23 01:17:09,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 14561280. Throughput: 0: 844.6. Samples: 1135746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:17:09,903][05631] Avg episode reward: [(0, '26.816')] [2023-02-23 01:17:11,108][45651] Updated weights for policy 0, policy_version 3557 (0.0025) [2023-02-23 01:17:14,895][05631] Fps is (10 sec: 3277.6, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 14581760. Throughput: 0: 869.7. Samples: 1138914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:17:14,898][05631] Avg episode reward: [(0, '29.013')] [2023-02-23 01:17:19,895][05631] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 14606336. Throughput: 0: 875.4. Samples: 1145376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:17:19,902][05631] Avg episode reward: [(0, '28.396')] [2023-02-23 01:17:21,305][45651] Updated weights for policy 0, policy_version 3567 (0.0016) [2023-02-23 01:17:24,896][05631] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3429.5). Total num frames: 14618624. Throughput: 0: 846.1. Samples: 1149730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:24,905][05631] Avg episode reward: [(0, '28.227')] [2023-02-23 01:17:29,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 14630912. Throughput: 0: 844.6. Samples: 1151682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:17:29,899][05631] Avg episode reward: [(0, '28.876')] [2023-02-23 01:17:34,496][45651] Updated weights for policy 0, policy_version 3577 (0.0026) [2023-02-23 01:17:34,895][05631] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14651392. Throughput: 0: 868.5. Samples: 1156840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:34,898][05631] Avg episode reward: [(0, '29.874')] [2023-02-23 01:17:34,901][45637] Saving new best policy, reward=29.874! [2023-02-23 01:17:39,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 14671872. Throughput: 0: 871.2. Samples: 1163224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:39,899][05631] Avg episode reward: [(0, '29.521')] [2023-02-23 01:17:44,895][05631] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 14688256. Throughput: 0: 853.3. Samples: 1165686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:44,899][05631] Avg episode reward: [(0, '27.351')] [2023-02-23 01:17:46,015][45651] Updated weights for policy 0, policy_version 3587 (0.0018) [2023-02-23 01:17:49,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 14700544. Throughput: 0: 845.2. Samples: 1169756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:49,902][05631] Avg episode reward: [(0, '27.131')] [2023-02-23 01:17:54,897][05631] Fps is (10 sec: 2866.8, 60 sec: 3345.0, 300 sec: 3429.5). Total num frames: 14716928. Throughput: 0: 871.5. Samples: 1174964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:17:54,900][05631] Avg episode reward: [(0, '27.021')] [2023-02-23 01:17:57,834][45651] Updated weights for policy 0, policy_version 3597 (0.0034) [2023-02-23 01:17:59,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 14741504. Throughput: 0: 871.1. Samples: 1178112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:17:59,899][05631] Avg episode reward: [(0, '26.287')] [2023-02-23 01:18:04,895][05631] Fps is (10 sec: 3687.0, 60 sec: 3413.5, 300 sec: 3401.8). Total num frames: 14753792. Throughput: 0: 846.6. Samples: 1183472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:04,903][05631] Avg episode reward: [(0, '26.139')] [2023-02-23 01:18:09,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.7). Total num frames: 14770176. Throughput: 0: 839.5. Samples: 1187506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:09,901][05631] Avg episode reward: [(0, '26.193')] [2023-02-23 01:18:10,926][45651] Updated weights for policy 0, policy_version 3607 (0.0025) [2023-02-23 01:18:14,895][05631] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 14786560. Throughput: 0: 843.8. Samples: 1189652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:14,898][05631] Avg episode reward: [(0, '26.528')] [2023-02-23 01:18:19,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 14807040. Throughput: 0: 870.5. Samples: 1196012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:18:19,904][05631] Avg episode reward: [(0, '27.337')] [2023-02-23 01:18:19,919][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003615_14807040.pth... [2023-02-23 01:18:20,024][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003414_13983744.pth [2023-02-23 01:18:21,323][45651] Updated weights for policy 0, policy_version 3617 (0.0022) [2023-02-23 01:18:24,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 14823424. Throughput: 0: 844.5. Samples: 1201226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:18:24,902][05631] Avg episode reward: [(0, '27.728')] [2023-02-23 01:18:29,897][05631] Fps is (10 sec: 2866.5, 60 sec: 3413.2, 300 sec: 3401.7). Total num frames: 14835712. Throughput: 0: 833.1. Samples: 1203176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:29,903][05631] Avg episode reward: [(0, '28.032')] [2023-02-23 01:18:34,895][05631] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.7). Total num frames: 14852096. Throughput: 0: 838.1. Samples: 1207472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:34,898][05631] Avg episode reward: [(0, '27.593')] [2023-02-23 01:18:35,158][45651] Updated weights for policy 0, policy_version 3627 (0.0017) [2023-02-23 01:18:39,898][05631] Fps is (10 sec: 4095.7, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 14876672. Throughput: 0: 866.5. Samples: 1213956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:39,902][05631] Avg episode reward: [(0, '27.846')] [2023-02-23 01:18:44,895][05631] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 14893056. Throughput: 0: 865.6. Samples: 1217064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:44,904][05631] Avg episode reward: [(0, '25.499')] [2023-02-23 01:18:45,421][45651] Updated weights for policy 0, policy_version 3637 (0.0014) [2023-02-23 01:18:49,895][05631] Fps is (10 sec: 2868.0, 60 sec: 3413.3, 300 sec: 3415.8). Total num frames: 14905344. Throughput: 0: 838.3. Samples: 1221196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-23 01:18:49,902][05631] Avg episode reward: [(0, '26.467')] [2023-02-23 01:18:54,895][05631] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 14921728. Throughput: 0: 843.8. Samples: 1225476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-23 01:18:54,898][05631] Avg episode reward: [(0, '26.901')] [2023-02-23 01:18:58,375][45651] Updated weights for policy 0, policy_version 3647 (0.0024) [2023-02-23 01:18:59,895][05631] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 14942208. Throughput: 0: 866.2. Samples: 1228630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-23 01:18:59,904][05631] Avg episode reward: [(0, '27.805')] [2023-02-23 01:19:04,895][05631] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 14962688. Throughput: 0: 864.1. Samples: 1234898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:19:04,898][05631] Avg episode reward: [(0, '28.057')] [2023-02-23 01:19:09,897][05631] Fps is (10 sec: 3276.3, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 14974976. Throughput: 0: 836.0. Samples: 1238846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-23 01:19:09,905][05631] Avg episode reward: [(0, '28.742')] [2023-02-23 01:19:10,968][45651] Updated weights for policy 0, policy_version 3657 (0.0013) [2023-02-23 01:19:14,895][05631] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 14987264. Throughput: 0: 836.1. Samples: 1240800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-23 01:19:14,900][05631] Avg episode reward: [(0, '28.278')] [2023-02-23 01:19:18,863][05631] Component Batcher_0 stopped! [2023-02-23 01:19:18,862][45637] Stopping Batcher_0... [2023-02-23 01:19:18,866][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003664_15007744.pth... [2023-02-23 01:19:18,868][45637] Loop batcher_evt_loop terminating... [2023-02-23 01:19:18,921][45651] Weights refcount: 2 0 [2023-02-23 01:19:18,925][05631] Component InferenceWorker_p0-w0 stopped! [2023-02-23 01:19:18,924][45651] Stopping InferenceWorker_p0-w0... [2023-02-23 01:19:18,936][45651] Loop inference_proc0-0_evt_loop terminating... [2023-02-23 01:19:19,021][45637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003515_14397440.pth [2023-02-23 01:19:19,032][45637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003664_15007744.pth... [2023-02-23 01:19:19,137][45637] Stopping LearnerWorker_p0... [2023-02-23 01:19:19,138][45637] Loop learner_proc0_evt_loop terminating... [2023-02-23 01:19:19,145][05631] Component LearnerWorker_p0 stopped! [2023-02-23 01:19:19,178][05631] Component RolloutWorker_w0 stopped! [2023-02-23 01:19:19,191][05631] Component RolloutWorker_w7 stopped! [2023-02-23 01:19:19,199][45664] Stopping RolloutWorker_w4... [2023-02-23 01:19:19,199][05631] Component RolloutWorker_w4 stopped! [2023-02-23 01:19:19,209][45653] Stopping RolloutWorker_w2... [2023-02-23 01:19:19,189][45652] Stopping RolloutWorker_w0... [2023-02-23 01:19:19,210][45652] Loop rollout_proc0_evt_loop terminating... [2023-02-23 01:19:19,210][45664] Loop rollout_proc4_evt_loop terminating... [2023-02-23 01:19:19,209][05631] Component RolloutWorker_w2 stopped! [2023-02-23 01:19:19,194][45672] Stopping RolloutWorker_w7... [2023-02-23 01:19:19,218][45670] Stopping RolloutWorker_w6... [2023-02-23 01:19:19,213][45653] Loop rollout_proc2_evt_loop terminating... [2023-02-23 01:19:19,219][45670] Loop rollout_proc6_evt_loop terminating... [2023-02-23 01:19:19,218][05631] Component RolloutWorker_w6 stopped! [2023-02-23 01:19:19,226][45662] Stopping RolloutWorker_w3... [2023-02-23 01:19:19,227][45662] Loop rollout_proc3_evt_loop terminating... [2023-02-23 01:19:19,226][05631] Component RolloutWorker_w3 stopped! [2023-02-23 01:19:19,215][45672] Loop rollout_proc7_evt_loop terminating... [2023-02-23 01:19:19,253][05631] Component RolloutWorker_w1 stopped! [2023-02-23 01:19:19,255][45660] Stopping RolloutWorker_w1... [2023-02-23 01:19:19,259][05631] Component RolloutWorker_w5 stopped! [2023-02-23 01:19:19,261][05631] Waiting for process learner_proc0 to stop... [2023-02-23 01:19:19,268][45674] Stopping RolloutWorker_w5... [2023-02-23 01:19:19,271][45660] Loop rollout_proc1_evt_loop terminating... [2023-02-23 01:19:19,276][45674] Loop rollout_proc5_evt_loop terminating... [2023-02-23 01:19:21,785][05631] Waiting for process inference_proc0-0 to join... [2023-02-23 01:19:21,871][05631] Waiting for process rollout_proc0 to join... [2023-02-23 01:19:21,878][05631] Waiting for process rollout_proc1 to join... [2023-02-23 01:19:22,078][05631] Waiting for process rollout_proc2 to join... [2023-02-23 01:19:22,079][05631] Waiting for process rollout_proc3 to join... [2023-02-23 01:19:22,081][05631] Waiting for process rollout_proc4 to join... [2023-02-23 01:19:22,082][05631] Waiting for process rollout_proc5 to join... [2023-02-23 01:19:22,084][05631] Waiting for process rollout_proc6 to join... [2023-02-23 01:19:22,086][05631] Waiting for process rollout_proc7 to join... [2023-02-23 01:19:22,087][05631] Batcher 0 profile tree view: batching: 33.6766, releasing_batches: 0.0367 [2023-02-23 01:19:22,089][05631] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0033 wait_policy_total: 706.1641 update_model: 10.8087 weight_update: 0.0030 one_step: 0.0029 handle_policy_step: 706.2869 deserialize: 19.9182, stack: 4.0569, obs_to_device_normalize: 153.0930, forward: 346.6872, send_messages: 34.5198 prepare_outputs: 112.4429 to_cpu: 69.9142 [2023-02-23 01:19:22,091][05631] Learner 0 profile tree view: misc: 0.0073, prepare_batch: 20.8492 train: 101.1410 epoch_init: 0.0073, minibatch_init: 0.0148, losses_postprocess: 0.7208, kl_divergence: 0.7410, after_optimizer: 3.5961 calculate_losses: 34.2226 losses_init: 0.0099, forward_head: 2.2717, bptt_initial: 22.4144, tail: 1.5801, advantages_returns: 0.4399, losses: 4.0439 bptt: 3.0091 bptt_forward_core: 2.8539 update: 60.9162 clip: 1.8641 [2023-02-23 01:19:22,092][05631] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5163, enqueue_policy_requests: 197.2211, env_step: 1104.3054, overhead: 30.0232, complete_rollouts: 9.9141 save_policy_outputs: 28.0545 split_output_tensors: 13.4964 [2023-02-23 01:19:22,094][05631] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4448, enqueue_policy_requests: 202.0074, env_step: 1109.5958, overhead: 30.4165, complete_rollouts: 9.5866 save_policy_outputs: 28.6057 split_output_tensors: 13.7530 [2023-02-23 01:19:22,096][05631] Loop Runner_EvtLoop terminating... [2023-02-23 01:19:22,098][05631] Runner profile tree view: main_loop: 1499.7011 [2023-02-23 01:19:22,100][05631] Collected {0: 15007744}, FPS: 3323.9 [2023-02-23 01:19:22,155][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 01:19:22,158][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 01:19:22,159][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 01:19:22,160][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 01:19:22,162][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 01:19:22,164][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 01:19:22,166][05631] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 01:19:22,168][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 01:19:22,177][05631] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-23 01:19:22,179][05631] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-23 01:19:22,180][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 01:19:22,183][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 01:19:22,186][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 01:19:22,187][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 01:19:22,194][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 01:19:22,220][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 01:19:22,223][05631] RunningMeanStd input shape: (1,) [2023-02-23 01:19:22,237][05631] ConvEncoder: input_channels=3 [2023-02-23 01:19:22,293][05631] Conv encoder output size: 512 [2023-02-23 01:19:22,296][05631] Policy head output size: 512 [2023-02-23 01:19:22,325][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003664_15007744.pth... [2023-02-23 01:19:22,833][05631] Num frames 100... [2023-02-23 01:19:22,949][05631] Num frames 200... [2023-02-23 01:19:23,063][05631] Num frames 300... [2023-02-23 01:19:23,194][05631] Num frames 400... [2023-02-23 01:19:23,306][05631] Num frames 500... [2023-02-23 01:19:23,425][05631] Num frames 600... [2023-02-23 01:19:23,538][05631] Num frames 700... [2023-02-23 01:19:23,656][05631] Num frames 800... [2023-02-23 01:19:23,793][05631] Num frames 900... [2023-02-23 01:19:23,907][05631] Num frames 1000... [2023-02-23 01:19:24,029][05631] Num frames 1100... [2023-02-23 01:19:24,140][05631] Num frames 1200... [2023-02-23 01:19:24,273][05631] Num frames 1300... [2023-02-23 01:19:24,445][05631] Num frames 1400... [2023-02-23 01:19:24,628][05631] Num frames 1500... [2023-02-23 01:19:24,802][05631] Num frames 1600... [2023-02-23 01:19:24,967][05631] Num frames 1700... [2023-02-23 01:19:25,080][05631] Avg episode rewards: #0: 51.329, true rewards: #0: 17.330 [2023-02-23 01:19:25,086][05631] Avg episode reward: 51.329, avg true_objective: 17.330 [2023-02-23 01:19:25,208][05631] Num frames 1800... [2023-02-23 01:19:25,379][05631] Num frames 1900... [2023-02-23 01:19:25,565][05631] Num frames 2000... [2023-02-23 01:19:25,730][05631] Num frames 2100... [2023-02-23 01:19:25,902][05631] Num frames 2200... [2023-02-23 01:19:26,065][05631] Num frames 2300... [2023-02-23 01:19:26,244][05631] Num frames 2400... [2023-02-23 01:19:26,410][05631] Num frames 2500... [2023-02-23 01:19:26,574][05631] Num frames 2600... [2023-02-23 01:19:26,755][05631] Num frames 2700... [2023-02-23 01:19:26,965][05631] Avg episode rewards: #0: 38.445, true rewards: #0: 13.945 [2023-02-23 01:19:26,967][05631] Avg episode reward: 38.445, avg true_objective: 13.945 [2023-02-23 01:19:26,988][05631] Num frames 2800... [2023-02-23 01:19:27,160][05631] Num frames 2900... [2023-02-23 01:19:27,321][05631] Num frames 3000... [2023-02-23 01:19:27,481][05631] Num frames 3100... [2023-02-23 01:19:27,649][05631] Num frames 3200... [2023-02-23 01:19:27,814][05631] Num frames 3300... [2023-02-23 01:19:27,980][05631] Num frames 3400... [2023-02-23 01:19:28,092][05631] Num frames 3500... [2023-02-23 01:19:28,207][05631] Num frames 3600... [2023-02-23 01:19:28,291][05631] Avg episode rewards: #0: 31.737, true rewards: #0: 12.070 [2023-02-23 01:19:28,294][05631] Avg episode reward: 31.737, avg true_objective: 12.070 [2023-02-23 01:19:28,389][05631] Num frames 3700... [2023-02-23 01:19:28,502][05631] Num frames 3800... [2023-02-23 01:19:28,621][05631] Num frames 3900... [2023-02-23 01:19:28,735][05631] Num frames 4000... [2023-02-23 01:19:28,858][05631] Num frames 4100... [2023-02-23 01:19:28,978][05631] Num frames 4200... [2023-02-23 01:19:29,091][05631] Num frames 4300... [2023-02-23 01:19:29,211][05631] Num frames 4400... [2023-02-23 01:19:29,370][05631] Avg episode rewards: #0: 28.712, true rewards: #0: 11.212 [2023-02-23 01:19:29,371][05631] Avg episode reward: 28.712, avg true_objective: 11.212 [2023-02-23 01:19:29,396][05631] Num frames 4500... [2023-02-23 01:19:29,520][05631] Num frames 4600... [2023-02-23 01:19:29,639][05631] Num frames 4700... [2023-02-23 01:19:29,763][05631] Num frames 4800... [2023-02-23 01:19:29,879][05631] Num frames 4900... [2023-02-23 01:19:30,012][05631] Num frames 5000... [2023-02-23 01:19:30,138][05631] Num frames 5100... [2023-02-23 01:19:30,255][05631] Num frames 5200... [2023-02-23 01:19:30,380][05631] Num frames 5300... [2023-02-23 01:19:30,496][05631] Num frames 5400... [2023-02-23 01:19:30,617][05631] Num frames 5500... [2023-02-23 01:19:30,737][05631] Num frames 5600... [2023-02-23 01:19:30,856][05631] Num frames 5700... [2023-02-23 01:19:30,983][05631] Num frames 5800... [2023-02-23 01:19:31,102][05631] Num frames 5900... [2023-02-23 01:19:31,220][05631] Num frames 6000... [2023-02-23 01:19:31,350][05631] Num frames 6100... [2023-02-23 01:19:31,468][05631] Num frames 6200... [2023-02-23 01:19:31,585][05631] Num frames 6300... [2023-02-23 01:19:31,705][05631] Num frames 6400... [2023-02-23 01:19:31,825][05631] Num frames 6500... [2023-02-23 01:19:31,985][05631] Avg episode rewards: #0: 33.770, true rewards: #0: 13.170 [2023-02-23 01:19:31,987][05631] Avg episode reward: 33.770, avg true_objective: 13.170 [2023-02-23 01:19:32,010][05631] Num frames 6600... [2023-02-23 01:19:32,137][05631] Num frames 6700... [2023-02-23 01:19:32,267][05631] Num frames 6800... [2023-02-23 01:19:32,386][05631] Num frames 6900... [2023-02-23 01:19:32,506][05631] Num frames 7000... [2023-02-23 01:19:32,633][05631] Num frames 7100... [2023-02-23 01:19:32,751][05631] Num frames 7200... [2023-02-23 01:19:32,871][05631] Num frames 7300... [2023-02-23 01:19:33,002][05631] Num frames 7400... [2023-02-23 01:19:33,151][05631] Avg episode rewards: #0: 31.968, true rewards: #0: 12.468 [2023-02-23 01:19:33,153][05631] Avg episode reward: 31.968, avg true_objective: 12.468 [2023-02-23 01:19:33,181][05631] Num frames 7500... [2023-02-23 01:19:33,307][05631] Num frames 7600... [2023-02-23 01:19:33,434][05631] Num frames 7700... [2023-02-23 01:19:33,550][05631] Num frames 7800... [2023-02-23 01:19:33,667][05631] Num frames 7900... [2023-02-23 01:19:33,783][05631] Num frames 8000... [2023-02-23 01:19:33,900][05631] Num frames 8100... [2023-02-23 01:19:34,018][05631] Num frames 8200... [2023-02-23 01:19:34,132][05631] Num frames 8300... [2023-02-23 01:19:34,257][05631] Num frames 8400... [2023-02-23 01:19:34,375][05631] Num frames 8500... [2023-02-23 01:19:34,491][05631] Num frames 8600... [2023-02-23 01:19:34,611][05631] Num frames 8700... [2023-02-23 01:19:34,729][05631] Num frames 8800... [2023-02-23 01:19:34,844][05631] Num frames 8900... [2023-02-23 01:19:34,966][05631] Num frames 9000... [2023-02-23 01:19:35,090][05631] Num frames 9100... [2023-02-23 01:19:35,209][05631] Num frames 9200... [2023-02-23 01:19:35,341][05631] Num frames 9300... [2023-02-23 01:19:35,458][05631] Num frames 9400... [2023-02-23 01:19:35,578][05631] Num frames 9500... [2023-02-23 01:19:35,732][05631] Avg episode rewards: #0: 36.115, true rewards: #0: 13.687 [2023-02-23 01:19:35,735][05631] Avg episode reward: 36.115, avg true_objective: 13.687 [2023-02-23 01:19:35,768][05631] Num frames 9600... [2023-02-23 01:19:35,895][05631] Num frames 9700... [2023-02-23 01:19:36,024][05631] Num frames 9800... [2023-02-23 01:19:36,141][05631] Num frames 9900... [2023-02-23 01:19:36,263][05631] Num frames 10000... [2023-02-23 01:19:36,382][05631] Num frames 10100... [2023-02-23 01:19:36,506][05631] Num frames 10200... [2023-02-23 01:19:36,624][05631] Num frames 10300... [2023-02-23 01:19:36,683][05631] Avg episode rewards: #0: 33.501, true rewards: #0: 12.876 [2023-02-23 01:19:36,684][05631] Avg episode reward: 33.501, avg true_objective: 12.876 [2023-02-23 01:19:36,813][05631] Num frames 10400... [2023-02-23 01:19:36,942][05631] Num frames 10500... [2023-02-23 01:19:37,067][05631] Num frames 10600... [2023-02-23 01:19:37,192][05631] Num frames 10700... [2023-02-23 01:19:37,321][05631] Num frames 10800... [2023-02-23 01:19:37,438][05631] Num frames 10900... [2023-02-23 01:19:37,554][05631] Num frames 11000... [2023-02-23 01:19:37,671][05631] Num frames 11100... [2023-02-23 01:19:37,784][05631] Num frames 11200... [2023-02-23 01:19:37,906][05631] Num frames 11300... [2023-02-23 01:19:38,070][05631] Num frames 11400... [2023-02-23 01:19:38,239][05631] Num frames 11500... [2023-02-23 01:19:38,434][05631] Avg episode rewards: #0: 33.201, true rewards: #0: 12.868 [2023-02-23 01:19:38,440][05631] Avg episode reward: 33.201, avg true_objective: 12.868 [2023-02-23 01:19:38,478][05631] Num frames 11600... [2023-02-23 01:19:38,647][05631] Num frames 11700... [2023-02-23 01:19:38,808][05631] Num frames 11800... [2023-02-23 01:19:38,967][05631] Num frames 11900... [2023-02-23 01:19:39,138][05631] Num frames 12000... [2023-02-23 01:19:39,302][05631] Num frames 12100... [2023-02-23 01:19:39,462][05631] Num frames 12200... [2023-02-23 01:19:39,621][05631] Num frames 12300... [2023-02-23 01:19:39,783][05631] Num frames 12400... [2023-02-23 01:19:39,946][05631] Num frames 12500... [2023-02-23 01:19:40,106][05631] Num frames 12600... [2023-02-23 01:19:40,196][05631] Avg episode rewards: #0: 32.418, true rewards: #0: 12.618 [2023-02-23 01:19:40,198][05631] Avg episode reward: 32.418, avg true_objective: 12.618 [2023-02-23 01:21:02,859][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-23 01:21:03,307][05631] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-23 01:21:03,315][05631] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-23 01:21:03,319][05631] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-23 01:21:03,323][05631] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-23 01:21:03,327][05631] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-23 01:21:03,329][05631] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-23 01:21:03,331][05631] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-23 01:21:03,333][05631] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-23 01:21:03,335][05631] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-23 01:21:03,337][05631] Adding new argument 'hf_repository'='pittawat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-23 01:21:03,339][05631] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-23 01:21:03,341][05631] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-23 01:21:03,343][05631] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-23 01:21:03,345][05631] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-23 01:21:03,347][05631] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-23 01:21:03,376][05631] RunningMeanStd input shape: (3, 72, 128) [2023-02-23 01:21:03,378][05631] RunningMeanStd input shape: (1,) [2023-02-23 01:21:03,395][05631] ConvEncoder: input_channels=3 [2023-02-23 01:21:03,455][05631] Conv encoder output size: 512 [2023-02-23 01:21:03,458][05631] Policy head output size: 512 [2023-02-23 01:21:03,484][05631] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003664_15007744.pth... [2023-02-23 01:21:04,167][05631] Num frames 100... [2023-02-23 01:21:04,328][05631] Num frames 200... [2023-02-23 01:21:04,487][05631] Num frames 300... [2023-02-23 01:21:04,639][05631] Num frames 400... [2023-02-23 01:21:04,805][05631] Num frames 500... [2023-02-23 01:21:04,956][05631] Num frames 600... [2023-02-23 01:21:05,112][05631] Num frames 700... [2023-02-23 01:21:05,285][05631] Num frames 800... [2023-02-23 01:21:05,446][05631] Num frames 900... [2023-02-23 01:21:05,617][05631] Num frames 1000... [2023-02-23 01:21:05,813][05631] Num frames 1100... [2023-02-23 01:21:05,982][05631] Num frames 1200... [2023-02-23 01:21:06,148][05631] Num frames 1300... [2023-02-23 01:21:06,305][05631] Num frames 1400... [2023-02-23 01:21:06,407][05631] Avg episode rewards: #0: 34.180, true rewards: #0: 14.180 [2023-02-23 01:21:06,410][05631] Avg episode reward: 34.180, avg true_objective: 14.180 [2023-02-23 01:21:06,600][05631] Num frames 1500... [2023-02-23 01:21:06,782][05631] Num frames 1600... [2023-02-23 01:21:06,943][05631] Num frames 1700... [2023-02-23 01:21:07,148][05631] Num frames 1800... [2023-02-23 01:21:07,338][05631] Num frames 1900... [2023-02-23 01:21:07,525][05631] Num frames 2000... [2023-02-23 01:21:07,724][05631] Num frames 2100... [2023-02-23 01:21:07,920][05631] Num frames 2200... [2023-02-23 01:21:08,095][05631] Num frames 2300... [2023-02-23 01:21:08,277][05631] Num frames 2400... [2023-02-23 01:21:08,456][05631] Num frames 2500... [2023-02-23 01:21:08,665][05631] Num frames 2600... [2023-02-23 01:21:08,874][05631] Num frames 2700... [2023-02-23 01:21:09,066][05631] Num frames 2800... [2023-02-23 01:21:09,262][05631] Num frames 2900... [2023-02-23 01:21:09,460][05631] Num frames 3000... [2023-02-23 01:21:09,671][05631] Num frames 3100... [2023-02-23 01:21:09,892][05631] Num frames 3200... [2023-02-23 01:21:10,103][05631] Num frames 3300... [2023-02-23 01:21:10,309][05631] Num frames 3400... [2023-02-23 01:21:10,495][05631] Num frames 3500... [2023-02-23 01:21:10,606][05631] Avg episode rewards: #0: 45.589, true rewards: #0: 17.590 [2023-02-23 01:21:10,608][05631] Avg episode reward: 45.589, avg true_objective: 17.590 [2023-02-23 01:21:10,793][05631] Num frames 3600... [2023-02-23 01:21:11,001][05631] Num frames 3700... [2023-02-23 01:21:11,217][05631] Num frames 3800... [2023-02-23 01:21:11,413][05631] Num frames 3900... [2023-02-23 01:21:11,633][05631] Num frames 4000... [2023-02-23 01:21:11,820][05631] Num frames 4100... [2023-02-23 01:21:11,984][05631] Num frames 4200... [2023-02-23 01:21:12,147][05631] Num frames 4300... [2023-02-23 01:21:12,303][05631] Num frames 4400... [2023-02-23 01:21:12,469][05631] Num frames 4500... [2023-02-23 01:21:12,630][05631] Num frames 4600... [2023-02-23 01:21:12,799][05631] Num frames 4700... [2023-02-23 01:21:12,970][05631] Avg episode rewards: #0: 39.553, true rewards: #0: 15.887 [2023-02-23 01:21:12,973][05631] Avg episode reward: 39.553, avg true_objective: 15.887 [2023-02-23 01:21:13,029][05631] Num frames 4800... [2023-02-23 01:21:13,179][05631] Num frames 4900... [2023-02-23 01:21:13,295][05631] Num frames 5000... [2023-02-23 01:21:13,414][05631] Num frames 5100... [2023-02-23 01:21:13,532][05631] Num frames 5200... [2023-02-23 01:21:13,646][05631] Num frames 5300... [2023-02-23 01:21:13,757][05631] Num frames 5400... [2023-02-23 01:21:13,869][05631] Num frames 5500... [2023-02-23 01:21:13,982][05631] Num frames 5600... [2023-02-23 01:21:14,093][05631] Num frames 5700... [2023-02-23 01:21:14,214][05631] Num frames 5800... [2023-02-23 01:21:14,330][05631] Num frames 5900... [2023-02-23 01:21:14,448][05631] Num frames 6000... [2023-02-23 01:21:14,563][05631] Avg episode rewards: #0: 37.615, true rewards: #0: 15.115 [2023-02-23 01:21:14,565][05631] Avg episode reward: 37.615, avg true_objective: 15.115 [2023-02-23 01:21:14,631][05631] Num frames 6100... [2023-02-23 01:21:14,748][05631] Num frames 6200... [2023-02-23 01:21:14,868][05631] Num frames 6300... [2023-02-23 01:21:14,989][05631] Num frames 6400... [2023-02-23 01:21:15,113][05631] Num frames 6500... [2023-02-23 01:21:15,240][05631] Num frames 6600... [2023-02-23 01:21:15,362][05631] Num frames 6700... [2023-02-23 01:21:15,490][05631] Num frames 6800... [2023-02-23 01:21:15,609][05631] Num frames 6900... [2023-02-23 01:21:15,725][05631] Num frames 7000... [2023-02-23 01:21:15,839][05631] Num frames 7100... [2023-02-23 01:21:15,955][05631] Num frames 7200... [2023-02-23 01:21:16,077][05631] Avg episode rewards: #0: 35.892, true rewards: #0: 14.492 [2023-02-23 01:21:16,079][05631] Avg episode reward: 35.892, avg true_objective: 14.492 [2023-02-23 01:21:16,149][05631] Num frames 7300... [2023-02-23 01:21:16,280][05631] Num frames 7400... [2023-02-23 01:21:16,406][05631] Num frames 7500... [2023-02-23 01:21:16,529][05631] Num frames 7600... [2023-02-23 01:21:16,645][05631] Num frames 7700... [2023-02-23 01:21:16,758][05631] Num frames 7800... [2023-02-23 01:21:16,870][05631] Num frames 7900... [2023-02-23 01:21:16,989][05631] Num frames 8000... [2023-02-23 01:21:17,101][05631] Num frames 8100... [2023-02-23 01:21:17,220][05631] Num frames 8200... [2023-02-23 01:21:17,335][05631] Num frames 8300... [2023-02-23 01:21:17,450][05631] Num frames 8400... [2023-02-23 01:21:17,570][05631] Num frames 8500... [2023-02-23 01:21:17,684][05631] Num frames 8600... [2023-02-23 01:21:17,795][05631] Num frames 8700... [2023-02-23 01:21:17,909][05631] Num frames 8800... [2023-02-23 01:21:18,023][05631] Num frames 8900... [2023-02-23 01:21:18,136][05631] Num frames 9000... [2023-02-23 01:21:18,258][05631] Num frames 9100... [2023-02-23 01:21:18,373][05631] Num frames 9200... [2023-02-23 01:21:18,491][05631] Num frames 9300... [2023-02-23 01:21:18,603][05631] Avg episode rewards: #0: 39.743, true rewards: #0: 15.577 [2023-02-23 01:21:18,606][05631] Avg episode reward: 39.743, avg true_objective: 15.577 [2023-02-23 01:21:18,675][05631] Num frames 9400... [2023-02-23 01:21:18,795][05631] Num frames 9500... [2023-02-23 01:21:18,919][05631] Num frames 9600... [2023-02-23 01:21:19,053][05631] Num frames 9700... [2023-02-23 01:21:19,172][05631] Num frames 9800... [2023-02-23 01:21:19,295][05631] Num frames 9900... [2023-02-23 01:21:19,410][05631] Num frames 10000... [2023-02-23 01:21:19,523][05631] Num frames 10100... [2023-02-23 01:21:19,645][05631] Num frames 10200... [2023-02-23 01:21:19,759][05631] Num frames 10300... [2023-02-23 01:21:19,869][05631] Num frames 10400... [2023-02-23 01:21:19,982][05631] Num frames 10500... [2023-02-23 01:21:20,110][05631] Num frames 10600... [2023-02-23 01:21:20,226][05631] Num frames 10700... [2023-02-23 01:21:20,345][05631] Num frames 10800... [2023-02-23 01:21:20,463][05631] Num frames 10900... [2023-02-23 01:21:20,585][05631] Num frames 11000... [2023-02-23 01:21:20,696][05631] Num frames 11100... [2023-02-23 01:21:20,809][05631] Num frames 11200... [2023-02-23 01:21:20,923][05631] Num frames 11300... [2023-02-23 01:21:21,038][05631] Num frames 11400... [2023-02-23 01:21:21,149][05631] Avg episode rewards: #0: 42.065, true rewards: #0: 16.351 [2023-02-23 01:21:21,151][05631] Avg episode reward: 42.065, avg true_objective: 16.351 [2023-02-23 01:21:21,217][05631] Num frames 11500... [2023-02-23 01:21:21,352][05631] Num frames 11600... [2023-02-23 01:21:21,470][05631] Num frames 11700... [2023-02-23 01:21:21,583][05631] Num frames 11800... [2023-02-23 01:21:21,699][05631] Num frames 11900... [2023-02-23 01:21:21,754][05631] Avg episode rewards: #0: 37.875, true rewards: #0: 14.875 [2023-02-23 01:21:21,756][05631] Avg episode reward: 37.875, avg true_objective: 14.875 [2023-02-23 01:21:21,927][05631] Num frames 12000... [2023-02-23 01:21:22,092][05631] Num frames 12100... [2023-02-23 01:21:22,256][05631] Num frames 12200... [2023-02-23 01:21:22,429][05631] Num frames 12300... [2023-02-23 01:21:22,595][05631] Num frames 12400... [2023-02-23 01:21:22,753][05631] Num frames 12500... [2023-02-23 01:21:22,911][05631] Num frames 12600... [2023-02-23 01:21:23,071][05631] Num frames 12700... [2023-02-23 01:21:23,251][05631] Num frames 12800... [2023-02-23 01:21:23,419][05631] Num frames 12900... [2023-02-23 01:21:23,587][05631] Num frames 13000... [2023-02-23 01:21:23,810][05631] Avg episode rewards: #0: 36.765, true rewards: #0: 14.543 [2023-02-23 01:21:23,813][05631] Avg episode reward: 36.765, avg true_objective: 14.543 [2023-02-23 01:21:23,840][05631] Num frames 13100... [2023-02-23 01:21:24,017][05631] Num frames 13200... [2023-02-23 01:21:24,189][05631] Num frames 13300... [2023-02-23 01:21:24,361][05631] Num frames 13400... [2023-02-23 01:21:24,539][05631] Num frames 13500... [2023-02-23 01:21:24,706][05631] Num frames 13600... [2023-02-23 01:21:24,873][05631] Num frames 13700... [2023-02-23 01:21:25,045][05631] Num frames 13800... [2023-02-23 01:21:25,218][05631] Num frames 13900... [2023-02-23 01:21:25,375][05631] Num frames 14000... [2023-02-23 01:21:25,497][05631] Num frames 14100... [2023-02-23 01:21:25,614][05631] Num frames 14200... [2023-02-23 01:21:25,736][05631] Num frames 14300... [2023-02-23 01:21:25,853][05631] Num frames 14400... [2023-02-23 01:21:25,984][05631] Avg episode rewards: #0: 36.465, true rewards: #0: 14.465 [2023-02-23 01:21:25,985][05631] Avg episode reward: 36.465, avg true_objective: 14.465 [2023-02-23 01:23:00,177][05631] Replay video saved to /content/train_dir/default_experiment/replay.mp4!