[2023-02-24 06:25:44,317][00368] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 06:25:44,320][00368] Rollout worker 0 uses device cpu [2023-02-24 06:25:44,321][00368] Rollout worker 1 uses device cpu [2023-02-24 06:25:44,323][00368] Rollout worker 2 uses device cpu [2023-02-24 06:25:44,324][00368] Rollout worker 3 uses device cpu [2023-02-24 06:25:44,326][00368] Rollout worker 4 uses device cpu [2023-02-24 06:25:44,329][00368] Rollout worker 5 uses device cpu [2023-02-24 06:25:44,332][00368] Rollout worker 6 uses device cpu [2023-02-24 06:25:44,334][00368] Rollout worker 7 uses device cpu [2023-02-24 06:25:44,528][00368] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:25:44,531][00368] InferenceWorker_p0-w0: min num requests: 2 [2023-02-24 06:25:44,561][00368] Starting all processes... [2023-02-24 06:25:44,562][00368] Starting process learner_proc0 [2023-02-24 06:25:44,618][00368] Starting all processes... [2023-02-24 06:25:44,628][00368] Starting process inference_proc0-0 [2023-02-24 06:25:44,628][00368] Starting process rollout_proc0 [2023-02-24 06:25:44,630][00368] Starting process rollout_proc1 [2023-02-24 06:25:44,630][00368] Starting process rollout_proc2 [2023-02-24 06:25:44,630][00368] Starting process rollout_proc3 [2023-02-24 06:25:44,631][00368] Starting process rollout_proc4 [2023-02-24 06:25:44,631][00368] Starting process rollout_proc5 [2023-02-24 06:25:44,631][00368] Starting process rollout_proc6 [2023-02-24 06:25:44,631][00368] Starting process rollout_proc7 [2023-02-24 06:25:57,847][11427] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:25:57,848][11427] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 06:25:58,806][11442] Worker 1 uses CPU cores [1] [2023-02-24 06:25:58,985][11441] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:25:58,985][11441] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 06:25:59,103][11447] Worker 6 uses CPU cores [0] [2023-02-24 06:25:59,186][11453] Worker 5 uses CPU cores [1] [2023-02-24 06:25:59,220][11443] Worker 0 uses CPU cores [0] [2023-02-24 06:25:59,308][11446] Worker 4 uses CPU cores [0] [2023-02-24 06:25:59,344][11445] Worker 2 uses CPU cores [0] [2023-02-24 06:25:59,348][11441] Num visible devices: 1 [2023-02-24 06:25:59,353][11427] Num visible devices: 1 [2023-02-24 06:25:59,373][11444] Worker 3 uses CPU cores [1] [2023-02-24 06:25:59,379][11448] Worker 7 uses CPU cores [1] [2023-02-24 06:25:59,384][11427] Starting seed is not provided [2023-02-24 06:25:59,385][11427] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:25:59,386][11427] Initializing actor-critic model on device cuda:0 [2023-02-24 06:25:59,387][11427] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:25:59,388][11427] RunningMeanStd input shape: (1,) [2023-02-24 06:25:59,417][11427] ConvEncoder: input_channels=3 [2023-02-24 06:25:59,853][11427] Conv encoder output size: 512 [2023-02-24 06:25:59,853][11427] Policy head output size: 512 [2023-02-24 06:25:59,914][11427] Created Actor Critic model with architecture: [2023-02-24 06:25:59,914][11427] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 06:26:04,519][00368] Heartbeat connected on Batcher_0 [2023-02-24 06:26:04,529][00368] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 06:26:04,543][00368] Heartbeat connected on RolloutWorker_w0 [2023-02-24 06:26:04,545][00368] Heartbeat connected on RolloutWorker_w1 [2023-02-24 06:26:04,548][00368] Heartbeat connected on RolloutWorker_w3 [2023-02-24 06:26:04,555][00368] Heartbeat connected on RolloutWorker_w2 [2023-02-24 06:26:04,558][00368] Heartbeat connected on RolloutWorker_w4 [2023-02-24 06:26:04,559][00368] Heartbeat connected on RolloutWorker_w5 [2023-02-24 06:26:04,560][00368] Heartbeat connected on RolloutWorker_w6 [2023-02-24 06:26:04,563][00368] Heartbeat connected on RolloutWorker_w7 [2023-02-24 06:26:09,138][11427] Using optimizer [2023-02-24 06:26:09,139][11427] No checkpoints found [2023-02-24 06:26:09,139][11427] Did not load from checkpoint, starting from scratch! [2023-02-24 06:26:09,140][11427] Initialized policy 0 weights for model version 0 [2023-02-24 06:26:09,144][11427] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:26:09,151][11427] LearnerWorker_p0 finished initialization! [2023-02-24 06:26:09,152][00368] Heartbeat connected on LearnerWorker_p0 [2023-02-24 06:26:09,256][11441] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:26:09,259][11441] RunningMeanStd input shape: (1,) [2023-02-24 06:26:09,274][11441] ConvEncoder: input_channels=3 [2023-02-24 06:26:09,374][11441] Conv encoder output size: 512 [2023-02-24 06:26:09,374][11441] Policy head output size: 512 [2023-02-24 06:26:09,810][00368] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:26:11,699][00368] Inference worker 0-0 is ready! [2023-02-24 06:26:11,701][00368] All inference workers are ready! Signal rollout workers to start! [2023-02-24 06:26:11,840][11446] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,846][11448] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,848][11453] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,856][11447] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,861][11442] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,860][11444] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,868][11445] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:11,897][11443] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:26:12,764][11446] Decorrelating experience for 0 frames... [2023-02-24 06:26:12,763][11447] Decorrelating experience for 0 frames... [2023-02-24 06:26:13,016][11442] Decorrelating experience for 0 frames... [2023-02-24 06:26:13,019][11453] Decorrelating experience for 0 frames... [2023-02-24 06:26:13,022][11448] Decorrelating experience for 0 frames... [2023-02-24 06:26:13,704][11446] Decorrelating experience for 32 frames... [2023-02-24 06:26:13,707][11443] Decorrelating experience for 0 frames... [2023-02-24 06:26:14,078][11442] Decorrelating experience for 32 frames... [2023-02-24 06:26:14,084][11453] Decorrelating experience for 32 frames... [2023-02-24 06:26:14,144][11444] Decorrelating experience for 0 frames... [2023-02-24 06:26:14,184][11447] Decorrelating experience for 32 frames... [2023-02-24 06:26:14,712][11445] Decorrelating experience for 0 frames... [2023-02-24 06:26:14,719][11443] Decorrelating experience for 32 frames... [2023-02-24 06:26:14,810][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:26:15,358][11448] Decorrelating experience for 32 frames... [2023-02-24 06:26:15,407][11444] Decorrelating experience for 32 frames... [2023-02-24 06:26:15,502][11446] Decorrelating experience for 64 frames... [2023-02-24 06:26:15,630][11453] Decorrelating experience for 64 frames... [2023-02-24 06:26:15,709][11443] Decorrelating experience for 64 frames... [2023-02-24 06:26:16,093][11442] Decorrelating experience for 64 frames... [2023-02-24 06:26:16,591][11448] Decorrelating experience for 64 frames... [2023-02-24 06:26:16,722][11453] Decorrelating experience for 96 frames... [2023-02-24 06:26:16,889][11445] Decorrelating experience for 32 frames... [2023-02-24 06:26:17,006][11447] Decorrelating experience for 64 frames... [2023-02-24 06:26:17,746][11442] Decorrelating experience for 96 frames... [2023-02-24 06:26:17,856][11443] Decorrelating experience for 96 frames... [2023-02-24 06:26:17,900][11448] Decorrelating experience for 96 frames... [2023-02-24 06:26:18,446][11446] Decorrelating experience for 96 frames... [2023-02-24 06:26:19,111][11444] Decorrelating experience for 64 frames... [2023-02-24 06:26:19,772][11447] Decorrelating experience for 96 frames... [2023-02-24 06:26:19,819][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:26:19,950][11444] Decorrelating experience for 96 frames... [2023-02-24 06:26:20,088][11445] Decorrelating experience for 64 frames... [2023-02-24 06:26:20,912][11445] Decorrelating experience for 96 frames... [2023-02-24 06:26:24,810][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 75.9. Samples: 1138. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:26:24,817][00368] Avg episode reward: [(0, '1.351')] [2023-02-24 06:26:26,262][11427] Signal inference workers to stop experience collection... [2023-02-24 06:26:26,283][11441] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 06:26:28,597][11427] Signal inference workers to resume experience collection... [2023-02-24 06:26:28,599][11441] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 06:26:29,810][00368] Fps is (10 sec: 410.0, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 153.4. Samples: 3068. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-24 06:26:29,818][00368] Avg episode reward: [(0, '2.498')] [2023-02-24 06:26:34,810][00368] Fps is (10 sec: 2867.2, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 223.8. Samples: 5594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-02-24 06:26:34,818][00368] Avg episode reward: [(0, '3.688')] [2023-02-24 06:26:39,168][11441] Updated weights for policy 0, policy_version 10 (0.0014) [2023-02-24 06:26:39,816][00368] Fps is (10 sec: 3684.0, 60 sec: 1365.0, 300 sec: 1365.0). Total num frames: 40960. Throughput: 0: 352.5. Samples: 10578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:26:39,822][00368] Avg episode reward: [(0, '4.116')] [2023-02-24 06:26:44,810][00368] Fps is (10 sec: 2457.6, 60 sec: 1521.4, 300 sec: 1521.4). Total num frames: 53248. Throughput: 0: 407.9. Samples: 14278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:26:44,816][00368] Avg episode reward: [(0, '4.578')] [2023-02-24 06:26:49,810][00368] Fps is (10 sec: 2869.0, 60 sec: 1740.8, 300 sec: 1740.8). Total num frames: 69632. Throughput: 0: 409.2. Samples: 16368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:26:49,813][00368] Avg episode reward: [(0, '4.496')] [2023-02-24 06:26:52,412][11441] Updated weights for policy 0, policy_version 20 (0.0020) [2023-02-24 06:26:54,810][00368] Fps is (10 sec: 3686.4, 60 sec: 2002.5, 300 sec: 2002.5). Total num frames: 90112. Throughput: 0: 496.1. Samples: 22326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:26:54,816][00368] Avg episode reward: [(0, '4.111')] [2023-02-24 06:26:59,810][00368] Fps is (10 sec: 3686.4, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 106496. Throughput: 0: 607.8. Samples: 27350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:26:59,813][00368] Avg episode reward: [(0, '4.047')] [2023-02-24 06:26:59,826][11427] Saving new best policy, reward=4.047! [2023-02-24 06:27:04,811][00368] Fps is (10 sec: 2866.9, 60 sec: 2159.7, 300 sec: 2159.7). Total num frames: 118784. Throughput: 0: 646.6. Samples: 29090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:27:04,819][00368] Avg episode reward: [(0, '4.175')] [2023-02-24 06:27:04,825][11427] Saving new best policy, reward=4.175! [2023-02-24 06:27:06,269][11441] Updated weights for policy 0, policy_version 30 (0.0020) [2023-02-24 06:27:09,810][00368] Fps is (10 sec: 2457.6, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 131072. Throughput: 0: 709.6. Samples: 33068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:27:09,813][00368] Avg episode reward: [(0, '4.337')] [2023-02-24 06:27:09,821][11427] Saving new best policy, reward=4.337! [2023-02-24 06:27:14,810][00368] Fps is (10 sec: 3277.2, 60 sec: 2525.9, 300 sec: 2331.6). Total num frames: 151552. Throughput: 0: 799.4. Samples: 39040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:27:14,817][00368] Avg episode reward: [(0, '4.415')] [2023-02-24 06:27:14,826][11427] Saving new best policy, reward=4.415! [2023-02-24 06:27:17,137][11441] Updated weights for policy 0, policy_version 40 (0.0028) [2023-02-24 06:27:19,810][00368] Fps is (10 sec: 3686.4, 60 sec: 2799.4, 300 sec: 2399.1). Total num frames: 167936. Throughput: 0: 805.5. Samples: 41840. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:27:19,813][00368] Avg episode reward: [(0, '4.411')] [2023-02-24 06:27:24,814][00368] Fps is (10 sec: 2865.9, 60 sec: 3003.5, 300 sec: 2402.9). Total num frames: 180224. Throughput: 0: 772.7. Samples: 45346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:27:24,818][00368] Avg episode reward: [(0, '4.375')] [2023-02-24 06:27:29,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2406.4). Total num frames: 192512. Throughput: 0: 783.6. Samples: 49542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:27:29,813][00368] Avg episode reward: [(0, '4.412')] [2023-02-24 06:27:31,870][11441] Updated weights for policy 0, policy_version 50 (0.0021) [2023-02-24 06:27:34,810][00368] Fps is (10 sec: 3278.2, 60 sec: 3072.0, 300 sec: 2505.8). Total num frames: 212992. Throughput: 0: 804.0. Samples: 52550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:27:34,817][00368] Avg episode reward: [(0, '4.441')] [2023-02-24 06:27:34,822][11427] Saving new best policy, reward=4.441! [2023-02-24 06:27:39,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3208.9, 300 sec: 2594.1). Total num frames: 233472. Throughput: 0: 802.0. Samples: 58416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:27:39,814][00368] Avg episode reward: [(0, '4.396')] [2023-02-24 06:27:39,836][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000057_233472.pth... [2023-02-24 06:27:44,384][11441] Updated weights for policy 0, policy_version 60 (0.0030) [2023-02-24 06:27:44,812][00368] Fps is (10 sec: 3276.0, 60 sec: 3208.4, 300 sec: 2586.9). Total num frames: 245760. Throughput: 0: 775.6. Samples: 62252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:27:44,815][00368] Avg episode reward: [(0, '4.425')] [2023-02-24 06:27:49,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2580.5). Total num frames: 258048. Throughput: 0: 779.6. Samples: 64170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:27:49,813][00368] Avg episode reward: [(0, '4.503')] [2023-02-24 06:27:49,832][11427] Saving new best policy, reward=4.503! [2023-02-24 06:27:54,810][00368] Fps is (10 sec: 3277.6, 60 sec: 3140.3, 300 sec: 2652.7). Total num frames: 278528. Throughput: 0: 807.2. Samples: 69394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:27:54,815][00368] Avg episode reward: [(0, '4.555')] [2023-02-24 06:27:54,822][11427] Saving new best policy, reward=4.555! [2023-02-24 06:27:56,571][11441] Updated weights for policy 0, policy_version 70 (0.0026) [2023-02-24 06:27:59,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2718.3). Total num frames: 299008. Throughput: 0: 809.0. Samples: 75444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:27:59,813][00368] Avg episode reward: [(0, '4.578')] [2023-02-24 06:27:59,822][11427] Saving new best policy, reward=4.578! [2023-02-24 06:28:04,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 2706.9). Total num frames: 311296. Throughput: 0: 790.9. Samples: 77432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:28:04,816][00368] Avg episode reward: [(0, '4.552')] [2023-02-24 06:28:09,810][00368] Fps is (10 sec: 2457.5, 60 sec: 3208.5, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 800.4. Samples: 81360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:28:09,814][00368] Avg episode reward: [(0, '4.461')] [2023-02-24 06:28:10,343][11441] Updated weights for policy 0, policy_version 80 (0.0064) [2023-02-24 06:28:14,810][00368] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 2752.5). Total num frames: 344064. Throughput: 0: 826.5. Samples: 86736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:28:14,812][00368] Avg episode reward: [(0, '4.340')] [2023-02-24 06:28:19,812][00368] Fps is (10 sec: 3685.6, 60 sec: 3208.4, 300 sec: 2772.6). Total num frames: 360448. Throughput: 0: 827.4. Samples: 89786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:28:19,817][00368] Avg episode reward: [(0, '4.305')] [2023-02-24 06:28:22,624][11441] Updated weights for policy 0, policy_version 90 (0.0016) [2023-02-24 06:28:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.7, 300 sec: 2761.0). Total num frames: 372736. Throughput: 0: 774.4. Samples: 93266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:28:24,813][00368] Avg episode reward: [(0, '4.308')] [2023-02-24 06:28:29,810][00368] Fps is (10 sec: 2048.5, 60 sec: 3140.3, 300 sec: 2720.9). Total num frames: 380928. Throughput: 0: 756.9. Samples: 96312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:28:29,814][00368] Avg episode reward: [(0, '4.509')] [2023-02-24 06:28:34,810][00368] Fps is (10 sec: 2048.1, 60 sec: 3003.7, 300 sec: 2711.8). Total num frames: 393216. Throughput: 0: 745.9. Samples: 97736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:28:34,819][00368] Avg episode reward: [(0, '4.816')] [2023-02-24 06:28:34,822][11427] Saving new best policy, reward=4.816! [2023-02-24 06:28:39,162][11441] Updated weights for policy 0, policy_version 100 (0.0020) [2023-02-24 06:28:39,810][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2730.7). Total num frames: 409600. Throughput: 0: 737.1. Samples: 102564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:28:39,813][00368] Avg episode reward: [(0, '5.088')] [2023-02-24 06:28:39,826][11427] Saving new best policy, reward=5.088! [2023-02-24 06:28:44,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3072.1, 300 sec: 2774.7). Total num frames: 430080. Throughput: 0: 733.4. Samples: 108446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:28:44,813][00368] Avg episode reward: [(0, '4.866')] [2023-02-24 06:28:49,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2764.8). Total num frames: 442368. Throughput: 0: 739.6. Samples: 110716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:28:49,813][00368] Avg episode reward: [(0, '4.600')] [2023-02-24 06:28:51,592][11441] Updated weights for policy 0, policy_version 110 (0.0025) [2023-02-24 06:28:54,811][00368] Fps is (10 sec: 2867.0, 60 sec: 3003.7, 300 sec: 2780.3). Total num frames: 458752. Throughput: 0: 737.5. Samples: 114550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:28:54,818][00368] Avg episode reward: [(0, '4.546')] [2023-02-24 06:28:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2770.8). Total num frames: 471040. Throughput: 0: 724.8. Samples: 119350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:28:59,822][00368] Avg episode reward: [(0, '4.620')] [2023-02-24 06:29:03,821][11441] Updated weights for policy 0, policy_version 120 (0.0018) [2023-02-24 06:29:04,810][00368] Fps is (10 sec: 3686.7, 60 sec: 3072.0, 300 sec: 2832.1). Total num frames: 495616. Throughput: 0: 723.6. Samples: 122348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:29:04,813][00368] Avg episode reward: [(0, '4.583')] [2023-02-24 06:29:09,810][00368] Fps is (10 sec: 4095.9, 60 sec: 3140.3, 300 sec: 2844.4). Total num frames: 512000. Throughput: 0: 768.4. Samples: 127842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:29:09,820][00368] Avg episode reward: [(0, '4.614')] [2023-02-24 06:29:14,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2834.0). Total num frames: 524288. Throughput: 0: 789.5. Samples: 131838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:29:14,815][00368] Avg episode reward: [(0, '4.647')] [2023-02-24 06:29:17,342][11441] Updated weights for policy 0, policy_version 130 (0.0032) [2023-02-24 06:29:19,810][00368] Fps is (10 sec: 2867.3, 60 sec: 3003.8, 300 sec: 2845.6). Total num frames: 540672. Throughput: 0: 801.2. Samples: 133792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:29:19,813][00368] Avg episode reward: [(0, '4.697')] [2023-02-24 06:29:24,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2877.7). Total num frames: 561152. Throughput: 0: 832.4. Samples: 140024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:29:24,818][00368] Avg episode reward: [(0, '4.573')] [2023-02-24 06:29:27,499][11441] Updated weights for policy 0, policy_version 140 (0.0013) [2023-02-24 06:29:29,813][00368] Fps is (10 sec: 3685.2, 60 sec: 3276.6, 300 sec: 2887.6). Total num frames: 577536. Throughput: 0: 823.6. Samples: 145510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:29:29,816][00368] Avg episode reward: [(0, '4.461')] [2023-02-24 06:29:34,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2877.2). Total num frames: 589824. Throughput: 0: 814.0. Samples: 147348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:29:34,817][00368] Avg episode reward: [(0, '4.541')] [2023-02-24 06:29:39,810][00368] Fps is (10 sec: 2868.2, 60 sec: 3276.8, 300 sec: 2886.7). Total num frames: 606208. Throughput: 0: 815.9. Samples: 151266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:29:39,813][00368] Avg episode reward: [(0, '4.586')] [2023-02-24 06:29:39,827][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000148_606208.pth... [2023-02-24 06:29:41,570][11441] Updated weights for policy 0, policy_version 150 (0.0020) [2023-02-24 06:29:44,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 2914.8). Total num frames: 626688. Throughput: 0: 842.0. Samples: 157242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:29:44,813][00368] Avg episode reward: [(0, '4.662')] [2023-02-24 06:29:49,812][00368] Fps is (10 sec: 3685.5, 60 sec: 3344.9, 300 sec: 2923.0). Total num frames: 643072. Throughput: 0: 841.6. Samples: 160224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:29:49,815][00368] Avg episode reward: [(0, '4.670')] [2023-02-24 06:29:53,243][11441] Updated weights for policy 0, policy_version 160 (0.0014) [2023-02-24 06:29:54,812][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2912.7). Total num frames: 655360. Throughput: 0: 814.8. Samples: 164508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:29:54,815][00368] Avg episode reward: [(0, '4.643')] [2023-02-24 06:29:59,810][00368] Fps is (10 sec: 2867.8, 60 sec: 3345.1, 300 sec: 2920.6). Total num frames: 671744. Throughput: 0: 814.7. Samples: 168498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:29:59,816][00368] Avg episode reward: [(0, '4.536')] [2023-02-24 06:30:04,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2928.2). Total num frames: 688128. Throughput: 0: 836.7. Samples: 171444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:30:04,814][00368] Avg episode reward: [(0, '4.384')] [2023-02-24 06:30:06,014][11441] Updated weights for policy 0, policy_version 170 (0.0013) [2023-02-24 06:30:09,812][00368] Fps is (10 sec: 3685.7, 60 sec: 3276.7, 300 sec: 2952.5). Total num frames: 708608. Throughput: 0: 832.9. Samples: 177506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:30:09,818][00368] Avg episode reward: [(0, '4.426')] [2023-02-24 06:30:14,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2959.2). Total num frames: 724992. Throughput: 0: 807.9. Samples: 181864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:30:14,818][00368] Avg episode reward: [(0, '4.457')] [2023-02-24 06:30:19,225][11441] Updated weights for policy 0, policy_version 180 (0.0023) [2023-02-24 06:30:19,810][00368] Fps is (10 sec: 2867.9, 60 sec: 3276.8, 300 sec: 2949.1). Total num frames: 737280. Throughput: 0: 811.6. Samples: 183870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:30:19,815][00368] Avg episode reward: [(0, '4.544')] [2023-02-24 06:30:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2955.5). Total num frames: 753664. Throughput: 0: 837.3. Samples: 188944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:30:24,819][00368] Avg episode reward: [(0, '4.723')] [2023-02-24 06:30:29,571][11441] Updated weights for policy 0, policy_version 190 (0.0019) [2023-02-24 06:30:29,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.3, 300 sec: 2993.2). Total num frames: 778240. Throughput: 0: 844.8. Samples: 195256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:30:29,813][00368] Avg episode reward: [(0, '4.865')] [2023-02-24 06:30:34,818][00368] Fps is (10 sec: 3683.6, 60 sec: 3344.7, 300 sec: 2983.0). Total num frames: 790528. Throughput: 0: 830.9. Samples: 197618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:30:34,822][00368] Avg episode reward: [(0, '4.796')] [2023-02-24 06:30:39,811][00368] Fps is (10 sec: 2457.5, 60 sec: 3276.8, 300 sec: 2973.4). Total num frames: 802816. Throughput: 0: 818.3. Samples: 201334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:30:39,813][00368] Avg episode reward: [(0, '4.763')] [2023-02-24 06:30:43,502][11441] Updated weights for policy 0, policy_version 200 (0.0030) [2023-02-24 06:30:44,819][00368] Fps is (10 sec: 3276.3, 60 sec: 3276.3, 300 sec: 2993.7). Total num frames: 823296. Throughput: 0: 845.0. Samples: 206530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:30:44,822][00368] Avg episode reward: [(0, '4.782')] [2023-02-24 06:30:49,810][00368] Fps is (10 sec: 4096.3, 60 sec: 3345.2, 300 sec: 3013.5). Total num frames: 843776. Throughput: 0: 850.4. Samples: 209712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:30:49,813][00368] Avg episode reward: [(0, '4.638')] [2023-02-24 06:30:54,486][11441] Updated weights for policy 0, policy_version 210 (0.0022) [2023-02-24 06:30:54,810][00368] Fps is (10 sec: 3689.7, 60 sec: 3413.3, 300 sec: 3018.1). Total num frames: 860160. Throughput: 0: 836.4. Samples: 215140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:30:54,816][00368] Avg episode reward: [(0, '4.620')] [2023-02-24 06:30:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3008.4). Total num frames: 872448. Throughput: 0: 827.4. Samples: 219098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:30:59,816][00368] Avg episode reward: [(0, '4.742')] [2023-02-24 06:31:04,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3013.0). Total num frames: 888832. Throughput: 0: 826.0. Samples: 221042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:31:04,813][00368] Avg episode reward: [(0, '4.836')] [2023-02-24 06:31:07,699][11441] Updated weights for policy 0, policy_version 220 (0.0036) [2023-02-24 06:31:09,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3082.4). Total num frames: 909312. Throughput: 0: 843.7. Samples: 226912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:31:09,817][00368] Avg episode reward: [(0, '4.794')] [2023-02-24 06:31:14,814][00368] Fps is (10 sec: 3684.9, 60 sec: 3344.8, 300 sec: 3138.0). Total num frames: 925696. Throughput: 0: 823.5. Samples: 232316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:31:14,818][00368] Avg episode reward: [(0, '4.887')] [2023-02-24 06:31:19,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3179.6). Total num frames: 937984. Throughput: 0: 814.0. Samples: 234242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:31:19,813][00368] Avg episode reward: [(0, '4.916')] [2023-02-24 06:31:20,427][11441] Updated weights for policy 0, policy_version 230 (0.0016) [2023-02-24 06:31:24,810][00368] Fps is (10 sec: 2868.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 954368. Throughput: 0: 819.6. Samples: 238214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:31:24,817][00368] Avg episode reward: [(0, '4.768')] [2023-02-24 06:31:29,812][00368] Fps is (10 sec: 3276.0, 60 sec: 3208.4, 300 sec: 3193.5). Total num frames: 970752. Throughput: 0: 836.6. Samples: 244172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:31:29,820][00368] Avg episode reward: [(0, '4.742')] [2023-02-24 06:31:31,793][11441] Updated weights for policy 0, policy_version 240 (0.0015) [2023-02-24 06:31:34,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.5, 300 sec: 3221.3). Total num frames: 991232. Throughput: 0: 834.5. Samples: 247264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:31:34,813][00368] Avg episode reward: [(0, '4.714')] [2023-02-24 06:31:39,810][00368] Fps is (10 sec: 3277.6, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1003520. Throughput: 0: 809.5. Samples: 251568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:31:39,814][00368] Avg episode reward: [(0, '4.793')] [2023-02-24 06:31:39,837][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000245_1003520.pth... [2023-02-24 06:31:40,040][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000057_233472.pth [2023-02-24 06:31:44,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3209.0, 300 sec: 3207.4). Total num frames: 1015808. Throughput: 0: 801.5. Samples: 255166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:31:44,815][00368] Avg episode reward: [(0, '4.898')] [2023-02-24 06:31:46,996][11441] Updated weights for policy 0, policy_version 250 (0.0031) [2023-02-24 06:31:49,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1028096. Throughput: 0: 798.1. Samples: 256956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:31:49,819][00368] Avg episode reward: [(0, '4.835')] [2023-02-24 06:31:54,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1044480. Throughput: 0: 758.0. Samples: 261020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:31:54,818][00368] Avg episode reward: [(0, '4.899')] [2023-02-24 06:31:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1056768. Throughput: 0: 733.1. Samples: 265302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:31:59,813][00368] Avg episode reward: [(0, '4.977')] [2023-02-24 06:32:02,103][11441] Updated weights for policy 0, policy_version 260 (0.0030) [2023-02-24 06:32:04,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3179.6). Total num frames: 1069056. Throughput: 0: 734.0. Samples: 267274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:32:04,815][00368] Avg episode reward: [(0, '5.134')] [2023-02-24 06:32:04,817][11427] Saving new best policy, reward=5.134! [2023-02-24 06:32:09,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3179.6). Total num frames: 1089536. Throughput: 0: 753.6. Samples: 272124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:32:09,813][00368] Avg episode reward: [(0, '5.107')] [2023-02-24 06:32:13,644][11441] Updated weights for policy 0, policy_version 270 (0.0032) [2023-02-24 06:32:14,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3072.2, 300 sec: 3193.5). Total num frames: 1110016. Throughput: 0: 758.1. Samples: 278284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:32:14,813][00368] Avg episode reward: [(0, '5.104')] [2023-02-24 06:32:19,815][00368] Fps is (10 sec: 3684.8, 60 sec: 3140.0, 300 sec: 3207.4). Total num frames: 1126400. Throughput: 0: 746.3. Samples: 280852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:32:19,817][00368] Avg episode reward: [(0, '4.941')] [2023-02-24 06:32:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 1138688. Throughput: 0: 738.0. Samples: 284780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:32:24,813][00368] Avg episode reward: [(0, '4.904')] [2023-02-24 06:32:27,298][11441] Updated weights for policy 0, policy_version 280 (0.0014) [2023-02-24 06:32:29,810][00368] Fps is (10 sec: 2868.5, 60 sec: 3072.1, 300 sec: 3193.5). Total num frames: 1155072. Throughput: 0: 766.4. Samples: 289654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:32:29,813][00368] Avg episode reward: [(0, '4.859')] [2023-02-24 06:32:34,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 1175552. Throughput: 0: 794.2. Samples: 292696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:32:34,818][00368] Avg episode reward: [(0, '5.064')] [2023-02-24 06:32:37,737][11441] Updated weights for policy 0, policy_version 290 (0.0027) [2023-02-24 06:32:39,817][00368] Fps is (10 sec: 3684.0, 60 sec: 3139.9, 300 sec: 3207.3). Total num frames: 1191936. Throughput: 0: 825.9. Samples: 298190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:32:39,819][00368] Avg episode reward: [(0, '5.337')] [2023-02-24 06:32:39,837][11427] Saving new best policy, reward=5.337! [2023-02-24 06:32:44,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1204224. Throughput: 0: 813.8. Samples: 301924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:32:44,815][00368] Avg episode reward: [(0, '5.455')] [2023-02-24 06:32:44,821][11427] Saving new best policy, reward=5.455! [2023-02-24 06:32:49,810][00368] Fps is (10 sec: 2869.1, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1220608. Throughput: 0: 811.5. Samples: 303790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:32:49,816][00368] Avg episode reward: [(0, '5.434')] [2023-02-24 06:32:51,964][11441] Updated weights for policy 0, policy_version 300 (0.0017) [2023-02-24 06:32:54,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1236992. Throughput: 0: 835.4. Samples: 309716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:32:54,813][00368] Avg episode reward: [(0, '5.583')] [2023-02-24 06:32:54,819][11427] Saving new best policy, reward=5.583! [2023-02-24 06:32:59,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1257472. Throughput: 0: 823.6. Samples: 315346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:32:59,815][00368] Avg episode reward: [(0, '5.568')] [2023-02-24 06:33:04,040][11441] Updated weights for policy 0, policy_version 310 (0.0022) [2023-02-24 06:33:04,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 1269760. Throughput: 0: 808.5. Samples: 317232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:33:04,815][00368] Avg episode reward: [(0, '5.629')] [2023-02-24 06:33:04,819][11427] Saving new best policy, reward=5.629! [2023-02-24 06:33:09,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 1282048. Throughput: 0: 809.3. Samples: 321200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:33:09,813][00368] Avg episode reward: [(0, '5.566')] [2023-02-24 06:33:14,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1302528. Throughput: 0: 831.6. Samples: 327074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:33:14,817][00368] Avg episode reward: [(0, '5.574')] [2023-02-24 06:33:15,893][11441] Updated weights for policy 0, policy_version 320 (0.0014) [2023-02-24 06:33:19,810][00368] Fps is (10 sec: 4095.9, 60 sec: 3277.0, 300 sec: 3221.3). Total num frames: 1323008. Throughput: 0: 832.2. Samples: 330146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:33:19,813][00368] Avg episode reward: [(0, '5.667')] [2023-02-24 06:33:19,831][11427] Saving new best policy, reward=5.667! [2023-02-24 06:33:24,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1335296. Throughput: 0: 808.9. Samples: 334586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:33:24,817][00368] Avg episode reward: [(0, '5.799')] [2023-02-24 06:33:24,821][11427] Saving new best policy, reward=5.799! [2023-02-24 06:33:29,671][11441] Updated weights for policy 0, policy_version 330 (0.0024) [2023-02-24 06:33:29,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1351680. Throughput: 0: 808.9. Samples: 338326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:33:29,817][00368] Avg episode reward: [(0, '5.972')] [2023-02-24 06:33:29,833][11427] Saving new best policy, reward=5.972! [2023-02-24 06:33:34,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1368064. Throughput: 0: 827.2. Samples: 341014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:33:34,812][00368] Avg episode reward: [(0, '5.830')] [2023-02-24 06:33:39,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3277.1, 300 sec: 3249.0). Total num frames: 1388544. Throughput: 0: 833.2. Samples: 347212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:33:39,817][00368] Avg episode reward: [(0, '5.938')] [2023-02-24 06:33:39,833][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000339_1388544.pth... [2023-02-24 06:33:39,991][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000148_606208.pth [2023-02-24 06:33:40,141][11441] Updated weights for policy 0, policy_version 340 (0.0013) [2023-02-24 06:33:44,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1404928. Throughput: 0: 808.8. Samples: 351740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:33:44,814][00368] Avg episode reward: [(0, '6.054')] [2023-02-24 06:33:44,820][11427] Saving new best policy, reward=6.054! [2023-02-24 06:33:49,811][00368] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1417216. Throughput: 0: 809.5. Samples: 353658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:33:49,820][00368] Avg episode reward: [(0, '5.953')] [2023-02-24 06:33:54,406][11441] Updated weights for policy 0, policy_version 350 (0.0023) [2023-02-24 06:33:54,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1433600. Throughput: 0: 826.1. Samples: 358376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:33:54,813][00368] Avg episode reward: [(0, '5.699')] [2023-02-24 06:33:59,810][00368] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1454080. Throughput: 0: 838.4. Samples: 364800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:33:59,818][00368] Avg episode reward: [(0, '5.788')] [2023-02-24 06:34:04,811][00368] Fps is (10 sec: 3686.0, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 1470464. Throughput: 0: 828.2. Samples: 367416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:34:04,814][00368] Avg episode reward: [(0, '6.170')] [2023-02-24 06:34:04,822][11427] Saving new best policy, reward=6.170! [2023-02-24 06:34:05,469][11441] Updated weights for policy 0, policy_version 360 (0.0017) [2023-02-24 06:34:09,812][00368] Fps is (10 sec: 2866.7, 60 sec: 3345.0, 300 sec: 3249.0). Total num frames: 1482752. Throughput: 0: 817.7. Samples: 371386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:34:09,815][00368] Avg episode reward: [(0, '5.904')] [2023-02-24 06:34:14,810][00368] Fps is (10 sec: 2867.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1499136. Throughput: 0: 840.4. Samples: 376146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:34:14,812][00368] Avg episode reward: [(0, '5.928')] [2023-02-24 06:34:17,991][11441] Updated weights for policy 0, policy_version 370 (0.0030) [2023-02-24 06:34:19,810][00368] Fps is (10 sec: 3687.1, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1519616. Throughput: 0: 850.1. Samples: 379268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:34:19,818][00368] Avg episode reward: [(0, '6.084')] [2023-02-24 06:34:24,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3263.0). Total num frames: 1540096. Throughput: 0: 842.7. Samples: 385132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:34:24,813][00368] Avg episode reward: [(0, '6.464')] [2023-02-24 06:34:24,816][11427] Saving new best policy, reward=6.464! [2023-02-24 06:34:29,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1552384. Throughput: 0: 830.3. Samples: 389102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:34:29,815][00368] Avg episode reward: [(0, '6.539')] [2023-02-24 06:34:29,828][11427] Saving new best policy, reward=6.539! [2023-02-24 06:34:30,842][11441] Updated weights for policy 0, policy_version 380 (0.0021) [2023-02-24 06:34:34,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1564672. Throughput: 0: 828.9. Samples: 390958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:34:34,819][00368] Avg episode reward: [(0, '6.989')] [2023-02-24 06:34:34,905][11427] Saving new best policy, reward=6.989! [2023-02-24 06:34:39,815][00368] Fps is (10 sec: 3684.7, 60 sec: 3344.8, 300 sec: 3262.9). Total num frames: 1589248. Throughput: 0: 851.8. Samples: 396710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:34:39,818][00368] Avg episode reward: [(0, '7.006')] [2023-02-24 06:34:39,836][11427] Saving new best policy, reward=7.006! [2023-02-24 06:34:42,096][11441] Updated weights for policy 0, policy_version 390 (0.0017) [2023-02-24 06:34:44,810][00368] Fps is (10 sec: 4095.9, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 1605632. Throughput: 0: 834.9. Samples: 402370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:34:44,818][00368] Avg episode reward: [(0, '7.134')] [2023-02-24 06:34:44,826][11427] Saving new best policy, reward=7.134! [2023-02-24 06:34:49,810][00368] Fps is (10 sec: 2868.5, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1617920. Throughput: 0: 818.5. Samples: 404248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:34:49,814][00368] Avg episode reward: [(0, '7.438')] [2023-02-24 06:34:49,827][11427] Saving new best policy, reward=7.438! [2023-02-24 06:34:54,810][00368] Fps is (10 sec: 2457.7, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1630208. Throughput: 0: 813.7. Samples: 408002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:34:54,812][00368] Avg episode reward: [(0, '7.981')] [2023-02-24 06:34:54,818][11427] Saving new best policy, reward=7.981! [2023-02-24 06:34:56,453][11441] Updated weights for policy 0, policy_version 400 (0.0042) [2023-02-24 06:34:59,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 1650688. Throughput: 0: 832.2. Samples: 413594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:34:59,812][00368] Avg episode reward: [(0, '8.347')] [2023-02-24 06:34:59,829][11427] Saving new best policy, reward=8.347! [2023-02-24 06:35:04,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 1671168. Throughput: 0: 831.0. Samples: 416664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:35:04,818][00368] Avg episode reward: [(0, '8.061')] [2023-02-24 06:35:07,028][11441] Updated weights for policy 0, policy_version 410 (0.0017) [2023-02-24 06:35:09,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.2, 300 sec: 3249.0). Total num frames: 1683456. Throughput: 0: 802.4. Samples: 421238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:35:09,813][00368] Avg episode reward: [(0, '8.062')] [2023-02-24 06:35:14,810][00368] Fps is (10 sec: 2048.0, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1691648. Throughput: 0: 780.3. Samples: 424216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:35:14,812][00368] Avg episode reward: [(0, '8.500')] [2023-02-24 06:35:14,814][11427] Saving new best policy, reward=8.500! [2023-02-24 06:35:19,810][00368] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3221.3). Total num frames: 1703936. Throughput: 0: 772.1. Samples: 425702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:35:19,820][00368] Avg episode reward: [(0, '8.289')] [2023-02-24 06:35:24,694][11441] Updated weights for policy 0, policy_version 420 (0.0065) [2023-02-24 06:35:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 1720320. Throughput: 0: 731.5. Samples: 429624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:35:24,813][00368] Avg episode reward: [(0, '8.245')] [2023-02-24 06:35:29,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1740800. Throughput: 0: 743.6. Samples: 435834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:35:29,814][00368] Avg episode reward: [(0, '7.662')] [2023-02-24 06:35:34,812][00368] Fps is (10 sec: 3685.5, 60 sec: 3208.4, 300 sec: 3235.1). Total num frames: 1757184. Throughput: 0: 766.3. Samples: 438732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:35:34,819][00368] Avg episode reward: [(0, '7.891')] [2023-02-24 06:35:36,053][11441] Updated weights for policy 0, policy_version 430 (0.0022) [2023-02-24 06:35:39,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3004.0, 300 sec: 3207.5). Total num frames: 1769472. Throughput: 0: 768.8. Samples: 442600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:35:39,813][00368] Avg episode reward: [(0, '8.324')] [2023-02-24 06:35:39,839][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth... [2023-02-24 06:35:40,018][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000245_1003520.pth [2023-02-24 06:35:44,810][00368] Fps is (10 sec: 2867.9, 60 sec: 3003.8, 300 sec: 3193.5). Total num frames: 1785856. Throughput: 0: 746.0. Samples: 447166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:35:44,812][00368] Avg episode reward: [(0, '8.736')] [2023-02-24 06:35:44,816][11427] Saving new best policy, reward=8.736! [2023-02-24 06:35:48,689][11441] Updated weights for policy 0, policy_version 440 (0.0023) [2023-02-24 06:35:49,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1806336. Throughput: 0: 747.2. Samples: 450290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:35:49,813][00368] Avg episode reward: [(0, '8.961')] [2023-02-24 06:35:49,823][11427] Saving new best policy, reward=8.961! [2023-02-24 06:35:54,810][00368] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1822720. Throughput: 0: 777.3. Samples: 456218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:35:54,814][00368] Avg episode reward: [(0, '9.150')] [2023-02-24 06:35:54,821][11427] Saving new best policy, reward=9.150! [2023-02-24 06:35:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 1835008. Throughput: 0: 794.6. Samples: 459972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:35:59,812][00368] Avg episode reward: [(0, '9.563')] [2023-02-24 06:35:59,830][11427] Saving new best policy, reward=9.563! [2023-02-24 06:36:01,940][11441] Updated weights for policy 0, policy_version 450 (0.0018) [2023-02-24 06:36:04,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 1851392. Throughput: 0: 804.3. Samples: 461896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:04,816][00368] Avg episode reward: [(0, '9.225')] [2023-02-24 06:36:09,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1871872. Throughput: 0: 841.4. Samples: 467486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:09,812][00368] Avg episode reward: [(0, '9.911')] [2023-02-24 06:36:09,828][11427] Saving new best policy, reward=9.911! [2023-02-24 06:36:12,946][11441] Updated weights for policy 0, policy_version 460 (0.0024) [2023-02-24 06:36:14,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1888256. Throughput: 0: 831.7. Samples: 473260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:36:14,814][00368] Avg episode reward: [(0, '10.039')] [2023-02-24 06:36:14,824][11427] Saving new best policy, reward=10.039! [2023-02-24 06:36:19,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1900544. Throughput: 0: 809.7. Samples: 475168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:19,814][00368] Avg episode reward: [(0, '9.864')] [2023-02-24 06:36:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 1916928. Throughput: 0: 811.3. Samples: 479110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:36:24,817][00368] Avg episode reward: [(0, '9.767')] [2023-02-24 06:36:26,940][11441] Updated weights for policy 0, policy_version 470 (0.0021) [2023-02-24 06:36:29,810][00368] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 1933312. Throughput: 0: 833.4. Samples: 484670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:36:29,814][00368] Avg episode reward: [(0, '9.209')] [2023-02-24 06:36:34,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3221.3). Total num frames: 1953792. Throughput: 0: 829.2. Samples: 487602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:34,813][00368] Avg episode reward: [(0, '9.528')] [2023-02-24 06:36:37,667][11441] Updated weights for policy 0, policy_version 480 (0.0012) [2023-02-24 06:36:39,812][00368] Fps is (10 sec: 3685.8, 60 sec: 3345.0, 300 sec: 3235.1). Total num frames: 1970176. Throughput: 0: 809.7. Samples: 492654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:39,817][00368] Avg episode reward: [(0, '9.328')] [2023-02-24 06:36:44,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1982464. Throughput: 0: 812.5. Samples: 496536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:36:44,816][00368] Avg episode reward: [(0, '9.837')] [2023-02-24 06:36:49,810][00368] Fps is (10 sec: 2867.7, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1998848. Throughput: 0: 824.0. Samples: 498974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:36:49,817][00368] Avg episode reward: [(0, '9.257')] [2023-02-24 06:36:50,994][11441] Updated weights for policy 0, policy_version 490 (0.0018) [2023-02-24 06:36:54,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2019328. Throughput: 0: 835.7. Samples: 505094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:36:54,812][00368] Avg episode reward: [(0, '9.464')] [2023-02-24 06:36:59,812][00368] Fps is (10 sec: 3685.8, 60 sec: 3345.0, 300 sec: 3276.8). Total num frames: 2035712. Throughput: 0: 819.6. Samples: 510142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:36:59,818][00368] Avg episode reward: [(0, '8.687')] [2023-02-24 06:37:03,616][11441] Updated weights for policy 0, policy_version 500 (0.0044) [2023-02-24 06:37:04,818][00368] Fps is (10 sec: 2865.1, 60 sec: 3276.4, 300 sec: 3248.9). Total num frames: 2048000. Throughput: 0: 818.0. Samples: 511986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:37:04,826][00368] Avg episode reward: [(0, '8.920')] [2023-02-24 06:37:09,810][00368] Fps is (10 sec: 2867.6, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 2064384. Throughput: 0: 831.2. Samples: 516514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:37:09,813][00368] Avg episode reward: [(0, '9.019')] [2023-02-24 06:37:14,810][00368] Fps is (10 sec: 3689.1, 60 sec: 3276.8, 300 sec: 3249.1). Total num frames: 2084864. Throughput: 0: 839.7. Samples: 522456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:37:14,814][00368] Avg episode reward: [(0, '9.493')] [2023-02-24 06:37:15,207][11441] Updated weights for policy 0, policy_version 510 (0.0015) [2023-02-24 06:37:19,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2101248. Throughput: 0: 842.8. Samples: 525528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:37:19,819][00368] Avg episode reward: [(0, '10.014')] [2023-02-24 06:37:24,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 2117632. Throughput: 0: 816.5. Samples: 529394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:37:24,815][00368] Avg episode reward: [(0, '10.263')] [2023-02-24 06:37:24,817][11427] Saving new best policy, reward=10.263! [2023-02-24 06:37:28,993][11441] Updated weights for policy 0, policy_version 520 (0.0023) [2023-02-24 06:37:29,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2129920. Throughput: 0: 826.0. Samples: 533706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:37:29,818][00368] Avg episode reward: [(0, '10.572')] [2023-02-24 06:37:29,830][11427] Saving new best policy, reward=10.572! [2023-02-24 06:37:34,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.1). Total num frames: 2150400. Throughput: 0: 836.5. Samples: 536616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:37:34,813][00368] Avg episode reward: [(0, '10.463')] [2023-02-24 06:37:39,586][11441] Updated weights for policy 0, policy_version 530 (0.0013) [2023-02-24 06:37:39,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 2170880. Throughput: 0: 837.5. Samples: 542782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:37:39,814][00368] Avg episode reward: [(0, '9.947')] [2023-02-24 06:37:39,828][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth... [2023-02-24 06:37:40,006][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000339_1388544.pth [2023-02-24 06:37:44,815][00368] Fps is (10 sec: 3275.0, 60 sec: 3344.8, 300 sec: 3262.9). Total num frames: 2183168. Throughput: 0: 811.2. Samples: 546648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:37:44,820][00368] Avg episode reward: [(0, '9.487')] [2023-02-24 06:37:49,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2195456. Throughput: 0: 813.4. Samples: 548582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:37:49,817][00368] Avg episode reward: [(0, '9.215')] [2023-02-24 06:37:53,483][11441] Updated weights for policy 0, policy_version 540 (0.0050) [2023-02-24 06:37:54,810][00368] Fps is (10 sec: 3278.5, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2215936. Throughput: 0: 830.6. Samples: 553892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:37:54,813][00368] Avg episode reward: [(0, '9.720')] [2023-02-24 06:37:59,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.2, 300 sec: 3276.8). Total num frames: 2236416. Throughput: 0: 835.0. Samples: 560030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:37:59,813][00368] Avg episode reward: [(0, '10.189')] [2023-02-24 06:38:04,810][00368] Fps is (10 sec: 3276.7, 60 sec: 3345.5, 300 sec: 3276.8). Total num frames: 2248704. Throughput: 0: 808.0. Samples: 561888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:38:04,823][00368] Avg episode reward: [(0, '10.871')] [2023-02-24 06:38:04,828][11427] Saving new best policy, reward=10.871! [2023-02-24 06:38:05,928][11441] Updated weights for policy 0, policy_version 550 (0.0029) [2023-02-24 06:38:09,812][00368] Fps is (10 sec: 2457.0, 60 sec: 3276.7, 300 sec: 3249.0). Total num frames: 2260992. Throughput: 0: 807.0. Samples: 565712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:38:09,817][00368] Avg episode reward: [(0, '10.253')] [2023-02-24 06:38:14,810][00368] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2281472. Throughput: 0: 829.6. Samples: 571036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:38:14,812][00368] Avg episode reward: [(0, '11.060')] [2023-02-24 06:38:14,829][11427] Saving new best policy, reward=11.060! [2023-02-24 06:38:17,821][11441] Updated weights for policy 0, policy_version 560 (0.0033) [2023-02-24 06:38:19,810][00368] Fps is (10 sec: 4097.0, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 2301952. Throughput: 0: 835.5. Samples: 574212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:38:19,817][00368] Avg episode reward: [(0, '10.539')] [2023-02-24 06:38:24,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 2314240. Throughput: 0: 807.5. Samples: 579118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2023-02-24 06:38:24,817][00368] Avg episode reward: [(0, '9.972')] [2023-02-24 06:38:29,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2326528. Throughput: 0: 805.9. Samples: 582908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:38:29,813][00368] Avg episode reward: [(0, '10.126')] [2023-02-24 06:38:32,599][11441] Updated weights for policy 0, policy_version 570 (0.0021) [2023-02-24 06:38:34,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 2338816. Throughput: 0: 798.7. Samples: 584524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2023-02-24 06:38:34,816][00368] Avg episode reward: [(0, '9.549')] [2023-02-24 06:38:39,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3207.4). Total num frames: 2351104. Throughput: 0: 769.2. Samples: 588508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:38:39,817][00368] Avg episode reward: [(0, '9.887')] [2023-02-24 06:38:44,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.3, 300 sec: 3221.3). Total num frames: 2367488. Throughput: 0: 723.6. Samples: 592590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:38:44,812][00368] Avg episode reward: [(0, '9.918')] [2023-02-24 06:38:47,830][11441] Updated weights for policy 0, policy_version 580 (0.0023) [2023-02-24 06:38:49,811][00368] Fps is (10 sec: 2866.8, 60 sec: 3071.9, 300 sec: 3207.4). Total num frames: 2379776. Throughput: 0: 727.1. Samples: 594610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:38:49,814][00368] Avg episode reward: [(0, '10.981')] [2023-02-24 06:38:54,810][00368] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3179.6). Total num frames: 2392064. Throughput: 0: 730.2. Samples: 598568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:38:54,819][00368] Avg episode reward: [(0, '12.223')] [2023-02-24 06:38:54,824][11427] Saving new best policy, reward=12.223! [2023-02-24 06:38:59,810][00368] Fps is (10 sec: 3277.2, 60 sec: 2935.5, 300 sec: 3193.5). Total num frames: 2412544. Throughput: 0: 750.2. Samples: 604794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:38:59,813][00368] Avg episode reward: [(0, '12.540')] [2023-02-24 06:38:59,890][11427] Saving new best policy, reward=12.540! [2023-02-24 06:38:59,898][11441] Updated weights for policy 0, policy_version 590 (0.0021) [2023-02-24 06:39:04,817][00368] Fps is (10 sec: 4093.0, 60 sec: 3071.6, 300 sec: 3221.2). Total num frames: 2433024. Throughput: 0: 744.1. Samples: 607704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:39:04,820][00368] Avg episode reward: [(0, '12.967')] [2023-02-24 06:39:04,824][11427] Saving new best policy, reward=12.967! [2023-02-24 06:39:09,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 3207.4). Total num frames: 2445312. Throughput: 0: 731.1. Samples: 612018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:39:09,819][00368] Avg episode reward: [(0, '13.312')] [2023-02-24 06:39:09,839][11427] Saving new best policy, reward=13.312! [2023-02-24 06:39:13,249][11441] Updated weights for policy 0, policy_version 600 (0.0032) [2023-02-24 06:39:14,810][00368] Fps is (10 sec: 2869.2, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 2461696. Throughput: 0: 735.3. Samples: 615996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:39:14,819][00368] Avg episode reward: [(0, '12.797')] [2023-02-24 06:39:19,810][00368] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3179.6). Total num frames: 2478080. Throughput: 0: 763.3. Samples: 618872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:39:19,813][00368] Avg episode reward: [(0, '13.304')] [2023-02-24 06:39:24,158][11441] Updated weights for policy 0, policy_version 610 (0.0026) [2023-02-24 06:39:24,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 2498560. Throughput: 0: 809.3. Samples: 624926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:39:24,816][00368] Avg episode reward: [(0, '14.371')] [2023-02-24 06:39:24,820][11427] Saving new best policy, reward=14.371! [2023-02-24 06:39:29,812][00368] Fps is (10 sec: 3685.8, 60 sec: 3140.2, 300 sec: 3221.2). Total num frames: 2514944. Throughput: 0: 816.9. Samples: 629352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:39:29,821][00368] Avg episode reward: [(0, '14.349')] [2023-02-24 06:39:34,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3179.7). Total num frames: 2527232. Throughput: 0: 816.8. Samples: 631366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:39:34,816][00368] Avg episode reward: [(0, '14.594')] [2023-02-24 06:39:34,824][11427] Saving new best policy, reward=14.594! [2023-02-24 06:39:38,177][11441] Updated weights for policy 0, policy_version 620 (0.0025) [2023-02-24 06:39:39,810][00368] Fps is (10 sec: 2867.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2543616. Throughput: 0: 835.8. Samples: 636180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:39:39,818][00368] Avg episode reward: [(0, '14.853')] [2023-02-24 06:39:39,833][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth... [2023-02-24 06:39:39,976][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth [2023-02-24 06:39:39,992][11427] Saving new best policy, reward=14.853! [2023-02-24 06:39:44,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2564096. Throughput: 0: 831.0. Samples: 642188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:39:44,813][00368] Avg episode reward: [(0, '15.480')] [2023-02-24 06:39:44,817][11427] Saving new best policy, reward=15.480! [2023-02-24 06:39:49,552][11441] Updated weights for policy 0, policy_version 630 (0.0023) [2023-02-24 06:39:49,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 2580480. Throughput: 0: 820.1. Samples: 644602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:39:49,813][00368] Avg episode reward: [(0, '16.867')] [2023-02-24 06:39:49,823][11427] Saving new best policy, reward=16.867! [2023-02-24 06:39:54,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 2592768. Throughput: 0: 810.8. Samples: 648504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:39:54,815][00368] Avg episode reward: [(0, '16.422')] [2023-02-24 06:39:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 2609152. Throughput: 0: 828.9. Samples: 653298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:39:59,818][00368] Avg episode reward: [(0, '15.852')] [2023-02-24 06:40:02,456][11441] Updated weights for policy 0, policy_version 640 (0.0028) [2023-02-24 06:40:04,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3277.2, 300 sec: 3207.4). Total num frames: 2629632. Throughput: 0: 830.1. Samples: 656228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:40:04,813][00368] Avg episode reward: [(0, '15.153')] [2023-02-24 06:40:09,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 2646016. Throughput: 0: 819.6. Samples: 661808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:40:09,820][00368] Avg episode reward: [(0, '15.621')] [2023-02-24 06:40:14,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2658304. Throughput: 0: 808.2. Samples: 665718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:40:14,813][00368] Avg episode reward: [(0, '14.653')] [2023-02-24 06:40:15,531][11441] Updated weights for policy 0, policy_version 650 (0.0013) [2023-02-24 06:40:19,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2674688. Throughput: 0: 804.1. Samples: 667550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:40:19,813][00368] Avg episode reward: [(0, '15.346')] [2023-02-24 06:40:24,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2695168. Throughput: 0: 826.1. Samples: 673356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:40:24,816][00368] Avg episode reward: [(0, '16.647')] [2023-02-24 06:40:26,802][11441] Updated weights for policy 0, policy_version 660 (0.0025) [2023-02-24 06:40:29,815][00368] Fps is (10 sec: 3684.7, 60 sec: 3276.7, 300 sec: 3235.1). Total num frames: 2711552. Throughput: 0: 823.3. Samples: 679242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:40:29,819][00368] Avg episode reward: [(0, '17.542')] [2023-02-24 06:40:29,838][11427] Saving new best policy, reward=17.542! [2023-02-24 06:40:34,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2723840. Throughput: 0: 813.2. Samples: 681196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:40:34,813][00368] Avg episode reward: [(0, '17.865')] [2023-02-24 06:40:34,817][11427] Saving new best policy, reward=17.865! [2023-02-24 06:40:39,810][00368] Fps is (10 sec: 2868.5, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2740224. Throughput: 0: 812.8. Samples: 685078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:40:39,813][00368] Avg episode reward: [(0, '17.829')] [2023-02-24 06:40:40,755][11441] Updated weights for policy 0, policy_version 670 (0.0018) [2023-02-24 06:40:44,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2760704. Throughput: 0: 836.8. Samples: 690954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:40:44,813][00368] Avg episode reward: [(0, '18.670')] [2023-02-24 06:40:44,816][11427] Saving new best policy, reward=18.670! [2023-02-24 06:40:49,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2781184. Throughput: 0: 839.6. Samples: 694008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:40:49,813][00368] Avg episode reward: [(0, '18.442')] [2023-02-24 06:40:50,672][11441] Updated weights for policy 0, policy_version 680 (0.0028) [2023-02-24 06:40:54,817][00368] Fps is (10 sec: 3274.4, 60 sec: 3344.7, 300 sec: 3248.9). Total num frames: 2793472. Throughput: 0: 823.6. Samples: 698874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:40:54,820][00368] Avg episode reward: [(0, '19.675')] [2023-02-24 06:40:54,824][11427] Saving new best policy, reward=19.675! [2023-02-24 06:40:59,810][00368] Fps is (10 sec: 2457.5, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2805760. Throughput: 0: 823.1. Samples: 702758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:40:59,819][00368] Avg episode reward: [(0, '19.321')] [2023-02-24 06:41:04,255][11441] Updated weights for policy 0, policy_version 690 (0.0026) [2023-02-24 06:41:04,810][00368] Fps is (10 sec: 3279.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 2826240. Throughput: 0: 840.7. Samples: 705382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:41:04,817][00368] Avg episode reward: [(0, '19.982')] [2023-02-24 06:41:04,823][11427] Saving new best policy, reward=19.982! [2023-02-24 06:41:09,810][00368] Fps is (10 sec: 4096.1, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2846720. Throughput: 0: 852.3. Samples: 711710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:41:09,813][00368] Avg episode reward: [(0, '20.140')] [2023-02-24 06:41:09,824][11427] Saving new best policy, reward=20.140! [2023-02-24 06:41:14,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 2863104. Throughput: 0: 828.0. Samples: 716498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:41:14,815][00368] Avg episode reward: [(0, '19.855')] [2023-02-24 06:41:16,149][11441] Updated weights for policy 0, policy_version 700 (0.0017) [2023-02-24 06:41:19,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 2875392. Throughput: 0: 828.0. Samples: 718454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:41:19,814][00368] Avg episode reward: [(0, '19.921')] [2023-02-24 06:41:24,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2891776. Throughput: 0: 842.0. Samples: 722968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:41:24,814][00368] Avg episode reward: [(0, '20.475')] [2023-02-24 06:41:24,820][11427] Saving new best policy, reward=20.475! [2023-02-24 06:41:28,323][11441] Updated weights for policy 0, policy_version 710 (0.0020) [2023-02-24 06:41:29,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3249.0). Total num frames: 2912256. Throughput: 0: 849.7. Samples: 729192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:41:29,813][00368] Avg episode reward: [(0, '19.753')] [2023-02-24 06:41:34,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3249.1). Total num frames: 2928640. Throughput: 0: 848.8. Samples: 732206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:41:34,813][00368] Avg episode reward: [(0, '19.026')] [2023-02-24 06:41:39,810][00368] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 2945024. Throughput: 0: 829.9. Samples: 736214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:41:39,812][00368] Avg episode reward: [(0, '18.679')] [2023-02-24 06:41:39,831][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000719_2945024.pth... [2023-02-24 06:41:40,012][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth [2023-02-24 06:41:41,274][11441] Updated weights for policy 0, policy_version 720 (0.0035) [2023-02-24 06:41:44,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2957312. Throughput: 0: 840.5. Samples: 740582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:41:44,818][00368] Avg episode reward: [(0, '18.944')] [2023-02-24 06:41:49,810][00368] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 2977792. Throughput: 0: 847.3. Samples: 743510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:41:49,818][00368] Avg episode reward: [(0, '17.932')] [2023-02-24 06:41:52,028][11441] Updated weights for policy 0, policy_version 730 (0.0013) [2023-02-24 06:41:54,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.8, 300 sec: 3262.9). Total num frames: 2998272. Throughput: 0: 841.2. Samples: 749562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:41:54,813][00368] Avg episode reward: [(0, '17.908')] [2023-02-24 06:41:59,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3249.1). Total num frames: 3006464. Throughput: 0: 804.4. Samples: 752694. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 06:41:59,816][00368] Avg episode reward: [(0, '19.486')] [2023-02-24 06:42:04,810][00368] Fps is (10 sec: 2048.0, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3018752. Throughput: 0: 795.2. Samples: 754236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:42:04,815][00368] Avg episode reward: [(0, '19.084')] [2023-02-24 06:42:09,811][00368] Fps is (10 sec: 2047.9, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 3026944. Throughput: 0: 763.4. Samples: 757320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:42:09,816][00368] Avg episode reward: [(0, '18.557')] [2023-02-24 06:42:10,493][11441] Updated weights for policy 0, policy_version 740 (0.0047) [2023-02-24 06:42:14,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 3047424. Throughput: 0: 739.1. Samples: 762450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 06:42:14,818][00368] Avg episode reward: [(0, '18.084')] [2023-02-24 06:42:19,810][00368] Fps is (10 sec: 3686.6, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 3063808. Throughput: 0: 738.7. Samples: 765448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:42:19,815][00368] Avg episode reward: [(0, '19.212')] [2023-02-24 06:42:20,948][11441] Updated weights for policy 0, policy_version 750 (0.0015) [2023-02-24 06:42:24,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3080192. Throughput: 0: 759.7. Samples: 770402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:42:24,813][00368] Avg episode reward: [(0, '19.828')] [2023-02-24 06:42:29,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 3092480. Throughput: 0: 749.9. Samples: 774328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:42:29,815][00368] Avg episode reward: [(0, '19.278')] [2023-02-24 06:42:34,738][11441] Updated weights for policy 0, policy_version 760 (0.0012) [2023-02-24 06:42:34,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 3112960. Throughput: 0: 737.5. Samples: 776696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:42:34,813][00368] Avg episode reward: [(0, '18.821')] [2023-02-24 06:42:39,810][00368] Fps is (10 sec: 4096.1, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3133440. Throughput: 0: 742.0. Samples: 782954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:42:39,818][00368] Avg episode reward: [(0, '21.068')] [2023-02-24 06:42:39,831][11427] Saving new best policy, reward=21.068! [2023-02-24 06:42:44,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3145728. Throughput: 0: 783.2. Samples: 787936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:42:44,815][00368] Avg episode reward: [(0, '20.368')] [2023-02-24 06:42:46,563][11441] Updated weights for policy 0, policy_version 770 (0.0024) [2023-02-24 06:42:49,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 3162112. Throughput: 0: 790.5. Samples: 789808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:42:49,813][00368] Avg episode reward: [(0, '19.786')] [2023-02-24 06:42:54,810][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3179.6). Total num frames: 3174400. Throughput: 0: 817.4. Samples: 794102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:42:54,813][00368] Avg episode reward: [(0, '20.188')] [2023-02-24 06:42:58,763][11441] Updated weights for policy 0, policy_version 780 (0.0015) [2023-02-24 06:42:59,810][00368] Fps is (10 sec: 3686.5, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3198976. Throughput: 0: 840.7. Samples: 800282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 06:42:59,814][00368] Avg episode reward: [(0, '20.277')] [2023-02-24 06:43:04,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 3215360. Throughput: 0: 841.9. Samples: 803334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:43:04,817][00368] Avg episode reward: [(0, '18.634')] [2023-02-24 06:43:09,810][00368] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 3227648. Throughput: 0: 820.3. Samples: 807314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:09,818][00368] Avg episode reward: [(0, '19.002')] [2023-02-24 06:43:12,167][11441] Updated weights for policy 0, policy_version 790 (0.0013) [2023-02-24 06:43:14,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 3239936. Throughput: 0: 824.7. Samples: 811438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:14,813][00368] Avg episode reward: [(0, '20.595')] [2023-02-24 06:43:19,810][00368] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3260416. Throughput: 0: 842.2. Samples: 814596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:19,818][00368] Avg episode reward: [(0, '20.632')] [2023-02-24 06:43:22,912][11441] Updated weights for policy 0, policy_version 800 (0.0029) [2023-02-24 06:43:24,812][00368] Fps is (10 sec: 4095.0, 60 sec: 3344.9, 300 sec: 3235.1). Total num frames: 3280896. Throughput: 0: 836.0. Samples: 820578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:43:24,816][00368] Avg episode reward: [(0, '21.623')] [2023-02-24 06:43:24,824][11427] Saving new best policy, reward=21.623! [2023-02-24 06:43:29,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3293184. Throughput: 0: 813.7. Samples: 824552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:29,819][00368] Avg episode reward: [(0, '22.279')] [2023-02-24 06:43:29,840][11427] Saving new best policy, reward=22.279! [2023-02-24 06:43:34,810][00368] Fps is (10 sec: 2458.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3305472. Throughput: 0: 813.6. Samples: 826418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:43:34,820][00368] Avg episode reward: [(0, '23.804')] [2023-02-24 06:43:34,824][11427] Saving new best policy, reward=23.804! [2023-02-24 06:43:37,111][11441] Updated weights for policy 0, policy_version 810 (0.0023) [2023-02-24 06:43:39,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3325952. Throughput: 0: 834.6. Samples: 831660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:39,818][00368] Avg episode reward: [(0, '22.429')] [2023-02-24 06:43:39,833][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000812_3325952.pth... [2023-02-24 06:43:39,993][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth [2023-02-24 06:43:44,810][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3346432. Throughput: 0: 833.4. Samples: 837784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:43:44,813][00368] Avg episode reward: [(0, '22.473')] [2023-02-24 06:43:48,150][11441] Updated weights for policy 0, policy_version 820 (0.0019) [2023-02-24 06:43:49,814][00368] Fps is (10 sec: 3684.8, 60 sec: 3344.8, 300 sec: 3290.6). Total num frames: 3362816. Throughput: 0: 814.4. Samples: 839986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:43:49,818][00368] Avg episode reward: [(0, '21.713')] [2023-02-24 06:43:54,811][00368] Fps is (10 sec: 2867.0, 60 sec: 3345.0, 300 sec: 3262.9). Total num frames: 3375104. Throughput: 0: 810.3. Samples: 843778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:43:54,814][00368] Avg episode reward: [(0, '20.786')] [2023-02-24 06:43:59,812][00368] Fps is (10 sec: 2867.8, 60 sec: 3208.4, 300 sec: 3249.1). Total num frames: 3391488. Throughput: 0: 833.0. Samples: 848926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:43:59,818][00368] Avg episode reward: [(0, '19.457')] [2023-02-24 06:44:01,344][11441] Updated weights for policy 0, policy_version 830 (0.0026) [2023-02-24 06:44:04,810][00368] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3411968. Throughput: 0: 830.8. Samples: 851980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:44:04,812][00368] Avg episode reward: [(0, '18.783')] [2023-02-24 06:44:09,810][00368] Fps is (10 sec: 3687.3, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 3428352. Throughput: 0: 814.9. Samples: 857248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:44:09,813][00368] Avg episode reward: [(0, '18.902')] [2023-02-24 06:44:14,212][11441] Updated weights for policy 0, policy_version 840 (0.0014) [2023-02-24 06:44:14,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3440640. Throughput: 0: 812.2. Samples: 861102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:44:14,823][00368] Avg episode reward: [(0, '19.073')] [2023-02-24 06:44:19,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3457024. Throughput: 0: 818.9. Samples: 863268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:44:19,812][00368] Avg episode reward: [(0, '19.649')] [2023-02-24 06:44:24,811][00368] Fps is (10 sec: 3686.3, 60 sec: 3276.9, 300 sec: 3262.9). Total num frames: 3477504. Throughput: 0: 834.9. Samples: 869230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:44:24,814][00368] Avg episode reward: [(0, '18.214')] [2023-02-24 06:44:25,428][11441] Updated weights for policy 0, policy_version 850 (0.0018) [2023-02-24 06:44:29,812][00368] Fps is (10 sec: 3685.6, 60 sec: 3344.9, 300 sec: 3276.8). Total num frames: 3493888. Throughput: 0: 813.5. Samples: 874394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:44:29,815][00368] Avg episode reward: [(0, '18.606')] [2023-02-24 06:44:34,810][00368] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 3506176. Throughput: 0: 806.7. Samples: 876282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:44:34,814][00368] Avg episode reward: [(0, '18.736')] [2023-02-24 06:44:39,621][11441] Updated weights for policy 0, policy_version 860 (0.0025) [2023-02-24 06:44:39,810][00368] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 3522560. Throughput: 0: 812.5. Samples: 880342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:44:39,813][00368] Avg episode reward: [(0, '20.222')] [2023-02-24 06:44:44,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3543040. Throughput: 0: 835.9. Samples: 886540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:44:44,815][00368] Avg episode reward: [(0, '19.629')] [2023-02-24 06:44:49,810][00368] Fps is (10 sec: 3686.3, 60 sec: 3277.0, 300 sec: 3276.8). Total num frames: 3559424. Throughput: 0: 837.0. Samples: 889646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:44:49,815][00368] Avg episode reward: [(0, '20.756')] [2023-02-24 06:44:50,156][11441] Updated weights for policy 0, policy_version 870 (0.0022) [2023-02-24 06:44:54,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3571712. Throughput: 0: 806.8. Samples: 893556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:44:54,813][00368] Avg episode reward: [(0, '21.875')] [2023-02-24 06:44:59,810][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3249.0). Total num frames: 3588096. Throughput: 0: 809.4. Samples: 897526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 06:44:59,812][00368] Avg episode reward: [(0, '21.279')] [2023-02-24 06:45:03,886][11441] Updated weights for policy 0, policy_version 880 (0.0013) [2023-02-24 06:45:04,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 3604480. Throughput: 0: 831.7. Samples: 900694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:45:04,813][00368] Avg episode reward: [(0, '21.841')] [2023-02-24 06:45:09,810][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3624960. Throughput: 0: 832.7. Samples: 906700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:45:09,813][00368] Avg episode reward: [(0, '20.325')] [2023-02-24 06:45:14,810][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 3637248. Throughput: 0: 810.5. Samples: 910866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:45:14,813][00368] Avg episode reward: [(0, '20.200')] [2023-02-24 06:45:16,518][11441] Updated weights for policy 0, policy_version 890 (0.0019) [2023-02-24 06:45:19,811][00368] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3649536. Throughput: 0: 811.6. Samples: 912802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:45:19,814][00368] Avg episode reward: [(0, '20.802')] [2023-02-24 06:45:24,810][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3221.3). Total num frames: 3661824. Throughput: 0: 795.9. Samples: 916158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:45:24,828][00368] Avg episode reward: [(0, '20.658')] [2023-02-24 06:45:29,811][00368] Fps is (10 sec: 2867.0, 60 sec: 3072.1, 300 sec: 3235.1). Total num frames: 3678208. Throughput: 0: 747.7. Samples: 920186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:45:29,816][00368] Avg episode reward: [(0, '21.275')] [2023-02-24 06:45:32,184][11441] Updated weights for policy 0, policy_version 900 (0.0027) [2023-02-24 06:45:34,817][00368] Fps is (10 sec: 2865.1, 60 sec: 3071.6, 300 sec: 3221.2). Total num frames: 3690496. Throughput: 0: 733.9. Samples: 922678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:45:34,825][00368] Avg episode reward: [(0, '22.688')] [2023-02-24 06:45:39,810][00368] Fps is (10 sec: 2457.7, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 3702784. Throughput: 0: 733.2. Samples: 926550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 06:45:39,818][00368] Avg episode reward: [(0, '22.271')] [2023-02-24 06:45:39,908][11427] Stopping Batcher_0... [2023-02-24 06:45:39,909][11427] Loop batcher_evt_loop terminating... [2023-02-24 06:45:39,908][00368] Component Batcher_0 stopped! [2023-02-24 06:45:39,921][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:45:40,001][11441] Weights refcount: 2 0 [2023-02-24 06:45:40,016][00368] Component InferenceWorker_p0-w0 stopped! [2023-02-24 06:45:40,019][11441] Stopping InferenceWorker_p0-w0... [2023-02-24 06:45:40,020][11441] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 06:45:40,054][00368] Component RolloutWorker_w4 stopped! [2023-02-24 06:45:40,057][11446] Stopping RolloutWorker_w4... [2023-02-24 06:45:40,058][11446] Loop rollout_proc4_evt_loop terminating... [2023-02-24 06:45:40,079][00368] Component RolloutWorker_w6 stopped! [2023-02-24 06:45:40,081][11447] Stopping RolloutWorker_w6... [2023-02-24 06:45:40,081][11447] Loop rollout_proc6_evt_loop terminating... [2023-02-24 06:45:40,092][00368] Component RolloutWorker_w3 stopped! [2023-02-24 06:45:40,097][11444] Stopping RolloutWorker_w3... [2023-02-24 06:45:40,098][11444] Loop rollout_proc3_evt_loop terminating... [2023-02-24 06:45:40,104][11443] Stopping RolloutWorker_w0... [2023-02-24 06:45:40,104][11443] Loop rollout_proc0_evt_loop terminating... [2023-02-24 06:45:40,104][00368] Component RolloutWorker_w1 stopped! [2023-02-24 06:45:40,107][00368] Component RolloutWorker_w0 stopped! [2023-02-24 06:45:40,108][11442] Stopping RolloutWorker_w1... [2023-02-24 06:45:40,116][11453] Stopping RolloutWorker_w5... [2023-02-24 06:45:40,116][11453] Loop rollout_proc5_evt_loop terminating... [2023-02-24 06:45:40,116][00368] Component RolloutWorker_w5 stopped! [2023-02-24 06:45:40,125][11445] Stopping RolloutWorker_w2... [2023-02-24 06:45:40,125][11445] Loop rollout_proc2_evt_loop terminating... [2023-02-24 06:45:40,125][00368] Component RolloutWorker_w2 stopped! [2023-02-24 06:45:40,111][11442] Loop rollout_proc1_evt_loop terminating... [2023-02-24 06:45:40,135][00368] Component RolloutWorker_w7 stopped! [2023-02-24 06:45:40,139][11448] Stopping RolloutWorker_w7... [2023-02-24 06:45:40,140][11448] Loop rollout_proc7_evt_loop terminating... [2023-02-24 06:45:40,181][11427] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000719_2945024.pth [2023-02-24 06:45:40,199][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:45:40,558][11427] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:45:40,798][00368] Component LearnerWorker_p0 stopped! [2023-02-24 06:45:40,803][00368] Waiting for process learner_proc0 to stop... [2023-02-24 06:45:40,807][11427] Stopping LearnerWorker_p0... [2023-02-24 06:45:40,808][11427] Loop learner_proc0_evt_loop terminating... [2023-02-24 06:45:43,182][00368] Waiting for process inference_proc0-0 to join... [2023-02-24 06:45:43,478][00368] Waiting for process rollout_proc0 to join... [2023-02-24 06:45:43,480][00368] Waiting for process rollout_proc1 to join... [2023-02-24 06:45:43,886][00368] Waiting for process rollout_proc2 to join... [2023-02-24 06:45:43,888][00368] Waiting for process rollout_proc3 to join... [2023-02-24 06:45:43,899][00368] Waiting for process rollout_proc4 to join... [2023-02-24 06:45:43,900][00368] Waiting for process rollout_proc5 to join... [2023-02-24 06:45:43,901][00368] Waiting for process rollout_proc6 to join... [2023-02-24 06:45:43,902][00368] Waiting for process rollout_proc7 to join... [2023-02-24 06:45:43,903][00368] Batcher 0 profile tree view: batching: 25.5129, releasing_batches: 0.0240 [2023-02-24 06:45:43,905][00368] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0065 wait_policy_total: 560.1696 update_model: 8.0809 weight_update: 0.0031 one_step: 0.0028 handle_policy_step: 554.9018 deserialize: 15.3037, stack: 3.1870, obs_to_device_normalize: 117.1107, forward: 278.0135, send_messages: 27.0032 prepare_outputs: 86.2090 to_cpu: 53.1327 [2023-02-24 06:45:43,906][00368] Learner 0 profile tree view: misc: 0.0063, prepare_batch: 16.4303 train: 72.7290 epoch_init: 0.0086, minibatch_init: 0.0219, losses_postprocess: 0.5128, kl_divergence: 0.5776, after_optimizer: 30.8212 calculate_losses: 25.8497 losses_init: 0.0036, forward_head: 1.6968, bptt_initial: 16.7756, tail: 1.1422, advantages_returns: 0.2879, losses: 3.2923 bptt: 2.3093 bptt_forward_core: 2.1659 update: 14.2765 clip: 1.4080 [2023-02-24 06:45:43,908][00368] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3182, enqueue_policy_requests: 163.6050, env_step: 865.1272, overhead: 24.4902, complete_rollouts: 7.2754 save_policy_outputs: 23.2047 split_output_tensors: 10.9553 [2023-02-24 06:45:43,909][00368] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3254, enqueue_policy_requests: 165.7920, env_step: 864.1768, overhead: 24.1357, complete_rollouts: 7.2464 save_policy_outputs: 21.6742 split_output_tensors: 10.4273 [2023-02-24 06:45:43,911][00368] Loop Runner_EvtLoop terminating... [2023-02-24 06:45:43,913][00368] Runner profile tree view: main_loop: 1199.3516 [2023-02-24 06:45:43,914][00368] Collected {0: 3706880}, FPS: 3090.7 [2023-02-24 06:45:43,973][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 06:45:43,974][00368] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 06:45:43,976][00368] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 06:45:43,979][00368] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 06:45:43,981][00368] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 06:45:43,985][00368] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 06:45:43,986][00368] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 06:45:43,989][00368] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 06:45:43,991][00368] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-24 06:45:43,993][00368] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-24 06:45:43,997][00368] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 06:45:43,999][00368] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 06:45:44,001][00368] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 06:45:44,003][00368] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 06:45:44,005][00368] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 06:45:44,029][00368] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:45:44,032][00368] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:45:44,035][00368] RunningMeanStd input shape: (1,) [2023-02-24 06:45:44,055][00368] ConvEncoder: input_channels=3 [2023-02-24 06:45:44,752][00368] Conv encoder output size: 512 [2023-02-24 06:45:44,757][00368] Policy head output size: 512 [2023-02-24 06:45:47,098][00368] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:45:48,392][00368] Num frames 100... [2023-02-24 06:45:48,508][00368] Num frames 200... [2023-02-24 06:45:48,647][00368] Num frames 300... [2023-02-24 06:45:48,769][00368] Num frames 400... [2023-02-24 06:45:48,844][00368] Avg episode rewards: #0: 6.160, true rewards: #0: 4.160 [2023-02-24 06:45:48,845][00368] Avg episode reward: 6.160, avg true_objective: 4.160 [2023-02-24 06:45:48,959][00368] Num frames 500... [2023-02-24 06:45:49,081][00368] Num frames 600... [2023-02-24 06:45:49,195][00368] Num frames 700... [2023-02-24 06:45:49,312][00368] Num frames 800... [2023-02-24 06:45:49,428][00368] Num frames 900... [2023-02-24 06:45:49,549][00368] Num frames 1000... [2023-02-24 06:45:49,672][00368] Num frames 1100... [2023-02-24 06:45:49,786][00368] Num frames 1200... [2023-02-24 06:45:49,900][00368] Num frames 1300... [2023-02-24 06:45:50,019][00368] Num frames 1400... [2023-02-24 06:45:50,144][00368] Num frames 1500... [2023-02-24 06:45:50,260][00368] Num frames 1600... [2023-02-24 06:45:50,377][00368] Num frames 1700... [2023-02-24 06:45:50,527][00368] Avg episode rewards: #0: 18.885, true rewards: #0: 8.885 [2023-02-24 06:45:50,529][00368] Avg episode reward: 18.885, avg true_objective: 8.885 [2023-02-24 06:45:50,562][00368] Num frames 1800... [2023-02-24 06:45:50,689][00368] Num frames 1900... [2023-02-24 06:45:50,812][00368] Num frames 2000... [2023-02-24 06:45:50,935][00368] Num frames 2100... [2023-02-24 06:45:51,054][00368] Num frames 2200... [2023-02-24 06:45:51,169][00368] Num frames 2300... [2023-02-24 06:45:51,281][00368] Num frames 2400... [2023-02-24 06:45:51,399][00368] Num frames 2500... [2023-02-24 06:45:51,518][00368] Num frames 2600... [2023-02-24 06:45:51,633][00368] Num frames 2700... [2023-02-24 06:45:51,756][00368] Num frames 2800... [2023-02-24 06:45:51,910][00368] Avg episode rewards: #0: 21.617, true rewards: #0: 9.617 [2023-02-24 06:45:51,912][00368] Avg episode reward: 21.617, avg true_objective: 9.617 [2023-02-24 06:45:51,936][00368] Num frames 2900... [2023-02-24 06:45:52,065][00368] Num frames 3000... [2023-02-24 06:45:52,194][00368] Num frames 3100... [2023-02-24 06:45:52,319][00368] Num frames 3200... [2023-02-24 06:45:52,447][00368] Num frames 3300... [2023-02-24 06:45:52,575][00368] Num frames 3400... [2023-02-24 06:45:52,693][00368] Num frames 3500... [2023-02-24 06:45:52,816][00368] Num frames 3600... [2023-02-24 06:45:52,988][00368] Num frames 3700... [2023-02-24 06:45:53,169][00368] Num frames 3800... [2023-02-24 06:45:53,333][00368] Num frames 3900... [2023-02-24 06:45:53,505][00368] Num frames 4000... [2023-02-24 06:45:53,663][00368] Num frames 4100... [2023-02-24 06:45:53,828][00368] Num frames 4200... [2023-02-24 06:45:54,003][00368] Num frames 4300... [2023-02-24 06:45:54,178][00368] Num frames 4400... [2023-02-24 06:45:54,348][00368] Num frames 4500... [2023-02-24 06:45:54,513][00368] Num frames 4600... [2023-02-24 06:45:54,692][00368] Avg episode rewards: #0: 28.192, true rewards: #0: 11.692 [2023-02-24 06:45:54,695][00368] Avg episode reward: 28.192, avg true_objective: 11.692 [2023-02-24 06:45:54,745][00368] Num frames 4700... [2023-02-24 06:45:54,918][00368] Num frames 4800... [2023-02-24 06:45:55,089][00368] Num frames 4900... [2023-02-24 06:45:55,253][00368] Num frames 5000... [2023-02-24 06:45:55,420][00368] Num frames 5100... [2023-02-24 06:45:55,630][00368] Avg episode rewards: #0: 24.378, true rewards: #0: 10.378 [2023-02-24 06:45:55,632][00368] Avg episode reward: 24.378, avg true_objective: 10.378 [2023-02-24 06:45:55,653][00368] Num frames 5200... [2023-02-24 06:45:55,822][00368] Num frames 5300... [2023-02-24 06:45:55,989][00368] Num frames 5400... [2023-02-24 06:45:56,162][00368] Num frames 5500... [2023-02-24 06:45:56,322][00368] Num frames 5600... [2023-02-24 06:45:56,438][00368] Avg episode rewards: #0: 21.228, true rewards: #0: 9.395 [2023-02-24 06:45:56,441][00368] Avg episode reward: 21.228, avg true_objective: 9.395 [2023-02-24 06:45:56,548][00368] Num frames 5700... [2023-02-24 06:45:56,675][00368] Num frames 5800... [2023-02-24 06:45:56,797][00368] Num frames 5900... [2023-02-24 06:45:56,913][00368] Num frames 6000... [2023-02-24 06:45:57,028][00368] Num frames 6100... [2023-02-24 06:45:57,143][00368] Num frames 6200... [2023-02-24 06:45:57,262][00368] Num frames 6300... [2023-02-24 06:45:57,382][00368] Num frames 6400... [2023-02-24 06:45:57,498][00368] Num frames 6500... [2023-02-24 06:45:57,612][00368] Num frames 6600... [2023-02-24 06:45:57,724][00368] Num frames 6700... [2023-02-24 06:45:57,845][00368] Num frames 6800... [2023-02-24 06:45:57,962][00368] Num frames 6900... [2023-02-24 06:45:58,081][00368] Avg episode rewards: #0: 23.221, true rewards: #0: 9.936 [2023-02-24 06:45:58,083][00368] Avg episode reward: 23.221, avg true_objective: 9.936 [2023-02-24 06:45:58,141][00368] Num frames 7000... [2023-02-24 06:45:58,262][00368] Num frames 7100... [2023-02-24 06:45:58,384][00368] Num frames 7200... [2023-02-24 06:45:58,506][00368] Num frames 7300... [2023-02-24 06:45:58,624][00368] Num frames 7400... [2023-02-24 06:45:58,741][00368] Num frames 7500... [2023-02-24 06:45:58,864][00368] Num frames 7600... [2023-02-24 06:45:58,979][00368] Avg episode rewards: #0: 21.687, true rewards: #0: 9.562 [2023-02-24 06:45:58,981][00368] Avg episode reward: 21.687, avg true_objective: 9.562 [2023-02-24 06:45:59,050][00368] Num frames 7700... [2023-02-24 06:45:59,172][00368] Num frames 7800... [2023-02-24 06:45:59,297][00368] Num frames 7900... [2023-02-24 06:45:59,439][00368] Num frames 8000... [2023-02-24 06:45:59,574][00368] Num frames 8100... [2023-02-24 06:45:59,696][00368] Num frames 8200... [2023-02-24 06:45:59,808][00368] Num frames 8300... [2023-02-24 06:45:59,939][00368] Num frames 8400... [2023-02-24 06:46:00,086][00368] Num frames 8500... [2023-02-24 06:46:00,216][00368] Num frames 8600... [2023-02-24 06:46:00,347][00368] Num frames 8700... [2023-02-24 06:46:00,500][00368] Avg episode rewards: #0: 21.972, true rewards: #0: 9.750 [2023-02-24 06:46:00,502][00368] Avg episode reward: 21.972, avg true_objective: 9.750 [2023-02-24 06:46:00,537][00368] Num frames 8800... [2023-02-24 06:46:00,652][00368] Num frames 8900... [2023-02-24 06:46:00,767][00368] Num frames 9000... [2023-02-24 06:46:00,889][00368] Num frames 9100... [2023-02-24 06:46:01,006][00368] Num frames 9200... [2023-02-24 06:46:01,121][00368] Num frames 9300... [2023-02-24 06:46:01,238][00368] Num frames 9400... [2023-02-24 06:46:01,358][00368] Num frames 9500... [2023-02-24 06:46:01,473][00368] Num frames 9600... [2023-02-24 06:46:01,587][00368] Num frames 9700... [2023-02-24 06:46:01,702][00368] Num frames 9800... [2023-02-24 06:46:01,817][00368] Num frames 9900... [2023-02-24 06:46:01,946][00368] Num frames 10000... [2023-02-24 06:46:02,061][00368] Num frames 10100... [2023-02-24 06:46:02,178][00368] Num frames 10200... [2023-02-24 06:46:02,295][00368] Num frames 10300... [2023-02-24 06:46:02,408][00368] Num frames 10400... [2023-02-24 06:46:02,525][00368] Num frames 10500... [2023-02-24 06:46:02,637][00368] Num frames 10600... [2023-02-24 06:46:02,756][00368] Num frames 10700... [2023-02-24 06:46:02,841][00368] Avg episode rewards: #0: 24.327, true rewards: #0: 10.727 [2023-02-24 06:46:02,843][00368] Avg episode reward: 24.327, avg true_objective: 10.727 [2023-02-24 06:47:16,084][00368] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 06:47:16,122][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 06:47:16,125][00368] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 06:47:16,127][00368] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 06:47:16,133][00368] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 06:47:16,134][00368] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 06:47:16,136][00368] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 06:47:16,138][00368] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 06:47:16,139][00368] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 06:47:16,140][00368] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 06:47:16,142][00368] Adding new argument 'hf_repository'='SatCat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-24 06:47:16,143][00368] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 06:47:16,144][00368] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 06:47:16,146][00368] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 06:47:16,147][00368] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 06:47:16,148][00368] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 06:47:16,182][00368] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:47:16,184][00368] RunningMeanStd input shape: (1,) [2023-02-24 06:47:16,199][00368] ConvEncoder: input_channels=3 [2023-02-24 06:47:16,235][00368] Conv encoder output size: 512 [2023-02-24 06:47:16,237][00368] Policy head output size: 512 [2023-02-24 06:47:16,256][00368] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:47:16,703][00368] Num frames 100... [2023-02-24 06:47:16,823][00368] Num frames 200... [2023-02-24 06:47:16,936][00368] Num frames 300... [2023-02-24 06:47:17,052][00368] Num frames 400... [2023-02-24 06:47:17,181][00368] Num frames 500... [2023-02-24 06:47:17,308][00368] Num frames 600... [2023-02-24 06:47:17,424][00368] Num frames 700... [2023-02-24 06:47:17,542][00368] Num frames 800... [2023-02-24 06:47:17,663][00368] Num frames 900... [2023-02-24 06:47:17,790][00368] Num frames 1000... [2023-02-24 06:47:17,918][00368] Num frames 1100... [2023-02-24 06:47:18,049][00368] Num frames 1200... [2023-02-24 06:47:18,172][00368] Num frames 1300... [2023-02-24 06:47:18,297][00368] Num frames 1400... [2023-02-24 06:47:18,415][00368] Num frames 1500... [2023-02-24 06:47:18,530][00368] Num frames 1600... [2023-02-24 06:47:18,644][00368] Num frames 1700... [2023-02-24 06:47:18,761][00368] Num frames 1800... [2023-02-24 06:47:18,875][00368] Avg episode rewards: #0: 49.509, true rewards: #0: 18.510 [2023-02-24 06:47:18,877][00368] Avg episode reward: 49.509, avg true_objective: 18.510 [2023-02-24 06:47:18,952][00368] Num frames 1900... [2023-02-24 06:47:19,075][00368] Num frames 2000... [2023-02-24 06:47:19,197][00368] Num frames 2100... [2023-02-24 06:47:19,322][00368] Num frames 2200... [2023-02-24 06:47:19,445][00368] Num frames 2300... [2023-02-24 06:47:19,572][00368] Num frames 2400... [2023-02-24 06:47:19,716][00368] Num frames 2500... [2023-02-24 06:47:19,881][00368] Num frames 2600... [2023-02-24 06:47:20,047][00368] Num frames 2700... [2023-02-24 06:47:20,213][00368] Num frames 2800... [2023-02-24 06:47:20,390][00368] Num frames 2900... [2023-02-24 06:47:20,555][00368] Num frames 3000... [2023-02-24 06:47:20,735][00368] Num frames 3100... [2023-02-24 06:47:20,923][00368] Num frames 3200... [2023-02-24 06:47:21,101][00368] Num frames 3300... [2023-02-24 06:47:21,266][00368] Num frames 3400... [2023-02-24 06:47:21,428][00368] Num frames 3500... [2023-02-24 06:47:21,589][00368] Num frames 3600... [2023-02-24 06:47:21,758][00368] Num frames 3700... [2023-02-24 06:47:21,919][00368] Num frames 3800... [2023-02-24 06:47:22,094][00368] Num frames 3900... [2023-02-24 06:47:22,241][00368] Avg episode rewards: #0: 52.754, true rewards: #0: 19.755 [2023-02-24 06:47:22,244][00368] Avg episode reward: 52.754, avg true_objective: 19.755 [2023-02-24 06:47:22,340][00368] Num frames 4000... [2023-02-24 06:47:22,509][00368] Num frames 4100... [2023-02-24 06:47:22,679][00368] Num frames 4200... [2023-02-24 06:47:22,845][00368] Num frames 4300... [2023-02-24 06:47:23,010][00368] Num frames 4400... [2023-02-24 06:47:23,176][00368] Num frames 4500... [2023-02-24 06:47:23,345][00368] Num frames 4600... [2023-02-24 06:47:23,467][00368] Num frames 4700... [2023-02-24 06:47:23,584][00368] Num frames 4800... [2023-02-24 06:47:23,703][00368] Num frames 4900... [2023-02-24 06:47:23,832][00368] Num frames 5000... [2023-02-24 06:47:23,971][00368] Avg episode rewards: #0: 45.236, true rewards: #0: 16.903 [2023-02-24 06:47:23,973][00368] Avg episode reward: 45.236, avg true_objective: 16.903 [2023-02-24 06:47:24,013][00368] Num frames 5100... [2023-02-24 06:47:24,137][00368] Num frames 5200... [2023-02-24 06:47:24,263][00368] Num frames 5300... [2023-02-24 06:47:24,395][00368] Num frames 5400... [2023-02-24 06:47:24,520][00368] Num frames 5500... [2023-02-24 06:47:24,645][00368] Num frames 5600... [2023-02-24 06:47:24,774][00368] Num frames 5700... [2023-02-24 06:47:24,924][00368] Avg episode rewards: #0: 37.687, true rewards: #0: 14.438 [2023-02-24 06:47:24,927][00368] Avg episode reward: 37.687, avg true_objective: 14.438 [2023-02-24 06:47:24,966][00368] Num frames 5800... [2023-02-24 06:47:25,103][00368] Num frames 5900... [2023-02-24 06:47:25,235][00368] Num frames 6000... [2023-02-24 06:47:25,366][00368] Num frames 6100... [2023-02-24 06:47:25,504][00368] Num frames 6200... [2023-02-24 06:47:25,625][00368] Num frames 6300... [2023-02-24 06:47:25,747][00368] Num frames 6400... [2023-02-24 06:47:25,821][00368] Avg episode rewards: #0: 33.230, true rewards: #0: 12.830 [2023-02-24 06:47:25,823][00368] Avg episode reward: 33.230, avg true_objective: 12.830 [2023-02-24 06:47:25,936][00368] Num frames 6500... [2023-02-24 06:47:26,061][00368] Num frames 6600... [2023-02-24 06:47:26,174][00368] Num frames 6700... [2023-02-24 06:47:26,291][00368] Num frames 6800... [2023-02-24 06:47:26,415][00368] Num frames 6900... [2023-02-24 06:47:26,533][00368] Num frames 7000... [2023-02-24 06:47:26,651][00368] Num frames 7100... [2023-02-24 06:47:26,768][00368] Num frames 7200... [2023-02-24 06:47:26,883][00368] Num frames 7300... [2023-02-24 06:47:26,998][00368] Avg episode rewards: #0: 30.915, true rewards: #0: 12.248 [2023-02-24 06:47:26,999][00368] Avg episode reward: 30.915, avg true_objective: 12.248 [2023-02-24 06:47:27,074][00368] Num frames 7400... [2023-02-24 06:47:27,189][00368] Num frames 7500... [2023-02-24 06:47:27,307][00368] Num frames 7600... [2023-02-24 06:47:27,434][00368] Num frames 7700... [2023-02-24 06:47:27,551][00368] Num frames 7800... [2023-02-24 06:47:27,714][00368] Avg episode rewards: #0: 27.990, true rewards: #0: 11.276 [2023-02-24 06:47:27,716][00368] Avg episode reward: 27.990, avg true_objective: 11.276 [2023-02-24 06:47:27,729][00368] Num frames 7900... [2023-02-24 06:47:27,848][00368] Num frames 8000... [2023-02-24 06:47:27,969][00368] Num frames 8100... [2023-02-24 06:47:28,097][00368] Num frames 8200... [2023-02-24 06:47:28,237][00368] Num frames 8300... [2023-02-24 06:47:28,363][00368] Num frames 8400... [2023-02-24 06:47:28,493][00368] Num frames 8500... [2023-02-24 06:47:28,661][00368] Avg episode rewards: #0: 26.370, true rewards: #0: 10.745 [2023-02-24 06:47:28,664][00368] Avg episode reward: 26.370, avg true_objective: 10.745 [2023-02-24 06:47:28,674][00368] Num frames 8600... [2023-02-24 06:47:28,804][00368] Num frames 8700... [2023-02-24 06:47:28,936][00368] Num frames 8800... [2023-02-24 06:47:29,063][00368] Num frames 8900... [2023-02-24 06:47:29,184][00368] Num frames 9000... [2023-02-24 06:47:29,312][00368] Num frames 9100... [2023-02-24 06:47:29,437][00368] Num frames 9200... [2023-02-24 06:47:29,559][00368] Num frames 9300... [2023-02-24 06:47:29,671][00368] Num frames 9400... [2023-02-24 06:47:29,788][00368] Num frames 9500... [2023-02-24 06:47:29,898][00368] Num frames 9600... [2023-02-24 06:47:30,017][00368] Num frames 9700... [2023-02-24 06:47:30,135][00368] Num frames 9800... [2023-02-24 06:47:30,258][00368] Num frames 9900... [2023-02-24 06:47:30,379][00368] Num frames 10000... [2023-02-24 06:47:30,503][00368] Num frames 10100... [2023-02-24 06:47:30,620][00368] Num frames 10200... [2023-02-24 06:47:30,744][00368] Num frames 10300... [2023-02-24 06:47:30,861][00368] Num frames 10400... [2023-02-24 06:47:30,978][00368] Num frames 10500... [2023-02-24 06:47:31,090][00368] Num frames 10600... [2023-02-24 06:47:31,259][00368] Avg episode rewards: #0: 29.551, true rewards: #0: 11.884 [2023-02-24 06:47:31,261][00368] Avg episode reward: 29.551, avg true_objective: 11.884 [2023-02-24 06:47:31,273][00368] Num frames 10700... [2023-02-24 06:47:31,390][00368] Num frames 10800... [2023-02-24 06:47:31,509][00368] Num frames 10900... [2023-02-24 06:47:31,629][00368] Num frames 11000... [2023-02-24 06:47:31,752][00368] Num frames 11100... [2023-02-24 06:47:31,893][00368] Avg episode rewards: #0: 27.376, true rewards: #0: 11.176 [2023-02-24 06:47:31,896][00368] Avg episode reward: 27.376, avg true_objective: 11.176 [2023-02-24 06:48:48,293][00368] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 06:48:55,512][00368] The model has been pushed to https://huggingface.co/SatCat/rl_course_vizdoom_health_gathering_supreme [2023-02-24 06:53:15,327][00368] Environment doom_basic already registered, overwriting... [2023-02-24 06:53:15,330][00368] Environment doom_two_colors_easy already registered, overwriting... [2023-02-24 06:53:15,339][00368] Environment doom_two_colors_hard already registered, overwriting... [2023-02-24 06:53:15,340][00368] Environment doom_dm already registered, overwriting... [2023-02-24 06:53:15,344][00368] Environment doom_dwango5 already registered, overwriting... [2023-02-24 06:53:15,351][00368] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-24 06:53:15,360][00368] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-24 06:53:15,375][00368] Environment doom_my_way_home already registered, overwriting... [2023-02-24 06:53:15,381][00368] Environment doom_deadly_corridor already registered, overwriting... [2023-02-24 06:53:15,382][00368] Environment doom_defend_the_center already registered, overwriting... [2023-02-24 06:53:15,384][00368] Environment doom_defend_the_line already registered, overwriting... [2023-02-24 06:53:15,385][00368] Environment doom_health_gathering already registered, overwriting... [2023-02-24 06:53:15,395][00368] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-24 06:53:15,398][00368] Environment doom_battle already registered, overwriting... [2023-02-24 06:53:15,400][00368] Environment doom_battle2 already registered, overwriting... [2023-02-24 06:53:15,402][00368] Environment doom_duel_bots already registered, overwriting... [2023-02-24 06:53:15,404][00368] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-24 06:53:15,406][00368] Environment doom_duel already registered, overwriting... [2023-02-24 06:53:15,407][00368] Environment doom_deathmatch_full already registered, overwriting... [2023-02-24 06:53:15,408][00368] Environment doom_benchmark already registered, overwriting... [2023-02-24 06:53:15,410][00368] register_encoder_factory: [2023-02-24 06:53:15,440][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 06:53:15,441][00368] Overriding arg 'num_workers' with value 10 passed from command line [2023-02-24 06:53:15,444][00368] Overriding arg 'train_for_env_steps' with value 6000000 passed from command line [2023-02-24 06:53:15,453][00368] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-24 06:53:15,455][00368] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-24 06:53:15,458][00368] Weights and Biases integration disabled [2023-02-24 06:53:15,462][00368] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-24 06:53:17,640][00368] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=10 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=6000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=3700000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 3700000} git_hash=unknown git_repo_name=not a git repository [2023-02-24 06:53:17,645][00368] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 06:53:17,650][00368] Rollout worker 0 uses device cpu [2023-02-24 06:53:17,652][00368] Rollout worker 1 uses device cpu [2023-02-24 06:53:17,657][00368] Rollout worker 2 uses device cpu [2023-02-24 06:53:17,660][00368] Rollout worker 3 uses device cpu [2023-02-24 06:53:17,661][00368] Rollout worker 4 uses device cpu [2023-02-24 06:53:17,665][00368] Rollout worker 5 uses device cpu [2023-02-24 06:53:17,667][00368] Rollout worker 6 uses device cpu [2023-02-24 06:53:17,668][00368] Rollout worker 7 uses device cpu [2023-02-24 06:53:17,669][00368] Rollout worker 8 uses device cpu [2023-02-24 06:53:17,670][00368] Rollout worker 9 uses device cpu [2023-02-24 06:53:17,817][00368] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:53:17,819][00368] InferenceWorker_p0-w0: min num requests: 3 [2023-02-24 06:53:17,859][00368] Starting all processes... [2023-02-24 06:53:17,861][00368] Starting process learner_proc0 [2023-02-24 06:53:17,996][00368] Starting all processes... [2023-02-24 06:53:18,003][00368] Starting process inference_proc0-0 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc0 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc1 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc2 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc3 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc4 [2023-02-24 06:53:18,005][00368] Starting process rollout_proc5 [2023-02-24 06:53:18,006][00368] Starting process rollout_proc6 [2023-02-24 06:53:18,006][00368] Starting process rollout_proc7 [2023-02-24 06:53:18,006][00368] Starting process rollout_proc8 [2023-02-24 06:53:18,006][00368] Starting process rollout_proc9 [2023-02-24 06:53:29,743][21362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:53:29,748][21362] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 06:53:29,821][21362] Num visible devices: 1 [2023-02-24 06:53:29,859][21362] Starting seed is not provided [2023-02-24 06:53:29,860][21362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:53:29,861][21362] Initializing actor-critic model on device cuda:0 [2023-02-24 06:53:29,867][21362] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:53:29,868][21362] RunningMeanStd input shape: (1,) [2023-02-24 06:53:30,028][21362] ConvEncoder: input_channels=3 [2023-02-24 06:53:31,343][21362] Conv encoder output size: 512 [2023-02-24 06:53:31,354][21362] Policy head output size: 512 [2023-02-24 06:53:31,525][21362] Created Actor Critic model with architecture: [2023-02-24 06:53:31,541][21362] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 06:53:31,906][21378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:53:31,914][21378] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 06:53:31,989][21378] Num visible devices: 1 [2023-02-24 06:53:32,027][21379] Worker 0 uses CPU cores [0] [2023-02-24 06:53:32,144][21380] Worker 1 uses CPU cores [1] [2023-02-24 06:53:32,353][21381] Worker 3 uses CPU cores [1] [2023-02-24 06:53:32,753][21385] Worker 2 uses CPU cores [0] [2023-02-24 06:53:32,833][21386] Worker 4 uses CPU cores [0] [2023-02-24 06:53:33,009][21395] Worker 5 uses CPU cores [1] [2023-02-24 06:53:33,206][21393] Worker 6 uses CPU cores [0] [2023-02-24 06:53:33,285][21397] Worker 7 uses CPU cores [1] [2023-02-24 06:53:33,425][21403] Worker 9 uses CPU cores [1] [2023-02-24 06:53:33,534][21405] Worker 8 uses CPU cores [0] [2023-02-24 06:53:35,387][21362] Using optimizer [2023-02-24 06:53:35,388][21362] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... [2023-02-24 06:53:35,425][21362] Loading model from checkpoint [2023-02-24 06:53:35,431][21362] Loaded experiment state at self.train_step=905, self.env_steps=3706880 [2023-02-24 06:53:35,432][21362] Initialized policy 0 weights for model version 905 [2023-02-24 06:53:35,434][21362] LearnerWorker_p0 finished initialization! [2023-02-24 06:53:35,435][21362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 06:53:35,466][00368] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 3706880. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:53:35,592][21378] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 06:53:35,594][21378] RunningMeanStd input shape: (1,) [2023-02-24 06:53:35,607][21378] ConvEncoder: input_channels=3 [2023-02-24 06:53:35,707][21378] Conv encoder output size: 512 [2023-02-24 06:53:35,708][21378] Policy head output size: 512 [2023-02-24 06:53:37,810][00368] Heartbeat connected on Batcher_0 [2023-02-24 06:53:37,813][00368] Heartbeat connected on LearnerWorker_p0 [2023-02-24 06:53:37,826][00368] Heartbeat connected on RolloutWorker_w0 [2023-02-24 06:53:37,832][00368] Heartbeat connected on RolloutWorker_w1 [2023-02-24 06:53:37,835][00368] Heartbeat connected on RolloutWorker_w2 [2023-02-24 06:53:37,839][00368] Heartbeat connected on RolloutWorker_w3 [2023-02-24 06:53:37,843][00368] Heartbeat connected on RolloutWorker_w4 [2023-02-24 06:53:37,847][00368] Heartbeat connected on RolloutWorker_w5 [2023-02-24 06:53:37,852][00368] Heartbeat connected on RolloutWorker_w6 [2023-02-24 06:53:37,856][00368] Heartbeat connected on RolloutWorker_w7 [2023-02-24 06:53:37,860][00368] Heartbeat connected on RolloutWorker_w8 [2023-02-24 06:53:37,862][00368] Heartbeat connected on RolloutWorker_w9 [2023-02-24 06:53:38,138][00368] Inference worker 0-0 is ready! [2023-02-24 06:53:38,139][00368] All inference workers are ready! Signal rollout workers to start! [2023-02-24 06:53:38,145][00368] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 06:53:38,277][21393] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,280][21405] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,281][21385] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,284][21386] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,285][21379] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,327][21403] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,329][21397] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,331][21395] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,337][21380] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:38,344][21381] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 06:53:39,193][21403] Decorrelating experience for 0 frames... [2023-02-24 06:53:39,196][21380] Decorrelating experience for 0 frames... [2023-02-24 06:53:39,541][21403] Decorrelating experience for 32 frames... [2023-02-24 06:53:39,971][21403] Decorrelating experience for 64 frames... [2023-02-24 06:53:40,157][21386] Decorrelating experience for 0 frames... [2023-02-24 06:53:40,160][21385] Decorrelating experience for 0 frames... [2023-02-24 06:53:40,163][21405] Decorrelating experience for 0 frames... [2023-02-24 06:53:40,168][21379] Decorrelating experience for 0 frames... [2023-02-24 06:53:40,175][21393] Decorrelating experience for 0 frames... [2023-02-24 06:53:40,463][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3706880. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:53:41,546][21397] Decorrelating experience for 0 frames... [2023-02-24 06:53:41,657][21403] Decorrelating experience for 96 frames... [2023-02-24 06:53:41,851][21385] Decorrelating experience for 32 frames... [2023-02-24 06:53:41,859][21379] Decorrelating experience for 32 frames... [2023-02-24 06:53:41,865][21395] Decorrelating experience for 0 frames... [2023-02-24 06:53:42,006][21386] Decorrelating experience for 32 frames... [2023-02-24 06:53:44,702][21397] Decorrelating experience for 32 frames... [2023-02-24 06:53:44,846][21381] Decorrelating experience for 0 frames... [2023-02-24 06:53:44,952][21380] Decorrelating experience for 32 frames... [2023-02-24 06:53:44,992][21393] Decorrelating experience for 32 frames... [2023-02-24 06:53:45,472][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3706880. Throughput: 0: 1.2. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:53:45,627][21405] Decorrelating experience for 32 frames... [2023-02-24 06:53:45,876][21385] Decorrelating experience for 64 frames... [2023-02-24 06:53:45,966][21379] Decorrelating experience for 64 frames... [2023-02-24 06:53:47,676][21381] Decorrelating experience for 32 frames... [2023-02-24 06:53:47,933][21386] Decorrelating experience for 64 frames... [2023-02-24 06:53:47,995][21397] Decorrelating experience for 64 frames... [2023-02-24 06:53:48,196][21395] Decorrelating experience for 32 frames... [2023-02-24 06:53:49,368][21393] Decorrelating experience for 64 frames... [2023-02-24 06:53:49,699][21385] Decorrelating experience for 96 frames... [2023-02-24 06:53:49,758][21379] Decorrelating experience for 96 frames... [2023-02-24 06:53:49,819][21405] Decorrelating experience for 64 frames... [2023-02-24 06:53:50,347][21380] Decorrelating experience for 64 frames... [2023-02-24 06:53:50,462][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3706880. Throughput: 0: 41.7. Samples: 626. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:53:50,464][00368] Avg episode reward: [(0, '8.823')] [2023-02-24 06:53:50,494][21381] Decorrelating experience for 64 frames... [2023-02-24 06:53:50,738][21397] Decorrelating experience for 96 frames... [2023-02-24 06:53:51,343][21395] Decorrelating experience for 64 frames... [2023-02-24 06:53:54,147][21405] Decorrelating experience for 96 frames... [2023-02-24 06:53:54,246][21386] Decorrelating experience for 96 frames... [2023-02-24 06:53:54,287][21362] Signal inference workers to stop experience collection... [2023-02-24 06:53:54,331][21378] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 06:53:54,473][21393] Decorrelating experience for 96 frames... [2023-02-24 06:53:54,528][21380] Decorrelating experience for 96 frames... [2023-02-24 06:53:54,599][21381] Decorrelating experience for 96 frames... [2023-02-24 06:53:54,879][21395] Decorrelating experience for 96 frames... [2023-02-24 06:53:55,462][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3706880. Throughput: 0: 88.3. Samples: 1766. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 06:53:55,464][00368] Avg episode reward: [(0, '6.987')] [2023-02-24 06:53:57,096][21362] Signal inference workers to resume experience collection... [2023-02-24 06:53:57,099][21378] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 06:54:00,462][00368] Fps is (10 sec: 1228.8, 60 sec: 491.6, 300 sec: 491.6). Total num frames: 3719168. Throughput: 0: 140.0. Samples: 3500. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-24 06:54:00,469][00368] Avg episode reward: [(0, '6.828')] [2023-02-24 06:54:05,462][00368] Fps is (10 sec: 2867.2, 60 sec: 955.8, 300 sec: 955.8). Total num frames: 3735552. Throughput: 0: 248.0. Samples: 7440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:54:05,469][00368] Avg episode reward: [(0, '8.096')] [2023-02-24 06:54:10,462][00368] Fps is (10 sec: 2457.6, 60 sec: 1053.4, 300 sec: 1053.4). Total num frames: 3743744. Throughput: 0: 270.9. Samples: 9480. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:54:10,465][00368] Avg episode reward: [(0, '9.860')] [2023-02-24 06:54:10,500][21378] Updated weights for policy 0, policy_version 915 (0.0023) [2023-02-24 06:54:15,462][00368] Fps is (10 sec: 2457.6, 60 sec: 1331.3, 300 sec: 1331.3). Total num frames: 3760128. Throughput: 0: 346.1. Samples: 13844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:54:15,465][00368] Avg episode reward: [(0, '13.533')] [2023-02-24 06:54:20,462][00368] Fps is (10 sec: 4096.0, 60 sec: 1729.6, 300 sec: 1729.6). Total num frames: 3784704. Throughput: 0: 447.3. Samples: 20126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:54:20,465][00368] Avg episode reward: [(0, '15.689')] [2023-02-24 06:54:21,144][21378] Updated weights for policy 0, policy_version 925 (0.0021) [2023-02-24 06:54:25,463][00368] Fps is (10 sec: 4095.8, 60 sec: 1884.3, 300 sec: 1884.3). Total num frames: 3801088. Throughput: 0: 514.0. Samples: 23130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:54:25,470][00368] Avg episode reward: [(0, '16.697')] [2023-02-24 06:54:30,463][00368] Fps is (10 sec: 2867.1, 60 sec: 1936.4, 300 sec: 1936.4). Total num frames: 3813376. Throughput: 0: 605.2. Samples: 27240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:54:30,476][00368] Avg episode reward: [(0, '18.516')] [2023-02-24 06:54:35,416][21378] Updated weights for policy 0, policy_version 935 (0.0035) [2023-02-24 06:54:35,463][00368] Fps is (10 sec: 2867.2, 60 sec: 2048.1, 300 sec: 2048.1). Total num frames: 3829760. Throughput: 0: 683.8. Samples: 31396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:54:35,471][00368] Avg episode reward: [(0, '20.729')] [2023-02-24 06:54:40,463][00368] Fps is (10 sec: 3686.5, 60 sec: 2389.3, 300 sec: 2205.7). Total num frames: 3850240. Throughput: 0: 721.1. Samples: 34216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:54:40,465][00368] Avg episode reward: [(0, '21.766')] [2023-02-24 06:54:45,279][21378] Updated weights for policy 0, policy_version 945 (0.0013) [2023-02-24 06:54:45,462][00368] Fps is (10 sec: 4096.3, 60 sec: 2731.1, 300 sec: 2340.7). Total num frames: 3870720. Throughput: 0: 822.0. Samples: 40488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:54:45,465][00368] Avg episode reward: [(0, '22.185')] [2023-02-24 06:54:50,466][00368] Fps is (10 sec: 3275.8, 60 sec: 2935.3, 300 sec: 2348.4). Total num frames: 3883008. Throughput: 0: 843.9. Samples: 45418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:54:50,468][00368] Avg episode reward: [(0, '22.458')] [2023-02-24 06:54:55,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2406.5). Total num frames: 3899392. Throughput: 0: 844.6. Samples: 47486. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:54:55,466][00368] Avg episode reward: [(0, '22.853')] [2023-02-24 06:54:59,474][21378] Updated weights for policy 0, policy_version 955 (0.0017) [2023-02-24 06:55:00,463][00368] Fps is (10 sec: 3277.5, 60 sec: 3276.8, 300 sec: 2457.7). Total num frames: 3915776. Throughput: 0: 842.0. Samples: 51734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:00,466][00368] Avg episode reward: [(0, '23.076')] [2023-02-24 06:55:05,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2548.7). Total num frames: 3936256. Throughput: 0: 841.1. Samples: 57976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:55:05,465][00368] Avg episode reward: [(0, '24.516')] [2023-02-24 06:55:05,474][21362] Saving new best policy, reward=24.516! [2023-02-24 06:55:09,420][21378] Updated weights for policy 0, policy_version 965 (0.0016) [2023-02-24 06:55:10,462][00368] Fps is (10 sec: 3686.7, 60 sec: 3481.6, 300 sec: 2587.0). Total num frames: 3952640. Throughput: 0: 843.4. Samples: 61084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:55:10,469][00368] Avg episode reward: [(0, '24.310')] [2023-02-24 06:55:15,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2621.5). Total num frames: 3969024. Throughput: 0: 846.6. Samples: 65336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:15,468][00368] Avg episode reward: [(0, '24.457')] [2023-02-24 06:55:15,479][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth... [2023-02-24 06:55:15,828][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000812_3325952.pth [2023-02-24 06:55:20,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2613.7). Total num frames: 3981312. Throughput: 0: 847.2. Samples: 69520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:55:20,469][00368] Avg episode reward: [(0, '24.702')] [2023-02-24 06:55:20,474][21362] Saving new best policy, reward=24.702! [2023-02-24 06:55:23,310][21378] Updated weights for policy 0, policy_version 975 (0.0032) [2023-02-24 06:55:25,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2681.1). Total num frames: 4001792. Throughput: 0: 842.7. Samples: 72138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:25,464][00368] Avg episode reward: [(0, '24.353')] [2023-02-24 06:55:30,462][00368] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2742.6). Total num frames: 4022272. Throughput: 0: 841.9. Samples: 78374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:55:30,465][00368] Avg episode reward: [(0, '23.592')] [2023-02-24 06:55:34,257][21378] Updated weights for policy 0, policy_version 985 (0.0015) [2023-02-24 06:55:35,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 2730.7). Total num frames: 4034560. Throughput: 0: 836.3. Samples: 83050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:35,474][00368] Avg episode reward: [(0, '23.133')] [2023-02-24 06:55:40,465][00368] Fps is (10 sec: 2456.9, 60 sec: 3276.7, 300 sec: 2719.8). Total num frames: 4046848. Throughput: 0: 826.8. Samples: 84694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:40,470][00368] Avg episode reward: [(0, '23.594')] [2023-02-24 06:55:45,462][00368] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 2678.2). Total num frames: 4055040. Throughput: 0: 801.7. Samples: 87808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:55:45,467][00368] Avg episode reward: [(0, '23.850')] [2023-02-24 06:55:50,462][00368] Fps is (10 sec: 2048.6, 60 sec: 3072.2, 300 sec: 2670.1). Total num frames: 4067328. Throughput: 0: 740.1. Samples: 91282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:55:50,465][00368] Avg episode reward: [(0, '23.429')] [2023-02-24 06:55:52,032][21378] Updated weights for policy 0, policy_version 995 (0.0031) [2023-02-24 06:55:55,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2721.0). Total num frames: 4087808. Throughput: 0: 731.4. Samples: 93996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:55:55,465][00368] Avg episode reward: [(0, '24.679')] [2023-02-24 06:56:00,463][00368] Fps is (10 sec: 4095.9, 60 sec: 3208.6, 300 sec: 2768.4). Total num frames: 4108288. Throughput: 0: 775.2. Samples: 100222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:56:00,468][00368] Avg episode reward: [(0, '23.328')] [2023-02-24 06:56:03,210][21378] Updated weights for policy 0, policy_version 1005 (0.0014) [2023-02-24 06:56:05,463][00368] Fps is (10 sec: 3276.6, 60 sec: 3072.0, 300 sec: 2758.0). Total num frames: 4120576. Throughput: 0: 770.8. Samples: 104208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:56:05,466][00368] Avg episode reward: [(0, '24.140')] [2023-02-24 06:56:10,466][00368] Fps is (10 sec: 2456.7, 60 sec: 3003.5, 300 sec: 2748.3). Total num frames: 4132864. Throughput: 0: 758.7. Samples: 106284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:56:10,469][00368] Avg episode reward: [(0, '24.991')] [2023-02-24 06:56:10,583][21362] Saving new best policy, reward=24.991! [2023-02-24 06:56:15,462][00368] Fps is (10 sec: 3277.0, 60 sec: 3072.0, 300 sec: 2790.5). Total num frames: 4153344. Throughput: 0: 732.0. Samples: 111312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:56:15,472][00368] Avg episode reward: [(0, '26.247')] [2023-02-24 06:56:15,488][21362] Saving new best policy, reward=26.247! [2023-02-24 06:56:16,067][21378] Updated weights for policy 0, policy_version 1015 (0.0021) [2023-02-24 06:56:20,462][00368] Fps is (10 sec: 4097.6, 60 sec: 3208.5, 300 sec: 2830.0). Total num frames: 4173824. Throughput: 0: 766.5. Samples: 117544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:56:20,464][00368] Avg episode reward: [(0, '25.548')] [2023-02-24 06:56:25,463][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2819.1). Total num frames: 4186112. Throughput: 0: 784.7. Samples: 120002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:56:25,465][00368] Avg episode reward: [(0, '23.646')] [2023-02-24 06:56:27,780][21378] Updated weights for policy 0, policy_version 1025 (0.0015) [2023-02-24 06:56:30,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2832.1). Total num frames: 4202496. Throughput: 0: 808.8. Samples: 124204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 06:56:30,467][00368] Avg episode reward: [(0, '24.173')] [2023-02-24 06:56:35,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2844.5). Total num frames: 4218880. Throughput: 0: 827.7. Samples: 128530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:56:35,464][00368] Avg episode reward: [(0, '23.096')] [2023-02-24 06:56:39,880][21378] Updated weights for policy 0, policy_version 1035 (0.0018) [2023-02-24 06:56:40,463][00368] Fps is (10 sec: 3686.1, 60 sec: 3208.6, 300 sec: 2878.3). Total num frames: 4239360. Throughput: 0: 837.2. Samples: 131670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:56:40,471][00368] Avg episode reward: [(0, '21.875')] [2023-02-24 06:56:45,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2888.8). Total num frames: 4255744. Throughput: 0: 837.8. Samples: 137922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:56:45,470][00368] Avg episode reward: [(0, '22.434')] [2023-02-24 06:56:50,467][00368] Fps is (10 sec: 3275.6, 60 sec: 3413.1, 300 sec: 2898.7). Total num frames: 4272128. Throughput: 0: 841.3. Samples: 142070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:56:50,478][00368] Avg episode reward: [(0, '23.431')] [2023-02-24 06:56:53,204][21378] Updated weights for policy 0, policy_version 1045 (0.0015) [2023-02-24 06:56:55,463][00368] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 2887.7). Total num frames: 4284416. Throughput: 0: 840.8. Samples: 144118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:56:55,472][00368] Avg episode reward: [(0, '23.760')] [2023-02-24 06:57:00,462][00368] Fps is (10 sec: 3278.3, 60 sec: 3276.8, 300 sec: 2917.2). Total num frames: 4304896. Throughput: 0: 840.8. Samples: 149146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:57:00,465][00368] Avg episode reward: [(0, '25.021')] [2023-02-24 06:57:04,041][21378] Updated weights for policy 0, policy_version 1055 (0.0016) [2023-02-24 06:57:05,462][00368] Fps is (10 sec: 4096.4, 60 sec: 3413.4, 300 sec: 2945.3). Total num frames: 4325376. Throughput: 0: 842.2. Samples: 155442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:57:05,472][00368] Avg episode reward: [(0, '24.714')] [2023-02-24 06:57:10,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 2953.0). Total num frames: 4341760. Throughput: 0: 841.2. Samples: 157858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:57:10,465][00368] Avg episode reward: [(0, '23.299')] [2023-02-24 06:57:15,464][00368] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 2941.7). Total num frames: 4354048. Throughput: 0: 840.8. Samples: 162040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:57:15,469][00368] Avg episode reward: [(0, '23.986')] [2023-02-24 06:57:15,489][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001063_4354048.pth... [2023-02-24 06:57:15,720][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth [2023-02-24 06:57:17,776][21378] Updated weights for policy 0, policy_version 1065 (0.0021) [2023-02-24 06:57:20,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2949.2). Total num frames: 4370432. Throughput: 0: 848.6. Samples: 166718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:57:20,465][00368] Avg episode reward: [(0, '23.000')] [2023-02-24 06:57:25,462][00368] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 2974.1). Total num frames: 4390912. Throughput: 0: 848.2. Samples: 169838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:57:25,472][00368] Avg episode reward: [(0, '21.967')] [2023-02-24 06:57:27,912][21378] Updated weights for policy 0, policy_version 1075 (0.0025) [2023-02-24 06:57:30,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2980.5). Total num frames: 4407296. Throughput: 0: 841.7. Samples: 175800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:57:30,466][00368] Avg episode reward: [(0, '21.823')] [2023-02-24 06:57:35,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2986.7). Total num frames: 4423680. Throughput: 0: 840.4. Samples: 179884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:57:35,466][00368] Avg episode reward: [(0, '24.303')] [2023-02-24 06:57:40,463][00368] Fps is (10 sec: 2867.0, 60 sec: 3276.8, 300 sec: 2975.9). Total num frames: 4435968. Throughput: 0: 840.6. Samples: 181946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:57:40,467][00368] Avg episode reward: [(0, '24.334')] [2023-02-24 06:57:42,190][21378] Updated weights for policy 0, policy_version 1085 (0.0020) [2023-02-24 06:57:45,463][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2998.3). Total num frames: 4456448. Throughput: 0: 844.8. Samples: 187162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:57:45,470][00368] Avg episode reward: [(0, '23.596')] [2023-02-24 06:57:50,462][00368] Fps is (10 sec: 4096.2, 60 sec: 3413.6, 300 sec: 3019.8). Total num frames: 4476928. Throughput: 0: 842.0. Samples: 193334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:57:50,464][00368] Avg episode reward: [(0, '24.089')] [2023-02-24 06:57:52,381][21378] Updated weights for policy 0, policy_version 1095 (0.0023) [2023-02-24 06:57:55,462][00368] Fps is (10 sec: 3686.5, 60 sec: 3481.7, 300 sec: 3024.8). Total num frames: 4493312. Throughput: 0: 841.5. Samples: 195726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:57:55,476][00368] Avg episode reward: [(0, '24.706')] [2023-02-24 06:58:00,465][00368] Fps is (10 sec: 2866.3, 60 sec: 3344.9, 300 sec: 3014.0). Total num frames: 4505600. Throughput: 0: 841.5. Samples: 199910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:58:00,469][00368] Avg episode reward: [(0, '26.212')] [2023-02-24 06:58:05,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3018.9). Total num frames: 4521984. Throughput: 0: 836.2. Samples: 204348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:58:05,466][00368] Avg episode reward: [(0, '25.528')] [2023-02-24 06:58:06,336][21378] Updated weights for policy 0, policy_version 1105 (0.0037) [2023-02-24 06:58:10,462][00368] Fps is (10 sec: 3687.5, 60 sec: 3345.1, 300 sec: 3038.5). Total num frames: 4542464. Throughput: 0: 834.8. Samples: 207404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:58:10,465][00368] Avg episode reward: [(0, '24.300')] [2023-02-24 06:58:15,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3042.8). Total num frames: 4558848. Throughput: 0: 829.7. Samples: 213136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:58:15,468][00368] Avg episode reward: [(0, '25.465')] [2023-02-24 06:58:17,973][21378] Updated weights for policy 0, policy_version 1115 (0.0013) [2023-02-24 06:58:20,463][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3032.5). Total num frames: 4571136. Throughput: 0: 832.1. Samples: 217328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:58:20,472][00368] Avg episode reward: [(0, '26.367')] [2023-02-24 06:58:20,482][21362] Saving new best policy, reward=26.367! [2023-02-24 06:58:25,465][00368] Fps is (10 sec: 2456.9, 60 sec: 3208.4, 300 sec: 3022.6). Total num frames: 4583424. Throughput: 0: 829.2. Samples: 219260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:58:25,475][00368] Avg episode reward: [(0, '26.856')] [2023-02-24 06:58:25,520][21362] Saving new best policy, reward=26.856! [2023-02-24 06:58:30,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 4603904. Throughput: 0: 824.8. Samples: 224276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:58:30,465][00368] Avg episode reward: [(0, '26.939')] [2023-02-24 06:58:30,469][21362] Saving new best policy, reward=26.939! [2023-02-24 06:58:30,783][21378] Updated weights for policy 0, policy_version 1125 (0.0017) [2023-02-24 06:58:35,463][00368] Fps is (10 sec: 4097.2, 60 sec: 3345.1, 300 sec: 3110.2). Total num frames: 4624384. Throughput: 0: 825.5. Samples: 230480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:58:35,465][00368] Avg episode reward: [(0, '26.361')] [2023-02-24 06:58:40,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3165.8). Total num frames: 4640768. Throughput: 0: 828.8. Samples: 233020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:58:40,464][00368] Avg episode reward: [(0, '26.467')] [2023-02-24 06:58:42,617][21378] Updated weights for policy 0, policy_version 1135 (0.0030) [2023-02-24 06:58:45,471][00368] Fps is (10 sec: 2864.8, 60 sec: 3276.3, 300 sec: 3207.3). Total num frames: 4653056. Throughput: 0: 828.3. Samples: 237188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:58:45,482][00368] Avg episode reward: [(0, '27.565')] [2023-02-24 06:58:45,497][21362] Saving new best policy, reward=27.565! [2023-02-24 06:58:50,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 4669440. Throughput: 0: 824.9. Samples: 241468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:58:50,465][00368] Avg episode reward: [(0, '27.102')] [2023-02-24 06:58:55,114][21378] Updated weights for policy 0, policy_version 1145 (0.0016) [2023-02-24 06:58:55,462][00368] Fps is (10 sec: 3689.5, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 4689920. Throughput: 0: 824.6. Samples: 244512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:58:55,468][00368] Avg episode reward: [(0, '25.519')] [2023-02-24 06:59:00,463][00368] Fps is (10 sec: 3686.0, 60 sec: 3345.2, 300 sec: 3290.7). Total num frames: 4706304. Throughput: 0: 829.6. Samples: 250468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:00,466][00368] Avg episode reward: [(0, '24.368')] [2023-02-24 06:59:05,463][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3304.6). Total num frames: 4718592. Throughput: 0: 809.7. Samples: 253766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:59:05,469][00368] Avg episode reward: [(0, '24.432')] [2023-02-24 06:59:10,463][00368] Fps is (10 sec: 2048.1, 60 sec: 3072.0, 300 sec: 3276.8). Total num frames: 4726784. Throughput: 0: 802.2. Samples: 255358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:59:10,466][00368] Avg episode reward: [(0, '24.002')] [2023-02-24 06:59:10,795][21378] Updated weights for policy 0, policy_version 1155 (0.0019) [2023-02-24 06:59:15,463][00368] Fps is (10 sec: 2048.0, 60 sec: 3003.7, 300 sec: 3235.1). Total num frames: 4739072. Throughput: 0: 758.1. Samples: 258392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:15,469][00368] Avg episode reward: [(0, '23.457')] [2023-02-24 06:59:15,485][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001157_4739072.pth... [2023-02-24 06:59:15,728][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000969_3969024.pth [2023-02-24 06:59:20,462][00368] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 3235.2). Total num frames: 4755456. Throughput: 0: 716.4. Samples: 262718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 06:59:20,472][00368] Avg episode reward: [(0, '22.234')] [2023-02-24 06:59:24,124][21378] Updated weights for policy 0, policy_version 1165 (0.0019) [2023-02-24 06:59:25,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 3262.9). Total num frames: 4775936. Throughput: 0: 730.5. Samples: 265892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:25,465][00368] Avg episode reward: [(0, '21.647')] [2023-02-24 06:59:30,463][00368] Fps is (10 sec: 3686.0, 60 sec: 3140.2, 300 sec: 3262.9). Total num frames: 4792320. Throughput: 0: 770.9. Samples: 271872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:59:30,468][00368] Avg episode reward: [(0, '22.571')] [2023-02-24 06:59:35,463][00368] Fps is (10 sec: 2866.9, 60 sec: 3003.7, 300 sec: 3235.1). Total num frames: 4804608. Throughput: 0: 762.2. Samples: 275766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:35,468][00368] Avg episode reward: [(0, '23.035')] [2023-02-24 06:59:37,305][21378] Updated weights for policy 0, policy_version 1175 (0.0023) [2023-02-24 06:59:40,463][00368] Fps is (10 sec: 2867.4, 60 sec: 3003.7, 300 sec: 3221.3). Total num frames: 4820992. Throughput: 0: 740.0. Samples: 277814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:40,476][00368] Avg episode reward: [(0, '22.937')] [2023-02-24 06:59:45,463][00368] Fps is (10 sec: 3686.5, 60 sec: 3140.7, 300 sec: 3249.1). Total num frames: 4841472. Throughput: 0: 728.5. Samples: 283252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 06:59:45,466][00368] Avg episode reward: [(0, '23.908')] [2023-02-24 06:59:48,086][21378] Updated weights for policy 0, policy_version 1185 (0.0013) [2023-02-24 06:59:50,462][00368] Fps is (10 sec: 4096.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 4861952. Throughput: 0: 793.1. Samples: 289456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 06:59:50,468][00368] Avg episode reward: [(0, '23.366')] [2023-02-24 06:59:55,464][00368] Fps is (10 sec: 3276.6, 60 sec: 3071.9, 300 sec: 3249.0). Total num frames: 4874240. Throughput: 0: 808.0. Samples: 291718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 06:59:55,472][00368] Avg episode reward: [(0, '24.419')] [2023-02-24 07:00:00,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.1, 300 sec: 3235.1). Total num frames: 4890624. Throughput: 0: 832.8. Samples: 295868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:00,469][00368] Avg episode reward: [(0, '24.895')] [2023-02-24 07:00:01,708][21378] Updated weights for policy 0, policy_version 1195 (0.0014) [2023-02-24 07:00:05,462][00368] Fps is (10 sec: 3277.3, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 4907008. Throughput: 0: 842.0. Samples: 300606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:05,469][00368] Avg episode reward: [(0, '24.382')] [2023-02-24 07:00:10,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 4927488. Throughput: 0: 841.4. Samples: 303754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:00:10,469][00368] Avg episode reward: [(0, '25.833')] [2023-02-24 07:00:12,085][21378] Updated weights for policy 0, policy_version 1205 (0.0045) [2023-02-24 07:00:15,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 4943872. Throughput: 0: 843.5. Samples: 309828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:00:15,467][00368] Avg episode reward: [(0, '25.557')] [2023-02-24 07:00:20,463][00368] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3249.0). Total num frames: 4960256. Throughput: 0: 848.5. Samples: 313946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:20,466][00368] Avg episode reward: [(0, '25.238')] [2023-02-24 07:00:25,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 4972544. Throughput: 0: 845.1. Samples: 315842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:25,466][00368] Avg episode reward: [(0, '26.139')] [2023-02-24 07:00:26,521][21378] Updated weights for policy 0, policy_version 1215 (0.0016) [2023-02-24 07:00:30,462][00368] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 4993024. Throughput: 0: 837.7. Samples: 320946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:00:30,469][00368] Avg episode reward: [(0, '26.041')] [2023-02-24 07:00:35,462][00368] Fps is (10 sec: 4096.0, 60 sec: 3481.7, 300 sec: 3276.8). Total num frames: 5013504. Throughput: 0: 838.9. Samples: 327206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:35,465][00368] Avg episode reward: [(0, '27.038')] [2023-02-24 07:00:36,787][21378] Updated weights for policy 0, policy_version 1225 (0.0022) [2023-02-24 07:00:40,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3290.7). Total num frames: 5025792. Throughput: 0: 840.0. Samples: 329518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:00:40,475][00368] Avg episode reward: [(0, '27.777')] [2023-02-24 07:00:40,479][21362] Saving new best policy, reward=27.777! [2023-02-24 07:00:45,462][00368] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 5038080. Throughput: 0: 836.6. Samples: 333514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:00:45,473][00368] Avg episode reward: [(0, '26.869')] [2023-02-24 07:00:50,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 5054464. Throughput: 0: 826.5. Samples: 337800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:00:50,466][00368] Avg episode reward: [(0, '26.722')] [2023-02-24 07:00:50,899][21378] Updated weights for policy 0, policy_version 1235 (0.0025) [2023-02-24 07:00:55,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 5074944. Throughput: 0: 822.8. Samples: 340782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-24 07:00:55,466][00368] Avg episode reward: [(0, '26.575')] [2023-02-24 07:01:00,468][00368] Fps is (10 sec: 3684.1, 60 sec: 3344.7, 300 sec: 3290.6). Total num frames: 5091328. Throughput: 0: 822.9. Samples: 346864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:01:00,473][00368] Avg episode reward: [(0, '25.390')] [2023-02-24 07:01:02,007][21378] Updated weights for policy 0, policy_version 1245 (0.0027) [2023-02-24 07:01:05,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3304.6). Total num frames: 5107712. Throughput: 0: 820.0. Samples: 350844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:01:05,471][00368] Avg episode reward: [(0, '24.685')] [2023-02-24 07:01:10,463][00368] Fps is (10 sec: 2868.7, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 5120000. Throughput: 0: 820.2. Samples: 352752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:10,469][00368] Avg episode reward: [(0, '24.209')] [2023-02-24 07:01:15,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3262.9). Total num frames: 5136384. Throughput: 0: 809.7. Samples: 357384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:15,464][00368] Avg episode reward: [(0, '23.866')] [2023-02-24 07:01:15,477][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001254_5136384.pth... [2023-02-24 07:01:15,681][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001063_4354048.pth [2023-02-24 07:01:15,908][21378] Updated weights for policy 0, policy_version 1255 (0.0018) [2023-02-24 07:01:20,462][00368] Fps is (10 sec: 3686.8, 60 sec: 3276.8, 300 sec: 3290.7). Total num frames: 5156864. Throughput: 0: 803.0. Samples: 363342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:20,465][00368] Avg episode reward: [(0, '23.297')] [2023-02-24 07:01:25,469][00368] Fps is (10 sec: 3684.1, 60 sec: 3344.7, 300 sec: 3290.6). Total num frames: 5173248. Throughput: 0: 808.9. Samples: 365924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:25,475][00368] Avg episode reward: [(0, '23.443')] [2023-02-24 07:01:28,366][21378] Updated weights for policy 0, policy_version 1265 (0.0014) [2023-02-24 07:01:30,463][00368] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3276.8). Total num frames: 5185536. Throughput: 0: 810.7. Samples: 369996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:01:30,469][00368] Avg episode reward: [(0, '23.008')] [2023-02-24 07:01:35,462][00368] Fps is (10 sec: 2459.2, 60 sec: 3072.0, 300 sec: 3249.0). Total num frames: 5197824. Throughput: 0: 813.9. Samples: 374424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:01:35,472][00368] Avg episode reward: [(0, '24.486')] [2023-02-24 07:01:40,403][21378] Updated weights for policy 0, policy_version 1275 (0.0023) [2023-02-24 07:01:40,462][00368] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 5222400. Throughput: 0: 816.8. Samples: 377540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:40,464][00368] Avg episode reward: [(0, '23.154')] [2023-02-24 07:01:45,470][00368] Fps is (10 sec: 4502.4, 60 sec: 3412.9, 300 sec: 3290.7). Total num frames: 5242880. Throughput: 0: 815.9. Samples: 383582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:01:45,473][00368] Avg episode reward: [(0, '24.776')] [2023-02-24 07:01:50,463][00368] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3290.7). Total num frames: 5255168. Throughput: 0: 825.5. Samples: 387992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:01:50,470][00368] Avg episode reward: [(0, '25.119')] [2023-02-24 07:01:52,890][21378] Updated weights for policy 0, policy_version 1285 (0.0015) [2023-02-24 07:01:55,465][00368] Fps is (10 sec: 2458.7, 60 sec: 3208.4, 300 sec: 3262.9). Total num frames: 5267456. Throughput: 0: 825.4. Samples: 389896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:01:55,468][00368] Avg episode reward: [(0, '24.856')] [2023-02-24 07:02:00,462][00368] Fps is (10 sec: 3277.0, 60 sec: 3277.1, 300 sec: 3262.9). Total num frames: 5287936. Throughput: 0: 828.7. Samples: 394674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:02:00,465][00368] Avg episode reward: [(0, '24.434')] [2023-02-24 07:02:04,602][21378] Updated weights for policy 0, policy_version 1295 (0.0034) [2023-02-24 07:02:05,462][00368] Fps is (10 sec: 3687.3, 60 sec: 3276.8, 300 sec: 3262.9). Total num frames: 5304320. Throughput: 0: 827.6. Samples: 400586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:02:05,468][00368] Avg episode reward: [(0, '24.831')] [2023-02-24 07:02:10,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 5320704. Throughput: 0: 834.7. Samples: 403482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:02:10,472][00368] Avg episode reward: [(0, '26.219')] [2023-02-24 07:02:15,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 5337088. Throughput: 0: 833.5. Samples: 407504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:02:15,469][00368] Avg episode reward: [(0, '26.680')] [2023-02-24 07:02:18,431][21378] Updated weights for policy 0, policy_version 1305 (0.0038) [2023-02-24 07:02:20,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 5349376. Throughput: 0: 823.7. Samples: 411492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:02:20,466][00368] Avg episode reward: [(0, '25.961')] [2023-02-24 07:02:25,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3277.1, 300 sec: 3262.9). Total num frames: 5369856. Throughput: 0: 819.5. Samples: 414416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:02:25,469][00368] Avg episode reward: [(0, '26.150')] [2023-02-24 07:02:30,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 5382144. Throughput: 0: 802.2. Samples: 419676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:02:30,469][00368] Avg episode reward: [(0, '26.144')] [2023-02-24 07:02:30,662][21378] Updated weights for policy 0, policy_version 1315 (0.0027) [2023-02-24 07:02:35,462][00368] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 5394432. Throughput: 0: 777.7. Samples: 422988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:02:35,471][00368] Avg episode reward: [(0, '26.946')] [2023-02-24 07:02:40,462][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3221.3). Total num frames: 5406720. Throughput: 0: 769.4. Samples: 424516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:02:40,471][00368] Avg episode reward: [(0, '26.817')] [2023-02-24 07:02:45,463][00368] Fps is (10 sec: 2048.0, 60 sec: 2867.5, 300 sec: 3179.6). Total num frames: 5414912. Throughput: 0: 732.9. Samples: 427656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:02:45,472][00368] Avg episode reward: [(0, '26.696')] [2023-02-24 07:02:48,175][21378] Updated weights for policy 0, policy_version 1325 (0.0023) [2023-02-24 07:02:50,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 3193.5). Total num frames: 5435392. Throughput: 0: 715.6. Samples: 432790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:02:50,470][00368] Avg episode reward: [(0, '25.642')] [2023-02-24 07:02:55,462][00368] Fps is (10 sec: 4096.1, 60 sec: 3140.4, 300 sec: 3221.3). Total num frames: 5455872. Throughput: 0: 719.2. Samples: 435846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:02:55,470][00368] Avg episode reward: [(0, '26.335')] [2023-02-24 07:02:58,518][21378] Updated weights for policy 0, policy_version 1335 (0.0013) [2023-02-24 07:03:00,464][00368] Fps is (10 sec: 3685.7, 60 sec: 3071.9, 300 sec: 3221.2). Total num frames: 5472256. Throughput: 0: 749.7. Samples: 441242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:03:00,467][00368] Avg episode reward: [(0, '25.921')] [2023-02-24 07:03:05,466][00368] Fps is (10 sec: 2866.0, 60 sec: 3003.5, 300 sec: 3193.4). Total num frames: 5484544. Throughput: 0: 748.8. Samples: 445190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:03:05,469][00368] Avg episode reward: [(0, '25.644')] [2023-02-24 07:03:10,462][00368] Fps is (10 sec: 2458.1, 60 sec: 2935.5, 300 sec: 3179.6). Total num frames: 5496832. Throughput: 0: 728.0. Samples: 447174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:03:10,472][00368] Avg episode reward: [(0, '26.834')] [2023-02-24 07:03:12,869][21378] Updated weights for policy 0, policy_version 1345 (0.0022) [2023-02-24 07:03:15,462][00368] Fps is (10 sec: 3278.1, 60 sec: 3003.7, 300 sec: 3207.4). Total num frames: 5517312. Throughput: 0: 731.5. Samples: 452592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:03:15,466][00368] Avg episode reward: [(0, '25.655')] [2023-02-24 07:03:15,474][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001347_5517312.pth... [2023-02-24 07:03:15,642][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001157_4739072.pth [2023-02-24 07:03:20,463][00368] Fps is (10 sec: 4095.9, 60 sec: 3140.3, 300 sec: 3235.2). Total num frames: 5537792. Throughput: 0: 793.1. Samples: 458676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:03:20,466][00368] Avg episode reward: [(0, '24.644')] [2023-02-24 07:03:24,174][21378] Updated weights for policy 0, policy_version 1355 (0.0016) [2023-02-24 07:03:25,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3207.4). Total num frames: 5550080. Throughput: 0: 804.6. Samples: 460724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:03:25,466][00368] Avg episode reward: [(0, '25.596')] [2023-02-24 07:03:30,462][00368] Fps is (10 sec: 2867.3, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 5566464. Throughput: 0: 826.3. Samples: 464840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:03:30,474][00368] Avg episode reward: [(0, '25.842')] [2023-02-24 07:03:35,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 5582848. Throughput: 0: 823.5. Samples: 469846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:03:35,467][00368] Avg episode reward: [(0, '25.866')] [2023-02-24 07:03:36,851][21378] Updated weights for policy 0, policy_version 1365 (0.0024) [2023-02-24 07:03:40,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.4). Total num frames: 5603328. Throughput: 0: 824.7. Samples: 472958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:03:40,466][00368] Avg episode reward: [(0, '26.473')] [2023-02-24 07:03:45,465][00368] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3221.2). Total num frames: 5619712. Throughput: 0: 831.0. Samples: 478636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:03:45,469][00368] Avg episode reward: [(0, '24.707')] [2023-02-24 07:03:48,925][21378] Updated weights for policy 0, policy_version 1375 (0.0023) [2023-02-24 07:03:50,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 5632000. Throughput: 0: 832.0. Samples: 482626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:03:50,466][00368] Avg episode reward: [(0, '24.194')] [2023-02-24 07:03:55,462][00368] Fps is (10 sec: 2867.8, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 5648384. Throughput: 0: 832.4. Samples: 484634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:03:55,468][00368] Avg episode reward: [(0, '23.083')] [2023-02-24 07:04:00,464][00368] Fps is (10 sec: 3685.9, 60 sec: 3276.8, 300 sec: 3221.2). Total num frames: 5668864. Throughput: 0: 831.5. Samples: 490012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:04:00,467][00368] Avg episode reward: [(0, '21.353')] [2023-02-24 07:04:01,240][21378] Updated weights for policy 0, policy_version 1385 (0.0024) [2023-02-24 07:04:05,462][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.6, 300 sec: 3262.9). Total num frames: 5689344. Throughput: 0: 830.3. Samples: 496038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:04:05,470][00368] Avg episode reward: [(0, '20.899')] [2023-02-24 07:04:10,463][00368] Fps is (10 sec: 3277.0, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 5701632. Throughput: 0: 834.6. Samples: 498280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:04:10,475][00368] Avg episode reward: [(0, '20.699')] [2023-02-24 07:04:14,344][21378] Updated weights for policy 0, policy_version 1395 (0.0016) [2023-02-24 07:04:15,470][00368] Fps is (10 sec: 2455.6, 60 sec: 3276.4, 300 sec: 3248.9). Total num frames: 5713920. Throughput: 0: 833.4. Samples: 502350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:04:15,473][00368] Avg episode reward: [(0, '21.165')] [2023-02-24 07:04:20,462][00368] Fps is (10 sec: 2867.5, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 5730304. Throughput: 0: 828.8. Samples: 507144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:04:20,466][00368] Avg episode reward: [(0, '22.606')] [2023-02-24 07:04:25,462][00368] Fps is (10 sec: 3689.3, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 5750784. Throughput: 0: 825.9. Samples: 510124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:04:25,468][00368] Avg episode reward: [(0, '24.095')] [2023-02-24 07:04:25,766][21378] Updated weights for policy 0, policy_version 1405 (0.0023) [2023-02-24 07:04:30,471][00368] Fps is (10 sec: 4092.6, 60 sec: 3412.9, 300 sec: 3276.7). Total num frames: 5771264. Throughput: 0: 827.9. Samples: 515896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:04:30,476][00368] Avg episode reward: [(0, '25.480')] [2023-02-24 07:04:35,467][00368] Fps is (10 sec: 3275.4, 60 sec: 3344.8, 300 sec: 3262.9). Total num frames: 5783552. Throughput: 0: 831.3. Samples: 520040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:04:35,469][00368] Avg episode reward: [(0, '25.272')] [2023-02-24 07:04:39,442][21378] Updated weights for policy 0, policy_version 1415 (0.0013) [2023-02-24 07:04:40,469][00368] Fps is (10 sec: 2458.0, 60 sec: 3208.2, 300 sec: 3235.1). Total num frames: 5795840. Throughput: 0: 833.8. Samples: 522160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:04:40,472][00368] Avg episode reward: [(0, '26.075')] [2023-02-24 07:04:45,462][00368] Fps is (10 sec: 3278.2, 60 sec: 3276.9, 300 sec: 3235.1). Total num frames: 5816320. Throughput: 0: 832.9. Samples: 527492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:04:45,466][00368] Avg episode reward: [(0, '25.785')] [2023-02-24 07:04:50,000][21378] Updated weights for policy 0, policy_version 1425 (0.0017) [2023-02-24 07:04:50,462][00368] Fps is (10 sec: 4098.7, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 5836800. Throughput: 0: 828.2. Samples: 533308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:04:50,464][00368] Avg episode reward: [(0, '25.845')] [2023-02-24 07:04:55,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 5853184. Throughput: 0: 831.2. Samples: 535684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:04:55,469][00368] Avg episode reward: [(0, '26.947')] [2023-02-24 07:05:00,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.9, 300 sec: 3249.0). Total num frames: 5865472. Throughput: 0: 833.1. Samples: 539832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:05:00,474][00368] Avg episode reward: [(0, '26.711')] [2023-02-24 07:05:04,158][21378] Updated weights for policy 0, policy_version 1435 (0.0032) [2023-02-24 07:05:05,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 5881856. Throughput: 0: 829.9. Samples: 544488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:05:05,471][00368] Avg episode reward: [(0, '25.147')] [2023-02-24 07:05:10,462][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 5902336. Throughput: 0: 832.3. Samples: 547576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:05:10,465][00368] Avg episode reward: [(0, '26.568')] [2023-02-24 07:05:14,296][21378] Updated weights for policy 0, policy_version 1445 (0.0024) [2023-02-24 07:05:15,463][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.8, 300 sec: 3249.0). Total num frames: 5918720. Throughput: 0: 837.6. Samples: 553580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:05:15,469][00368] Avg episode reward: [(0, '26.604')] [2023-02-24 07:05:15,487][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001445_5918720.pth... [2023-02-24 07:05:15,752][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001254_5136384.pth [2023-02-24 07:05:20,462][00368] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3262.9). Total num frames: 5935104. Throughput: 0: 834.5. Samples: 557590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:05:20,464][00368] Avg episode reward: [(0, '25.841')] [2023-02-24 07:05:25,462][00368] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 5947392. Throughput: 0: 832.8. Samples: 559632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:05:25,466][00368] Avg episode reward: [(0, '24.598')] [2023-02-24 07:05:28,439][21378] Updated weights for policy 0, policy_version 1455 (0.0014) [2023-02-24 07:05:30,465][00368] Fps is (10 sec: 3275.8, 60 sec: 3277.1, 300 sec: 3235.1). Total num frames: 5967872. Throughput: 0: 827.1. Samples: 564716. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:05:30,469][00368] Avg episode reward: [(0, '22.835')] [2023-02-24 07:05:35,462][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.6, 300 sec: 3262.9). Total num frames: 5988352. Throughput: 0: 834.3. Samples: 570852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:05:35,470][00368] Avg episode reward: [(0, '24.186')] [2023-02-24 07:05:39,196][21378] Updated weights for policy 0, policy_version 1465 (0.0020) [2023-02-24 07:05:40,467][00368] Fps is (10 sec: 3276.1, 60 sec: 3413.4, 300 sec: 3262.9). Total num frames: 6000640. Throughput: 0: 836.0. Samples: 573306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:05:40,470][00368] Avg episode reward: [(0, '23.127')] [2023-02-24 07:05:40,729][21362] Stopping Batcher_0... [2023-02-24 07:05:40,730][21362] Loop batcher_evt_loop terminating... [2023-02-24 07:05:40,739][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-24 07:05:40,730][00368] Component Batcher_0 stopped! [2023-02-24 07:05:40,813][21378] Weights refcount: 2 0 [2023-02-24 07:05:40,833][21378] Stopping InferenceWorker_p0-w0... [2023-02-24 07:05:40,835][21378] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 07:05:40,838][00368] Component InferenceWorker_p0-w0 stopped! [2023-02-24 07:05:41,018][00368] Component RolloutWorker_w6 stopped! [2023-02-24 07:05:41,017][21393] Stopping RolloutWorker_w6... [2023-02-24 07:05:41,030][21393] Loop rollout_proc6_evt_loop terminating... [2023-02-24 07:05:41,026][00368] Component RolloutWorker_w2 stopped! [2023-02-24 07:05:41,026][21385] Stopping RolloutWorker_w2... [2023-02-24 07:05:41,034][21385] Loop rollout_proc2_evt_loop terminating... [2023-02-24 07:05:41,038][21379] Stopping RolloutWorker_w0... [2023-02-24 07:05:41,041][21379] Loop rollout_proc0_evt_loop terminating... [2023-02-24 07:05:41,040][00368] Component RolloutWorker_w0 stopped! [2023-02-24 07:05:41,046][21405] Stopping RolloutWorker_w8... [2023-02-24 07:05:41,047][21405] Loop rollout_proc8_evt_loop terminating... [2023-02-24 07:05:41,048][21386] Stopping RolloutWorker_w4... [2023-02-24 07:05:41,048][00368] Component RolloutWorker_w8 stopped! [2023-02-24 07:05:41,049][21386] Loop rollout_proc4_evt_loop terminating... [2023-02-24 07:05:41,050][00368] Component RolloutWorker_w4 stopped! [2023-02-24 07:05:41,080][21362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001347_5517312.pth [2023-02-24 07:05:41,097][00368] Component RolloutWorker_w5 stopped! [2023-02-24 07:05:41,101][21395] Stopping RolloutWorker_w5... [2023-02-24 07:05:41,102][21395] Loop rollout_proc5_evt_loop terminating... [2023-02-24 07:05:41,116][00368] Component RolloutWorker_w3 stopped! [2023-02-24 07:05:41,122][21381] Stopping RolloutWorker_w3... [2023-02-24 07:05:41,122][21381] Loop rollout_proc3_evt_loop terminating... [2023-02-24 07:05:41,132][00368] Component RolloutWorker_w7 stopped! [2023-02-24 07:05:41,148][21380] Stopping RolloutWorker_w1... [2023-02-24 07:05:41,147][00368] Component RolloutWorker_w1 stopped! [2023-02-24 07:05:41,136][21397] Stopping RolloutWorker_w7... [2023-02-24 07:05:41,167][21397] Loop rollout_proc7_evt_loop terminating... [2023-02-24 07:05:41,168][21380] Loop rollout_proc1_evt_loop terminating... [2023-02-24 07:05:41,175][00368] Component RolloutWorker_w9 stopped! [2023-02-24 07:05:41,179][21403] Stopping RolloutWorker_w9... [2023-02-24 07:05:41,165][21362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-24 07:05:41,188][21403] Loop rollout_proc9_evt_loop terminating... [2023-02-24 07:05:41,509][00368] Component LearnerWorker_p0 stopped! [2023-02-24 07:05:41,512][00368] Waiting for process learner_proc0 to stop... [2023-02-24 07:05:41,518][21362] Stopping LearnerWorker_p0... [2023-02-24 07:05:41,518][21362] Loop learner_proc0_evt_loop terminating... [2023-02-24 07:05:45,489][00368] Waiting for process inference_proc0-0 to join... [2023-02-24 07:05:45,499][00368] Waiting for process rollout_proc0 to join... [2023-02-24 07:05:45,673][00368] Waiting for process rollout_proc1 to join... [2023-02-24 07:05:45,674][00368] Waiting for process rollout_proc2 to join... [2023-02-24 07:05:45,678][00368] Waiting for process rollout_proc3 to join... [2023-02-24 07:05:45,680][00368] Waiting for process rollout_proc4 to join... [2023-02-24 07:05:45,685][00368] Waiting for process rollout_proc5 to join... [2023-02-24 07:05:45,686][00368] Waiting for process rollout_proc6 to join... [2023-02-24 07:05:45,690][00368] Waiting for process rollout_proc7 to join... [2023-02-24 07:05:45,692][00368] Waiting for process rollout_proc8 to join... [2023-02-24 07:05:45,694][00368] Waiting for process rollout_proc9 to join... [2023-02-24 07:05:45,696][00368] Batcher 0 profile tree view: batching: 18.4374, releasing_batches: 0.0199 [2023-02-24 07:05:45,699][00368] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0027 wait_policy_total: 415.6736 update_model: 4.1588 weight_update: 0.0020 one_step: 0.0040 handle_policy_step: 276.6575 deserialize: 8.6679, stack: 1.5923, obs_to_device_normalize: 58.4883, forward: 135.2476, send_messages: 17.0332 prepare_outputs: 42.3764 to_cpu: 26.0368 [2023-02-24 07:05:45,703][00368] Learner 0 profile tree view: misc: 0.0037, prepare_batch: 13.3509 train: 49.7766 epoch_init: 0.0034, minibatch_init: 0.0094, losses_postprocess: 0.3332, kl_divergence: 0.4402, after_optimizer: 2.6157 calculate_losses: 16.5324 losses_init: 0.0023, forward_head: 1.2665, bptt_initial: 10.3662, tail: 0.7936, advantages_returns: 0.2664, losses: 2.0719 bptt: 1.5532 bptt_forward_core: 1.4833 update: 29.3588 clip: 0.9263 [2023-02-24 07:05:45,705][00368] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2325, enqueue_policy_requests: 116.2209, env_step: 512.7642, overhead: 14.2039, complete_rollouts: 4.2761 save_policy_outputs: 12.0052 split_output_tensors: 5.7731 [2023-02-24 07:05:45,707][00368] RolloutWorker_w9 profile tree view: wait_for_trajectories: 0.2008, enqueue_policy_requests: 120.5764, env_step: 511.1318, overhead: 14.3627, complete_rollouts: 4.0199 save_policy_outputs: 12.0691 split_output_tensors: 5.5596 [2023-02-24 07:05:45,709][00368] Loop Runner_EvtLoop terminating... [2023-02-24 07:05:45,711][00368] Runner profile tree view: main_loop: 747.8518 [2023-02-24 07:05:45,712][00368] Collected {0: 6004736}, FPS: 3072.6 [2023-02-24 07:10:35,507][00368] Environment doom_basic already registered, overwriting... [2023-02-24 07:10:35,510][00368] Environment doom_two_colors_easy already registered, overwriting... [2023-02-24 07:10:35,513][00368] Environment doom_two_colors_hard already registered, overwriting... [2023-02-24 07:10:35,517][00368] Environment doom_dm already registered, overwriting... [2023-02-24 07:10:35,519][00368] Environment doom_dwango5 already registered, overwriting... [2023-02-24 07:10:35,522][00368] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-24 07:10:35,523][00368] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-24 07:10:35,524][00368] Environment doom_my_way_home already registered, overwriting... [2023-02-24 07:10:35,525][00368] Environment doom_deadly_corridor already registered, overwriting... [2023-02-24 07:10:35,526][00368] Environment doom_defend_the_center already registered, overwriting... [2023-02-24 07:10:35,528][00368] Environment doom_defend_the_line already registered, overwriting... [2023-02-24 07:10:35,530][00368] Environment doom_health_gathering already registered, overwriting... [2023-02-24 07:10:35,532][00368] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-24 07:10:35,535][00368] Environment doom_battle already registered, overwriting... [2023-02-24 07:10:35,537][00368] Environment doom_battle2 already registered, overwriting... [2023-02-24 07:10:35,539][00368] Environment doom_duel_bots already registered, overwriting... [2023-02-24 07:10:35,542][00368] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-24 07:10:35,544][00368] Environment doom_duel already registered, overwriting... [2023-02-24 07:10:35,546][00368] Environment doom_deathmatch_full already registered, overwriting... [2023-02-24 07:10:35,548][00368] Environment doom_benchmark already registered, overwriting... [2023-02-24 07:10:35,550][00368] register_encoder_factory: [2023-02-24 07:10:35,591][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 07:10:35,594][00368] Overriding arg 'train_for_env_steps' with value 9000000 passed from command line [2023-02-24 07:10:35,604][00368] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-24 07:10:35,605][00368] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-24 07:10:35,606][00368] Weights and Biases integration disabled [2023-02-24 07:10:35,613][00368] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-24 07:10:39,333][00368] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=10 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=9000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=3700000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 3700000} git_hash=unknown git_repo_name=not a git repository [2023-02-24 07:10:39,339][00368] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 07:10:39,342][00368] Rollout worker 0 uses device cpu [2023-02-24 07:10:39,345][00368] Rollout worker 1 uses device cpu [2023-02-24 07:10:39,348][00368] Rollout worker 2 uses device cpu [2023-02-24 07:10:39,349][00368] Rollout worker 3 uses device cpu [2023-02-24 07:10:39,351][00368] Rollout worker 4 uses device cpu [2023-02-24 07:10:39,352][00368] Rollout worker 5 uses device cpu [2023-02-24 07:10:39,358][00368] Rollout worker 6 uses device cpu [2023-02-24 07:10:39,359][00368] Rollout worker 7 uses device cpu [2023-02-24 07:10:39,366][00368] Rollout worker 8 uses device cpu [2023-02-24 07:10:39,370][00368] Rollout worker 9 uses device cpu [2023-02-24 07:10:39,527][00368] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 07:10:39,529][00368] InferenceWorker_p0-w0: min num requests: 3 [2023-02-24 07:10:39,573][00368] Starting all processes... [2023-02-24 07:10:39,577][00368] Starting process learner_proc0 [2023-02-24 07:10:39,724][00368] Starting all processes... [2023-02-24 07:10:39,737][00368] Starting process inference_proc0-0 [2023-02-24 07:10:39,737][00368] Starting process rollout_proc0 [2023-02-24 07:10:39,744][00368] Starting process rollout_proc1 [2023-02-24 07:10:39,744][00368] Starting process rollout_proc2 [2023-02-24 07:10:39,744][00368] Starting process rollout_proc3 [2023-02-24 07:10:39,744][00368] Starting process rollout_proc4 [2023-02-24 07:10:39,822][00368] Starting process rollout_proc5 [2023-02-24 07:10:39,825][00368] Starting process rollout_proc6 [2023-02-24 07:10:39,827][00368] Starting process rollout_proc7 [2023-02-24 07:10:39,827][00368] Starting process rollout_proc8 [2023-02-24 07:10:39,830][00368] Starting process rollout_proc9 [2023-02-24 07:10:51,851][33049] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 07:10:51,859][33049] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 07:10:51,940][33049] Num visible devices: 1 [2023-02-24 07:10:52,003][33049] Starting seed is not provided [2023-02-24 07:10:52,004][33049] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 07:10:52,004][33049] Initializing actor-critic model on device cuda:0 [2023-02-24 07:10:52,005][33049] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 07:10:52,029][33049] RunningMeanStd input shape: (1,) [2023-02-24 07:10:52,166][33049] ConvEncoder: input_channels=3 [2023-02-24 07:10:52,916][33049] Conv encoder output size: 512 [2023-02-24 07:10:52,916][33049] Policy head output size: 512 [2023-02-24 07:10:53,074][33049] Created Actor Critic model with architecture: [2023-02-24 07:10:53,075][33049] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 07:10:53,905][33065] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 07:10:53,909][33065] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 07:10:54,016][33065] Num visible devices: 1 [2023-02-24 07:10:54,230][33066] Worker 0 uses CPU cores [0] [2023-02-24 07:10:54,391][33068] Worker 1 uses CPU cores [1] [2023-02-24 07:10:54,414][33074] Worker 3 uses CPU cores [1] [2023-02-24 07:10:54,666][33086] Worker 6 uses CPU cores [0] [2023-02-24 07:10:55,054][33080] Worker 5 uses CPU cores [1] [2023-02-24 07:10:55,180][33076] Worker 2 uses CPU cores [0] [2023-02-24 07:10:55,348][33088] Worker 7 uses CPU cores [1] [2023-02-24 07:10:55,373][33078] Worker 4 uses CPU cores [0] [2023-02-24 07:10:55,403][33090] Worker 9 uses CPU cores [1] [2023-02-24 07:10:55,604][33096] Worker 8 uses CPU cores [0] [2023-02-24 07:10:59,519][00368] Heartbeat connected on Batcher_0 [2023-02-24 07:10:59,527][00368] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 07:10:59,539][00368] Heartbeat connected on RolloutWorker_w0 [2023-02-24 07:10:59,543][00368] Heartbeat connected on RolloutWorker_w1 [2023-02-24 07:10:59,546][00368] Heartbeat connected on RolloutWorker_w2 [2023-02-24 07:10:59,551][00368] Heartbeat connected on RolloutWorker_w3 [2023-02-24 07:10:59,554][00368] Heartbeat connected on RolloutWorker_w4 [2023-02-24 07:10:59,559][00368] Heartbeat connected on RolloutWorker_w5 [2023-02-24 07:10:59,562][00368] Heartbeat connected on RolloutWorker_w6 [2023-02-24 07:10:59,567][00368] Heartbeat connected on RolloutWorker_w7 [2023-02-24 07:10:59,577][00368] Heartbeat connected on RolloutWorker_w8 [2023-02-24 07:10:59,578][00368] Heartbeat connected on RolloutWorker_w9 [2023-02-24 07:11:00,080][33049] Using optimizer [2023-02-24 07:11:00,080][33049] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2023-02-24 07:11:00,127][33049] Loading model from checkpoint [2023-02-24 07:11:00,137][33049] Loaded experiment state at self.train_step=1466, self.env_steps=6004736 [2023-02-24 07:11:00,138][33049] Initialized policy 0 weights for model version 1466 [2023-02-24 07:11:00,142][33049] LearnerWorker_p0 finished initialization! [2023-02-24 07:11:00,146][33049] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 07:11:00,144][00368] Heartbeat connected on LearnerWorker_p0 [2023-02-24 07:11:00,353][33065] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 07:11:00,354][33065] RunningMeanStd input shape: (1,) [2023-02-24 07:11:00,375][33065] ConvEncoder: input_channels=3 [2023-02-24 07:11:00,543][33065] Conv encoder output size: 512 [2023-02-24 07:11:00,544][33065] Policy head output size: 512 [2023-02-24 07:11:00,614][00368] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 6004736. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 07:11:04,107][00368] Inference worker 0-0 is ready! [2023-02-24 07:11:04,108][00368] All inference workers are ready! Signal rollout workers to start! [2023-02-24 07:11:04,233][33076] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,237][33066] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,236][33096] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,238][33078] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,239][33086] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,241][33088] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,235][33090] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,243][33074] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,240][33068] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:04,245][33080] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 07:11:05,534][33088] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,535][33068] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,614][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6004736. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 07:11:05,954][33074] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,948][33086] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,954][33066] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,957][33076] Decorrelating experience for 0 frames... [2023-02-24 07:11:05,962][33078] Decorrelating experience for 0 frames... [2023-02-24 07:11:06,695][33074] Decorrelating experience for 32 frames... [2023-02-24 07:11:06,802][33068] Decorrelating experience for 32 frames... [2023-02-24 07:11:07,058][33078] Decorrelating experience for 32 frames... [2023-02-24 07:11:07,060][33086] Decorrelating experience for 32 frames... [2023-02-24 07:11:07,067][33076] Decorrelating experience for 32 frames... [2023-02-24 07:11:07,624][33088] Decorrelating experience for 32 frames... [2023-02-24 07:11:07,653][33074] Decorrelating experience for 64 frames... [2023-02-24 07:11:07,959][33066] Decorrelating experience for 32 frames... [2023-02-24 07:11:08,087][33076] Decorrelating experience for 64 frames... [2023-02-24 07:11:08,840][33066] Decorrelating experience for 64 frames... [2023-02-24 07:11:08,913][33076] Decorrelating experience for 96 frames... [2023-02-24 07:11:09,113][33068] Decorrelating experience for 64 frames... [2023-02-24 07:11:09,118][33090] Decorrelating experience for 0 frames... [2023-02-24 07:11:09,477][33088] Decorrelating experience for 64 frames... [2023-02-24 07:11:09,612][33080] Decorrelating experience for 0 frames... [2023-02-24 07:11:10,077][33066] Decorrelating experience for 96 frames... [2023-02-24 07:11:10,552][33078] Decorrelating experience for 64 frames... [2023-02-24 07:11:10,614][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6004736. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 07:11:10,907][33090] Decorrelating experience for 32 frames... [2023-02-24 07:11:11,255][33080] Decorrelating experience for 32 frames... [2023-02-24 07:11:11,265][33088] Decorrelating experience for 96 frames... [2023-02-24 07:11:11,597][33068] Decorrelating experience for 96 frames... [2023-02-24 07:11:12,319][33086] Decorrelating experience for 64 frames... [2023-02-24 07:11:13,201][33078] Decorrelating experience for 96 frames... [2023-02-24 07:11:14,511][33096] Decorrelating experience for 0 frames... [2023-02-24 07:11:15,614][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6004736. Throughput: 0: 61.2. Samples: 918. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 07:11:15,628][00368] Avg episode reward: [(0, '2.123')] [2023-02-24 07:11:16,264][33086] Decorrelating experience for 96 frames... [2023-02-24 07:11:18,018][33074] Decorrelating experience for 96 frames... [2023-02-24 07:11:18,744][33049] Signal inference workers to stop experience collection... [2023-02-24 07:11:18,780][33065] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 07:11:19,037][33090] Decorrelating experience for 64 frames... [2023-02-24 07:11:19,287][33080] Decorrelating experience for 64 frames... [2023-02-24 07:11:20,015][33090] Decorrelating experience for 96 frames... [2023-02-24 07:11:20,191][33080] Decorrelating experience for 96 frames... [2023-02-24 07:11:20,568][33096] Decorrelating experience for 32 frames... [2023-02-24 07:11:20,614][00368] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 6004736. Throughput: 0: 120.5. Samples: 2410. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 07:11:20,621][00368] Avg episode reward: [(0, '3.985')] [2023-02-24 07:11:20,872][33049] Signal inference workers to resume experience collection... [2023-02-24 07:11:20,874][33065] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 07:11:22,872][33096] Decorrelating experience for 64 frames... [2023-02-24 07:11:25,615][00368] Fps is (10 sec: 1228.7, 60 sec: 491.5, 300 sec: 491.5). Total num frames: 6017024. Throughput: 0: 156.8. Samples: 3920. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-02-24 07:11:25,625][00368] Avg episode reward: [(0, '3.905')] [2023-02-24 07:11:28,499][33096] Decorrelating experience for 96 frames... [2023-02-24 07:11:30,614][00368] Fps is (10 sec: 2457.6, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 6029312. Throughput: 0: 188.9. Samples: 5668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0) [2023-02-24 07:11:30,623][00368] Avg episode reward: [(0, '8.438')] [2023-02-24 07:11:34,397][33065] Updated weights for policy 0, policy_version 1476 (0.0019) [2023-02-24 07:11:35,614][00368] Fps is (10 sec: 2867.5, 60 sec: 1170.3, 300 sec: 1170.3). Total num frames: 6045696. Throughput: 0: 304.2. Samples: 10648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:11:35,617][00368] Avg episode reward: [(0, '10.663')] [2023-02-24 07:11:40,614][00368] Fps is (10 sec: 2867.2, 60 sec: 1331.2, 300 sec: 1331.2). Total num frames: 6057984. Throughput: 0: 371.9. Samples: 14878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:11:40,617][00368] Avg episode reward: [(0, '13.879')] [2023-02-24 07:11:45,614][00368] Fps is (10 sec: 2457.6, 60 sec: 1456.4, 300 sec: 1456.4). Total num frames: 6070272. Throughput: 0: 366.9. Samples: 16510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:11:45,622][00368] Avg episode reward: [(0, '16.655')] [2023-02-24 07:11:50,610][33065] Updated weights for policy 0, policy_version 1486 (0.0032) [2023-02-24 07:11:50,614][00368] Fps is (10 sec: 2867.2, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 6086656. Throughput: 0: 446.3. Samples: 20082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:11:50,619][00368] Avg episode reward: [(0, '18.734')] [2023-02-24 07:11:55,614][00368] Fps is (10 sec: 2867.2, 60 sec: 1712.9, 300 sec: 1712.9). Total num frames: 6098944. Throughput: 0: 550.1. Samples: 24754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:11:55,619][00368] Avg episode reward: [(0, '20.994')] [2023-02-24 07:12:00,614][00368] Fps is (10 sec: 3276.8, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 6119424. Throughput: 0: 589.9. Samples: 27462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:12:00,623][00368] Avg episode reward: [(0, '21.869')] [2023-02-24 07:12:02,700][33065] Updated weights for policy 0, policy_version 1496 (0.0017) [2023-02-24 07:12:05,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2116.3, 300 sec: 1953.5). Total num frames: 6131712. Throughput: 0: 660.8. Samples: 32148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:12:05,620][00368] Avg episode reward: [(0, '24.026')] [2023-02-24 07:12:10,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2321.1, 300 sec: 1989.5). Total num frames: 6144000. Throughput: 0: 709.3. Samples: 35836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:12:10,618][00368] Avg episode reward: [(0, '24.926')] [2023-02-24 07:12:15,614][00368] Fps is (10 sec: 2457.5, 60 sec: 2525.9, 300 sec: 2020.7). Total num frames: 6156288. Throughput: 0: 711.3. Samples: 37676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:12:15,623][00368] Avg episode reward: [(0, '24.872')] [2023-02-24 07:12:17,906][33065] Updated weights for policy 0, policy_version 1506 (0.0028) [2023-02-24 07:12:20,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2150.4). Total num frames: 6176768. Throughput: 0: 711.4. Samples: 42660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:20,617][00368] Avg episode reward: [(0, '24.118')] [2023-02-24 07:12:25,619][00368] Fps is (10 sec: 3684.7, 60 sec: 2935.3, 300 sec: 2216.5). Total num frames: 6193152. Throughput: 0: 742.4. Samples: 48290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:25,621][00368] Avg episode reward: [(0, '23.278')] [2023-02-24 07:12:30,614][00368] Fps is (10 sec: 2867.1, 60 sec: 2935.4, 300 sec: 2230.0). Total num frames: 6205440. Throughput: 0: 746.3. Samples: 50096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:30,622][00368] Avg episode reward: [(0, '22.359')] [2023-02-24 07:12:30,984][33065] Updated weights for policy 0, policy_version 1516 (0.0015) [2023-02-24 07:12:35,614][00368] Fps is (10 sec: 2458.6, 60 sec: 2867.2, 300 sec: 2242.0). Total num frames: 6217728. Throughput: 0: 749.7. Samples: 53818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:35,620][00368] Avg episode reward: [(0, '22.768')] [2023-02-24 07:12:35,647][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001518_6217728.pth... [2023-02-24 07:12:35,991][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001445_5918720.pth [2023-02-24 07:12:40,614][00368] Fps is (10 sec: 2867.3, 60 sec: 2935.5, 300 sec: 2293.8). Total num frames: 6234112. Throughput: 0: 736.6. Samples: 57900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:40,616][00368] Avg episode reward: [(0, '23.929')] [2023-02-24 07:12:44,829][33065] Updated weights for policy 0, policy_version 1526 (0.0023) [2023-02-24 07:12:45,614][00368] Fps is (10 sec: 3277.1, 60 sec: 3003.7, 300 sec: 2340.6). Total num frames: 6250496. Throughput: 0: 738.4. Samples: 60692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:12:45,622][00368] Avg episode reward: [(0, '24.989')] [2023-02-24 07:12:50,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2345.9). Total num frames: 6262784. Throughput: 0: 727.7. Samples: 64896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:12:50,620][00368] Avg episode reward: [(0, '25.405')] [2023-02-24 07:12:55,618][00368] Fps is (10 sec: 2047.2, 60 sec: 2867.0, 300 sec: 2315.0). Total num frames: 6270976. Throughput: 0: 704.2. Samples: 67528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:12:55,624][00368] Avg episode reward: [(0, '25.475')] [2023-02-24 07:13:00,614][00368] Fps is (10 sec: 1638.4, 60 sec: 2662.4, 300 sec: 2286.9). Total num frames: 6279168. Throughput: 0: 693.6. Samples: 68886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:13:00,622][00368] Avg episode reward: [(0, '25.152')] [2023-02-24 07:13:05,057][33065] Updated weights for policy 0, policy_version 1536 (0.0027) [2023-02-24 07:13:05,614][00368] Fps is (10 sec: 2048.8, 60 sec: 2662.4, 300 sec: 2293.8). Total num frames: 6291456. Throughput: 0: 648.1. Samples: 71826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:13:05,621][00368] Avg episode reward: [(0, '25.208')] [2023-02-24 07:13:10,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2331.6). Total num frames: 6307840. Throughput: 0: 625.6. Samples: 76438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:13:10,616][00368] Avg episode reward: [(0, '24.952')] [2023-02-24 07:13:15,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2366.6). Total num frames: 6324224. Throughput: 0: 644.1. Samples: 79082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:13:15,617][00368] Avg episode reward: [(0, '24.563')] [2023-02-24 07:13:16,973][33065] Updated weights for policy 0, policy_version 1546 (0.0025) [2023-02-24 07:13:20,617][00368] Fps is (10 sec: 3275.7, 60 sec: 2730.5, 300 sec: 2399.0). Total num frames: 6340608. Throughput: 0: 670.2. Samples: 83978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:13:20,619][00368] Avg episode reward: [(0, '24.516')] [2023-02-24 07:13:25,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2662.6, 300 sec: 2401.1). Total num frames: 6352896. Throughput: 0: 660.3. Samples: 87614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:13:25,616][00368] Avg episode reward: [(0, '22.972')] [2023-02-24 07:13:30,617][00368] Fps is (10 sec: 2457.5, 60 sec: 2662.2, 300 sec: 2402.9). Total num frames: 6365184. Throughput: 0: 637.0. Samples: 89358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:13:30,621][00368] Avg episode reward: [(0, '24.294')] [2023-02-24 07:13:32,902][33065] Updated weights for policy 0, policy_version 1556 (0.0038) [2023-02-24 07:13:35,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2431.2). Total num frames: 6381568. Throughput: 0: 648.8. Samples: 94090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:13:35,621][00368] Avg episode reward: [(0, '24.965')] [2023-02-24 07:13:40,614][00368] Fps is (10 sec: 3687.8, 60 sec: 2798.9, 300 sec: 2483.2). Total num frames: 6402048. Throughput: 0: 713.9. Samples: 99652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:13:40,620][00368] Avg episode reward: [(0, '25.158')] [2023-02-24 07:13:45,433][33065] Updated weights for policy 0, policy_version 1566 (0.0015) [2023-02-24 07:13:45,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2482.4). Total num frames: 6414336. Throughput: 0: 728.4. Samples: 101666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:13:45,621][00368] Avg episode reward: [(0, '25.123')] [2023-02-24 07:13:50,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2730.7, 300 sec: 2481.7). Total num frames: 6426624. Throughput: 0: 744.4. Samples: 105322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:13:50,621][00368] Avg episode reward: [(0, '25.755')] [2023-02-24 07:13:55,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2481.0). Total num frames: 6438912. Throughput: 0: 724.3. Samples: 109030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:13:55,617][00368] Avg episode reward: [(0, '26.084')] [2023-02-24 07:13:59,678][33065] Updated weights for policy 0, policy_version 1576 (0.0019) [2023-02-24 07:14:00,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2503.1). Total num frames: 6455296. Throughput: 0: 727.8. Samples: 111834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:14:00,616][00368] Avg episode reward: [(0, '26.186')] [2023-02-24 07:14:05,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2546.2). Total num frames: 6475776. Throughput: 0: 742.1. Samples: 117370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:14:05,620][00368] Avg episode reward: [(0, '26.949')] [2023-02-24 07:14:10,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2543.8). Total num frames: 6488064. Throughput: 0: 749.0. Samples: 121318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:14:10,617][00368] Avg episode reward: [(0, '26.457')] [2023-02-24 07:14:13,555][33065] Updated weights for policy 0, policy_version 1586 (0.0030) [2023-02-24 07:14:15,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2541.6). Total num frames: 6500352. Throughput: 0: 751.4. Samples: 123166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:14:15,625][00368] Avg episode reward: [(0, '26.507')] [2023-02-24 07:14:20,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.6, 300 sec: 2560.0). Total num frames: 6516736. Throughput: 0: 743.0. Samples: 127526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:14:20,617][00368] Avg episode reward: [(0, '25.285')] [2023-02-24 07:14:25,293][33065] Updated weights for policy 0, policy_version 1596 (0.0020) [2023-02-24 07:14:25,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2597.5). Total num frames: 6537216. Throughput: 0: 755.5. Samples: 133650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:14:25,621][00368] Avg episode reward: [(0, '24.206')] [2023-02-24 07:14:30,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.5, 300 sec: 2613.6). Total num frames: 6553600. Throughput: 0: 772.4. Samples: 136422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:14:30,618][00368] Avg episode reward: [(0, '23.126')] [2023-02-24 07:14:35,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2610.0). Total num frames: 6565888. Throughput: 0: 777.2. Samples: 140296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:14:35,626][00368] Avg episode reward: [(0, '20.966')] [2023-02-24 07:14:35,640][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001603_6565888.pth... [2023-02-24 07:14:35,873][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth [2023-02-24 07:14:40,568][33065] Updated weights for policy 0, policy_version 1606 (0.0040) [2023-02-24 07:14:40,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2606.5). Total num frames: 6578176. Throughput: 0: 777.9. Samples: 144034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:14:40,623][00368] Avg episode reward: [(0, '21.246')] [2023-02-24 07:14:45,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2639.6). Total num frames: 6598656. Throughput: 0: 776.4. Samples: 146772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:14:45,620][00368] Avg episode reward: [(0, '20.414')] [2023-02-24 07:14:50,335][33065] Updated weights for policy 0, policy_version 1616 (0.0023) [2023-02-24 07:14:50,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2671.3). Total num frames: 6619136. Throughput: 0: 793.5. Samples: 153078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:14:50,617][00368] Avg episode reward: [(0, '21.511')] [2023-02-24 07:14:55,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2666.8). Total num frames: 6631424. Throughput: 0: 810.4. Samples: 157784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:14:55,620][00368] Avg episode reward: [(0, '21.550')] [2023-02-24 07:15:00,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2679.5). Total num frames: 6647808. Throughput: 0: 813.9. Samples: 159790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:15:00,624][00368] Avg episode reward: [(0, '21.290')] [2023-02-24 07:15:04,901][33065] Updated weights for policy 0, policy_version 1626 (0.0034) [2023-02-24 07:15:05,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2674.9). Total num frames: 6660096. Throughput: 0: 809.3. Samples: 163944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:15:05,619][00368] Avg episode reward: [(0, '23.128')] [2023-02-24 07:15:10,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2703.4). Total num frames: 6680576. Throughput: 0: 806.3. Samples: 169934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:15:10,617][00368] Avg episode reward: [(0, '22.237')] [2023-02-24 07:15:15,463][33065] Updated weights for policy 0, policy_version 1636 (0.0022) [2023-02-24 07:15:15,614][00368] Fps is (10 sec: 4095.8, 60 sec: 3345.0, 300 sec: 2730.7). Total num frames: 6701056. Throughput: 0: 812.4. Samples: 172982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:15:15,618][00368] Avg episode reward: [(0, '22.266')] [2023-02-24 07:15:20,614][00368] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 2725.4). Total num frames: 6713344. Throughput: 0: 821.8. Samples: 177278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:15:20,617][00368] Avg episode reward: [(0, '23.505')] [2023-02-24 07:15:25,614][00368] Fps is (10 sec: 2457.5, 60 sec: 3140.2, 300 sec: 2720.4). Total num frames: 6725632. Throughput: 0: 829.3. Samples: 181354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:15:25,622][00368] Avg episode reward: [(0, '24.866')] [2023-02-24 07:15:29,268][33065] Updated weights for policy 0, policy_version 1646 (0.0022) [2023-02-24 07:15:30,614][00368] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 2745.8). Total num frames: 6746112. Throughput: 0: 823.2. Samples: 183816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:15:30,616][00368] Avg episode reward: [(0, '25.132')] [2023-02-24 07:15:35,614][00368] Fps is (10 sec: 4096.3, 60 sec: 3345.1, 300 sec: 2770.4). Total num frames: 6766592. Throughput: 0: 822.8. Samples: 190102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:15:35,622][00368] Avg episode reward: [(0, '25.007')] [2023-02-24 07:15:40,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2764.8). Total num frames: 6778880. Throughput: 0: 815.7. Samples: 194492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:15:40,617][00368] Avg episode reward: [(0, '23.566')] [2023-02-24 07:15:41,204][33065] Updated weights for policy 0, policy_version 1656 (0.0018) [2023-02-24 07:15:45,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2759.4). Total num frames: 6791168. Throughput: 0: 809.2. Samples: 196202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:15:45,619][00368] Avg episode reward: [(0, '23.495')] [2023-02-24 07:15:50,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2754.2). Total num frames: 6803456. Throughput: 0: 795.3. Samples: 199734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:15:50,616][00368] Avg episode reward: [(0, '22.214')] [2023-02-24 07:15:55,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2763.1). Total num frames: 6819840. Throughput: 0: 773.2. Samples: 204728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:15:55,617][00368] Avg episode reward: [(0, '20.316')] [2023-02-24 07:15:55,970][33065] Updated weights for policy 0, policy_version 1666 (0.0043) [2023-02-24 07:16:00,615][00368] Fps is (10 sec: 3276.5, 60 sec: 3140.2, 300 sec: 2818.6). Total num frames: 6836224. Throughput: 0: 762.8. Samples: 207310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:16:00,617][00368] Avg episode reward: [(0, '20.789')] [2023-02-24 07:16:05,618][00368] Fps is (10 sec: 2866.0, 60 sec: 3140.1, 300 sec: 2860.2). Total num frames: 6848512. Throughput: 0: 754.1. Samples: 211216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:16:05,621][00368] Avg episode reward: [(0, '20.971')] [2023-02-24 07:16:10,614][00368] Fps is (10 sec: 2457.8, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 6860800. Throughput: 0: 739.2. Samples: 214616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:16:10,617][00368] Avg episode reward: [(0, '20.727')] [2023-02-24 07:16:12,120][33065] Updated weights for policy 0, policy_version 1676 (0.0015) [2023-02-24 07:16:15,614][00368] Fps is (10 sec: 2048.8, 60 sec: 2798.9, 300 sec: 2929.7). Total num frames: 6868992. Throughput: 0: 718.5. Samples: 216148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:16:15,618][00368] Avg episode reward: [(0, '21.196')] [2023-02-24 07:16:20,618][00368] Fps is (10 sec: 2047.0, 60 sec: 2798.7, 300 sec: 2929.6). Total num frames: 6881280. Throughput: 0: 651.4. Samples: 219416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:16:20,622][00368] Avg episode reward: [(0, '21.945')] [2023-02-24 07:16:25,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2929.7). Total num frames: 6893568. Throughput: 0: 632.4. Samples: 222948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:16:25,621][00368] Avg episode reward: [(0, '22.684')] [2023-02-24 07:16:30,618][00368] Fps is (10 sec: 2048.0, 60 sec: 2593.9, 300 sec: 2901.9). Total num frames: 6901760. Throughput: 0: 625.8. Samples: 224368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:16:30,621][00368] Avg episode reward: [(0, '23.994')] [2023-02-24 07:16:30,888][33065] Updated weights for policy 0, policy_version 1686 (0.0018) [2023-02-24 07:16:35,616][00368] Fps is (10 sec: 2047.6, 60 sec: 2457.5, 300 sec: 2901.9). Total num frames: 6914048. Throughput: 0: 626.6. Samples: 227932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:16:35,618][00368] Avg episode reward: [(0, '23.791')] [2023-02-24 07:16:35,634][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001688_6914048.pth... [2023-02-24 07:16:35,981][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001518_6217728.pth [2023-02-24 07:16:40,614][00368] Fps is (10 sec: 2458.8, 60 sec: 2457.6, 300 sec: 2901.9). Total num frames: 6926336. Throughput: 0: 601.4. Samples: 231792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:16:40,618][00368] Avg episode reward: [(0, '25.185')] [2023-02-24 07:16:45,133][33065] Updated weights for policy 0, policy_version 1696 (0.0020) [2023-02-24 07:16:45,614][00368] Fps is (10 sec: 3277.4, 60 sec: 2594.1, 300 sec: 2915.8). Total num frames: 6946816. Throughput: 0: 605.4. Samples: 234552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:16:45,617][00368] Avg episode reward: [(0, '25.251')] [2023-02-24 07:16:50,614][00368] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2929.7). Total num frames: 6963200. Throughput: 0: 647.5. Samples: 240352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:16:50,622][00368] Avg episode reward: [(0, '25.585')] [2023-02-24 07:16:55,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2915.8). Total num frames: 6979584. Throughput: 0: 659.0. Samples: 244272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:16:55,617][00368] Avg episode reward: [(0, '26.446')] [2023-02-24 07:16:59,099][33065] Updated weights for policy 0, policy_version 1706 (0.0034) [2023-02-24 07:17:00,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2594.2, 300 sec: 2915.8). Total num frames: 6991872. Throughput: 0: 666.5. Samples: 246142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:17:00,619][00368] Avg episode reward: [(0, '27.646')] [2023-02-24 07:17:05,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2662.6, 300 sec: 2929.7). Total num frames: 7008256. Throughput: 0: 693.1. Samples: 250600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:17:05,617][00368] Avg episode reward: [(0, '27.657')] [2023-02-24 07:17:10,289][33065] Updated weights for policy 0, policy_version 1716 (0.0021) [2023-02-24 07:17:10,614][00368] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2957.5). Total num frames: 7028736. Throughput: 0: 746.1. Samples: 256522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:17:10,622][00368] Avg episode reward: [(0, '27.028')] [2023-02-24 07:17:15,614][00368] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 7045120. Throughput: 0: 770.9. Samples: 259054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-24 07:17:15,617][00368] Avg episode reward: [(0, '25.727')] [2023-02-24 07:17:20,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.7, 300 sec: 2929.7). Total num frames: 7057408. Throughput: 0: 781.5. Samples: 263100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:17:20,619][00368] Avg episode reward: [(0, '25.873')] [2023-02-24 07:17:24,982][33065] Updated weights for policy 0, policy_version 1726 (0.0020) [2023-02-24 07:17:25,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 7069696. Throughput: 0: 786.4. Samples: 267178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:17:25,623][00368] Avg episode reward: [(0, '25.624')] [2023-02-24 07:17:30,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.5, 300 sec: 2957.5). Total num frames: 7090176. Throughput: 0: 790.5. Samples: 270124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:17:30,617][00368] Avg episode reward: [(0, '26.232')] [2023-02-24 07:17:35,380][33065] Updated weights for policy 0, policy_version 1736 (0.0015) [2023-02-24 07:17:35,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3276.9, 300 sec: 2971.3). Total num frames: 7110656. Throughput: 0: 797.4. Samples: 276234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:17:35,618][00368] Avg episode reward: [(0, '25.011')] [2023-02-24 07:17:40,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2957.5). Total num frames: 7122944. Throughput: 0: 805.9. Samples: 280538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:17:40,619][00368] Avg episode reward: [(0, '25.605')] [2023-02-24 07:17:45,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2957.4). Total num frames: 7135232. Throughput: 0: 805.4. Samples: 282386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:17:45,623][00368] Avg episode reward: [(0, '25.532')] [2023-02-24 07:17:49,586][33065] Updated weights for policy 0, policy_version 1746 (0.0015) [2023-02-24 07:17:50,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2985.3). Total num frames: 7151616. Throughput: 0: 803.1. Samples: 286740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:17:50,616][00368] Avg episode reward: [(0, '27.464')] [2023-02-24 07:17:55,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 7172096. Throughput: 0: 804.4. Samples: 292718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:17:55,619][00368] Avg episode reward: [(0, '27.739')] [2023-02-24 07:18:00,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 7188480. Throughput: 0: 812.9. Samples: 295634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:18:00,621][00368] Avg episode reward: [(0, '27.198')] [2023-02-24 07:18:01,207][33065] Updated weights for policy 0, policy_version 1756 (0.0017) [2023-02-24 07:18:05,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 7200768. Throughput: 0: 808.4. Samples: 299478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:05,617][00368] Avg episode reward: [(0, '26.194')] [2023-02-24 07:18:10,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 7213056. Throughput: 0: 807.5. Samples: 303516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:10,620][00368] Avg episode reward: [(0, '25.611')] [2023-02-24 07:18:14,708][33065] Updated weights for policy 0, policy_version 1766 (0.0029) [2023-02-24 07:18:15,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 7233536. Throughput: 0: 803.0. Samples: 306258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:15,621][00368] Avg episode reward: [(0, '25.268')] [2023-02-24 07:18:20,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3054.6). Total num frames: 7254016. Throughput: 0: 800.8. Samples: 312270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:18:20,623][00368] Avg episode reward: [(0, '23.758')] [2023-02-24 07:18:25,617][00368] Fps is (10 sec: 3685.0, 60 sec: 3344.8, 300 sec: 3068.5). Total num frames: 7270400. Throughput: 0: 804.8. Samples: 316756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:18:25,621][00368] Avg episode reward: [(0, '23.695')] [2023-02-24 07:18:27,210][33065] Updated weights for policy 0, policy_version 1776 (0.0019) [2023-02-24 07:18:30,617][00368] Fps is (10 sec: 2866.2, 60 sec: 3208.3, 300 sec: 3054.6). Total num frames: 7282688. Throughput: 0: 807.4. Samples: 318720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:30,622][00368] Avg episode reward: [(0, '24.503')] [2023-02-24 07:18:35,614][00368] Fps is (10 sec: 2458.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 7294976. Throughput: 0: 797.9. Samples: 322646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:18:35,621][00368] Avg episode reward: [(0, '25.191')] [2023-02-24 07:18:35,633][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001781_7294976.pth... [2023-02-24 07:18:35,831][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001603_6565888.pth [2023-02-24 07:18:39,763][33065] Updated weights for policy 0, policy_version 1786 (0.0014) [2023-02-24 07:18:40,614][00368] Fps is (10 sec: 3278.0, 60 sec: 3208.5, 300 sec: 3054.6). Total num frames: 7315456. Throughput: 0: 799.2. Samples: 328680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:18:40,621][00368] Avg episode reward: [(0, '26.004')] [2023-02-24 07:18:45,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3082.4). Total num frames: 7335936. Throughput: 0: 801.3. Samples: 331692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:45,624][00368] Avg episode reward: [(0, '26.206')] [2023-02-24 07:18:50,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3082.4). Total num frames: 7348224. Throughput: 0: 809.5. Samples: 335906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:50,617][00368] Avg episode reward: [(0, '26.897')] [2023-02-24 07:18:53,249][33065] Updated weights for policy 0, policy_version 1796 (0.0022) [2023-02-24 07:18:55,614][00368] Fps is (10 sec: 2457.5, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 7360512. Throughput: 0: 806.2. Samples: 339796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:18:55,620][00368] Avg episode reward: [(0, '27.999')] [2023-02-24 07:18:55,638][33049] Saving new best policy, reward=27.999! [2023-02-24 07:19:00,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3054.6). Total num frames: 7376896. Throughput: 0: 801.0. Samples: 342302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:19:00,617][00368] Avg episode reward: [(0, '27.032')] [2023-02-24 07:19:04,440][33065] Updated weights for policy 0, policy_version 1806 (0.0015) [2023-02-24 07:19:05,614][00368] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3082.4). Total num frames: 7397376. Throughput: 0: 802.0. Samples: 348362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:19:05,619][00368] Avg episode reward: [(0, '25.759')] [2023-02-24 07:19:10,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3096.3). Total num frames: 7413760. Throughput: 0: 810.4. Samples: 353220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:19:10,619][00368] Avg episode reward: [(0, '25.891')] [2023-02-24 07:19:15,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3082.4). Total num frames: 7426048. Throughput: 0: 807.2. Samples: 355040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:19:15,619][00368] Avg episode reward: [(0, '25.320')] [2023-02-24 07:19:20,249][33065] Updated weights for policy 0, policy_version 1816 (0.0015) [2023-02-24 07:19:20,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3054.6). Total num frames: 7438336. Throughput: 0: 797.5. Samples: 358532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:19:20,617][00368] Avg episode reward: [(0, '26.067')] [2023-02-24 07:19:25,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.2, 300 sec: 3054.6). Total num frames: 7454720. Throughput: 0: 776.2. Samples: 363608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:19:25,617][00368] Avg episode reward: [(0, '25.201')] [2023-02-24 07:19:30,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3208.7, 300 sec: 3082.4). Total num frames: 7475200. Throughput: 0: 771.4. Samples: 366404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:19:30,620][00368] Avg episode reward: [(0, '24.597')] [2023-02-24 07:19:32,236][33065] Updated weights for policy 0, policy_version 1826 (0.0018) [2023-02-24 07:19:35,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3082.4). Total num frames: 7487488. Throughput: 0: 769.0. Samples: 370510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:19:35,617][00368] Avg episode reward: [(0, '24.148')] [2023-02-24 07:19:40,614][00368] Fps is (10 sec: 2048.0, 60 sec: 3003.7, 300 sec: 3040.8). Total num frames: 7495680. Throughput: 0: 761.2. Samples: 374050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:19:40,621][00368] Avg episode reward: [(0, '24.726')] [2023-02-24 07:19:45,617][00368] Fps is (10 sec: 2047.4, 60 sec: 2867.1, 300 sec: 3013.0). Total num frames: 7507968. Throughput: 0: 739.1. Samples: 375564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:19:45,621][00368] Avg episode reward: [(0, '25.240')] [2023-02-24 07:19:49,548][33065] Updated weights for policy 0, policy_version 1836 (0.0030) [2023-02-24 07:19:50,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3013.0). Total num frames: 7520256. Throughput: 0: 685.6. Samples: 379212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:19:50,620][00368] Avg episode reward: [(0, '25.759')] [2023-02-24 07:19:55,614][00368] Fps is (10 sec: 3277.8, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 7540736. Throughput: 0: 689.5. Samples: 384248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:19:55,619][00368] Avg episode reward: [(0, '25.660')] [2023-02-24 07:20:00,614][00368] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3026.9). Total num frames: 7553024. Throughput: 0: 698.3. Samples: 386462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:20:00,617][00368] Avg episode reward: [(0, '25.380')] [2023-02-24 07:20:03,109][33065] Updated weights for policy 0, policy_version 1846 (0.0038) [2023-02-24 07:20:05,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2999.1). Total num frames: 7565312. Throughput: 0: 708.3. Samples: 390406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:20:05,621][00368] Avg episode reward: [(0, '25.698')] [2023-02-24 07:20:10,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2798.9, 300 sec: 2985.2). Total num frames: 7581696. Throughput: 0: 695.6. Samples: 394912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:20:10,617][00368] Avg episode reward: [(0, '25.431')] [2023-02-24 07:20:15,532][33065] Updated weights for policy 0, policy_version 1856 (0.0026) [2023-02-24 07:20:15,614][00368] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 3013.0). Total num frames: 7602176. Throughput: 0: 702.1. Samples: 397998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:20:15,617][00368] Avg episode reward: [(0, '26.530')] [2023-02-24 07:20:20,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 7618560. Throughput: 0: 740.0. Samples: 403810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:20:20,617][00368] Avg episode reward: [(0, '27.332')] [2023-02-24 07:20:25,616][00368] Fps is (10 sec: 2866.7, 60 sec: 2935.4, 300 sec: 2999.1). Total num frames: 7630848. Throughput: 0: 750.9. Samples: 407844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:20:25,619][00368] Avg episode reward: [(0, '26.925')] [2023-02-24 07:20:28,654][33065] Updated weights for policy 0, policy_version 1866 (0.0053) [2023-02-24 07:20:30,615][00368] Fps is (10 sec: 2866.9, 60 sec: 2867.1, 300 sec: 2985.2). Total num frames: 7647232. Throughput: 0: 764.1. Samples: 409948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:20:30,622][00368] Avg episode reward: [(0, '26.423')] [2023-02-24 07:20:35,614][00368] Fps is (10 sec: 3687.1, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 7667712. Throughput: 0: 797.1. Samples: 415082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:20:35,621][00368] Avg episode reward: [(0, '26.922')] [2023-02-24 07:20:35,637][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001872_7667712.pth... [2023-02-24 07:20:35,849][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001688_6914048.pth [2023-02-24 07:20:39,664][33065] Updated weights for policy 0, policy_version 1876 (0.0018) [2023-02-24 07:20:40,614][00368] Fps is (10 sec: 3686.8, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 7684096. Throughput: 0: 816.3. Samples: 420980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:20:40,622][00368] Avg episode reward: [(0, '28.066')] [2023-02-24 07:20:40,629][33049] Saving new best policy, reward=28.066! [2023-02-24 07:20:45,619][00368] Fps is (10 sec: 3275.2, 60 sec: 3208.4, 300 sec: 3040.7). Total num frames: 7700480. Throughput: 0: 817.2. Samples: 423238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:20:45,621][00368] Avg episode reward: [(0, '28.265')] [2023-02-24 07:20:45,637][33049] Saving new best policy, reward=28.265! [2023-02-24 07:20:50,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 7712768. Throughput: 0: 820.5. Samples: 427330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:20:50,617][00368] Avg episode reward: [(0, '27.583')] [2023-02-24 07:20:54,213][33065] Updated weights for policy 0, policy_version 1886 (0.0026) [2023-02-24 07:20:55,614][00368] Fps is (10 sec: 2868.6, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 7729152. Throughput: 0: 819.0. Samples: 431766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:20:55,622][00368] Avg episode reward: [(0, '26.598')] [2023-02-24 07:21:00,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3054.7). Total num frames: 7749632. Throughput: 0: 819.0. Samples: 434852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:21:00,621][00368] Avg episode reward: [(0, '26.389')] [2023-02-24 07:21:04,372][33065] Updated weights for policy 0, policy_version 1896 (0.0013) [2023-02-24 07:21:05,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3068.5). Total num frames: 7766016. Throughput: 0: 821.9. Samples: 440794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:21:05,622][00368] Avg episode reward: [(0, '26.242')] [2023-02-24 07:21:10,620][00368] Fps is (10 sec: 3274.8, 60 sec: 3344.7, 300 sec: 3096.2). Total num frames: 7782400. Throughput: 0: 825.7. Samples: 445002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:21:10,627][00368] Avg episode reward: [(0, '26.366')] [2023-02-24 07:21:15,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 7794688. Throughput: 0: 824.2. Samples: 447034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:21:15,628][00368] Avg episode reward: [(0, '25.883')] [2023-02-24 07:21:18,561][33065] Updated weights for policy 0, policy_version 1906 (0.0037) [2023-02-24 07:21:20,614][00368] Fps is (10 sec: 2868.9, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 7811072. Throughput: 0: 820.9. Samples: 452024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:21:20,616][00368] Avg episode reward: [(0, '26.273')] [2023-02-24 07:21:25,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3151.9). Total num frames: 7831552. Throughput: 0: 829.1. Samples: 458290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:21:25,620][00368] Avg episode reward: [(0, '26.736')] [2023-02-24 07:21:29,272][33065] Updated weights for policy 0, policy_version 1916 (0.0014) [2023-02-24 07:21:30,618][00368] Fps is (10 sec: 3684.8, 60 sec: 3344.9, 300 sec: 3165.7). Total num frames: 7847936. Throughput: 0: 833.7. Samples: 460752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:21:30,621][00368] Avg episode reward: [(0, '27.852')] [2023-02-24 07:21:35,614][00368] Fps is (10 sec: 3276.5, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 7864320. Throughput: 0: 833.6. Samples: 464842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:21:35,618][00368] Avg episode reward: [(0, '27.614')] [2023-02-24 07:21:40,614][00368] Fps is (10 sec: 3278.2, 60 sec: 3276.8, 300 sec: 3165.7). Total num frames: 7880704. Throughput: 0: 836.6. Samples: 469414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:21:40,619][00368] Avg episode reward: [(0, '27.718')] [2023-02-24 07:21:42,451][33065] Updated weights for policy 0, policy_version 1926 (0.0025) [2023-02-24 07:21:45,614][00368] Fps is (10 sec: 3686.7, 60 sec: 3345.3, 300 sec: 3179.6). Total num frames: 7901184. Throughput: 0: 835.6. Samples: 472452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:21:45,616][00368] Avg episode reward: [(0, '26.564')] [2023-02-24 07:21:50,614][00368] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3193.5). Total num frames: 7921664. Throughput: 0: 842.6. Samples: 478712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:21:50,617][00368] Avg episode reward: [(0, '25.787')] [2023-02-24 07:21:53,937][33065] Updated weights for policy 0, policy_version 1936 (0.0024) [2023-02-24 07:21:55,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3193.5). Total num frames: 7933952. Throughput: 0: 840.4. Samples: 482814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 07:21:55,617][00368] Avg episode reward: [(0, '24.686')] [2023-02-24 07:22:00,614][00368] Fps is (10 sec: 2457.8, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 7946240. Throughput: 0: 840.7. Samples: 484864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:22:00,616][00368] Avg episode reward: [(0, '23.746')] [2023-02-24 07:22:05,615][00368] Fps is (10 sec: 2866.7, 60 sec: 3276.7, 300 sec: 3165.7). Total num frames: 7962624. Throughput: 0: 834.5. Samples: 489578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:22:05,623][00368] Avg episode reward: [(0, '23.526')] [2023-02-24 07:22:06,739][33065] Updated weights for policy 0, policy_version 1946 (0.0040) [2023-02-24 07:22:10,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.7, 300 sec: 3193.5). Total num frames: 7987200. Throughput: 0: 833.1. Samples: 495778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:22:10,617][00368] Avg episode reward: [(0, '24.142')] [2023-02-24 07:22:15,615][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.2, 300 sec: 3193.5). Total num frames: 7999488. Throughput: 0: 835.2. Samples: 498336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:22:15,622][00368] Avg episode reward: [(0, '25.056')] [2023-02-24 07:22:19,082][33065] Updated weights for policy 0, policy_version 1956 (0.0045) [2023-02-24 07:22:20,614][00368] Fps is (10 sec: 2866.9, 60 sec: 3413.3, 300 sec: 3207.4). Total num frames: 8015872. Throughput: 0: 835.2. Samples: 502424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:22:20,621][00368] Avg episode reward: [(0, '24.875')] [2023-02-24 07:22:25,614][00368] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 8028160. Throughput: 0: 830.8. Samples: 506798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:22:25,617][00368] Avg episode reward: [(0, '24.619')] [2023-02-24 07:22:30,614][00368] Fps is (10 sec: 3277.1, 60 sec: 3345.3, 300 sec: 3179.6). Total num frames: 8048640. Throughput: 0: 833.3. Samples: 509952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:22:30,617][00368] Avg episode reward: [(0, '26.219')] [2023-02-24 07:22:30,655][33065] Updated weights for policy 0, policy_version 1966 (0.0014) [2023-02-24 07:22:35,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3207.4). Total num frames: 8069120. Throughput: 0: 829.4. Samples: 516034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:22:35,616][00368] Avg episode reward: [(0, '25.861')] [2023-02-24 07:22:35,634][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001970_8069120.pth... [2023-02-24 07:22:35,830][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001781_7294976.pth [2023-02-24 07:22:40,620][00368] Fps is (10 sec: 3274.9, 60 sec: 3344.7, 300 sec: 3207.3). Total num frames: 8081408. Throughput: 0: 827.2. Samples: 520042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:22:40,625][00368] Avg episode reward: [(0, '26.879')] [2023-02-24 07:22:44,328][33065] Updated weights for policy 0, policy_version 1976 (0.0018) [2023-02-24 07:22:45,616][00368] Fps is (10 sec: 2456.9, 60 sec: 3208.4, 300 sec: 3193.5). Total num frames: 8093696. Throughput: 0: 825.7. Samples: 522024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:22:45,619][00368] Avg episode reward: [(0, '26.310')] [2023-02-24 07:22:50,614][00368] Fps is (10 sec: 3278.7, 60 sec: 3208.6, 300 sec: 3193.5). Total num frames: 8114176. Throughput: 0: 826.3. Samples: 526760. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:22:50,622][00368] Avg episode reward: [(0, '25.863')] [2023-02-24 07:22:55,144][33065] Updated weights for policy 0, policy_version 1986 (0.0026) [2023-02-24 07:22:55,614][00368] Fps is (10 sec: 4097.1, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 8134656. Throughput: 0: 826.5. Samples: 532970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:22:55,618][00368] Avg episode reward: [(0, '26.705')] [2023-02-24 07:23:00,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3221.3). Total num frames: 8151040. Throughput: 0: 830.6. Samples: 535712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:23:00,621][00368] Avg episode reward: [(0, '25.948')] [2023-02-24 07:23:05,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.2, 300 sec: 3221.3). Total num frames: 8163328. Throughput: 0: 828.9. Samples: 539722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:23:05,617][00368] Avg episode reward: [(0, '26.380')] [2023-02-24 07:23:10,614][00368] Fps is (10 sec: 2048.0, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 8171520. Throughput: 0: 805.1. Samples: 543028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:23:10,621][00368] Avg episode reward: [(0, '27.647')] [2023-02-24 07:23:10,958][33065] Updated weights for policy 0, policy_version 1996 (0.0013) [2023-02-24 07:23:15,614][00368] Fps is (10 sec: 2457.6, 60 sec: 3140.4, 300 sec: 3165.7). Total num frames: 8187904. Throughput: 0: 776.4. Samples: 544892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:23:15,616][00368] Avg episode reward: [(0, '28.491')] [2023-02-24 07:23:15,632][33049] Saving new best policy, reward=28.491! [2023-02-24 07:23:20,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3151.9). Total num frames: 8200192. Throughput: 0: 729.9. Samples: 548880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:23:20,616][00368] Avg episode reward: [(0, '26.371')] [2023-02-24 07:23:25,241][33065] Updated weights for policy 0, policy_version 2006 (0.0033) [2023-02-24 07:23:25,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3165.8). Total num frames: 8216576. Throughput: 0: 748.3. Samples: 553712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:23:25,625][00368] Avg episode reward: [(0, '26.134')] [2023-02-24 07:23:30,614][00368] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 8228864. Throughput: 0: 750.0. Samples: 555770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:23:30,617][00368] Avg episode reward: [(0, '25.152')] [2023-02-24 07:23:35,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3151.8). Total num frames: 8245248. Throughput: 0: 738.7. Samples: 560000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:23:35,617][00368] Avg episode reward: [(0, '24.721')] [2023-02-24 07:23:37,887][33065] Updated weights for policy 0, policy_version 2016 (0.0017) [2023-02-24 07:23:40,614][00368] Fps is (10 sec: 3686.5, 60 sec: 3072.3, 300 sec: 3151.8). Total num frames: 8265728. Throughput: 0: 736.4. Samples: 566106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:23:40,617][00368] Avg episode reward: [(0, '24.783')] [2023-02-24 07:23:45,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3165.7). Total num frames: 8282112. Throughput: 0: 742.5. Samples: 569124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:23:45,616][00368] Avg episode reward: [(0, '23.899')] [2023-02-24 07:23:49,753][33065] Updated weights for policy 0, policy_version 2026 (0.0032) [2023-02-24 07:23:50,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 8298496. Throughput: 0: 751.7. Samples: 573548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:23:50,621][00368] Avg episode reward: [(0, '23.764')] [2023-02-24 07:23:55,614][00368] Fps is (10 sec: 2867.1, 60 sec: 2935.5, 300 sec: 3165.7). Total num frames: 8310784. Throughput: 0: 770.0. Samples: 577680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:23:55,622][00368] Avg episode reward: [(0, '22.879')] [2023-02-24 07:24:00,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3165.7). Total num frames: 8331264. Throughput: 0: 784.3. Samples: 580184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:24:00,620][00368] Avg episode reward: [(0, '23.111')] [2023-02-24 07:24:02,187][33065] Updated weights for policy 0, policy_version 2036 (0.0017) [2023-02-24 07:24:05,614][00368] Fps is (10 sec: 4096.1, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 8351744. Throughput: 0: 830.2. Samples: 586240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:24:05,621][00368] Avg episode reward: [(0, '23.611')] [2023-02-24 07:24:10,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 8364032. Throughput: 0: 830.8. Samples: 591098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:24:10,626][00368] Avg episode reward: [(0, '23.845')] [2023-02-24 07:24:15,171][33065] Updated weights for policy 0, policy_version 2046 (0.0027) [2023-02-24 07:24:15,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 8380416. Throughput: 0: 827.6. Samples: 593014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:24:15,622][00368] Avg episode reward: [(0, '22.752')] [2023-02-24 07:24:20,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 8392704. Throughput: 0: 823.3. Samples: 597050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:24:20,617][00368] Avg episode reward: [(0, '23.709')] [2023-02-24 07:24:25,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3179.6). Total num frames: 8413184. Throughput: 0: 817.9. Samples: 602912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:24:25,620][00368] Avg episode reward: [(0, '25.414')] [2023-02-24 07:24:26,990][33065] Updated weights for policy 0, policy_version 2056 (0.0032) [2023-02-24 07:24:30,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 8429568. Throughput: 0: 816.0. Samples: 605842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:24:30,620][00368] Avg episode reward: [(0, '26.899')] [2023-02-24 07:24:35,618][00368] Fps is (10 sec: 3275.5, 60 sec: 3344.8, 300 sec: 3221.2). Total num frames: 8445952. Throughput: 0: 812.2. Samples: 610098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:24:35,621][00368] Avg episode reward: [(0, '27.059')] [2023-02-24 07:24:35,635][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002062_8445952.pth... [2023-02-24 07:24:35,924][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001872_7667712.pth [2023-02-24 07:24:40,616][00368] Fps is (10 sec: 2866.6, 60 sec: 3208.4, 300 sec: 3221.3). Total num frames: 8458240. Throughput: 0: 807.9. Samples: 614036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:24:40,620][00368] Avg episode reward: [(0, '27.204')] [2023-02-24 07:24:41,659][33065] Updated weights for policy 0, policy_version 2066 (0.0032) [2023-02-24 07:24:45,614][00368] Fps is (10 sec: 2868.3, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 8474624. Throughput: 0: 810.1. Samples: 616638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:24:45,616][00368] Avg episode reward: [(0, '27.696')] [2023-02-24 07:24:50,614][00368] Fps is (10 sec: 3687.1, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 8495104. Throughput: 0: 809.4. Samples: 622664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:24:50,617][00368] Avg episode reward: [(0, '29.218')] [2023-02-24 07:24:50,676][33049] Saving new best policy, reward=29.218! [2023-02-24 07:24:51,934][33065] Updated weights for policy 0, policy_version 2076 (0.0014) [2023-02-24 07:24:55,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 8511488. Throughput: 0: 808.3. Samples: 627472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:24:55,622][00368] Avg episode reward: [(0, '28.615')] [2023-02-24 07:25:00,619][00368] Fps is (10 sec: 2865.8, 60 sec: 3208.3, 300 sec: 3249.0). Total num frames: 8523776. Throughput: 0: 811.9. Samples: 629552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:00,623][00368] Avg episode reward: [(0, '27.750')] [2023-02-24 07:25:05,616][00368] Fps is (10 sec: 2866.4, 60 sec: 3140.1, 300 sec: 3249.0). Total num frames: 8540160. Throughput: 0: 815.5. Samples: 633752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:25:05,619][00368] Avg episode reward: [(0, '27.648')] [2023-02-24 07:25:06,032][33065] Updated weights for policy 0, policy_version 2086 (0.0015) [2023-02-24 07:25:10,614][00368] Fps is (10 sec: 3688.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 8560640. Throughput: 0: 821.3. Samples: 639870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:25:10,617][00368] Avg episode reward: [(0, '27.554')] [2023-02-24 07:25:15,614][00368] Fps is (10 sec: 3687.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 8577024. Throughput: 0: 819.7. Samples: 642728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:25:15,620][00368] Avg episode reward: [(0, '27.894')] [2023-02-24 07:25:17,242][33065] Updated weights for policy 0, policy_version 2096 (0.0020) [2023-02-24 07:25:20,617][00368] Fps is (10 sec: 3275.5, 60 sec: 3344.9, 300 sec: 3262.9). Total num frames: 8593408. Throughput: 0: 823.2. Samples: 647142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:20,620][00368] Avg episode reward: [(0, '27.664')] [2023-02-24 07:25:25,620][00368] Fps is (10 sec: 2865.5, 60 sec: 3208.2, 300 sec: 3249.0). Total num frames: 8605696. Throughput: 0: 832.7. Samples: 651510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:25:25,625][00368] Avg episode reward: [(0, '27.278')] [2023-02-24 07:25:29,789][33065] Updated weights for policy 0, policy_version 2106 (0.0039) [2023-02-24 07:25:30,614][00368] Fps is (10 sec: 3278.1, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 8626176. Throughput: 0: 836.8. Samples: 654296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:30,621][00368] Avg episode reward: [(0, '27.389')] [2023-02-24 07:25:35,614][00368] Fps is (10 sec: 4098.5, 60 sec: 3345.3, 300 sec: 3262.9). Total num frames: 8646656. Throughput: 0: 841.9. Samples: 660550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:25:35,616][00368] Avg episode reward: [(0, '28.735')] [2023-02-24 07:25:40,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3263.0). Total num frames: 8663040. Throughput: 0: 843.4. Samples: 665424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:25:40,619][00368] Avg episode reward: [(0, '30.010')] [2023-02-24 07:25:40,624][33049] Saving new best policy, reward=30.010! [2023-02-24 07:25:41,373][33065] Updated weights for policy 0, policy_version 2116 (0.0013) [2023-02-24 07:25:45,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3262.9). Total num frames: 8675328. Throughput: 0: 842.8. Samples: 667474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:45,618][00368] Avg episode reward: [(0, '29.730')] [2023-02-24 07:25:50,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 8695808. Throughput: 0: 848.6. Samples: 671936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:50,616][00368] Avg episode reward: [(0, '28.031')] [2023-02-24 07:25:53,706][33065] Updated weights for policy 0, policy_version 2126 (0.0024) [2023-02-24 07:25:55,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 8716288. Throughput: 0: 849.0. Samples: 678074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:25:55,617][00368] Avg episode reward: [(0, '28.286')] [2023-02-24 07:26:00,614][00368] Fps is (10 sec: 3686.1, 60 sec: 3481.9, 300 sec: 3276.8). Total num frames: 8732672. Throughput: 0: 854.5. Samples: 681180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:26:00,617][00368] Avg episode reward: [(0, '29.183')] [2023-02-24 07:26:05,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3263.0). Total num frames: 8744960. Throughput: 0: 852.7. Samples: 685512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:26:05,619][00368] Avg episode reward: [(0, '28.367')] [2023-02-24 07:26:05,847][33065] Updated weights for policy 0, policy_version 2136 (0.0015) [2023-02-24 07:26:10,615][00368] Fps is (10 sec: 2866.8, 60 sec: 3345.0, 300 sec: 3276.8). Total num frames: 8761344. Throughput: 0: 847.6. Samples: 689650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:26:10,621][00368] Avg episode reward: [(0, '28.290')] [2023-02-24 07:26:15,614][00368] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 8777728. Throughput: 0: 847.0. Samples: 692412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:26:15,622][00368] Avg episode reward: [(0, '28.480')] [2023-02-24 07:26:17,650][33065] Updated weights for policy 0, policy_version 2146 (0.0019) [2023-02-24 07:26:20,614][00368] Fps is (10 sec: 4096.8, 60 sec: 3481.8, 300 sec: 3290.7). Total num frames: 8802304. Throughput: 0: 848.4. Samples: 698726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:26:20,617][00368] Avg episode reward: [(0, '29.012')] [2023-02-24 07:26:25,618][00368] Fps is (10 sec: 3684.6, 60 sec: 3481.7, 300 sec: 3276.8). Total num frames: 8814592. Throughput: 0: 846.0. Samples: 703500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:26:25,622][00368] Avg episode reward: [(0, '30.116')] [2023-02-24 07:26:25,635][33049] Saving new best policy, reward=30.116! [2023-02-24 07:26:30,616][00368] Fps is (10 sec: 2456.9, 60 sec: 3344.9, 300 sec: 3262.9). Total num frames: 8826880. Throughput: 0: 843.0. Samples: 705410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:26:30,619][00368] Avg episode reward: [(0, '30.239')] [2023-02-24 07:26:30,625][33049] Saving new best policy, reward=30.239! [2023-02-24 07:26:31,144][33065] Updated weights for policy 0, policy_version 2156 (0.0019) [2023-02-24 07:26:35,614][00368] Fps is (10 sec: 2458.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 8839168. Throughput: 0: 825.2. Samples: 709070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:26:35,617][00368] Avg episode reward: [(0, '29.367')] [2023-02-24 07:26:35,636][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002158_8839168.pth... [2023-02-24 07:26:35,906][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001970_8069120.pth [2023-02-24 07:26:40,615][00368] Fps is (10 sec: 2457.8, 60 sec: 3140.2, 300 sec: 3221.2). Total num frames: 8851456. Throughput: 0: 777.2. Samples: 713048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 07:26:40,618][00368] Avg episode reward: [(0, '28.972')] [2023-02-24 07:26:45,614][00368] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 8867840. Throughput: 0: 753.3. Samples: 715080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:26:45,616][00368] Avg episode reward: [(0, '29.192')] [2023-02-24 07:26:46,981][33065] Updated weights for policy 0, policy_version 2166 (0.0031) [2023-02-24 07:26:50,617][00368] Fps is (10 sec: 2866.6, 60 sec: 3071.8, 300 sec: 3207.3). Total num frames: 8880128. Throughput: 0: 747.0. Samples: 719132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:26:50,624][00368] Avg episode reward: [(0, '27.113')] [2023-02-24 07:26:55,614][00368] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3207.4). Total num frames: 8892416. Throughput: 0: 741.1. Samples: 722998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:26:55,622][00368] Avg episode reward: [(0, '27.096')] [2023-02-24 07:27:00,276][33065] Updated weights for policy 0, policy_version 2176 (0.0018) [2023-02-24 07:27:00,614][00368] Fps is (10 sec: 3278.1, 60 sec: 3003.8, 300 sec: 3221.3). Total num frames: 8912896. Throughput: 0: 734.8. Samples: 725480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:27:00,616][00368] Avg episode reward: [(0, '26.663')] [2023-02-24 07:27:05,614][00368] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 8933376. Throughput: 0: 736.0. Samples: 731848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:27:05,621][00368] Avg episode reward: [(0, '24.941')] [2023-02-24 07:27:10,614][00368] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3221.3). Total num frames: 8949760. Throughput: 0: 742.3. Samples: 736902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:27:10,619][00368] Avg episode reward: [(0, '24.546')] [2023-02-24 07:27:11,869][33065] Updated weights for policy 0, policy_version 2186 (0.0018) [2023-02-24 07:27:15,616][00368] Fps is (10 sec: 2866.6, 60 sec: 3071.9, 300 sec: 3207.4). Total num frames: 8962048. Throughput: 0: 745.1. Samples: 738938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 07:27:15,624][00368] Avg episode reward: [(0, '24.662')] [2023-02-24 07:27:20,614][00368] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3221.3). Total num frames: 8978432. Throughput: 0: 753.7. Samples: 742988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 07:27:20,616][00368] Avg episode reward: [(0, '24.851')] [2023-02-24 07:27:24,098][33065] Updated weights for policy 0, policy_version 2196 (0.0027) [2023-02-24 07:27:25,614][00368] Fps is (10 sec: 3687.2, 60 sec: 3072.2, 300 sec: 3221.3). Total num frames: 8998912. Throughput: 0: 805.1. Samples: 749278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 07:27:25,617][00368] Avg episode reward: [(0, '23.469')] [2023-02-24 07:27:27,029][33049] Stopping Batcher_0... [2023-02-24 07:27:27,029][33049] Loop batcher_evt_loop terminating... [2023-02-24 07:27:27,030][00368] Component Batcher_0 stopped! [2023-02-24 07:27:27,038][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2023-02-24 07:27:27,096][33065] Weights refcount: 2 0 [2023-02-24 07:27:27,133][33065] Stopping InferenceWorker_p0-w0... [2023-02-24 07:27:27,135][33065] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 07:27:27,137][00368] Component InferenceWorker_p0-w0 stopped! [2023-02-24 07:27:27,177][33049] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002062_8445952.pth [2023-02-24 07:27:27,189][33049] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2023-02-24 07:27:27,297][00368] Component LearnerWorker_p0 stopped! [2023-02-24 07:27:27,302][33049] Stopping LearnerWorker_p0... [2023-02-24 07:27:27,303][33049] Loop learner_proc0_evt_loop terminating... [2023-02-24 07:27:27,333][00368] Component RolloutWorker_w1 stopped! [2023-02-24 07:27:27,346][33086] Stopping RolloutWorker_w6... [2023-02-24 07:27:27,346][00368] Component RolloutWorker_w6 stopped! [2023-02-24 07:27:27,361][33066] Stopping RolloutWorker_w0... [2023-02-24 07:27:27,360][00368] Component RolloutWorker_w7 stopped! [2023-02-24 07:27:27,362][33096] Stopping RolloutWorker_w8... [2023-02-24 07:27:27,363][33086] Loop rollout_proc6_evt_loop terminating... [2023-02-24 07:27:27,363][33096] Loop rollout_proc8_evt_loop terminating... [2023-02-24 07:27:27,364][33076] Stopping RolloutWorker_w2... [2023-02-24 07:27:27,368][33066] Loop rollout_proc0_evt_loop terminating... [2023-02-24 07:27:27,364][00368] Component RolloutWorker_w0 stopped! [2023-02-24 07:27:27,368][33076] Loop rollout_proc2_evt_loop terminating... [2023-02-24 07:27:27,369][00368] Component RolloutWorker_w8 stopped! [2023-02-24 07:27:27,371][00368] Component RolloutWorker_w2 stopped! [2023-02-24 07:27:27,375][33078] Stopping RolloutWorker_w4... [2023-02-24 07:27:27,376][33078] Loop rollout_proc4_evt_loop terminating... [2023-02-24 07:27:27,375][00368] Component RolloutWorker_w4 stopped! [2023-02-24 07:27:27,339][33068] Stopping RolloutWorker_w1... [2023-02-24 07:27:27,381][33068] Loop rollout_proc1_evt_loop terminating... [2023-02-24 07:27:27,382][33088] Stopping RolloutWorker_w7... [2023-02-24 07:27:27,427][00368] Component RolloutWorker_w5 stopped! [2023-02-24 07:27:27,427][33080] Stopping RolloutWorker_w5... [2023-02-24 07:27:27,435][33080] Loop rollout_proc5_evt_loop terminating... [2023-02-24 07:27:27,424][33088] Loop rollout_proc7_evt_loop terminating... [2023-02-24 07:27:27,454][00368] Component RolloutWorker_w9 stopped! [2023-02-24 07:27:27,462][33090] Stopping RolloutWorker_w9... [2023-02-24 07:27:27,463][33090] Loop rollout_proc9_evt_loop terminating... [2023-02-24 07:27:27,465][33074] Stopping RolloutWorker_w3... [2023-02-24 07:27:27,466][33074] Loop rollout_proc3_evt_loop terminating... [2023-02-24 07:27:27,465][00368] Component RolloutWorker_w3 stopped! [2023-02-24 07:27:27,474][00368] Waiting for process learner_proc0 to stop... [2023-02-24 07:27:31,547][00368] Waiting for process inference_proc0-0 to join... [2023-02-24 07:27:31,550][00368] Waiting for process rollout_proc0 to join... [2023-02-24 07:27:31,552][00368] Waiting for process rollout_proc1 to join... [2023-02-24 07:27:31,792][00368] Waiting for process rollout_proc2 to join... [2023-02-24 07:27:31,794][00368] Waiting for process rollout_proc3 to join... [2023-02-24 07:27:31,796][00368] Waiting for process rollout_proc4 to join... [2023-02-24 07:27:31,797][00368] Waiting for process rollout_proc5 to join... [2023-02-24 07:27:31,800][00368] Waiting for process rollout_proc6 to join... [2023-02-24 07:27:31,801][00368] Waiting for process rollout_proc7 to join... [2023-02-24 07:27:31,803][00368] Waiting for process rollout_proc8 to join... [2023-02-24 07:27:31,804][00368] Waiting for process rollout_proc9 to join... [2023-02-24 07:27:31,806][00368] Batcher 0 profile tree view: batching: 24.6208, releasing_batches: 0.0252 [2023-02-24 07:27:31,807][00368] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 558.4279 update_model: 6.0949 weight_update: 0.0027 one_step: 0.0171 handle_policy_step: 384.4165 deserialize: 12.4442, stack: 2.1662, obs_to_device_normalize: 83.2904, forward: 187.7970, send_messages: 23.4736 prepare_outputs: 56.4492 to_cpu: 33.7584 [2023-02-24 07:27:31,808][00368] Learner 0 profile tree view: misc: 0.0046, prepare_batch: 18.2454 train: 66.3601 epoch_init: 0.0095, minibatch_init: 0.0057, losses_postprocess: 0.5073, kl_divergence: 0.4866, after_optimizer: 3.2433 calculate_losses: 22.2203 losses_init: 0.0029, forward_head: 1.8005, bptt_initial: 13.9169, tail: 1.0846, advantages_returns: 0.2207, losses: 2.8907 bptt: 1.9802 bptt_forward_core: 1.9091 update: 39.1320 clip: 1.2104 [2023-02-24 07:27:31,810][00368] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4121, enqueue_policy_requests: 164.0646, env_step: 693.8239, overhead: 21.4233, complete_rollouts: 5.9541 save_policy_outputs: 18.7681 split_output_tensors: 9.0038 [2023-02-24 07:27:31,812][00368] RolloutWorker_w9 profile tree view: wait_for_trajectories: 0.2629, enqueue_policy_requests: 162.6794, env_step: 689.8885, overhead: 20.9270, complete_rollouts: 5.9544 save_policy_outputs: 18.2668 split_output_tensors: 9.1414 [2023-02-24 07:27:31,814][00368] Loop Runner_EvtLoop terminating... [2023-02-24 07:27:31,817][00368] Runner profile tree view: main_loop: 1012.2451 [2023-02-24 07:27:31,818][00368] Collected {0: 9007104}, FPS: 2966.0 [2023-02-24 07:27:31,906][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 07:27:31,908][00368] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 07:27:31,909][00368] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 07:27:31,911][00368] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 07:27:31,912][00368] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 07:27:31,915][00368] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 07:27:31,916][00368] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 07:27:31,919][00368] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 07:27:31,921][00368] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-24 07:27:31,922][00368] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-24 07:27:31,924][00368] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 07:27:31,926][00368] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 07:27:31,929][00368] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 07:27:31,930][00368] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 07:27:31,932][00368] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 07:27:31,971][00368] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 07:27:31,979][00368] RunningMeanStd input shape: (1,) [2023-02-24 07:27:32,011][00368] ConvEncoder: input_channels=3 [2023-02-24 07:27:32,199][00368] Conv encoder output size: 512 [2023-02-24 07:27:32,202][00368] Policy head output size: 512 [2023-02-24 07:27:32,328][00368] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2023-02-24 07:27:33,040][00368] Num frames 100... [2023-02-24 07:27:33,203][00368] Num frames 200... [2023-02-24 07:27:33,382][00368] Num frames 300... [2023-02-24 07:27:33,556][00368] Num frames 400... [2023-02-24 07:27:33,723][00368] Num frames 500... [2023-02-24 07:27:33,892][00368] Num frames 600... [2023-02-24 07:27:34,063][00368] Avg episode rewards: #0: 16.690, true rewards: #0: 6.690 [2023-02-24 07:27:34,065][00368] Avg episode reward: 16.690, avg true_objective: 6.690 [2023-02-24 07:27:34,107][00368] Num frames 700... [2023-02-24 07:27:34,226][00368] Num frames 800... [2023-02-24 07:27:34,363][00368] Num frames 900... [2023-02-24 07:27:34,491][00368] Num frames 1000... [2023-02-24 07:27:34,610][00368] Num frames 1100... [2023-02-24 07:27:34,724][00368] Num frames 1200... [2023-02-24 07:27:34,843][00368] Num frames 1300... [2023-02-24 07:27:34,964][00368] Num frames 1400... [2023-02-24 07:27:35,087][00368] Num frames 1500... [2023-02-24 07:27:35,203][00368] Num frames 1600... [2023-02-24 07:27:35,318][00368] Num frames 1700... [2023-02-24 07:27:35,442][00368] Num frames 1800... [2023-02-24 07:27:35,567][00368] Num frames 1900... [2023-02-24 07:27:35,683][00368] Num frames 2000... [2023-02-24 07:27:35,814][00368] Avg episode rewards: #0: 26.830, true rewards: #0: 10.330 [2023-02-24 07:27:35,816][00368] Avg episode reward: 26.830, avg true_objective: 10.330 [2023-02-24 07:27:35,867][00368] Num frames 2100... [2023-02-24 07:27:35,995][00368] Num frames 2200... [2023-02-24 07:27:36,128][00368] Num frames 2300... [2023-02-24 07:27:36,258][00368] Num frames 2400... [2023-02-24 07:27:36,436][00368] Avg episode rewards: #0: 20.323, true rewards: #0: 8.323 [2023-02-24 07:27:36,440][00368] Avg episode reward: 20.323, avg true_objective: 8.323 [2023-02-24 07:27:36,446][00368] Num frames 2500... [2023-02-24 07:27:36,582][00368] Num frames 2600... [2023-02-24 07:27:36,705][00368] Num frames 2700... [2023-02-24 07:27:36,826][00368] Num frames 2800... [2023-02-24 07:27:36,943][00368] Num frames 2900... [2023-02-24 07:27:37,058][00368] Num frames 3000... [2023-02-24 07:27:37,174][00368] Num frames 3100... [2023-02-24 07:27:37,299][00368] Num frames 3200... [2023-02-24 07:27:37,420][00368] Num frames 3300... [2023-02-24 07:27:37,539][00368] Num frames 3400... [2023-02-24 07:27:37,658][00368] Num frames 3500... [2023-02-24 07:27:37,775][00368] Num frames 3600... [2023-02-24 07:27:37,888][00368] Num frames 3700... [2023-02-24 07:27:38,012][00368] Num frames 3800... [2023-02-24 07:27:38,142][00368] Avg episode rewards: #0: 24.150, true rewards: #0: 9.650 [2023-02-24 07:27:38,143][00368] Avg episode reward: 24.150, avg true_objective: 9.650 [2023-02-24 07:27:38,200][00368] Num frames 3900... [2023-02-24 07:27:38,323][00368] Num frames 4000... [2023-02-24 07:27:38,449][00368] Num frames 4100... [2023-02-24 07:27:38,569][00368] Num frames 4200... [2023-02-24 07:27:38,710][00368] Avg episode rewards: #0: 20.552, true rewards: #0: 8.552 [2023-02-24 07:27:38,711][00368] Avg episode reward: 20.552, avg true_objective: 8.552 [2023-02-24 07:27:38,746][00368] Num frames 4300... [2023-02-24 07:27:38,862][00368] Num frames 4400... [2023-02-24 07:27:38,984][00368] Num frames 4500... [2023-02-24 07:27:39,101][00368] Num frames 4600... [2023-02-24 07:27:39,217][00368] Num frames 4700... [2023-02-24 07:27:39,338][00368] Num frames 4800... [2023-02-24 07:27:39,462][00368] Num frames 4900... [2023-02-24 07:27:39,576][00368] Num frames 5000... [2023-02-24 07:27:39,697][00368] Num frames 5100... [2023-02-24 07:27:39,818][00368] Num frames 5200... [2023-02-24 07:27:39,945][00368] Num frames 5300... [2023-02-24 07:27:40,067][00368] Num frames 5400... [2023-02-24 07:27:40,200][00368] Avg episode rewards: #0: 21.102, true rewards: #0: 9.102 [2023-02-24 07:27:40,202][00368] Avg episode reward: 21.102, avg true_objective: 9.102 [2023-02-24 07:27:40,260][00368] Num frames 5500... [2023-02-24 07:27:40,411][00368] Num frames 5600... [2023-02-24 07:27:40,536][00368] Num frames 5700... [2023-02-24 07:27:40,659][00368] Num frames 5800... [2023-02-24 07:27:40,786][00368] Num frames 5900... [2023-02-24 07:27:40,905][00368] Num frames 6000... [2023-02-24 07:27:41,025][00368] Num frames 6100... [2023-02-24 07:27:41,144][00368] Num frames 6200... [2023-02-24 07:27:41,262][00368] Num frames 6300... [2023-02-24 07:27:41,385][00368] Num frames 6400... [2023-02-24 07:27:41,518][00368] Num frames 6500... [2023-02-24 07:27:41,641][00368] Num frames 6600... [2023-02-24 07:27:41,763][00368] Num frames 6700... [2023-02-24 07:27:41,878][00368] Num frames 6800... [2023-02-24 07:27:41,995][00368] Num frames 6900... [2023-02-24 07:27:42,110][00368] Num frames 7000... [2023-02-24 07:27:42,230][00368] Num frames 7100... [2023-02-24 07:27:42,348][00368] Num frames 7200... [2023-02-24 07:27:42,477][00368] Num frames 7300... [2023-02-24 07:27:42,594][00368] Num frames 7400... [2023-02-24 07:27:42,708][00368] Num frames 7500... [2023-02-24 07:27:42,838][00368] Avg episode rewards: #0: 25.944, true rewards: #0: 10.801 [2023-02-24 07:27:42,839][00368] Avg episode reward: 25.944, avg true_objective: 10.801 [2023-02-24 07:27:42,894][00368] Num frames 7600... [2023-02-24 07:27:43,010][00368] Num frames 7700... [2023-02-24 07:27:43,127][00368] Num frames 7800... [2023-02-24 07:27:43,244][00368] Num frames 7900... [2023-02-24 07:27:43,365][00368] Num frames 8000... [2023-02-24 07:27:43,484][00368] Num frames 8100... [2023-02-24 07:27:43,599][00368] Num frames 8200... [2023-02-24 07:27:43,744][00368] Avg episode rewards: #0: 24.851, true rewards: #0: 10.351 [2023-02-24 07:27:43,746][00368] Avg episode reward: 24.851, avg true_objective: 10.351 [2023-02-24 07:27:43,774][00368] Num frames 8300... [2023-02-24 07:27:43,895][00368] Num frames 8400... [2023-02-24 07:27:44,033][00368] Num frames 8500... [2023-02-24 07:27:44,199][00368] Num frames 8600... [2023-02-24 07:27:44,363][00368] Num frames 8700... [2023-02-24 07:27:44,533][00368] Num frames 8800... [2023-02-24 07:27:44,691][00368] Num frames 8900... [2023-02-24 07:27:44,849][00368] Num frames 9000... [2023-02-24 07:27:45,010][00368] Num frames 9100... [2023-02-24 07:27:45,176][00368] Num frames 9200... [2023-02-24 07:27:45,337][00368] Num frames 9300... [2023-02-24 07:27:45,500][00368] Num frames 9400... [2023-02-24 07:27:45,677][00368] Num frames 9500... [2023-02-24 07:27:45,839][00368] Num frames 9600... [2023-02-24 07:27:46,006][00368] Num frames 9700... [2023-02-24 07:27:46,170][00368] Num frames 9800... [2023-02-24 07:27:46,334][00368] Num frames 9900... [2023-02-24 07:27:46,506][00368] Num frames 10000... [2023-02-24 07:27:46,629][00368] Avg episode rewards: #0: 26.925, true rewards: #0: 11.148 [2023-02-24 07:27:46,631][00368] Avg episode reward: 26.925, avg true_objective: 11.148 [2023-02-24 07:27:46,742][00368] Num frames 10100... [2023-02-24 07:27:46,910][00368] Num frames 10200... [2023-02-24 07:27:47,078][00368] Num frames 10300... [2023-02-24 07:27:47,248][00368] Num frames 10400... [2023-02-24 07:27:47,418][00368] Num frames 10500... [2023-02-24 07:27:47,582][00368] Num frames 10600... [2023-02-24 07:27:47,689][00368] Avg episode rewards: #0: 25.641, true rewards: #0: 10.641 [2023-02-24 07:27:47,691][00368] Avg episode reward: 25.641, avg true_objective: 10.641 [2023-02-24 07:28:59,348][00368] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 07:29:27,698][00368] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 07:29:27,701][00368] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 07:29:27,703][00368] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 07:29:27,707][00368] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 07:29:27,709][00368] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 07:29:27,711][00368] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 07:29:27,712][00368] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 07:29:27,714][00368] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 07:29:27,715][00368] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 07:29:27,717][00368] Adding new argument 'hf_repository'='SatCat/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-24 07:29:27,718][00368] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 07:29:27,719][00368] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 07:29:27,721][00368] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 07:29:27,722][00368] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 07:29:27,723][00368] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 07:29:27,751][00368] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 07:29:27,754][00368] RunningMeanStd input shape: (1,) [2023-02-24 07:29:27,775][00368] ConvEncoder: input_channels=3 [2023-02-24 07:29:27,811][00368] Conv encoder output size: 512 [2023-02-24 07:29:27,813][00368] Policy head output size: 512 [2023-02-24 07:29:27,833][00368] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002199_9007104.pth... [2023-02-24 07:29:28,290][00368] Num frames 100... [2023-02-24 07:29:28,408][00368] Num frames 200... [2023-02-24 07:29:28,522][00368] Num frames 300... [2023-02-24 07:29:28,635][00368] Num frames 400... [2023-02-24 07:29:28,752][00368] Num frames 500... [2023-02-24 07:29:28,864][00368] Num frames 600... [2023-02-24 07:29:28,984][00368] Num frames 700... [2023-02-24 07:29:29,096][00368] Num frames 800... [2023-02-24 07:29:29,208][00368] Num frames 900... [2023-02-24 07:29:29,328][00368] Num frames 1000... [2023-02-24 07:29:29,442][00368] Num frames 1100... [2023-02-24 07:29:29,560][00368] Num frames 1200... [2023-02-24 07:29:29,674][00368] Num frames 1300... [2023-02-24 07:29:29,788][00368] Num frames 1400... [2023-02-24 07:29:29,901][00368] Num frames 1500... [2023-02-24 07:29:30,031][00368] Num frames 1600... [2023-02-24 07:29:30,154][00368] Num frames 1700... [2023-02-24 07:29:30,277][00368] Num frames 1800... [2023-02-24 07:29:30,392][00368] Num frames 1900... [2023-02-24 07:29:30,511][00368] Num frames 2000... [2023-02-24 07:29:30,627][00368] Num frames 2100... [2023-02-24 07:29:30,679][00368] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 [2023-02-24 07:29:30,682][00368] Avg episode reward: 58.999, avg true_objective: 21.000 [2023-02-24 07:29:30,797][00368] Num frames 2200... [2023-02-24 07:29:30,911][00368] Num frames 2300... [2023-02-24 07:29:31,025][00368] Num frames 2400... [2023-02-24 07:29:31,139][00368] Num frames 2500... [2023-02-24 07:29:31,267][00368] Num frames 2600... [2023-02-24 07:29:31,392][00368] Num frames 2700... [2023-02-24 07:29:31,514][00368] Num frames 2800... [2023-02-24 07:29:31,629][00368] Num frames 2900... [2023-02-24 07:29:31,721][00368] Avg episode rewards: #0: 39.159, true rewards: #0: 14.660 [2023-02-24 07:29:31,723][00368] Avg episode reward: 39.159, avg true_objective: 14.660 [2023-02-24 07:29:31,810][00368] Num frames 3000... [2023-02-24 07:29:31,935][00368] Num frames 3100... [2023-02-24 07:29:32,062][00368] Num frames 3200... [2023-02-24 07:29:32,176][00368] Num frames 3300... [2023-02-24 07:29:32,304][00368] Num frames 3400... [2023-02-24 07:29:32,421][00368] Num frames 3500... [2023-02-24 07:29:32,549][00368] Avg episode rewards: #0: 30.866, true rewards: #0: 11.867 [2023-02-24 07:29:32,550][00368] Avg episode reward: 30.866, avg true_objective: 11.867 [2023-02-24 07:29:32,605][00368] Num frames 3600... [2023-02-24 07:29:32,723][00368] Num frames 3700... [2023-02-24 07:29:32,840][00368] Num frames 3800... [2023-02-24 07:29:32,963][00368] Num frames 3900... [2023-02-24 07:29:33,082][00368] Num frames 4000... [2023-02-24 07:29:33,195][00368] Num frames 4100... [2023-02-24 07:29:33,317][00368] Num frames 4200... [2023-02-24 07:29:33,438][00368] Num frames 4300... [2023-02-24 07:29:33,558][00368] Num frames 4400... [2023-02-24 07:29:33,672][00368] Num frames 4500... [2023-02-24 07:29:33,787][00368] Num frames 4600... [2023-02-24 07:29:33,899][00368] Num frames 4700... [2023-02-24 07:29:34,018][00368] Num frames 4800... [2023-02-24 07:29:34,134][00368] Num frames 4900... [2023-02-24 07:29:34,250][00368] Num frames 5000... [2023-02-24 07:29:34,382][00368] Num frames 5100... [2023-02-24 07:29:34,501][00368] Num frames 5200... [2023-02-24 07:29:34,621][00368] Num frames 5300... [2023-02-24 07:29:34,700][00368] Avg episode rewards: #0: 35.300, true rewards: #0: 13.300 [2023-02-24 07:29:34,701][00368] Avg episode reward: 35.300, avg true_objective: 13.300 [2023-02-24 07:29:34,810][00368] Num frames 5400... [2023-02-24 07:29:34,938][00368] Num frames 5500... [2023-02-24 07:29:35,059][00368] Num frames 5600... [2023-02-24 07:29:35,175][00368] Num frames 5700... [2023-02-24 07:29:35,289][00368] Num frames 5800... [2023-02-24 07:29:35,409][00368] Num frames 5900... [2023-02-24 07:29:35,530][00368] Num frames 6000... [2023-02-24 07:29:35,648][00368] Num frames 6100... [2023-02-24 07:29:35,775][00368] Num frames 6200... [2023-02-24 07:29:35,941][00368] Num frames 6300... [2023-02-24 07:29:36,107][00368] Num frames 6400... [2023-02-24 07:29:36,268][00368] Num frames 6500... [2023-02-24 07:29:36,428][00368] Num frames 6600... [2023-02-24 07:29:36,601][00368] Num frames 6700... [2023-02-24 07:29:36,759][00368] Num frames 6800... [2023-02-24 07:29:36,917][00368] Num frames 6900... [2023-02-24 07:29:37,084][00368] Num frames 7000... [2023-02-24 07:29:37,245][00368] Num frames 7100... [2023-02-24 07:29:37,433][00368] Avg episode rewards: #0: 37.555, true rewards: #0: 14.356 [2023-02-24 07:29:37,436][00368] Avg episode reward: 37.555, avg true_objective: 14.356 [2023-02-24 07:29:37,491][00368] Num frames 7200... [2023-02-24 07:29:37,666][00368] Num frames 7300... [2023-02-24 07:29:37,824][00368] Num frames 7400... [2023-02-24 07:29:37,983][00368] Num frames 7500... [2023-02-24 07:29:38,147][00368] Num frames 7600... [2023-02-24 07:29:38,317][00368] Num frames 7700... [2023-02-24 07:29:38,494][00368] Num frames 7800... [2023-02-24 07:29:38,663][00368] Num frames 7900... [2023-02-24 07:29:38,832][00368] Num frames 8000... [2023-02-24 07:29:39,000][00368] Num frames 8100... [2023-02-24 07:29:39,163][00368] Num frames 8200... [2023-02-24 07:29:39,332][00368] Num frames 8300... [2023-02-24 07:29:39,481][00368] Num frames 8400... [2023-02-24 07:29:39,597][00368] Num frames 8500... [2023-02-24 07:29:39,716][00368] Num frames 8600... [2023-02-24 07:29:39,838][00368] Num frames 8700... [2023-02-24 07:29:39,961][00368] Num frames 8800... [2023-02-24 07:29:40,110][00368] Avg episode rewards: #0: 38.963, true rewards: #0: 14.797 [2023-02-24 07:29:40,112][00368] Avg episode reward: 38.963, avg true_objective: 14.797 [2023-02-24 07:29:40,143][00368] Num frames 8900... [2023-02-24 07:29:40,259][00368] Num frames 9000... [2023-02-24 07:29:40,383][00368] Num frames 9100... [2023-02-24 07:29:40,506][00368] Num frames 9200... [2023-02-24 07:29:40,651][00368] Avg episode rewards: #0: 34.108, true rewards: #0: 13.251 [2023-02-24 07:29:40,653][00368] Avg episode reward: 34.108, avg true_objective: 13.251 [2023-02-24 07:29:40,687][00368] Num frames 9300... [2023-02-24 07:29:40,802][00368] Num frames 9400... [2023-02-24 07:29:40,916][00368] Num frames 9500... [2023-02-24 07:29:41,030][00368] Num frames 9600... [2023-02-24 07:29:41,144][00368] Num frames 9700... [2023-02-24 07:29:41,259][00368] Num frames 9800... [2023-02-24 07:29:41,378][00368] Num frames 9900... [2023-02-24 07:29:41,507][00368] Avg episode rewards: #0: 31.560, true rewards: #0: 12.435 [2023-02-24 07:29:41,510][00368] Avg episode reward: 31.560, avg true_objective: 12.435 [2023-02-24 07:29:41,585][00368] Num frames 10000... [2023-02-24 07:29:41,714][00368] Num frames 10100... [2023-02-24 07:29:41,832][00368] Num frames 10200... [2023-02-24 07:29:41,952][00368] Num frames 10300... [2023-02-24 07:29:42,072][00368] Num frames 10400... [2023-02-24 07:29:42,133][00368] Avg episode rewards: #0: 28.892, true rewards: #0: 11.559 [2023-02-24 07:29:42,135][00368] Avg episode reward: 28.892, avg true_objective: 11.559 [2023-02-24 07:29:42,259][00368] Num frames 10500... [2023-02-24 07:29:42,383][00368] Num frames 10600... [2023-02-24 07:29:42,517][00368] Num frames 10700... [2023-02-24 07:29:42,638][00368] Num frames 10800... [2023-02-24 07:29:42,766][00368] Num frames 10900... [2023-02-24 07:29:42,881][00368] Num frames 11000... [2023-02-24 07:29:42,996][00368] Num frames 11100... [2023-02-24 07:29:43,113][00368] Num frames 11200... [2023-02-24 07:29:43,233][00368] Num frames 11300... [2023-02-24 07:29:43,354][00368] Num frames 11400... [2023-02-24 07:29:43,431][00368] Avg episode rewards: #0: 28.716, true rewards: #0: 11.416 [2023-02-24 07:29:43,433][00368] Avg episode reward: 28.716, avg true_objective: 11.416 [2023-02-24 07:31:01,765][00368] Replay video saved to /content/train_dir/default_experiment/replay.mp4!