[2023-02-24 22:38:03,469][00267] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 22:38:03,472][00267] Rollout worker 0 uses device cpu [2023-02-24 22:38:03,474][00267] Rollout worker 1 uses device cpu [2023-02-24 22:38:03,477][00267] Rollout worker 2 uses device cpu [2023-02-24 22:38:03,479][00267] Rollout worker 3 uses device cpu [2023-02-24 22:38:03,481][00267] Rollout worker 4 uses device cpu [2023-02-24 22:38:03,483][00267] Rollout worker 5 uses device cpu [2023-02-24 22:38:03,484][00267] Rollout worker 6 uses device cpu [2023-02-24 22:38:03,485][00267] Rollout worker 7 uses device cpu [2023-02-24 22:38:03,666][00267] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 22:38:03,668][00267] InferenceWorker_p0-w0: min num requests: 2 [2023-02-24 22:38:03,702][00267] Starting all processes... [2023-02-24 22:38:03,703][00267] Starting process learner_proc0 [2023-02-24 22:38:03,758][00267] Starting all processes... [2023-02-24 22:38:03,767][00267] Starting process inference_proc0-0 [2023-02-24 22:38:03,767][00267] Starting process rollout_proc0 [2023-02-24 22:38:03,769][00267] Starting process rollout_proc1 [2023-02-24 22:38:03,770][00267] Starting process rollout_proc2 [2023-02-24 22:38:03,771][00267] Starting process rollout_proc3 [2023-02-24 22:38:03,771][00267] Starting process rollout_proc4 [2023-02-24 22:38:03,771][00267] Starting process rollout_proc5 [2023-02-24 22:38:03,771][00267] Starting process rollout_proc6 [2023-02-24 22:38:03,771][00267] Starting process rollout_proc7 [2023-02-24 22:38:15,797][10374] Worker 4 uses CPU cores [0] [2023-02-24 22:38:15,796][10351] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 22:38:15,799][10351] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 22:38:15,956][10375] Worker 5 uses CPU cores [1] [2023-02-24 22:38:16,043][10372] Worker 2 uses CPU cores [0] [2023-02-24 22:38:16,122][10366] Worker 0 uses CPU cores [0] [2023-02-24 22:38:16,436][10351] Num visible devices: 1 [2023-02-24 22:38:16,482][10351] Starting seed is not provided [2023-02-24 22:38:16,483][10351] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 22:38:16,483][10351] Initializing actor-critic model on device cuda:0 [2023-02-24 22:38:16,484][10351] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 22:38:16,485][10351] RunningMeanStd input shape: (1,) [2023-02-24 22:38:16,494][10376] Worker 6 uses CPU cores [0] [2023-02-24 22:38:16,508][10371] Worker 1 uses CPU cores [1] [2023-02-24 22:38:16,520][10373] Worker 3 uses CPU cores [1] [2023-02-24 22:38:16,536][10365] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 22:38:16,536][10365] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 22:38:16,545][10351] ConvEncoder: input_channels=3 [2023-02-24 22:38:16,570][10365] Num visible devices: 1 [2023-02-24 22:38:16,605][10377] Worker 7 uses CPU cores [1] [2023-02-24 22:38:16,959][10351] Conv encoder output size: 512 [2023-02-24 22:38:16,959][10351] Policy head output size: 512 [2023-02-24 22:38:17,040][10351] Created Actor Critic model with architecture: [2023-02-24 22:38:17,040][10351] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 22:38:23,659][00267] Heartbeat connected on Batcher_0 [2023-02-24 22:38:23,666][00267] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 22:38:23,676][00267] Heartbeat connected on RolloutWorker_w0 [2023-02-24 22:38:23,681][00267] Heartbeat connected on RolloutWorker_w1 [2023-02-24 22:38:23,685][00267] Heartbeat connected on RolloutWorker_w2 [2023-02-24 22:38:23,690][00267] Heartbeat connected on RolloutWorker_w3 [2023-02-24 22:38:23,692][00267] Heartbeat connected on RolloutWorker_w4 [2023-02-24 22:38:23,696][00267] Heartbeat connected on RolloutWorker_w5 [2023-02-24 22:38:23,700][00267] Heartbeat connected on RolloutWorker_w6 [2023-02-24 22:38:23,702][00267] Heartbeat connected on RolloutWorker_w7 [2023-02-24 22:38:24,786][10351] Using optimizer [2023-02-24 22:38:24,787][10351] No checkpoints found [2023-02-24 22:38:24,787][10351] Did not load from checkpoint, starting from scratch! [2023-02-24 22:38:24,787][10351] Initialized policy 0 weights for model version 0 [2023-02-24 22:38:24,797][10351] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 22:38:24,808][10351] LearnerWorker_p0 finished initialization! [2023-02-24 22:38:24,809][00267] Heartbeat connected on LearnerWorker_p0 [2023-02-24 22:38:25,102][10365] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 22:38:25,105][10365] RunningMeanStd input shape: (1,) [2023-02-24 22:38:25,127][10365] ConvEncoder: input_channels=3 [2023-02-24 22:38:25,279][10365] Conv encoder output size: 512 [2023-02-24 22:38:25,280][10365] Policy head output size: 512 [2023-02-24 22:38:28,435][00267] Inference worker 0-0 is ready! [2023-02-24 22:38:28,439][00267] All inference workers are ready! Signal rollout workers to start! [2023-02-24 22:38:28,522][10371] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,579][10375] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,617][10376] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,628][10366] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,634][10374] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,674][10377] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,684][10372] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:28,699][10373] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:38:29,225][10366] Decorrelating experience for 0 frames... [2023-02-24 22:38:29,517][00267] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 22:38:29,707][10371] Decorrelating experience for 0 frames... [2023-02-24 22:38:29,761][10375] Decorrelating experience for 0 frames... [2023-02-24 22:38:29,809][10377] Decorrelating experience for 0 frames... [2023-02-24 22:38:29,844][10373] Decorrelating experience for 0 frames... [2023-02-24 22:38:30,432][10376] Decorrelating experience for 0 frames... [2023-02-24 22:38:30,518][10366] Decorrelating experience for 32 frames... [2023-02-24 22:38:30,758][10371] Decorrelating experience for 32 frames... [2023-02-24 22:38:30,822][10375] Decorrelating experience for 32 frames... [2023-02-24 22:38:30,890][10377] Decorrelating experience for 32 frames... [2023-02-24 22:38:31,381][10374] Decorrelating experience for 0 frames... [2023-02-24 22:38:31,390][10373] Decorrelating experience for 32 frames... [2023-02-24 22:38:31,649][10366] Decorrelating experience for 64 frames... [2023-02-24 22:38:31,828][10376] Decorrelating experience for 32 frames... [2023-02-24 22:38:31,969][10377] Decorrelating experience for 64 frames... [2023-02-24 22:38:32,194][10372] Decorrelating experience for 0 frames... [2023-02-24 22:38:32,456][10376] Decorrelating experience for 64 frames... [2023-02-24 22:38:32,593][10375] Decorrelating experience for 64 frames... [2023-02-24 22:38:32,852][10373] Decorrelating experience for 64 frames... [2023-02-24 22:38:33,107][10377] Decorrelating experience for 96 frames... [2023-02-24 22:38:33,222][10372] Decorrelating experience for 32 frames... [2023-02-24 22:38:33,801][10371] Decorrelating experience for 64 frames... [2023-02-24 22:38:33,832][10376] Decorrelating experience for 96 frames... [2023-02-24 22:38:33,883][10375] Decorrelating experience for 96 frames... [2023-02-24 22:38:34,360][10374] Decorrelating experience for 32 frames... [2023-02-24 22:38:34,465][10373] Decorrelating experience for 96 frames... [2023-02-24 22:38:34,517][00267] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 22:38:34,967][10371] Decorrelating experience for 96 frames... [2023-02-24 22:38:35,142][10372] Decorrelating experience for 64 frames... [2023-02-24 22:38:35,247][10374] Decorrelating experience for 64 frames... [2023-02-24 22:38:35,835][10366] Decorrelating experience for 96 frames... [2023-02-24 22:38:35,851][10372] Decorrelating experience for 96 frames... [2023-02-24 22:38:36,264][10374] Decorrelating experience for 96 frames... [2023-02-24 22:38:39,519][00267] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 53.6. Samples: 536. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 22:38:39,524][00267] Avg episode reward: [(0, '1.622')] [2023-02-24 22:38:40,006][10351] Signal inference workers to stop experience collection... [2023-02-24 22:38:40,036][10365] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 22:38:42,647][10351] Signal inference workers to resume experience collection... [2023-02-24 22:38:42,648][10365] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 22:38:44,517][00267] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 159.2. Samples: 2388. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-24 22:38:44,526][00267] Avg episode reward: [(0, '2.036')] [2023-02-24 22:38:49,517][00267] Fps is (10 sec: 2867.8, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 378.0. Samples: 7560. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2023-02-24 22:38:49,524][00267] Avg episode reward: [(0, '3.807')] [2023-02-24 22:38:52,123][10365] Updated weights for policy 0, policy_version 10 (0.0563) [2023-02-24 22:38:54,517][00267] Fps is (10 sec: 4505.6, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 447.4. Samples: 11184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:38:54,526][00267] Avg episode reward: [(0, '4.435')] [2023-02-24 22:38:59,517][00267] Fps is (10 sec: 4095.7, 60 sec: 2321.0, 300 sec: 2321.0). Total num frames: 69632. Throughput: 0: 585.6. Samples: 17568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:38:59,520][00267] Avg episode reward: [(0, '4.521')] [2023-02-24 22:39:02,896][10365] Updated weights for policy 0, policy_version 20 (0.0036) [2023-02-24 22:39:04,517][00267] Fps is (10 sec: 3686.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 630.4. Samples: 22064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:39:04,521][00267] Avg episode reward: [(0, '4.689')] [2023-02-24 22:39:09,517][00267] Fps is (10 sec: 3686.7, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 618.8. Samples: 24752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:39:09,523][00267] Avg episode reward: [(0, '4.520')] [2023-02-24 22:39:09,530][10351] Saving new best policy, reward=4.520! [2023-02-24 22:39:13,266][10365] Updated weights for policy 0, policy_version 30 (0.0016) [2023-02-24 22:39:14,517][00267] Fps is (10 sec: 3686.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 696.1. Samples: 31326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:39:14,521][00267] Avg episode reward: [(0, '4.524')] [2023-02-24 22:39:14,548][10351] Saving new best policy, reward=4.524! [2023-02-24 22:39:19,518][00267] Fps is (10 sec: 2866.7, 60 sec: 2703.3, 300 sec: 2703.3). Total num frames: 135168. Throughput: 0: 779.0. Samples: 35058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:39:19,521][00267] Avg episode reward: [(0, '4.482')] [2023-02-24 22:39:24,518][00267] Fps is (10 sec: 2457.1, 60 sec: 2680.9, 300 sec: 2680.9). Total num frames: 147456. Throughput: 0: 804.7. Samples: 36746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:39:24,521][00267] Avg episode reward: [(0, '4.405')] [2023-02-24 22:39:29,179][10365] Updated weights for policy 0, policy_version 40 (0.0021) [2023-02-24 22:39:29,517][00267] Fps is (10 sec: 2867.7, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 856.0. Samples: 40906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:39:29,519][00267] Avg episode reward: [(0, '4.276')] [2023-02-24 22:39:34,517][00267] Fps is (10 sec: 4096.7, 60 sec: 3140.3, 300 sec: 2898.7). Total num frames: 188416. Throughput: 0: 894.8. Samples: 47826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:39:34,519][00267] Avg episode reward: [(0, '4.319')] [2023-02-24 22:39:37,649][10365] Updated weights for policy 0, policy_version 50 (0.0017) [2023-02-24 22:39:39,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3481.7, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 894.7. Samples: 51446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:39:39,519][00267] Avg episode reward: [(0, '4.515')] [2023-02-24 22:39:44,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 867.7. Samples: 56614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:39:44,526][00267] Avg episode reward: [(0, '4.494')] [2023-02-24 22:39:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 876.4. Samples: 61500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:39:49,519][00267] Avg episode reward: [(0, '4.362')] [2023-02-24 22:39:49,887][10365] Updated weights for policy 0, policy_version 60 (0.0014) [2023-02-24 22:39:54,517][00267] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 895.1. Samples: 65030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:39:54,524][00267] Avg episode reward: [(0, '4.343')] [2023-02-24 22:39:58,408][10365] Updated weights for policy 0, policy_version 70 (0.0016) [2023-02-24 22:39:59,524][00267] Fps is (10 sec: 4911.3, 60 sec: 3686.0, 300 sec: 3231.0). Total num frames: 290816. Throughput: 0: 909.0. Samples: 72236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:39:59,527][00267] Avg episode reward: [(0, '4.385')] [2023-02-24 22:39:59,538][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2023-02-24 22:40:04,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 933.9. Samples: 77084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:40:04,522][00267] Avg episode reward: [(0, '4.384')] [2023-02-24 22:40:09,517][00267] Fps is (10 sec: 3279.4, 60 sec: 3618.1, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 947.1. Samples: 79362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:40:09,522][00267] Avg episode reward: [(0, '4.270')] [2023-02-24 22:40:10,320][10365] Updated weights for policy 0, policy_version 80 (0.0019) [2023-02-24 22:40:14,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 1004.9. Samples: 86128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:40:14,519][00267] Avg episode reward: [(0, '4.278')] [2023-02-24 22:40:18,806][10365] Updated weights for policy 0, policy_version 90 (0.0017) [2023-02-24 22:40:19,517][00267] Fps is (10 sec: 4505.5, 60 sec: 3891.3, 300 sec: 3351.3). Total num frames: 368640. Throughput: 0: 1004.0. Samples: 93004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:40:19,520][00267] Avg episode reward: [(0, '4.485')] [2023-02-24 22:40:24,516][00267] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 972.6. Samples: 95214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 22:40:24,522][00267] Avg episode reward: [(0, '4.638')] [2023-02-24 22:40:24,538][10351] Saving new best policy, reward=4.638! [2023-02-24 22:40:29,517][00267] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 961.2. Samples: 99866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:40:29,519][00267] Avg episode reward: [(0, '4.596')] [2023-02-24 22:40:31,100][10365] Updated weights for policy 0, policy_version 100 (0.0022) [2023-02-24 22:40:34,516][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3407.9). Total num frames: 425984. Throughput: 0: 1011.6. Samples: 107022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:40:34,519][00267] Avg episode reward: [(0, '4.450')] [2023-02-24 22:40:39,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 1014.4. Samples: 110676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 22:40:39,524][00267] Avg episode reward: [(0, '4.528')] [2023-02-24 22:40:40,109][10365] Updated weights for policy 0, policy_version 110 (0.0031) [2023-02-24 22:40:44,522][00267] Fps is (10 sec: 3684.5, 60 sec: 3959.1, 300 sec: 3428.4). Total num frames: 462848. Throughput: 0: 967.4. Samples: 115766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:40:44,529][00267] Avg episode reward: [(0, '4.514')] [2023-02-24 22:40:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 970.4. Samples: 120750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:40:49,519][00267] Avg episode reward: [(0, '4.552')] [2023-02-24 22:40:51,693][10365] Updated weights for policy 0, policy_version 120 (0.0014) [2023-02-24 22:40:54,517][00267] Fps is (10 sec: 4098.1, 60 sec: 3959.5, 300 sec: 3474.5). Total num frames: 503808. Throughput: 0: 999.9. Samples: 124356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:40:54,522][00267] Avg episode reward: [(0, '4.462')] [2023-02-24 22:40:59,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.7, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 1009.6. Samples: 131560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:40:59,518][00267] Avg episode reward: [(0, '4.634')] [2023-02-24 22:41:01,294][10365] Updated weights for policy 0, policy_version 130 (0.0011) [2023-02-24 22:41:04,521][00267] Fps is (10 sec: 3684.6, 60 sec: 3959.1, 300 sec: 3488.1). Total num frames: 540672. Throughput: 0: 956.7. Samples: 136058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:41:04,524][00267] Avg episode reward: [(0, '4.672')] [2023-02-24 22:41:04,526][10351] Saving new best policy, reward=4.672! [2023-02-24 22:41:09,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3481.6). Total num frames: 557056. Throughput: 0: 958.1. Samples: 138328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:41:09,519][00267] Avg episode reward: [(0, '4.456')] [2023-02-24 22:41:12,427][10365] Updated weights for policy 0, policy_version 140 (0.0024) [2023-02-24 22:41:14,516][00267] Fps is (10 sec: 4098.0, 60 sec: 3959.5, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 1006.9. Samples: 145178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:41:14,522][00267] Avg episode reward: [(0, '4.130')] [2023-02-24 22:41:19,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 998.7. Samples: 151964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:41:19,523][00267] Avg episode reward: [(0, '4.358')] [2023-02-24 22:41:22,620][10365] Updated weights for policy 0, policy_version 150 (0.0023) [2023-02-24 22:41:24,516][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 967.1. Samples: 154196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:41:24,523][00267] Avg episode reward: [(0, '4.422')] [2023-02-24 22:41:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 958.5. Samples: 158894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:41:29,524][00267] Avg episode reward: [(0, '4.203')] [2023-02-24 22:41:32,956][10365] Updated weights for policy 0, policy_version 160 (0.0024) [2023-02-24 22:41:34,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 1008.9. Samples: 166150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:41:34,524][00267] Avg episode reward: [(0, '4.469')] [2023-02-24 22:41:39,517][00267] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3600.2). Total num frames: 684032. Throughput: 0: 1008.8. Samples: 169752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:41:39,525][00267] Avg episode reward: [(0, '4.422')] [2023-02-24 22:41:43,502][10365] Updated weights for policy 0, policy_version 170 (0.0011) [2023-02-24 22:41:44,519][00267] Fps is (10 sec: 3685.5, 60 sec: 3891.4, 300 sec: 3570.8). Total num frames: 696320. Throughput: 0: 960.7. Samples: 174792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:41:44,526][00267] Avg episode reward: [(0, '4.240')] [2023-02-24 22:41:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 974.1. Samples: 179888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:41:49,519][00267] Avg episode reward: [(0, '4.263')] [2023-02-24 22:41:53,765][10365] Updated weights for policy 0, policy_version 180 (0.0021) [2023-02-24 22:41:54,518][00267] Fps is (10 sec: 4096.2, 60 sec: 3891.1, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 1004.8. Samples: 183544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:41:54,521][00267] Avg episode reward: [(0, '4.479')] [2023-02-24 22:41:59,517][00267] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3627.9). Total num frames: 761856. Throughput: 0: 1010.0. Samples: 190628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:41:59,519][00267] Avg episode reward: [(0, '4.515')] [2023-02-24 22:41:59,534][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2023-02-24 22:42:04,302][10365] Updated weights for policy 0, policy_version 190 (0.0021) [2023-02-24 22:42:04,516][00267] Fps is (10 sec: 4096.8, 60 sec: 3959.8, 300 sec: 3619.7). Total num frames: 778240. Throughput: 0: 958.8. Samples: 195110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:42:04,520][00267] Avg episode reward: [(0, '4.457')] [2023-02-24 22:42:09,518][00267] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3611.9). Total num frames: 794624. Throughput: 0: 962.5. Samples: 197508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:42:09,524][00267] Avg episode reward: [(0, '4.523')] [2023-02-24 22:42:13,990][10365] Updated weights for policy 0, policy_version 200 (0.0019) [2023-02-24 22:42:14,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3640.9). Total num frames: 819200. Throughput: 0: 1015.7. Samples: 204602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:42:14,519][00267] Avg episode reward: [(0, '4.618')] [2023-02-24 22:42:19,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3650.8). Total num frames: 839680. Throughput: 0: 996.0. Samples: 210970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:42:19,522][00267] Avg episode reward: [(0, '4.745')] [2023-02-24 22:42:19,533][10351] Saving new best policy, reward=4.745! [2023-02-24 22:42:24,516][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 963.1. Samples: 213090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:42:24,523][00267] Avg episode reward: [(0, '4.605')] [2023-02-24 22:42:25,667][10365] Updated weights for policy 0, policy_version 210 (0.0026) [2023-02-24 22:42:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3635.2). Total num frames: 872448. Throughput: 0: 963.2. Samples: 218134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:42:29,519][00267] Avg episode reward: [(0, '4.626')] [2023-02-24 22:42:34,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3661.3). Total num frames: 897024. Throughput: 0: 1011.9. Samples: 225422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:42:34,519][00267] Avg episode reward: [(0, '4.648')] [2023-02-24 22:42:34,844][10365] Updated weights for policy 0, policy_version 220 (0.0014) [2023-02-24 22:42:39,519][00267] Fps is (10 sec: 4504.3, 60 sec: 3891.0, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 1011.0. Samples: 229042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:42:39,522][00267] Avg episode reward: [(0, '4.431')] [2023-02-24 22:42:44,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3646.2). Total num frames: 929792. Throughput: 0: 942.2. Samples: 233028. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 22:42:44,527][00267] Avg episode reward: [(0, '4.416')] [2023-02-24 22:42:48,262][10365] Updated weights for policy 0, policy_version 230 (0.0015) [2023-02-24 22:42:49,518][00267] Fps is (10 sec: 2457.8, 60 sec: 3754.6, 300 sec: 3623.4). Total num frames: 942080. Throughput: 0: 921.8. Samples: 236594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:42:49,521][00267] Avg episode reward: [(0, '4.606')] [2023-02-24 22:42:54,517][00267] Fps is (10 sec: 2867.1, 60 sec: 3686.5, 300 sec: 3616.8). Total num frames: 958464. Throughput: 0: 913.0. Samples: 238594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:42:54,519][00267] Avg episode reward: [(0, '4.691')] [2023-02-24 22:42:58,968][10365] Updated weights for policy 0, policy_version 240 (0.0018) [2023-02-24 22:42:59,517][00267] Fps is (10 sec: 4096.8, 60 sec: 3686.4, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 914.2. Samples: 245742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:42:59,524][00267] Avg episode reward: [(0, '4.659')] [2023-02-24 22:43:04,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3649.2). Total num frames: 1003520. Throughput: 0: 907.2. Samples: 251792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:43:04,519][00267] Avg episode reward: [(0, '4.519')] [2023-02-24 22:43:09,517][00267] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 911.1. Samples: 254090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:43:09,522][00267] Avg episode reward: [(0, '4.450')] [2023-02-24 22:43:10,385][10365] Updated weights for policy 0, policy_version 250 (0.0019) [2023-02-24 22:43:14,516][00267] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 920.6. Samples: 259562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:43:14,522][00267] Avg episode reward: [(0, '4.527')] [2023-02-24 22:43:19,389][10365] Updated weights for policy 0, policy_version 260 (0.0021) [2023-02-24 22:43:19,517][00267] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3672.3). Total num frames: 1064960. Throughput: 0: 920.8. Samples: 266856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:43:19,519][00267] Avg episode reward: [(0, '4.647')] [2023-02-24 22:43:24,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 911.6. Samples: 270060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:43:24,521][00267] Avg episode reward: [(0, '4.712')] [2023-02-24 22:43:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 924.8. Samples: 274642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:43:29,524][00267] Avg episode reward: [(0, '4.753')] [2023-02-24 22:43:29,535][10351] Saving new best policy, reward=4.753! [2023-02-24 22:43:31,683][10365] Updated weights for policy 0, policy_version 270 (0.0025) [2023-02-24 22:43:34,516][00267] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1118208. Throughput: 0: 975.7. Samples: 280500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:43:34,522][00267] Avg episode reward: [(0, '4.987')] [2023-02-24 22:43:34,525][10351] Saving new best policy, reward=4.987! [2023-02-24 22:43:39,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3860.0). Total num frames: 1142784. Throughput: 0: 1009.2. Samples: 284008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:43:39,523][00267] Avg episode reward: [(0, '5.156')] [2023-02-24 22:43:39,535][10351] Saving new best policy, reward=5.156! [2023-02-24 22:43:40,432][10365] Updated weights for policy 0, policy_version 280 (0.0013) [2023-02-24 22:43:44,522][00267] Fps is (10 sec: 4093.6, 60 sec: 3822.6, 300 sec: 3832.1). Total num frames: 1159168. Throughput: 0: 986.2. Samples: 290126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:43:44,528][00267] Avg episode reward: [(0, '5.130')] [2023-02-24 22:43:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 1175552. Throughput: 0: 950.1. Samples: 294546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:43:49,522][00267] Avg episode reward: [(0, '5.101')] [2023-02-24 22:43:52,665][10365] Updated weights for policy 0, policy_version 290 (0.0035) [2023-02-24 22:43:54,517][00267] Fps is (10 sec: 3688.5, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1196032. Throughput: 0: 960.7. Samples: 297322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:43:54,519][00267] Avg episode reward: [(0, '4.991')] [2023-02-24 22:43:59,516][00267] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1220608. Throughput: 0: 1000.2. Samples: 304572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:43:59,523][00267] Avg episode reward: [(0, '4.964')] [2023-02-24 22:43:59,534][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth... [2023-02-24 22:43:59,681][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2023-02-24 22:44:01,183][10365] Updated weights for policy 0, policy_version 300 (0.0014) [2023-02-24 22:44:04,517][00267] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1236992. Throughput: 0: 965.6. Samples: 310308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:44:04,519][00267] Avg episode reward: [(0, '5.142')] [2023-02-24 22:44:09,517][00267] Fps is (10 sec: 3276.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1253376. Throughput: 0: 944.5. Samples: 312562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:44:09,524][00267] Avg episode reward: [(0, '5.296')] [2023-02-24 22:44:09,537][10351] Saving new best policy, reward=5.296! [2023-02-24 22:44:13,380][10365] Updated weights for policy 0, policy_version 310 (0.0059) [2023-02-24 22:44:14,517][00267] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1273856. Throughput: 0: 966.2. Samples: 318120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:44:14,524][00267] Avg episode reward: [(0, '5.238')] [2023-02-24 22:44:19,517][00267] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1298432. Throughput: 0: 998.4. Samples: 325426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:44:19,518][00267] Avg episode reward: [(0, '5.397')] [2023-02-24 22:44:19,528][10351] Saving new best policy, reward=5.397! [2023-02-24 22:44:22,518][10365] Updated weights for policy 0, policy_version 320 (0.0024) [2023-02-24 22:44:24,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1314816. Throughput: 0: 984.2. Samples: 328298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:44:24,518][00267] Avg episode reward: [(0, '5.247')] [2023-02-24 22:44:29,517][00267] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1331200. Throughput: 0: 949.5. Samples: 332850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:44:29,521][00267] Avg episode reward: [(0, '5.198')] [2023-02-24 22:44:34,021][10365] Updated weights for policy 0, policy_version 330 (0.0022) [2023-02-24 22:44:34,517][00267] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1351680. Throughput: 0: 991.1. Samples: 339146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:44:34,524][00267] Avg episode reward: [(0, '5.223')] [2023-02-24 22:44:39,517][00267] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1376256. Throughput: 0: 1010.4. Samples: 342792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:44:39,524][00267] Avg episode reward: [(0, '5.607')] [2023-02-24 22:44:39,535][10351] Saving new best policy, reward=5.607! [2023-02-24 22:44:43,383][10365] Updated weights for policy 0, policy_version 340 (0.0014) [2023-02-24 22:44:44,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.6, 300 sec: 3901.6). Total num frames: 1392640. Throughput: 0: 982.5. Samples: 348786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:44:44,519][00267] Avg episode reward: [(0, '5.528')] [2023-02-24 22:44:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1409024. Throughput: 0: 956.6. Samples: 353356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:44:49,523][00267] Avg episode reward: [(0, '5.397')] [2023-02-24 22:44:54,469][10365] Updated weights for policy 0, policy_version 350 (0.0022) [2023-02-24 22:44:54,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 1433600. Throughput: 0: 978.5. Samples: 356596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:44:54,519][00267] Avg episode reward: [(0, '5.151')] [2023-02-24 22:44:59,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1454080. Throughput: 0: 1014.2. Samples: 363760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:44:59,519][00267] Avg episode reward: [(0, '5.162')] [2023-02-24 22:45:04,486][10365] Updated weights for policy 0, policy_version 360 (0.0023) [2023-02-24 22:45:04,518][00267] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1474560. Throughput: 0: 971.5. Samples: 369146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:45:04,522][00267] Avg episode reward: [(0, '5.107')] [2023-02-24 22:45:09,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1486848. Throughput: 0: 957.2. Samples: 371370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:45:09,522][00267] Avg episode reward: [(0, '5.126')] [2023-02-24 22:45:14,517][00267] Fps is (10 sec: 3686.9, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 1511424. Throughput: 0: 994.1. Samples: 377582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:45:14,520][00267] Avg episode reward: [(0, '5.462')] [2023-02-24 22:45:14,992][10365] Updated weights for policy 0, policy_version 370 (0.0033) [2023-02-24 22:45:19,517][00267] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 1536000. Throughput: 0: 1017.9. Samples: 384954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:45:19,519][00267] Avg episode reward: [(0, '5.649')] [2023-02-24 22:45:19,533][10351] Saving new best policy, reward=5.649! [2023-02-24 22:45:24,517][00267] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1552384. Throughput: 0: 991.1. Samples: 387390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:45:24,520][00267] Avg episode reward: [(0, '5.919')] [2023-02-24 22:45:24,524][10351] Saving new best policy, reward=5.919! [2023-02-24 22:45:25,733][10365] Updated weights for policy 0, policy_version 380 (0.0024) [2023-02-24 22:45:29,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1564672. Throughput: 0: 957.3. Samples: 391864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:45:29,523][00267] Avg episode reward: [(0, '5.797')] [2023-02-24 22:45:34,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1589248. Throughput: 0: 1004.7. Samples: 398566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:45:34,519][00267] Avg episode reward: [(0, '5.715')] [2023-02-24 22:45:35,629][10365] Updated weights for policy 0, policy_version 390 (0.0018) [2023-02-24 22:45:39,517][00267] Fps is (10 sec: 4915.4, 60 sec: 3959.5, 300 sec: 3901.7). Total num frames: 1613824. Throughput: 0: 1012.7. Samples: 402168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:45:39,523][00267] Avg episode reward: [(0, '6.129')] [2023-02-24 22:45:39,531][10351] Saving new best policy, reward=6.129! [2023-02-24 22:45:44,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1630208. Throughput: 0: 976.5. Samples: 407702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:45:44,522][00267] Avg episode reward: [(0, '5.941')] [2023-02-24 22:45:46,714][10365] Updated weights for policy 0, policy_version 400 (0.0020) [2023-02-24 22:45:49,516][00267] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1646592. Throughput: 0: 960.9. Samples: 412386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:45:49,519][00267] Avg episode reward: [(0, '6.136')] [2023-02-24 22:45:49,528][10351] Saving new best policy, reward=6.136! [2023-02-24 22:45:54,518][00267] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1667072. Throughput: 0: 989.2. Samples: 415886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:45:54,525][00267] Avg episode reward: [(0, '5.908')] [2023-02-24 22:45:56,238][10365] Updated weights for policy 0, policy_version 410 (0.0017) [2023-02-24 22:45:59,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.7). Total num frames: 1691648. Throughput: 0: 1010.2. Samples: 423042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:45:59,519][00267] Avg episode reward: [(0, '6.085')] [2023-02-24 22:45:59,530][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000413_1691648.pth... [2023-02-24 22:45:59,717][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2023-02-24 22:46:04,517][00267] Fps is (10 sec: 3687.1, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 1703936. Throughput: 0: 945.2. Samples: 427488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:04,524][00267] Avg episode reward: [(0, '5.878')] [2023-02-24 22:46:09,518][00267] Fps is (10 sec: 2457.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1716224. Throughput: 0: 930.8. Samples: 429278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:09,523][00267] Avg episode reward: [(0, '6.102')] [2023-02-24 22:46:09,851][10365] Updated weights for policy 0, policy_version 420 (0.0013) [2023-02-24 22:46:14,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1732608. Throughput: 0: 913.0. Samples: 432948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:14,520][00267] Avg episode reward: [(0, '6.384')] [2023-02-24 22:46:14,525][10351] Saving new best policy, reward=6.384! [2023-02-24 22:46:19,517][00267] Fps is (10 sec: 3686.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 1753088. Throughput: 0: 909.8. Samples: 439508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:19,519][00267] Avg episode reward: [(0, '6.744')] [2023-02-24 22:46:19,533][10351] Saving new best policy, reward=6.744! [2023-02-24 22:46:20,716][10365] Updated weights for policy 0, policy_version 430 (0.0015) [2023-02-24 22:46:24,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1773568. Throughput: 0: 906.6. Samples: 442966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:24,519][00267] Avg episode reward: [(0, '7.039')] [2023-02-24 22:46:24,521][10351] Saving new best policy, reward=7.039! [2023-02-24 22:46:29,523][00267] Fps is (10 sec: 3683.9, 60 sec: 3754.3, 300 sec: 3832.1). Total num frames: 1789952. Throughput: 0: 883.1. Samples: 447446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:46:29,526][00267] Avg episode reward: [(0, '6.874')] [2023-02-24 22:46:32,982][10365] Updated weights for policy 0, policy_version 440 (0.0029) [2023-02-24 22:46:34,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1806336. Throughput: 0: 900.7. Samples: 452918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:34,518][00267] Avg episode reward: [(0, '6.770')] [2023-02-24 22:46:39,517][00267] Fps is (10 sec: 4098.8, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 1830912. Throughput: 0: 902.5. Samples: 456498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:39,522][00267] Avg episode reward: [(0, '6.491')] [2023-02-24 22:46:41,508][10365] Updated weights for policy 0, policy_version 450 (0.0017) [2023-02-24 22:46:44,516][00267] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1851392. Throughput: 0: 896.6. Samples: 463388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:44,525][00267] Avg episode reward: [(0, '6.503')] [2023-02-24 22:46:49,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1867776. Throughput: 0: 898.8. Samples: 467934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:49,520][00267] Avg episode reward: [(0, '6.070')] [2023-02-24 22:46:53,449][10365] Updated weights for policy 0, policy_version 460 (0.0019) [2023-02-24 22:46:54,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3818.3). Total num frames: 1888256. Throughput: 0: 913.1. Samples: 470366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:46:54,519][00267] Avg episode reward: [(0, '6.180')] [2023-02-24 22:46:59,517][00267] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1912832. Throughput: 0: 991.7. Samples: 477576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:46:59,520][00267] Avg episode reward: [(0, '6.519')] [2023-02-24 22:47:02,050][10365] Updated weights for policy 0, policy_version 470 (0.0025) [2023-02-24 22:47:04,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1929216. Throughput: 0: 983.7. Samples: 483776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:04,518][00267] Avg episode reward: [(0, '6.341')] [2023-02-24 22:47:09,520][00267] Fps is (10 sec: 3275.8, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 1945600. Throughput: 0: 955.6. Samples: 485972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:09,523][00267] Avg episode reward: [(0, '6.434')] [2023-02-24 22:47:13,996][10365] Updated weights for policy 0, policy_version 480 (0.0014) [2023-02-24 22:47:14,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1966080. Throughput: 0: 977.7. Samples: 491434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:14,522][00267] Avg episode reward: [(0, '6.704')] [2023-02-24 22:47:19,517][00267] Fps is (10 sec: 4507.1, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1990656. Throughput: 0: 1018.3. Samples: 498740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:47:19,520][00267] Avg episode reward: [(0, '7.159')] [2023-02-24 22:47:19,530][10351] Saving new best policy, reward=7.159! [2023-02-24 22:47:22,724][10365] Updated weights for policy 0, policy_version 490 (0.0026) [2023-02-24 22:47:24,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2011136. Throughput: 0: 1011.2. Samples: 502004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:47:24,521][00267] Avg episode reward: [(0, '7.073')] [2023-02-24 22:47:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3818.3). Total num frames: 2023424. Throughput: 0: 957.5. Samples: 506474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:47:29,519][00267] Avg episode reward: [(0, '7.307')] [2023-02-24 22:47:29,536][10351] Saving new best policy, reward=7.307! [2023-02-24 22:47:34,516][00267] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 2048000. Throughput: 0: 987.4. Samples: 512368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:34,518][00267] Avg episode reward: [(0, '7.440')] [2023-02-24 22:47:34,522][10351] Saving new best policy, reward=7.440! [2023-02-24 22:47:34,529][10365] Updated weights for policy 0, policy_version 500 (0.0011) [2023-02-24 22:47:39,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2068480. Throughput: 0: 1012.5. Samples: 515930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:39,521][00267] Avg episode reward: [(0, '7.444')] [2023-02-24 22:47:39,533][10351] Saving new best policy, reward=7.444! [2023-02-24 22:47:44,120][10365] Updated weights for policy 0, policy_version 510 (0.0013) [2023-02-24 22:47:44,516][00267] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2088960. Throughput: 0: 992.1. Samples: 522220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:44,521][00267] Avg episode reward: [(0, '7.904')] [2023-02-24 22:47:44,528][10351] Saving new best policy, reward=7.904! [2023-02-24 22:47:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2101248. Throughput: 0: 953.3. Samples: 526674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:47:49,519][00267] Avg episode reward: [(0, '8.312')] [2023-02-24 22:47:49,537][10351] Saving new best policy, reward=8.312! [2023-02-24 22:47:54,517][00267] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2121728. Throughput: 0: 964.2. Samples: 529356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:54,521][00267] Avg episode reward: [(0, '8.877')] [2023-02-24 22:47:54,523][10351] Saving new best policy, reward=8.877! [2023-02-24 22:47:55,560][10365] Updated weights for policy 0, policy_version 520 (0.0020) [2023-02-24 22:47:59,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2146304. Throughput: 0: 1001.8. Samples: 536514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:47:59,526][00267] Avg episode reward: [(0, '8.262')] [2023-02-24 22:47:59,536][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth... [2023-02-24 22:47:59,667][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000298_1220608.pth [2023-02-24 22:48:04,517][00267] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2166784. Throughput: 0: 969.6. Samples: 542374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:48:04,519][00267] Avg episode reward: [(0, '8.531')] [2023-02-24 22:48:05,567][10365] Updated weights for policy 0, policy_version 530 (0.0023) [2023-02-24 22:48:09,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 2179072. Throughput: 0: 949.3. Samples: 544724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:48:09,526][00267] Avg episode reward: [(0, '8.880')] [2023-02-24 22:48:09,537][10351] Saving new best policy, reward=8.880! [2023-02-24 22:48:14,516][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2203648. Throughput: 0: 974.0. Samples: 550302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:48:14,524][00267] Avg episode reward: [(0, '8.553')] [2023-02-24 22:48:16,008][10365] Updated weights for policy 0, policy_version 540 (0.0013) [2023-02-24 22:48:19,517][00267] Fps is (10 sec: 4915.3, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2228224. Throughput: 0: 1004.0. Samples: 557548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:48:19,523][00267] Avg episode reward: [(0, '8.372')] [2023-02-24 22:48:24,520][00267] Fps is (10 sec: 4094.4, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 2244608. Throughput: 0: 993.5. Samples: 560642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:48:24,525][00267] Avg episode reward: [(0, '8.446')] [2023-02-24 22:48:26,759][10365] Updated weights for policy 0, policy_version 550 (0.0041) [2023-02-24 22:48:29,516][00267] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2260992. Throughput: 0: 950.8. Samples: 565004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:48:29,520][00267] Avg episode reward: [(0, '8.903')] [2023-02-24 22:48:29,534][10351] Saving new best policy, reward=8.903! [2023-02-24 22:48:34,517][00267] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2281472. Throughput: 0: 986.0. Samples: 571046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:48:34,526][00267] Avg episode reward: [(0, '9.488')] [2023-02-24 22:48:34,529][10351] Saving new best policy, reward=9.488! [2023-02-24 22:48:36,970][10365] Updated weights for policy 0, policy_version 560 (0.0015) [2023-02-24 22:48:39,516][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2306048. Throughput: 0: 1005.7. Samples: 574614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:48:39,525][00267] Avg episode reward: [(0, '9.778')] [2023-02-24 22:48:39,533][10351] Saving new best policy, reward=9.778! [2023-02-24 22:48:44,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2322432. Throughput: 0: 983.3. Samples: 580762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:48:44,519][00267] Avg episode reward: [(0, '9.690')] [2023-02-24 22:48:48,197][10365] Updated weights for policy 0, policy_version 570 (0.0028) [2023-02-24 22:48:49,517][00267] Fps is (10 sec: 3276.6, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2338816. Throughput: 0: 955.1. Samples: 585356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:48:49,522][00267] Avg episode reward: [(0, '10.186')] [2023-02-24 22:48:49,540][10351] Saving new best policy, reward=10.186! [2023-02-24 22:48:54,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2359296. Throughput: 0: 968.1. Samples: 588290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:48:54,524][00267] Avg episode reward: [(0, '10.302')] [2023-02-24 22:48:54,527][10351] Saving new best policy, reward=10.302! [2023-02-24 22:48:57,787][10365] Updated weights for policy 0, policy_version 580 (0.0016) [2023-02-24 22:48:59,517][00267] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2383872. Throughput: 0: 1001.2. Samples: 595354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:48:59,519][00267] Avg episode reward: [(0, '10.448')] [2023-02-24 22:48:59,529][10351] Saving new best policy, reward=10.448! [2023-02-24 22:49:04,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2400256. Throughput: 0: 959.4. Samples: 600720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:49:04,521][00267] Avg episode reward: [(0, '10.945')] [2023-02-24 22:49:04,527][10351] Saving new best policy, reward=10.945! [2023-02-24 22:49:09,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2412544. Throughput: 0: 938.9. Samples: 602890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:49:09,524][00267] Avg episode reward: [(0, '11.281')] [2023-02-24 22:49:09,540][10351] Saving new best policy, reward=11.281! [2023-02-24 22:49:09,836][10365] Updated weights for policy 0, policy_version 590 (0.0020) [2023-02-24 22:49:14,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2437120. Throughput: 0: 971.7. Samples: 608730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:49:14,522][00267] Avg episode reward: [(0, '12.104')] [2023-02-24 22:49:14,528][10351] Saving new best policy, reward=12.104! [2023-02-24 22:49:18,615][10365] Updated weights for policy 0, policy_version 600 (0.0025) [2023-02-24 22:49:19,517][00267] Fps is (10 sec: 4915.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2461696. Throughput: 0: 998.0. Samples: 615954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:49:19,522][00267] Avg episode reward: [(0, '13.553')] [2023-02-24 22:49:19,533][10351] Saving new best policy, reward=13.553! [2023-02-24 22:49:24,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 2478080. Throughput: 0: 980.3. Samples: 618728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:49:24,524][00267] Avg episode reward: [(0, '13.272')] [2023-02-24 22:49:29,521][00267] Fps is (10 sec: 2865.8, 60 sec: 3822.6, 300 sec: 3859.9). Total num frames: 2490368. Throughput: 0: 933.1. Samples: 622754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:49:29,524][00267] Avg episode reward: [(0, '13.244')] [2023-02-24 22:49:32,541][10365] Updated weights for policy 0, policy_version 610 (0.0047) [2023-02-24 22:49:34,518][00267] Fps is (10 sec: 2457.2, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 2502656. Throughput: 0: 918.8. Samples: 626702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:49:34,522][00267] Avg episode reward: [(0, '13.417')] [2023-02-24 22:49:39,517][00267] Fps is (10 sec: 3278.4, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 2523136. Throughput: 0: 906.4. Samples: 629080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:49:39,523][00267] Avg episode reward: [(0, '12.699')] [2023-02-24 22:49:42,758][10365] Updated weights for policy 0, policy_version 620 (0.0012) [2023-02-24 22:49:44,522][00267] Fps is (10 sec: 4094.4, 60 sec: 3686.0, 300 sec: 3846.0). Total num frames: 2543616. Throughput: 0: 903.9. Samples: 636036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:49:44,529][00267] Avg episode reward: [(0, '12.497')] [2023-02-24 22:49:49,518][00267] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 2560000. Throughput: 0: 884.7. Samples: 640534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:49:49,521][00267] Avg episode reward: [(0, '13.315')] [2023-02-24 22:49:54,517][00267] Fps is (10 sec: 3278.7, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2576384. Throughput: 0: 890.3. Samples: 642952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:49:54,518][00267] Avg episode reward: [(0, '13.059')] [2023-02-24 22:49:54,752][10365] Updated weights for policy 0, policy_version 630 (0.0035) [2023-02-24 22:49:59,517][00267] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2600960. Throughput: 0: 920.3. Samples: 650144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:49:59,519][00267] Avg episode reward: [(0, '13.125')] [2023-02-24 22:49:59,527][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth... [2023-02-24 22:49:59,644][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000413_1691648.pth [2023-02-24 22:50:03,823][10365] Updated weights for policy 0, policy_version 640 (0.0020) [2023-02-24 22:50:04,517][00267] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 2621440. Throughput: 0: 898.4. Samples: 656384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:50:04,526][00267] Avg episode reward: [(0, '13.818')] [2023-02-24 22:50:04,530][10351] Saving new best policy, reward=13.818! [2023-02-24 22:50:09,518][00267] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 2637824. Throughput: 0: 886.4. Samples: 658618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:50:09,521][00267] Avg episode reward: [(0, '12.690')] [2023-02-24 22:50:14,517][00267] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 2654208. Throughput: 0: 908.5. Samples: 663632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:50:14,519][00267] Avg episode reward: [(0, '14.007')] [2023-02-24 22:50:14,522][10351] Saving new best policy, reward=14.007! [2023-02-24 22:50:15,645][10365] Updated weights for policy 0, policy_version 650 (0.0026) [2023-02-24 22:50:19,517][00267] Fps is (10 sec: 4096.6, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2678784. Throughput: 0: 979.7. Samples: 670788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:50:19,519][00267] Avg episode reward: [(0, '13.570')] [2023-02-24 22:50:24,518][00267] Fps is (10 sec: 4505.0, 60 sec: 3686.3, 300 sec: 3846.1). Total num frames: 2699264. Throughput: 0: 1004.2. Samples: 674270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:50:24,526][00267] Avg episode reward: [(0, '13.862')] [2023-02-24 22:50:25,393][10365] Updated weights for policy 0, policy_version 660 (0.0016) [2023-02-24 22:50:29,519][00267] Fps is (10 sec: 3685.7, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 2715648. Throughput: 0: 947.1. Samples: 678654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:50:29,522][00267] Avg episode reward: [(0, '14.349')] [2023-02-24 22:50:29,536][10351] Saving new best policy, reward=14.349! [2023-02-24 22:50:34,517][00267] Fps is (10 sec: 3277.1, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 2732032. Throughput: 0: 972.6. Samples: 684298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:50:34,524][00267] Avg episode reward: [(0, '13.357')] [2023-02-24 22:50:36,458][10365] Updated weights for policy 0, policy_version 670 (0.0020) [2023-02-24 22:50:39,517][00267] Fps is (10 sec: 4096.8, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2756608. Throughput: 0: 997.8. Samples: 687854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:50:39,519][00267] Avg episode reward: [(0, '15.011')] [2023-02-24 22:50:39,530][10351] Saving new best policy, reward=15.011! [2023-02-24 22:50:44,517][00267] Fps is (10 sec: 4505.7, 60 sec: 3891.6, 300 sec: 3832.2). Total num frames: 2777088. Throughput: 0: 982.3. Samples: 694346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:50:44,520][00267] Avg episode reward: [(0, '15.978')] [2023-02-24 22:50:44,527][10351] Saving new best policy, reward=15.978! [2023-02-24 22:50:46,724][10365] Updated weights for policy 0, policy_version 680 (0.0015) [2023-02-24 22:50:49,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2789376. Throughput: 0: 941.7. Samples: 698762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:50:49,519][00267] Avg episode reward: [(0, '16.687')] [2023-02-24 22:50:49,608][10351] Saving new best policy, reward=16.687! [2023-02-24 22:50:54,518][00267] Fps is (10 sec: 3276.4, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 2809856. Throughput: 0: 947.3. Samples: 701248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:50:54,525][00267] Avg episode reward: [(0, '17.371')] [2023-02-24 22:50:54,530][10351] Saving new best policy, reward=17.371! [2023-02-24 22:50:57,503][10365] Updated weights for policy 0, policy_version 690 (0.0013) [2023-02-24 22:50:59,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2834432. Throughput: 0: 991.6. Samples: 708254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:50:59,519][00267] Avg episode reward: [(0, '18.812')] [2023-02-24 22:50:59,537][10351] Saving new best policy, reward=18.812! [2023-02-24 22:51:04,517][00267] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2850816. Throughput: 0: 964.6. Samples: 714196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:51:04,519][00267] Avg episode reward: [(0, '18.422')] [2023-02-24 22:51:08,632][10365] Updated weights for policy 0, policy_version 700 (0.0027) [2023-02-24 22:51:09,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2867200. Throughput: 0: 937.3. Samples: 716448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:51:09,519][00267] Avg episode reward: [(0, '17.772')] [2023-02-24 22:51:14,517][00267] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2887680. Throughput: 0: 959.6. Samples: 721832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:51:14,519][00267] Avg episode reward: [(0, '17.333')] [2023-02-24 22:51:18,237][10365] Updated weights for policy 0, policy_version 710 (0.0022) [2023-02-24 22:51:19,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2912256. Throughput: 0: 995.5. Samples: 729096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:51:19,519][00267] Avg episode reward: [(0, '16.609')] [2023-02-24 22:51:24,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3873.9). Total num frames: 2932736. Throughput: 0: 987.4. Samples: 732288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:51:24,519][00267] Avg episode reward: [(0, '15.513')] [2023-02-24 22:51:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 2945024. Throughput: 0: 942.6. Samples: 736762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:51:29,520][00267] Avg episode reward: [(0, '16.086')] [2023-02-24 22:51:29,800][10365] Updated weights for policy 0, policy_version 720 (0.0013) [2023-02-24 22:51:34,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2965504. Throughput: 0: 978.8. Samples: 742810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:51:34,519][00267] Avg episode reward: [(0, '16.631')] [2023-02-24 22:51:38,879][10365] Updated weights for policy 0, policy_version 730 (0.0020) [2023-02-24 22:51:39,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2990080. Throughput: 0: 1002.3. Samples: 746350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:51:39,519][00267] Avg episode reward: [(0, '17.300')] [2023-02-24 22:51:44,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3010560. Throughput: 0: 986.2. Samples: 752632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:51:44,521][00267] Avg episode reward: [(0, '18.011')] [2023-02-24 22:51:49,517][00267] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3022848. Throughput: 0: 954.9. Samples: 757166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:51:49,525][00267] Avg episode reward: [(0, '17.840')] [2023-02-24 22:51:50,897][10365] Updated weights for policy 0, policy_version 740 (0.0012) [2023-02-24 22:51:54,525][00267] Fps is (10 sec: 3683.3, 60 sec: 3959.0, 300 sec: 3846.0). Total num frames: 3047424. Throughput: 0: 967.0. Samples: 759972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:51:54,530][00267] Avg episode reward: [(0, '19.110')] [2023-02-24 22:51:54,535][10351] Saving new best policy, reward=19.110! [2023-02-24 22:51:59,517][00267] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3067904. Throughput: 0: 1007.8. Samples: 767184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:51:59,524][00267] Avg episode reward: [(0, '18.111')] [2023-02-24 22:51:59,550][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth... [2023-02-24 22:51:59,555][10365] Updated weights for policy 0, policy_version 750 (0.0016) [2023-02-24 22:51:59,709][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000524_2146304.pth [2023-02-24 22:52:04,517][00267] Fps is (10 sec: 4099.5, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3088384. Throughput: 0: 971.1. Samples: 772796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:52:04,519][00267] Avg episode reward: [(0, '18.159')] [2023-02-24 22:52:09,517][00267] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3100672. Throughput: 0: 949.5. Samples: 775018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:52:09,521][00267] Avg episode reward: [(0, '19.526')] [2023-02-24 22:52:09,544][10351] Saving new best policy, reward=19.526! [2023-02-24 22:52:11,724][10365] Updated weights for policy 0, policy_version 760 (0.0047) [2023-02-24 22:52:14,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3125248. Throughput: 0: 978.9. Samples: 780812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:52:14,522][00267] Avg episode reward: [(0, '19.963')] [2023-02-24 22:52:14,525][10351] Saving new best policy, reward=19.963! [2023-02-24 22:52:19,517][00267] Fps is (10 sec: 4915.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3149824. Throughput: 0: 1003.6. Samples: 787970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:52:19,519][00267] Avg episode reward: [(0, '19.806')] [2023-02-24 22:52:20,264][10365] Updated weights for policy 0, policy_version 770 (0.0014) [2023-02-24 22:52:24,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3166208. Throughput: 0: 986.1. Samples: 790724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:52:24,523][00267] Avg episode reward: [(0, '20.766')] [2023-02-24 22:52:24,527][10351] Saving new best policy, reward=20.766! [2023-02-24 22:52:29,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3178496. Throughput: 0: 945.4. Samples: 795176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:52:29,526][00267] Avg episode reward: [(0, '19.807')] [2023-02-24 22:52:32,676][10365] Updated weights for policy 0, policy_version 780 (0.0023) [2023-02-24 22:52:34,516][00267] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3203072. Throughput: 0: 985.0. Samples: 801490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:52:34,519][00267] Avg episode reward: [(0, '17.889')] [2023-02-24 22:52:39,517][00267] Fps is (10 sec: 4915.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3227648. Throughput: 0: 1002.4. Samples: 805072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:52:39,520][00267] Avg episode reward: [(0, '18.067')] [2023-02-24 22:52:41,440][10365] Updated weights for policy 0, policy_version 790 (0.0017) [2023-02-24 22:52:44,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3244032. Throughput: 0: 972.3. Samples: 810938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:52:44,526][00267] Avg episode reward: [(0, '18.670')] [2023-02-24 22:52:49,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3256320. Throughput: 0: 932.7. Samples: 814766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:52:49,522][00267] Avg episode reward: [(0, '17.613')] [2023-02-24 22:52:54,517][00267] Fps is (10 sec: 2457.6, 60 sec: 3686.9, 300 sec: 3804.4). Total num frames: 3268608. Throughput: 0: 921.6. Samples: 816490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:52:54,521][00267] Avg episode reward: [(0, '18.835')] [2023-02-24 22:52:56,742][10365] Updated weights for policy 0, policy_version 800 (0.0043) [2023-02-24 22:52:59,517][00267] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 3284992. Throughput: 0: 907.2. Samples: 821636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:52:59,521][00267] Avg episode reward: [(0, '18.568')] [2023-02-24 22:53:04,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3305472. Throughput: 0: 889.1. Samples: 827978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:53:04,519][00267] Avg episode reward: [(0, '17.568')] [2023-02-24 22:53:07,418][10365] Updated weights for policy 0, policy_version 810 (0.0035) [2023-02-24 22:53:09,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3321856. Throughput: 0: 876.4. Samples: 830160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:53:09,524][00267] Avg episode reward: [(0, '16.887')] [2023-02-24 22:53:14,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3342336. Throughput: 0: 888.0. Samples: 835136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:53:14,524][00267] Avg episode reward: [(0, '17.571')] [2023-02-24 22:53:17,610][10365] Updated weights for policy 0, policy_version 820 (0.0019) [2023-02-24 22:53:19,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3804.5). Total num frames: 3366912. Throughput: 0: 908.4. Samples: 842368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:53:19,523][00267] Avg episode reward: [(0, '18.197')] [2023-02-24 22:53:24,519][00267] Fps is (10 sec: 4504.7, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 3387392. Throughput: 0: 908.9. Samples: 845974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:53:24,526][00267] Avg episode reward: [(0, '19.044')] [2023-02-24 22:53:28,129][10365] Updated weights for policy 0, policy_version 830 (0.0019) [2023-02-24 22:53:29,518][00267] Fps is (10 sec: 3276.2, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 3399680. Throughput: 0: 881.0. Samples: 850584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:53:29,521][00267] Avg episode reward: [(0, '18.745')] [2023-02-24 22:53:34,516][00267] Fps is (10 sec: 3277.5, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 3420160. Throughput: 0: 916.0. Samples: 855988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 22:53:34,524][00267] Avg episode reward: [(0, '21.532')] [2023-02-24 22:53:34,529][10351] Saving new best policy, reward=21.532! [2023-02-24 22:53:38,589][10365] Updated weights for policy 0, policy_version 840 (0.0044) [2023-02-24 22:53:39,517][00267] Fps is (10 sec: 4506.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 3444736. Throughput: 0: 956.5. Samples: 859532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:53:39,523][00267] Avg episode reward: [(0, '21.961')] [2023-02-24 22:53:39,535][10351] Saving new best policy, reward=21.961! [2023-02-24 22:53:44,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3465216. Throughput: 0: 991.7. Samples: 866264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:53:44,521][00267] Avg episode reward: [(0, '20.175')] [2023-02-24 22:53:49,525][00267] Fps is (10 sec: 3273.9, 60 sec: 3685.9, 300 sec: 3790.4). Total num frames: 3477504. Throughput: 0: 951.1. Samples: 870786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:53:49,539][00267] Avg episode reward: [(0, '20.170')] [2023-02-24 22:53:49,635][10365] Updated weights for policy 0, policy_version 850 (0.0015) [2023-02-24 22:53:54,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3497984. Throughput: 0: 954.7. Samples: 873120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:53:54,519][00267] Avg episode reward: [(0, '19.366')] [2023-02-24 22:53:59,223][10365] Updated weights for policy 0, policy_version 860 (0.0021) [2023-02-24 22:53:59,517][00267] Fps is (10 sec: 4509.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3522560. Throughput: 0: 1004.8. Samples: 880352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:53:59,518][00267] Avg episode reward: [(0, '18.990')] [2023-02-24 22:53:59,535][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000860_3522560.pth... [2023-02-24 22:53:59,658][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth [2023-02-24 22:54:04,516][00267] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 3543040. Throughput: 0: 978.3. Samples: 886392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:54:04,519][00267] Avg episode reward: [(0, '18.955')] [2023-02-24 22:54:09,522][00267] Fps is (10 sec: 3275.0, 60 sec: 3890.8, 300 sec: 3790.5). Total num frames: 3555328. Throughput: 0: 947.4. Samples: 888612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:54:09,525][00267] Avg episode reward: [(0, '19.560')] [2023-02-24 22:54:11,110][10365] Updated weights for policy 0, policy_version 870 (0.0011) [2023-02-24 22:54:14,517][00267] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 3575808. Throughput: 0: 959.8. Samples: 893774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:54:14,525][00267] Avg episode reward: [(0, '19.187')] [2023-02-24 22:54:19,516][00267] Fps is (10 sec: 4508.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3600384. Throughput: 0: 1001.1. Samples: 901036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 22:54:19,519][00267] Avg episode reward: [(0, '19.365')] [2023-02-24 22:54:20,035][10365] Updated weights for policy 0, policy_version 880 (0.0012) [2023-02-24 22:54:24,517][00267] Fps is (10 sec: 4505.7, 60 sec: 3891.3, 300 sec: 3832.3). Total num frames: 3620864. Throughput: 0: 1000.0. Samples: 904532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:54:24,529][00267] Avg episode reward: [(0, '17.116')] [2023-02-24 22:54:29,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3832.2). Total num frames: 3633152. Throughput: 0: 950.0. Samples: 909016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:54:29,521][00267] Avg episode reward: [(0, '17.566')] [2023-02-24 22:54:32,179][10365] Updated weights for policy 0, policy_version 890 (0.0014) [2023-02-24 22:54:34,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3653632. Throughput: 0: 973.6. Samples: 914590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:54:34,521][00267] Avg episode reward: [(0, '17.905')] [2023-02-24 22:54:39,516][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.2). Total num frames: 3678208. Throughput: 0: 1001.6. Samples: 918190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:54:39,518][00267] Avg episode reward: [(0, '17.711')] [2023-02-24 22:54:40,852][10365] Updated weights for policy 0, policy_version 900 (0.0012) [2023-02-24 22:54:44,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3698688. Throughput: 0: 988.2. Samples: 924820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:54:44,523][00267] Avg episode reward: [(0, '19.635')] [2023-02-24 22:54:49,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3960.1, 300 sec: 3860.0). Total num frames: 3715072. Throughput: 0: 955.6. Samples: 929394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:54:49,519][00267] Avg episode reward: [(0, '19.916')] [2023-02-24 22:54:52,828][10365] Updated weights for policy 0, policy_version 910 (0.0030) [2023-02-24 22:54:54,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3735552. Throughput: 0: 963.7. Samples: 931974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:54:54,521][00267] Avg episode reward: [(0, '21.438')] [2023-02-24 22:54:59,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3756032. Throughput: 0: 1006.3. Samples: 939058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:54:59,521][00267] Avg episode reward: [(0, '22.327')] [2023-02-24 22:54:59,530][10351] Saving new best policy, reward=22.327! [2023-02-24 22:55:01,697][10365] Updated weights for policy 0, policy_version 920 (0.0028) [2023-02-24 22:55:04,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3776512. Throughput: 0: 971.2. Samples: 944740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:55:04,522][00267] Avg episode reward: [(0, '22.655')] [2023-02-24 22:55:04,524][10351] Saving new best policy, reward=22.655! [2023-02-24 22:55:09,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3846.1). Total num frames: 3788800. Throughput: 0: 940.1. Samples: 946838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:55:09,518][00267] Avg episode reward: [(0, '22.035')] [2023-02-24 22:55:14,284][10365] Updated weights for policy 0, policy_version 930 (0.0032) [2023-02-24 22:55:14,517][00267] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3809280. Throughput: 0: 953.6. Samples: 951926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 22:55:14,524][00267] Avg episode reward: [(0, '20.759')] [2023-02-24 22:55:19,517][00267] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3833856. Throughput: 0: 987.7. Samples: 959036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 22:55:19,522][00267] Avg episode reward: [(0, '19.224')] [2023-02-24 22:55:23,632][10365] Updated weights for policy 0, policy_version 940 (0.0034) [2023-02-24 22:55:24,517][00267] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3850240. Throughput: 0: 978.3. Samples: 962214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:55:24,520][00267] Avg episode reward: [(0, '18.467')] [2023-02-24 22:55:29,517][00267] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3866624. Throughput: 0: 930.8. Samples: 966708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:55:29,522][00267] Avg episode reward: [(0, '18.824')] [2023-02-24 22:55:34,517][00267] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3887104. Throughput: 0: 953.7. Samples: 972310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 22:55:34,519][00267] Avg episode reward: [(0, '19.157')] [2023-02-24 22:55:35,326][10365] Updated weights for policy 0, policy_version 950 (0.0012) [2023-02-24 22:55:39,517][00267] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3907584. Throughput: 0: 976.0. Samples: 975892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:55:39,519][00267] Avg episode reward: [(0, '21.764')] [2023-02-24 22:55:44,517][00267] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3928064. Throughput: 0: 962.9. Samples: 982388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:55:44,526][00267] Avg episode reward: [(0, '20.719')] [2023-02-24 22:55:45,284][10365] Updated weights for policy 0, policy_version 960 (0.0024) [2023-02-24 22:55:49,519][00267] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 3940352. Throughput: 0: 932.2. Samples: 986692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:55:49,529][00267] Avg episode reward: [(0, '20.608')] [2023-02-24 22:55:54,517][00267] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3960832. Throughput: 0: 940.1. Samples: 989142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 22:55:54,519][00267] Avg episode reward: [(0, '22.027')] [2023-02-24 22:55:56,306][10365] Updated weights for policy 0, policy_version 970 (0.0041) [2023-02-24 22:55:59,517][00267] Fps is (10 sec: 4506.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3985408. Throughput: 0: 986.3. Samples: 996308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 22:55:59,519][00267] Avg episode reward: [(0, '22.121')] [2023-02-24 22:55:59,537][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000973_3985408.pth... [2023-02-24 22:55:59,659][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000750_3072000.pth [2023-02-24 22:56:04,192][10351] Stopping Batcher_0... [2023-02-24 22:56:04,192][10351] Loop batcher_evt_loop terminating... [2023-02-24 22:56:04,195][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 22:56:04,193][00267] Component Batcher_0 stopped! [2023-02-24 22:56:04,276][10373] Stopping RolloutWorker_w3... [2023-02-24 22:56:04,280][00267] Component RolloutWorker_w3 stopped! [2023-02-24 22:56:04,290][10365] Weights refcount: 2 0 [2023-02-24 22:56:04,286][10373] Loop rollout_proc3_evt_loop terminating... [2023-02-24 22:56:04,293][10365] Stopping InferenceWorker_p0-w0... [2023-02-24 22:56:04,298][10365] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 22:56:04,300][00267] Component InferenceWorker_p0-w0 stopped! [2023-02-24 22:56:04,309][10377] Stopping RolloutWorker_w7... [2023-02-24 22:56:04,309][10377] Loop rollout_proc7_evt_loop terminating... [2023-02-24 22:56:04,311][10371] Stopping RolloutWorker_w1... [2023-02-24 22:56:04,311][10371] Loop rollout_proc1_evt_loop terminating... [2023-02-24 22:56:04,309][00267] Component RolloutWorker_w7 stopped! [2023-02-24 22:56:04,312][00267] Component RolloutWorker_w1 stopped! [2023-02-24 22:56:04,406][00267] Component RolloutWorker_w6 stopped! [2023-02-24 22:56:04,409][10376] Stopping RolloutWorker_w6... [2023-02-24 22:56:04,414][10375] Stopping RolloutWorker_w5... [2023-02-24 22:56:04,416][10375] Loop rollout_proc5_evt_loop terminating... [2023-02-24 22:56:04,414][00267] Component RolloutWorker_w5 stopped! [2023-02-24 22:56:04,436][10376] Loop rollout_proc6_evt_loop terminating... [2023-02-24 22:56:04,454][00267] Component RolloutWorker_w4 stopped! [2023-02-24 22:56:04,459][10374] Stopping RolloutWorker_w4... [2023-02-24 22:56:04,460][10374] Loop rollout_proc4_evt_loop terminating... [2023-02-24 22:56:04,482][00267] Component RolloutWorker_w0 stopped! [2023-02-24 22:56:04,487][10366] Stopping RolloutWorker_w0... [2023-02-24 22:56:04,494][00267] Component RolloutWorker_w2 stopped! [2023-02-24 22:56:04,500][10372] Stopping RolloutWorker_w2... [2023-02-24 22:56:04,501][10372] Loop rollout_proc2_evt_loop terminating... [2023-02-24 22:56:04,488][10366] Loop rollout_proc0_evt_loop terminating... [2023-02-24 22:56:04,584][10351] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000860_3522560.pth [2023-02-24 22:56:04,599][10351] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 22:56:04,911][00267] Component LearnerWorker_p0 stopped! [2023-02-24 22:56:04,917][00267] Waiting for process learner_proc0 to stop... [2023-02-24 22:56:04,918][10351] Stopping LearnerWorker_p0... [2023-02-24 22:56:04,919][10351] Loop learner_proc0_evt_loop terminating... [2023-02-24 22:56:06,901][00267] Waiting for process inference_proc0-0 to join... [2023-02-24 22:56:07,766][00267] Waiting for process rollout_proc0 to join... [2023-02-24 22:56:08,378][00267] Waiting for process rollout_proc1 to join... [2023-02-24 22:56:08,384][00267] Waiting for process rollout_proc2 to join... [2023-02-24 22:56:08,385][00267] Waiting for process rollout_proc3 to join... [2023-02-24 22:56:08,387][00267] Waiting for process rollout_proc4 to join... [2023-02-24 22:56:08,388][00267] Waiting for process rollout_proc5 to join... [2023-02-24 22:56:08,390][00267] Waiting for process rollout_proc6 to join... [2023-02-24 22:56:08,392][00267] Waiting for process rollout_proc7 to join... [2023-02-24 22:56:08,394][00267] Batcher 0 profile tree view: batching: 25.5941, releasing_batches: 0.0250 [2023-02-24 22:56:08,396][00267] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 511.5995 update_model: 8.1402 weight_update: 0.0017 one_step: 0.0153 handle_policy_step: 495.4104 deserialize: 14.4745, stack: 2.7742, obs_to_device_normalize: 112.0463, forward: 234.1906, send_messages: 25.5786 prepare_outputs: 81.1428 to_cpu: 51.1377 [2023-02-24 22:56:08,399][00267] Learner 0 profile tree view: misc: 0.0051, prepare_batch: 17.0963 train: 75.7297 epoch_init: 0.0131, minibatch_init: 0.0074, losses_postprocess: 0.6837, kl_divergence: 0.5489, after_optimizer: 33.0583 calculate_losses: 26.9527 losses_init: 0.0037, forward_head: 1.8089, bptt_initial: 17.8636, tail: 1.0903, advantages_returns: 0.2811, losses: 3.3759 bptt: 2.2405 bptt_forward_core: 2.1402 update: 13.8000 clip: 1.3973 [2023-02-24 22:56:08,400][00267] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3609, enqueue_policy_requests: 134.6628, env_step: 799.9653, overhead: 19.1185, complete_rollouts: 6.4086 save_policy_outputs: 19.3479 split_output_tensors: 9.5043 [2023-02-24 22:56:08,402][00267] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2646, enqueue_policy_requests: 136.3562, env_step: 798.1813, overhead: 18.9527, complete_rollouts: 6.5099 save_policy_outputs: 18.9763 split_output_tensors: 9.1929 [2023-02-24 22:56:08,412][00267] Loop Runner_EvtLoop terminating... [2023-02-24 22:56:08,414][00267] Runner profile tree view: main_loop: 1084.7122 [2023-02-24 22:56:08,418][00267] Collected {0: 4005888}, FPS: 3693.0 [2023-02-24 22:56:08,570][00267] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 22:56:08,575][00267] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 22:56:08,577][00267] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 22:56:08,581][00267] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 22:56:08,583][00267] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 22:56:08,587][00267] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 22:56:08,589][00267] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 22:56:08,590][00267] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 22:56:08,591][00267] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-24 22:56:08,592][00267] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-24 22:56:08,594][00267] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 22:56:08,596][00267] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 22:56:08,598][00267] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 22:56:08,599][00267] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 22:56:08,603][00267] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 22:56:08,641][00267] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 22:56:08,644][00267] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 22:56:08,647][00267] RunningMeanStd input shape: (1,) [2023-02-24 22:56:08,671][00267] ConvEncoder: input_channels=3 [2023-02-24 22:56:09,447][00267] Conv encoder output size: 512 [2023-02-24 22:56:09,450][00267] Policy head output size: 512 [2023-02-24 22:56:12,367][00267] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 22:56:13,619][00267] Num frames 100... [2023-02-24 22:56:13,734][00267] Num frames 200... [2023-02-24 22:56:13,846][00267] Num frames 300... [2023-02-24 22:56:13,962][00267] Num frames 400... [2023-02-24 22:56:14,076][00267] Num frames 500... [2023-02-24 22:56:14,189][00267] Num frames 600... [2023-02-24 22:56:14,312][00267] Num frames 700... [2023-02-24 22:56:14,422][00267] Num frames 800... [2023-02-24 22:56:14,534][00267] Num frames 900... [2023-02-24 22:56:14,644][00267] Num frames 1000... [2023-02-24 22:56:14,763][00267] Num frames 1100... [2023-02-24 22:56:14,876][00267] Num frames 1200... [2023-02-24 22:56:14,992][00267] Num frames 1300... [2023-02-24 22:56:15,103][00267] Num frames 1400... [2023-02-24 22:56:15,221][00267] Num frames 1500... [2023-02-24 22:56:15,335][00267] Num frames 1600... [2023-02-24 22:56:15,453][00267] Num frames 1700... [2023-02-24 22:56:15,567][00267] Num frames 1800... [2023-02-24 22:56:15,740][00267] Avg episode rewards: #0: 48.949, true rewards: #0: 18.950 [2023-02-24 22:56:15,743][00267] Avg episode reward: 48.949, avg true_objective: 18.950 [2023-02-24 22:56:15,752][00267] Num frames 1900... [2023-02-24 22:56:15,864][00267] Num frames 2000... [2023-02-24 22:56:15,982][00267] Num frames 2100... [2023-02-24 22:56:16,100][00267] Num frames 2200... [2023-02-24 22:56:16,213][00267] Num frames 2300... [2023-02-24 22:56:16,328][00267] Num frames 2400... [2023-02-24 22:56:16,440][00267] Num frames 2500... [2023-02-24 22:56:16,563][00267] Avg episode rewards: #0: 30.769, true rewards: #0: 12.770 [2023-02-24 22:56:16,565][00267] Avg episode reward: 30.769, avg true_objective: 12.770 [2023-02-24 22:56:16,618][00267] Num frames 2600... [2023-02-24 22:56:16,733][00267] Num frames 2700... [2023-02-24 22:56:16,845][00267] Num frames 2800... [2023-02-24 22:56:16,960][00267] Num frames 2900... [2023-02-24 22:56:17,076][00267] Num frames 3000... [2023-02-24 22:56:17,172][00267] Avg episode rewards: #0: 23.113, true rewards: #0: 10.113 [2023-02-24 22:56:17,174][00267] Avg episode reward: 23.113, avg true_objective: 10.113 [2023-02-24 22:56:17,252][00267] Num frames 3100... [2023-02-24 22:56:17,366][00267] Num frames 3200... [2023-02-24 22:56:17,487][00267] Num frames 3300... [2023-02-24 22:56:17,601][00267] Num frames 3400... [2023-02-24 22:56:17,750][00267] Avg episode rewards: #0: 18.705, true rewards: #0: 8.705 [2023-02-24 22:56:17,754][00267] Avg episode reward: 18.705, avg true_objective: 8.705 [2023-02-24 22:56:17,776][00267] Num frames 3500... [2023-02-24 22:56:17,887][00267] Num frames 3600... [2023-02-24 22:56:18,006][00267] Num frames 3700... [2023-02-24 22:56:18,123][00267] Num frames 3800... [2023-02-24 22:56:18,231][00267] Num frames 3900... [2023-02-24 22:56:18,348][00267] Num frames 4000... [2023-02-24 22:56:18,499][00267] Avg episode rewards: #0: 16.916, true rewards: #0: 8.116 [2023-02-24 22:56:18,501][00267] Avg episode reward: 16.916, avg true_objective: 8.116 [2023-02-24 22:56:18,579][00267] Num frames 4100... [2023-02-24 22:56:18,736][00267] Num frames 4200... [2023-02-24 22:56:18,891][00267] Num frames 4300... [2023-02-24 22:56:19,051][00267] Num frames 4400... [2023-02-24 22:56:19,213][00267] Num frames 4500... [2023-02-24 22:56:19,376][00267] Num frames 4600... [2023-02-24 22:56:19,536][00267] Num frames 4700... [2023-02-24 22:56:19,699][00267] Num frames 4800... [2023-02-24 22:56:19,856][00267] Num frames 4900... [2023-02-24 22:56:20,015][00267] Num frames 5000... [2023-02-24 22:56:20,179][00267] Num frames 5100... [2023-02-24 22:56:20,342][00267] Num frames 5200... [2023-02-24 22:56:20,499][00267] Num frames 5300... [2023-02-24 22:56:20,679][00267] Num frames 5400... [2023-02-24 22:56:20,846][00267] Num frames 5500... [2023-02-24 22:56:21,013][00267] Num frames 5600... [2023-02-24 22:56:21,179][00267] Num frames 5700... [2023-02-24 22:56:21,340][00267] Num frames 5800... [2023-02-24 22:56:21,508][00267] Num frames 5900... [2023-02-24 22:56:21,674][00267] Num frames 6000... [2023-02-24 22:56:21,817][00267] Num frames 6100... [2023-02-24 22:56:21,938][00267] Avg episode rewards: #0: 22.763, true rewards: #0: 10.263 [2023-02-24 22:56:21,940][00267] Avg episode reward: 22.763, avg true_objective: 10.263 [2023-02-24 22:56:21,991][00267] Num frames 6200... [2023-02-24 22:56:22,101][00267] Num frames 6300... [2023-02-24 22:56:22,226][00267] Num frames 6400... [2023-02-24 22:56:22,340][00267] Num frames 6500... [2023-02-24 22:56:22,478][00267] Avg episode rewards: #0: 20.248, true rewards: #0: 9.391 [2023-02-24 22:56:22,480][00267] Avg episode reward: 20.248, avg true_objective: 9.391 [2023-02-24 22:56:22,513][00267] Num frames 6600... [2023-02-24 22:56:22,631][00267] Num frames 6700... [2023-02-24 22:56:22,752][00267] Num frames 6800... [2023-02-24 22:56:22,867][00267] Num frames 6900... [2023-02-24 22:56:22,979][00267] Num frames 7000... [2023-02-24 22:56:23,097][00267] Num frames 7100... [2023-02-24 22:56:23,170][00267] Avg episode rewards: #0: 19.141, true rewards: #0: 8.891 [2023-02-24 22:56:23,172][00267] Avg episode reward: 19.141, avg true_objective: 8.891 [2023-02-24 22:56:23,275][00267] Num frames 7200... [2023-02-24 22:56:23,389][00267] Num frames 7300... [2023-02-24 22:56:23,505][00267] Num frames 7400... [2023-02-24 22:56:23,624][00267] Num frames 7500... [2023-02-24 22:56:23,739][00267] Num frames 7600... [2023-02-24 22:56:23,853][00267] Num frames 7700... [2023-02-24 22:56:23,973][00267] Num frames 7800... [2023-02-24 22:56:24,091][00267] Num frames 7900... [2023-02-24 22:56:24,217][00267] Num frames 8000... [2023-02-24 22:56:24,330][00267] Num frames 8100... [2023-02-24 22:56:24,448][00267] Num frames 8200... [2023-02-24 22:56:24,583][00267] Avg episode rewards: #0: 20.183, true rewards: #0: 9.183 [2023-02-24 22:56:24,586][00267] Avg episode reward: 20.183, avg true_objective: 9.183 [2023-02-24 22:56:24,626][00267] Num frames 8300... [2023-02-24 22:56:24,741][00267] Num frames 8400... [2023-02-24 22:56:24,853][00267] Num frames 8500... [2023-02-24 22:56:24,963][00267] Num frames 8600... [2023-02-24 22:56:25,074][00267] Num frames 8700... [2023-02-24 22:56:25,194][00267] Num frames 8800... [2023-02-24 22:56:25,308][00267] Num frames 8900... [2023-02-24 22:56:25,424][00267] Num frames 9000... [2023-02-24 22:56:25,548][00267] Num frames 9100... [2023-02-24 22:56:25,664][00267] Num frames 9200... [2023-02-24 22:56:25,780][00267] Num frames 9300... [2023-02-24 22:56:25,893][00267] Num frames 9400... [2023-02-24 22:56:26,024][00267] Avg episode rewards: #0: 21.059, true rewards: #0: 9.459 [2023-02-24 22:56:26,025][00267] Avg episode reward: 21.059, avg true_objective: 9.459 [2023-02-24 22:57:22,305][00267] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 23:04:51,869][00267] Environment doom_basic already registered, overwriting... [2023-02-24 23:04:51,874][00267] Environment doom_two_colors_easy already registered, overwriting... [2023-02-24 23:04:51,878][00267] Environment doom_two_colors_hard already registered, overwriting... [2023-02-24 23:04:51,882][00267] Environment doom_dm already registered, overwriting... [2023-02-24 23:04:51,883][00267] Environment doom_dwango5 already registered, overwriting... [2023-02-24 23:04:51,884][00267] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-24 23:04:51,886][00267] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-24 23:04:51,887][00267] Environment doom_my_way_home already registered, overwriting... [2023-02-24 23:04:51,888][00267] Environment doom_deadly_corridor already registered, overwriting... [2023-02-24 23:04:51,889][00267] Environment doom_defend_the_center already registered, overwriting... [2023-02-24 23:04:51,890][00267] Environment doom_defend_the_line already registered, overwriting... [2023-02-24 23:04:51,891][00267] Environment doom_health_gathering already registered, overwriting... [2023-02-24 23:04:51,892][00267] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-24 23:04:51,893][00267] Environment doom_battle already registered, overwriting... [2023-02-24 23:04:51,895][00267] Environment doom_battle2 already registered, overwriting... [2023-02-24 23:04:51,896][00267] Environment doom_duel_bots already registered, overwriting... [2023-02-24 23:04:51,897][00267] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-24 23:04:51,899][00267] Environment doom_duel already registered, overwriting... [2023-02-24 23:04:51,900][00267] Environment doom_deathmatch_full already registered, overwriting... [2023-02-24 23:04:51,901][00267] Environment doom_benchmark already registered, overwriting... [2023-02-24 23:04:51,902][00267] register_encoder_factory: [2023-02-24 23:04:51,934][00267] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 23:04:51,947][00267] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-24 23:04:51,948][00267] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-24 23:04:51,949][00267] Weights and Biases integration disabled [2023-02-24 23:04:51,955][00267] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-24 23:05:17,977][00267] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 23:05:17,980][00267] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 23:05:17,981][00267] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 23:05:17,985][00267] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 23:05:17,989][00267] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 23:05:17,991][00267] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 23:05:17,992][00267] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 23:05:17,994][00267] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 23:05:17,995][00267] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 23:05:17,996][00267] Adding new argument 'hf_repository'='Artachtron/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-24 23:05:18,000][00267] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 23:05:18,001][00267] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 23:05:18,002][00267] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 23:05:18,003][00267] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 23:05:18,008][00267] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 23:05:18,040][00267] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 23:05:18,041][00267] RunningMeanStd input shape: (1,) [2023-02-24 23:05:18,058][00267] ConvEncoder: input_channels=3 [2023-02-24 23:05:18,095][00267] Conv encoder output size: 512 [2023-02-24 23:05:18,096][00267] Policy head output size: 512 [2023-02-24 23:05:18,116][00267] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 23:05:18,566][00267] Num frames 100... [2023-02-24 23:05:18,682][00267] Num frames 200... [2023-02-24 23:05:18,803][00267] Num frames 300... [2023-02-24 23:05:18,915][00267] Num frames 400... [2023-02-24 23:05:19,030][00267] Num frames 500... [2023-02-24 23:05:19,179][00267] Num frames 600... [2023-02-24 23:05:19,340][00267] Num frames 700... [2023-02-24 23:05:19,513][00267] Num frames 800... [2023-02-24 23:05:19,672][00267] Num frames 900... [2023-02-24 23:05:19,833][00267] Num frames 1000... [2023-02-24 23:05:19,995][00267] Num frames 1100... [2023-02-24 23:05:20,158][00267] Num frames 1200... [2023-02-24 23:05:20,319][00267] Num frames 1300... [2023-02-24 23:05:20,479][00267] Num frames 1400... [2023-02-24 23:05:20,654][00267] Avg episode rewards: #0: 33.720, true rewards: #0: 14.720 [2023-02-24 23:05:20,659][00267] Avg episode reward: 33.720, avg true_objective: 14.720 [2023-02-24 23:05:20,716][00267] Num frames 1500... [2023-02-24 23:05:20,875][00267] Num frames 1600... [2023-02-24 23:05:21,036][00267] Num frames 1700... [2023-02-24 23:05:21,196][00267] Num frames 1800... [2023-02-24 23:05:21,366][00267] Num frames 1900... [2023-02-24 23:05:21,568][00267] Avg episode rewards: #0: 21.435, true rewards: #0: 9.935 [2023-02-24 23:05:21,570][00267] Avg episode reward: 21.435, avg true_objective: 9.935 [2023-02-24 23:05:21,596][00267] Num frames 2000... [2023-02-24 23:05:21,764][00267] Num frames 2100... [2023-02-24 23:05:21,932][00267] Num frames 2200... [2023-02-24 23:05:22,098][00267] Num frames 2300... [2023-02-24 23:05:22,266][00267] Num frames 2400... [2023-02-24 23:05:22,428][00267] Num frames 2500... [2023-02-24 23:05:22,585][00267] Num frames 2600... [2023-02-24 23:05:22,708][00267] Num frames 2700... [2023-02-24 23:05:22,823][00267] Num frames 2800... [2023-02-24 23:05:22,941][00267] Num frames 2900... [2023-02-24 23:05:23,056][00267] Num frames 3000... [2023-02-24 23:05:23,168][00267] Num frames 3100... [2023-02-24 23:05:23,282][00267] Num frames 3200... [2023-02-24 23:05:23,398][00267] Num frames 3300... [2023-02-24 23:05:23,493][00267] Avg episode rewards: #0: 25.103, true rewards: #0: 11.103 [2023-02-24 23:05:23,494][00267] Avg episode reward: 25.103, avg true_objective: 11.103 [2023-02-24 23:05:23,573][00267] Num frames 3400... [2023-02-24 23:05:23,692][00267] Num frames 3500... [2023-02-24 23:05:23,806][00267] Num frames 3600... [2023-02-24 23:05:23,918][00267] Num frames 3700... [2023-02-24 23:05:24,035][00267] Num frames 3800... [2023-02-24 23:05:24,149][00267] Num frames 3900... [2023-02-24 23:05:24,261][00267] Num frames 4000... [2023-02-24 23:05:24,380][00267] Num frames 4100... [2023-02-24 23:05:24,498][00267] Num frames 4200... [2023-02-24 23:05:24,612][00267] Num frames 4300... [2023-02-24 23:05:24,695][00267] Avg episode rewards: #0: 24.308, true rewards: #0: 10.807 [2023-02-24 23:05:24,697][00267] Avg episode reward: 24.308, avg true_objective: 10.807 [2023-02-24 23:05:24,784][00267] Num frames 4400... [2023-02-24 23:05:24,904][00267] Num frames 4500... [2023-02-24 23:05:25,016][00267] Num frames 4600... [2023-02-24 23:05:25,130][00267] Num frames 4700... [2023-02-24 23:05:25,249][00267] Num frames 4800... [2023-02-24 23:05:25,310][00267] Avg episode rewards: #0: 21.406, true rewards: #0: 9.606 [2023-02-24 23:05:25,311][00267] Avg episode reward: 21.406, avg true_objective: 9.606 [2023-02-24 23:05:25,425][00267] Num frames 4900... [2023-02-24 23:05:25,544][00267] Num frames 5000... [2023-02-24 23:05:25,656][00267] Num frames 5100... [2023-02-24 23:05:25,782][00267] Num frames 5200... [2023-02-24 23:05:25,894][00267] Num frames 5300... [2023-02-24 23:05:26,006][00267] Num frames 5400... [2023-02-24 23:05:26,109][00267] Avg episode rewards: #0: 19.738, true rewards: #0: 9.072 [2023-02-24 23:05:26,112][00267] Avg episode reward: 19.738, avg true_objective: 9.072 [2023-02-24 23:05:26,182][00267] Num frames 5500... [2023-02-24 23:05:26,293][00267] Num frames 5600... [2023-02-24 23:05:26,405][00267] Num frames 5700... [2023-02-24 23:05:26,519][00267] Num frames 5800... [2023-02-24 23:05:26,635][00267] Num frames 5900... [2023-02-24 23:05:26,756][00267] Num frames 6000... [2023-02-24 23:05:26,867][00267] Avg episode rewards: #0: 18.497, true rewards: #0: 8.640 [2023-02-24 23:05:26,869][00267] Avg episode reward: 18.497, avg true_objective: 8.640 [2023-02-24 23:05:26,933][00267] Num frames 6100... [2023-02-24 23:05:27,050][00267] Num frames 6200... [2023-02-24 23:05:27,162][00267] Num frames 6300... [2023-02-24 23:05:27,275][00267] Num frames 6400... [2023-02-24 23:05:27,387][00267] Num frames 6500... [2023-02-24 23:05:27,507][00267] Num frames 6600... [2023-02-24 23:05:27,619][00267] Num frames 6700... [2023-02-24 23:05:27,737][00267] Num frames 6800... [2023-02-24 23:05:27,849][00267] Num frames 6900... [2023-02-24 23:05:27,960][00267] Num frames 7000... [2023-02-24 23:05:28,075][00267] Num frames 7100... [2023-02-24 23:05:28,186][00267] Num frames 7200... [2023-02-24 23:05:28,278][00267] Avg episode rewards: #0: 19.290, true rewards: #0: 9.040 [2023-02-24 23:05:28,280][00267] Avg episode reward: 19.290, avg true_objective: 9.040 [2023-02-24 23:05:28,358][00267] Num frames 7300... [2023-02-24 23:05:28,476][00267] Num frames 7400... [2023-02-24 23:05:28,588][00267] Num frames 7500... [2023-02-24 23:05:28,700][00267] Num frames 7600... [2023-02-24 23:05:28,817][00267] Num frames 7700... [2023-02-24 23:05:28,931][00267] Num frames 7800... [2023-02-24 23:05:29,042][00267] Num frames 7900... [2023-02-24 23:05:29,153][00267] Num frames 8000... [2023-02-24 23:05:29,265][00267] Num frames 8100... [2023-02-24 23:05:29,353][00267] Avg episode rewards: #0: 18.920, true rewards: #0: 9.031 [2023-02-24 23:05:29,354][00267] Avg episode reward: 18.920, avg true_objective: 9.031 [2023-02-24 23:05:29,454][00267] Num frames 8200... [2023-02-24 23:05:29,565][00267] Num frames 8300... [2023-02-24 23:05:29,675][00267] Num frames 8400... [2023-02-24 23:05:29,796][00267] Num frames 8500... [2023-02-24 23:05:29,909][00267] Avg episode rewards: #0: 17.644, true rewards: #0: 8.544 [2023-02-24 23:05:29,910][00267] Avg episode reward: 17.644, avg true_objective: 8.544 [2023-02-24 23:06:22,773][00267] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 23:06:34,800][00267] The model has been pushed to https://huggingface.co/Artachtron/rl_course_vizdoom_health_gathering_supreme [2023-02-24 23:07:08,926][00267] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 23:07:08,928][00267] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 23:07:08,930][00267] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 23:07:08,932][00267] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 23:07:08,934][00267] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 23:07:08,936][00267] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 23:07:08,938][00267] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 23:07:08,944][00267] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 23:07:08,947][00267] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 23:07:08,948][00267] Adding new argument 'hf_repository'='Artachtron/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-24 23:07:08,950][00267] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 23:07:08,952][00267] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 23:07:08,954][00267] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 23:07:08,955][00267] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 23:07:08,958][00267] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 23:07:08,985][00267] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 23:07:08,990][00267] RunningMeanStd input shape: (1,) [2023-02-24 23:07:09,010][00267] ConvEncoder: input_channels=3 [2023-02-24 23:07:09,067][00267] Conv encoder output size: 512 [2023-02-24 23:07:09,072][00267] Policy head output size: 512 [2023-02-24 23:07:09,101][00267] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-24 23:07:09,630][00267] Num frames 100... [2023-02-24 23:07:09,740][00267] Num frames 200... [2023-02-24 23:07:09,852][00267] Num frames 300... [2023-02-24 23:07:09,975][00267] Num frames 400... [2023-02-24 23:07:10,084][00267] Num frames 500... [2023-02-24 23:07:10,194][00267] Num frames 600... [2023-02-24 23:07:10,304][00267] Num frames 700... [2023-02-24 23:07:10,416][00267] Num frames 800... [2023-02-24 23:07:10,529][00267] Num frames 900... [2023-02-24 23:07:10,642][00267] Num frames 1000... [2023-02-24 23:07:10,755][00267] Num frames 1100... [2023-02-24 23:07:10,869][00267] Avg episode rewards: #0: 24.520, true rewards: #0: 11.520 [2023-02-24 23:07:10,871][00267] Avg episode reward: 24.520, avg true_objective: 11.520 [2023-02-24 23:07:10,927][00267] Num frames 1200... [2023-02-24 23:07:11,045][00267] Num frames 1300... [2023-02-24 23:07:11,158][00267] Num frames 1400... [2023-02-24 23:07:11,269][00267] Num frames 1500... [2023-02-24 23:07:11,382][00267] Num frames 1600... [2023-02-24 23:07:11,494][00267] Num frames 1700... [2023-02-24 23:07:11,625][00267] Num frames 1800... [2023-02-24 23:07:11,748][00267] Num frames 1900... [2023-02-24 23:07:11,872][00267] Num frames 2000... [2023-02-24 23:07:11,991][00267] Num frames 2100... [2023-02-24 23:07:12,106][00267] Num frames 2200... [2023-02-24 23:07:12,219][00267] Num frames 2300... [2023-02-24 23:07:12,334][00267] Num frames 2400... [2023-02-24 23:07:12,454][00267] Num frames 2500... [2023-02-24 23:07:12,593][00267] Avg episode rewards: #0: 27.380, true rewards: #0: 12.880 [2023-02-24 23:07:12,595][00267] Avg episode reward: 27.380, avg true_objective: 12.880 [2023-02-24 23:07:12,626][00267] Num frames 2600... [2023-02-24 23:07:12,737][00267] Num frames 2700... [2023-02-24 23:07:12,848][00267] Num frames 2800... [2023-02-24 23:07:12,968][00267] Num frames 2900... [2023-02-24 23:07:13,084][00267] Num frames 3000... [2023-02-24 23:07:13,201][00267] Num frames 3100... [2023-02-24 23:07:13,313][00267] Num frames 3200... [2023-02-24 23:07:13,424][00267] Avg episode rewards: #0: 23.160, true rewards: #0: 10.827 [2023-02-24 23:07:13,425][00267] Avg episode reward: 23.160, avg true_objective: 10.827 [2023-02-24 23:07:13,487][00267] Num frames 3300... [2023-02-24 23:07:13,598][00267] Num frames 3400... [2023-02-24 23:07:13,709][00267] Num frames 3500... [2023-02-24 23:07:13,831][00267] Num frames 3600... [2023-02-24 23:07:13,945][00267] Num frames 3700... [2023-02-24 23:07:14,062][00267] Num frames 3800... [2023-02-24 23:07:14,178][00267] Num frames 3900... [2023-02-24 23:07:14,296][00267] Num frames 4000... [2023-02-24 23:07:14,421][00267] Num frames 4100... [2023-02-24 23:07:14,545][00267] Num frames 4200... [2023-02-24 23:07:14,669][00267] Num frames 4300... [2023-02-24 23:07:14,791][00267] Num frames 4400... [2023-02-24 23:07:14,910][00267] Num frames 4500... [2023-02-24 23:07:15,031][00267] Num frames 4600... [2023-02-24 23:07:15,194][00267] Avg episode rewards: #0: 24.720, true rewards: #0: 11.720 [2023-02-24 23:07:15,196][00267] Avg episode reward: 24.720, avg true_objective: 11.720 [2023-02-24 23:07:15,214][00267] Num frames 4700... [2023-02-24 23:07:15,339][00267] Num frames 4800... [2023-02-24 23:07:15,454][00267] Num frames 4900... [2023-02-24 23:07:15,566][00267] Num frames 5000... [2023-02-24 23:07:15,682][00267] Num frames 5100... [2023-02-24 23:07:15,803][00267] Num frames 5200... [2023-02-24 23:07:15,965][00267] Avg episode rewards: #0: 21.592, true rewards: #0: 10.592 [2023-02-24 23:07:15,968][00267] Avg episode reward: 21.592, avg true_objective: 10.592 [2023-02-24 23:07:15,976][00267] Num frames 5300... [2023-02-24 23:07:16,094][00267] Num frames 5400... [2023-02-24 23:07:16,204][00267] Num frames 5500... [2023-02-24 23:07:16,320][00267] Num frames 5600... [2023-02-24 23:07:16,430][00267] Num frames 5700... [2023-02-24 23:07:16,544][00267] Num frames 5800... [2023-02-24 23:07:16,655][00267] Num frames 5900... [2023-02-24 23:07:16,777][00267] Num frames 6000... [2023-02-24 23:07:16,829][00267] Avg episode rewards: #0: 20.167, true rewards: #0: 10.000 [2023-02-24 23:07:16,831][00267] Avg episode reward: 20.167, avg true_objective: 10.000 [2023-02-24 23:07:16,945][00267] Num frames 6100... [2023-02-24 23:07:17,059][00267] Num frames 6200... [2023-02-24 23:07:17,184][00267] Num frames 6300... [2023-02-24 23:07:17,295][00267] Num frames 6400... [2023-02-24 23:07:17,411][00267] Num frames 6500... [2023-02-24 23:07:17,526][00267] Num frames 6600... [2023-02-24 23:07:17,647][00267] Num frames 6700... [2023-02-24 23:07:17,745][00267] Avg episode rewards: #0: 19.337, true rewards: #0: 9.623 [2023-02-24 23:07:17,746][00267] Avg episode reward: 19.337, avg true_objective: 9.623 [2023-02-24 23:07:17,822][00267] Num frames 6800... [2023-02-24 23:07:17,935][00267] Num frames 6900... [2023-02-24 23:07:18,055][00267] Num frames 7000... [2023-02-24 23:07:18,171][00267] Num frames 7100... [2023-02-24 23:07:18,284][00267] Num frames 7200... [2023-02-24 23:07:18,396][00267] Num frames 7300... [2023-02-24 23:07:18,511][00267] Num frames 7400... [2023-02-24 23:07:18,625][00267] Num frames 7500... [2023-02-24 23:07:18,739][00267] Num frames 7600... [2023-02-24 23:07:18,848][00267] Num frames 7700... [2023-02-24 23:07:18,970][00267] Num frames 7800... [2023-02-24 23:07:19,085][00267] Num frames 7900... [2023-02-24 23:07:19,203][00267] Num frames 8000... [2023-02-24 23:07:19,313][00267] Num frames 8100... [2023-02-24 23:07:19,447][00267] Num frames 8200... [2023-02-24 23:07:19,622][00267] Avg episode rewards: #0: 21.090, true rewards: #0: 10.340 [2023-02-24 23:07:19,625][00267] Avg episode reward: 21.090, avg true_objective: 10.340 [2023-02-24 23:07:19,672][00267] Num frames 8300... [2023-02-24 23:07:19,836][00267] Num frames 8400... [2023-02-24 23:07:19,990][00267] Num frames 8500... [2023-02-24 23:07:20,153][00267] Num frames 8600... [2023-02-24 23:07:20,310][00267] Num frames 8700... [2023-02-24 23:07:20,468][00267] Num frames 8800... [2023-02-24 23:07:20,623][00267] Num frames 8900... [2023-02-24 23:07:20,783][00267] Num frames 9000... [2023-02-24 23:07:20,942][00267] Num frames 9100... [2023-02-24 23:07:21,100][00267] Num frames 9200... [2023-02-24 23:07:21,268][00267] Num frames 9300... [2023-02-24 23:07:21,433][00267] Num frames 9400... [2023-02-24 23:07:21,593][00267] Num frames 9500... [2023-02-24 23:07:21,719][00267] Avg episode rewards: #0: 22.158, true rewards: #0: 10.602 [2023-02-24 23:07:21,721][00267] Avg episode reward: 22.158, avg true_objective: 10.602 [2023-02-24 23:07:21,815][00267] Num frames 9600... [2023-02-24 23:07:21,985][00267] Num frames 9700... [2023-02-24 23:07:22,151][00267] Num frames 9800... [2023-02-24 23:07:22,327][00267] Num frames 9900... [2023-02-24 23:07:22,490][00267] Num frames 10000... [2023-02-24 23:07:22,650][00267] Num frames 10100... [2023-02-24 23:07:22,815][00267] Num frames 10200... [2023-02-24 23:07:22,958][00267] Num frames 10300... [2023-02-24 23:07:23,069][00267] Num frames 10400... [2023-02-24 23:07:23,183][00267] Num frames 10500... [2023-02-24 23:07:23,305][00267] Num frames 10600... [2023-02-24 23:07:23,423][00267] Num frames 10700... [2023-02-24 23:07:23,541][00267] Num frames 10800... [2023-02-24 23:07:23,657][00267] Num frames 10900... [2023-02-24 23:07:23,774][00267] Num frames 11000... [2023-02-24 23:07:23,895][00267] Num frames 11100... [2023-02-24 23:07:24,011][00267] Num frames 11200... [2023-02-24 23:07:24,122][00267] Num frames 11300... [2023-02-24 23:07:24,238][00267] Num frames 11400... [2023-02-24 23:07:24,357][00267] Num frames 11500... [2023-02-24 23:07:24,477][00267] Avg episode rewards: #0: 25.250, true rewards: #0: 11.550 [2023-02-24 23:07:24,478][00267] Avg episode reward: 25.250, avg true_objective: 11.550 [2023-02-24 23:08:31,463][00267] Replay video saved to /content/train_dir/default_experiment/replay.mp4!