[2023-02-26 09:18:53,350][00216] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-26 09:18:53,353][00216] Rollout worker 0 uses device cpu [2023-02-26 09:18:53,356][00216] Rollout worker 1 uses device cpu [2023-02-26 09:18:53,364][00216] Rollout worker 2 uses device cpu [2023-02-26 09:18:53,372][00216] Rollout worker 3 uses device cpu [2023-02-26 09:18:53,377][00216] Rollout worker 4 uses device cpu [2023-02-26 09:18:53,379][00216] Rollout worker 5 uses device cpu [2023-02-26 09:18:53,387][00216] Rollout worker 6 uses device cpu [2023-02-26 09:18:53,388][00216] Rollout worker 7 uses device cpu [2023-02-26 09:18:53,875][00216] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 09:18:53,881][00216] InferenceWorker_p0-w0: min num requests: 2 [2023-02-26 09:18:54,011][00216] Starting all processes... [2023-02-26 09:18:54,013][00216] Starting process learner_proc0 [2023-02-26 09:18:54,144][00216] Starting all processes... [2023-02-26 09:18:54,195][00216] Starting process inference_proc0-0 [2023-02-26 09:18:54,195][00216] Starting process rollout_proc0 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc1 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc2 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc3 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc4 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc5 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc6 [2023-02-26 09:18:54,197][00216] Starting process rollout_proc7 [2023-02-26 09:19:05,810][13460] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 09:19:05,815][13460] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-26 09:19:05,981][13476] Worker 1 uses CPU cores [1] [2023-02-26 09:19:06,161][13479] Worker 3 uses CPU cores [1] [2023-02-26 09:19:06,176][13474] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 09:19:06,182][13474] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-26 09:19:06,286][13478] Worker 4 uses CPU cores [0] [2023-02-26 09:19:06,661][13481] Worker 6 uses CPU cores [0] [2023-02-26 09:19:06,662][13475] Worker 0 uses CPU cores [0] [2023-02-26 09:19:06,711][13482] Worker 7 uses CPU cores [1] [2023-02-26 09:19:06,724][13477] Worker 2 uses CPU cores [0] [2023-02-26 09:19:06,787][13460] Num visible devices: 1 [2023-02-26 09:19:06,790][13474] Num visible devices: 1 [2023-02-26 09:19:06,818][13460] Starting seed is not provided [2023-02-26 09:19:06,818][13460] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 09:19:06,819][13460] Initializing actor-critic model on device cuda:0 [2023-02-26 09:19:06,820][13460] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 09:19:06,823][13460] RunningMeanStd input shape: (1,) [2023-02-26 09:19:06,841][13480] Worker 5 uses CPU cores [1] [2023-02-26 09:19:06,847][13460] ConvEncoder: input_channels=3 [2023-02-26 09:19:07,436][13460] Conv encoder output size: 512 [2023-02-26 09:19:07,437][13460] Policy head output size: 512 [2023-02-26 09:19:07,514][13460] Created Actor Critic model with architecture: [2023-02-26 09:19:07,515][13460] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-26 09:19:13,832][00216] Heartbeat connected on Batcher_0 [2023-02-26 09:19:13,875][00216] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-26 09:19:13,902][00216] Heartbeat connected on RolloutWorker_w0 [2023-02-26 09:19:13,919][00216] Heartbeat connected on RolloutWorker_w1 [2023-02-26 09:19:13,924][00216] Heartbeat connected on RolloutWorker_w2 [2023-02-26 09:19:13,942][00216] Heartbeat connected on RolloutWorker_w3 [2023-02-26 09:19:13,968][00216] Heartbeat connected on RolloutWorker_w4 [2023-02-26 09:19:13,974][00216] Heartbeat connected on RolloutWorker_w5 [2023-02-26 09:19:13,995][00216] Heartbeat connected on RolloutWorker_w6 [2023-02-26 09:19:14,010][00216] Heartbeat connected on RolloutWorker_w7 [2023-02-26 09:19:16,307][13460] Using optimizer [2023-02-26 09:19:16,308][13460] No checkpoints found [2023-02-26 09:19:16,309][13460] Did not load from checkpoint, starting from scratch! [2023-02-26 09:19:16,309][13460] Initialized policy 0 weights for model version 0 [2023-02-26 09:19:16,314][13460] LearnerWorker_p0 finished initialization! [2023-02-26 09:19:16,315][13460] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-26 09:19:16,315][00216] Heartbeat connected on LearnerWorker_p0 [2023-02-26 09:19:16,515][13474] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 09:19:16,516][13474] RunningMeanStd input shape: (1,) [2023-02-26 09:19:16,528][13474] ConvEncoder: input_channels=3 [2023-02-26 09:19:16,624][13474] Conv encoder output size: 512 [2023-02-26 09:19:16,624][13474] Policy head output size: 512 [2023-02-26 09:19:18,226][00216] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 09:19:18,865][00216] Inference worker 0-0 is ready! [2023-02-26 09:19:18,867][00216] All inference workers are ready! Signal rollout workers to start! [2023-02-26 09:19:18,997][13477] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:18,998][13475] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,012][13479] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,022][13476] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,033][13480] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,039][13478] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,039][13482] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,044][13481] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:19:19,846][13475] Decorrelating experience for 0 frames... [2023-02-26 09:19:19,848][13477] Decorrelating experience for 0 frames... [2023-02-26 09:19:19,847][13480] Decorrelating experience for 0 frames... [2023-02-26 09:19:19,848][13482] Decorrelating experience for 0 frames... [2023-02-26 09:19:20,845][13476] Decorrelating experience for 0 frames... [2023-02-26 09:19:20,882][13475] Decorrelating experience for 32 frames... [2023-02-26 09:19:20,887][13482] Decorrelating experience for 32 frames... [2023-02-26 09:19:20,890][13477] Decorrelating experience for 32 frames... [2023-02-26 09:19:20,894][13480] Decorrelating experience for 32 frames... [2023-02-26 09:19:20,909][13481] Decorrelating experience for 0 frames... [2023-02-26 09:19:22,007][13479] Decorrelating experience for 0 frames... [2023-02-26 09:19:22,119][13476] Decorrelating experience for 32 frames... [2023-02-26 09:19:22,390][13482] Decorrelating experience for 64 frames... [2023-02-26 09:19:22,408][13481] Decorrelating experience for 32 frames... [2023-02-26 09:19:22,434][13478] Decorrelating experience for 0 frames... [2023-02-26 09:19:22,643][13477] Decorrelating experience for 64 frames... [2023-02-26 09:19:22,667][13475] Decorrelating experience for 64 frames... [2023-02-26 09:19:23,226][00216] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 09:19:23,865][13479] Decorrelating experience for 32 frames... [2023-02-26 09:19:24,090][13478] Decorrelating experience for 32 frames... [2023-02-26 09:19:24,104][13481] Decorrelating experience for 64 frames... [2023-02-26 09:19:24,225][13476] Decorrelating experience for 64 frames... [2023-02-26 09:19:24,258][13480] Decorrelating experience for 64 frames... [2023-02-26 09:19:24,501][13482] Decorrelating experience for 96 frames... [2023-02-26 09:19:25,574][13477] Decorrelating experience for 96 frames... [2023-02-26 09:19:25,651][13481] Decorrelating experience for 96 frames... [2023-02-26 09:19:25,731][13475] Decorrelating experience for 96 frames... [2023-02-26 09:19:26,443][13479] Decorrelating experience for 64 frames... [2023-02-26 09:19:26,621][13480] Decorrelating experience for 96 frames... [2023-02-26 09:19:26,627][13476] Decorrelating experience for 96 frames... [2023-02-26 09:19:27,068][13478] Decorrelating experience for 64 frames... [2023-02-26 09:19:27,576][13478] Decorrelating experience for 96 frames... [2023-02-26 09:19:27,899][13479] Decorrelating experience for 96 frames... [2023-02-26 09:19:28,226][00216] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 09:19:32,410][13460] Signal inference workers to stop experience collection... [2023-02-26 09:19:32,442][13474] InferenceWorker_p0-w0: stopping experience collection [2023-02-26 09:19:33,225][00216] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 74.3. Samples: 1114. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-26 09:19:33,229][00216] Avg episode reward: [(0, '2.045')] [2023-02-26 09:19:34,958][13460] Signal inference workers to resume experience collection... [2023-02-26 09:19:34,959][13474] InferenceWorker_p0-w0: resuming experience collection [2023-02-26 09:19:38,226][00216] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 179.3. Samples: 3586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:19:38,233][00216] Avg episode reward: [(0, '3.448')] [2023-02-26 09:19:43,226][00216] Fps is (10 sec: 3276.7, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 362.7. Samples: 9068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:19:43,230][00216] Avg episode reward: [(0, '3.881')] [2023-02-26 09:19:45,343][13474] Updated weights for policy 0, policy_version 10 (0.0012) [2023-02-26 09:19:48,226][00216] Fps is (10 sec: 2867.2, 60 sec: 1501.9, 300 sec: 1501.9). Total num frames: 45056. Throughput: 0: 372.3. Samples: 11170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:19:48,231][00216] Avg episode reward: [(0, '4.367')] [2023-02-26 09:19:53,226][00216] Fps is (10 sec: 3686.5, 60 sec: 1989.5, 300 sec: 1989.5). Total num frames: 69632. Throughput: 0: 470.9. Samples: 16480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:19:53,231][00216] Avg episode reward: [(0, '4.432')] [2023-02-26 09:19:55,848][13474] Updated weights for policy 0, policy_version 20 (0.0027) [2023-02-26 09:19:58,226][00216] Fps is (10 sec: 4505.6, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 586.5. Samples: 23462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:19:58,231][00216] Avg episode reward: [(0, '4.338')] [2023-02-26 09:20:03,226][00216] Fps is (10 sec: 3686.3, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 586.4. Samples: 26386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:20:03,230][00216] Avg episode reward: [(0, '4.326')] [2023-02-26 09:20:03,238][13460] Saving new best policy, reward=4.326! [2023-02-26 09:20:07,830][13474] Updated weights for policy 0, policy_version 30 (0.0013) [2023-02-26 09:20:08,226][00216] Fps is (10 sec: 3276.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 122880. Throughput: 0: 681.8. Samples: 30680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:20:08,232][00216] Avg episode reward: [(0, '4.373')] [2023-02-26 09:20:08,245][13460] Saving new best policy, reward=4.373! [2023-02-26 09:20:13,226][00216] Fps is (10 sec: 3686.5, 60 sec: 2606.5, 300 sec: 2606.5). Total num frames: 143360. Throughput: 0: 811.8. Samples: 36532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:20:13,228][00216] Avg episode reward: [(0, '4.361')] [2023-02-26 09:20:17,113][13474] Updated weights for policy 0, policy_version 40 (0.0021) [2023-02-26 09:20:18,226][00216] Fps is (10 sec: 4505.7, 60 sec: 2798.9, 300 sec: 2798.9). Total num frames: 167936. Throughput: 0: 864.7. Samples: 40026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:20:18,228][00216] Avg episode reward: [(0, '4.414')] [2023-02-26 09:20:18,243][13460] Saving new best policy, reward=4.414! [2023-02-26 09:20:23,230][00216] Fps is (10 sec: 3684.8, 60 sec: 3003.5, 300 sec: 2772.5). Total num frames: 180224. Throughput: 0: 936.2. Samples: 45720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:20:23,232][00216] Avg episode reward: [(0, '4.308')] [2023-02-26 09:20:28,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2808.7). Total num frames: 196608. Throughput: 0: 905.4. Samples: 49810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:20:28,234][00216] Avg episode reward: [(0, '4.305')] [2023-02-26 09:20:29,987][13474] Updated weights for policy 0, policy_version 50 (0.0011) [2023-02-26 09:20:33,227][00216] Fps is (10 sec: 3687.6, 60 sec: 3618.1, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 926.2. Samples: 52848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:20:33,229][00216] Avg episode reward: [(0, '4.354')] [2023-02-26 09:20:38,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 967.6. Samples: 60024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:20:38,232][00216] Avg episode reward: [(0, '4.371')] [2023-02-26 09:20:38,597][13474] Updated weights for policy 0, policy_version 60 (0.0017) [2023-02-26 09:20:43,231][00216] Fps is (10 sec: 4094.0, 60 sec: 3754.3, 300 sec: 3035.7). Total num frames: 258048. Throughput: 0: 931.1. Samples: 65366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:20:43,234][00216] Avg episode reward: [(0, '4.388')] [2023-02-26 09:20:48,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3049.2). Total num frames: 274432. Throughput: 0: 914.7. Samples: 67546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:20:48,229][00216] Avg episode reward: [(0, '4.385')] [2023-02-26 09:20:48,239][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2023-02-26 09:20:51,206][13474] Updated weights for policy 0, policy_version 70 (0.0040) [2023-02-26 09:20:53,226][00216] Fps is (10 sec: 3688.6, 60 sec: 3754.7, 300 sec: 3104.3). Total num frames: 294912. Throughput: 0: 945.6. Samples: 73234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:20:53,229][00216] Avg episode reward: [(0, '4.372')] [2023-02-26 09:20:58,231][00216] Fps is (10 sec: 4503.1, 60 sec: 3822.6, 300 sec: 3194.7). Total num frames: 319488. Throughput: 0: 970.5. Samples: 80208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:20:58,238][00216] Avg episode reward: [(0, '4.474')] [2023-02-26 09:20:58,253][13460] Saving new best policy, reward=4.474! [2023-02-26 09:21:00,598][13474] Updated weights for policy 0, policy_version 80 (0.0033) [2023-02-26 09:21:03,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3159.8). Total num frames: 331776. Throughput: 0: 946.7. Samples: 82628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:21:03,236][00216] Avg episode reward: [(0, '4.475')] [2023-02-26 09:21:03,247][13460] Saving new best policy, reward=4.475! [2023-02-26 09:21:08,226][00216] Fps is (10 sec: 2868.7, 60 sec: 3754.7, 300 sec: 3165.1). Total num frames: 348160. Throughput: 0: 914.6. Samples: 86874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:08,233][00216] Avg episode reward: [(0, '4.389')] [2023-02-26 09:21:12,624][13474] Updated weights for policy 0, policy_version 90 (0.0016) [2023-02-26 09:21:13,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3205.6). Total num frames: 368640. Throughput: 0: 959.2. Samples: 92974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:21:13,228][00216] Avg episode reward: [(0, '4.375')] [2023-02-26 09:21:18,226][00216] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3242.7). Total num frames: 389120. Throughput: 0: 963.2. Samples: 96192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:18,234][00216] Avg episode reward: [(0, '4.565')] [2023-02-26 09:21:18,246][13460] Saving new best policy, reward=4.565! [2023-02-26 09:21:23,227][00216] Fps is (10 sec: 2866.9, 60 sec: 3618.3, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 889.0. Samples: 100028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:21:23,232][00216] Avg episode reward: [(0, '4.496')] [2023-02-26 09:21:26,954][13474] Updated weights for policy 0, policy_version 100 (0.0030) [2023-02-26 09:21:28,226][00216] Fps is (10 sec: 2048.0, 60 sec: 3549.9, 300 sec: 3150.8). Total num frames: 409600. Throughput: 0: 848.2. Samples: 103530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:28,229][00216] Avg episode reward: [(0, '4.451')] [2023-02-26 09:21:33,226][00216] Fps is (10 sec: 3277.2, 60 sec: 3549.9, 300 sec: 3185.8). Total num frames: 430080. Throughput: 0: 850.4. Samples: 105812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:21:33,233][00216] Avg episode reward: [(0, '4.377')] [2023-02-26 09:21:37,524][13474] Updated weights for policy 0, policy_version 110 (0.0013) [2023-02-26 09:21:38,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3218.3). Total num frames: 450560. Throughput: 0: 873.3. Samples: 112534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:38,232][00216] Avg episode reward: [(0, '4.530')] [2023-02-26 09:21:43,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3550.2, 300 sec: 3248.6). Total num frames: 471040. Throughput: 0: 863.6. Samples: 119066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:43,233][00216] Avg episode reward: [(0, '4.510')] [2023-02-26 09:21:48,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3249.5). Total num frames: 487424. Throughput: 0: 858.4. Samples: 121256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:21:48,235][00216] Avg episode reward: [(0, '4.405')] [2023-02-26 09:21:48,841][13474] Updated weights for policy 0, policy_version 120 (0.0011) [2023-02-26 09:21:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3250.4). Total num frames: 503808. Throughput: 0: 869.4. Samples: 125998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:21:53,238][00216] Avg episode reward: [(0, '4.500')] [2023-02-26 09:21:58,226][00216] Fps is (10 sec: 4096.1, 60 sec: 3481.9, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 888.8. Samples: 132972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:21:58,230][00216] Avg episode reward: [(0, '4.559')] [2023-02-26 09:21:58,580][13474] Updated weights for policy 0, policy_version 130 (0.0026) [2023-02-26 09:22:03,231][00216] Fps is (10 sec: 4503.3, 60 sec: 3617.8, 300 sec: 3326.3). Total num frames: 548864. Throughput: 0: 895.0. Samples: 136470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:22:03,238][00216] Avg episode reward: [(0, '4.591')] [2023-02-26 09:22:03,240][13460] Saving new best policy, reward=4.591! [2023-02-26 09:22:08,227][00216] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3300.9). Total num frames: 561152. Throughput: 0: 908.5. Samples: 140910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:22:08,232][00216] Avg episode reward: [(0, '4.427')] [2023-02-26 09:22:11,123][13474] Updated weights for policy 0, policy_version 140 (0.0020) [2023-02-26 09:22:13,226][00216] Fps is (10 sec: 3278.5, 60 sec: 3549.9, 300 sec: 3323.6). Total num frames: 581632. Throughput: 0: 943.0. Samples: 145966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:22:13,233][00216] Avg episode reward: [(0, '4.403')] [2023-02-26 09:22:18,226][00216] Fps is (10 sec: 4506.0, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 606208. Throughput: 0: 969.2. Samples: 149426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:22:18,229][00216] Avg episode reward: [(0, '4.305')] [2023-02-26 09:22:19,842][13474] Updated weights for policy 0, policy_version 150 (0.0012) [2023-02-26 09:22:23,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3365.4). Total num frames: 622592. Throughput: 0: 967.4. Samples: 156066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:22:23,228][00216] Avg episode reward: [(0, '4.562')] [2023-02-26 09:22:28,228][00216] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3363.0). Total num frames: 638976. Throughput: 0: 918.5. Samples: 160402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:22:28,233][00216] Avg episode reward: [(0, '4.746')] [2023-02-26 09:22:28,248][13460] Saving new best policy, reward=4.746! [2023-02-26 09:22:32,440][13474] Updated weights for policy 0, policy_version 160 (0.0021) [2023-02-26 09:22:33,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3360.8). Total num frames: 655360. Throughput: 0: 917.4. Samples: 162538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:22:33,228][00216] Avg episode reward: [(0, '4.591')] [2023-02-26 09:22:38,226][00216] Fps is (10 sec: 4097.0, 60 sec: 3822.9, 300 sec: 3399.7). Total num frames: 679936. Throughput: 0: 967.7. Samples: 169544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:22:38,229][00216] Avg episode reward: [(0, '4.551')] [2023-02-26 09:22:41,289][13474] Updated weights for policy 0, policy_version 170 (0.0015) [2023-02-26 09:22:43,227][00216] Fps is (10 sec: 4505.1, 60 sec: 3822.9, 300 sec: 3416.6). Total num frames: 700416. Throughput: 0: 952.5. Samples: 175836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:22:43,232][00216] Avg episode reward: [(0, '4.382')] [2023-02-26 09:22:48,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3393.8). Total num frames: 712704. Throughput: 0: 922.9. Samples: 177996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:22:48,234][00216] Avg episode reward: [(0, '4.597')] [2023-02-26 09:22:48,253][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000175_716800.pth... [2023-02-26 09:22:53,226][00216] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3410.2). Total num frames: 733184. Throughput: 0: 933.8. Samples: 182930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:22:53,233][00216] Avg episode reward: [(0, '4.687')] [2023-02-26 09:22:53,655][13474] Updated weights for policy 0, policy_version 180 (0.0029) [2023-02-26 09:22:58,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3444.4). Total num frames: 757760. Throughput: 0: 977.6. Samples: 189958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:22:58,228][00216] Avg episode reward: [(0, '4.697')] [2023-02-26 09:23:03,054][13474] Updated weights for policy 0, policy_version 190 (0.0022) [2023-02-26 09:23:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3823.3, 300 sec: 3458.8). Total num frames: 778240. Throughput: 0: 977.2. Samples: 193398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 09:23:03,236][00216] Avg episode reward: [(0, '4.774')] [2023-02-26 09:23:03,240][13460] Saving new best policy, reward=4.774! [2023-02-26 09:23:08,226][00216] Fps is (10 sec: 3276.6, 60 sec: 3823.0, 300 sec: 3437.1). Total num frames: 790528. Throughput: 0: 926.7. Samples: 197770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 09:23:08,229][00216] Avg episode reward: [(0, '4.823')] [2023-02-26 09:23:08,252][13460] Saving new best policy, reward=4.823! [2023-02-26 09:23:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3451.1). Total num frames: 811008. Throughput: 0: 951.2. Samples: 203202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:23:13,233][00216] Avg episode reward: [(0, '4.659')] [2023-02-26 09:23:14,735][13474] Updated weights for policy 0, policy_version 200 (0.0020) [2023-02-26 09:23:18,226][00216] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3464.5). Total num frames: 831488. Throughput: 0: 980.2. Samples: 206648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:23:18,228][00216] Avg episode reward: [(0, '4.712')] [2023-02-26 09:23:23,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3477.4). Total num frames: 851968. Throughput: 0: 969.7. Samples: 213180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:23:23,229][00216] Avg episode reward: [(0, '4.637')] [2023-02-26 09:23:25,176][13474] Updated weights for policy 0, policy_version 210 (0.0013) [2023-02-26 09:23:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3473.4). Total num frames: 868352. Throughput: 0: 926.8. Samples: 217542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:23:28,232][00216] Avg episode reward: [(0, '4.679')] [2023-02-26 09:23:33,236][00216] Fps is (10 sec: 3273.3, 60 sec: 3822.3, 300 sec: 3469.4). Total num frames: 884736. Throughput: 0: 932.9. Samples: 219986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:23:33,244][00216] Avg episode reward: [(0, '4.665')] [2023-02-26 09:23:36,098][13474] Updated weights for policy 0, policy_version 220 (0.0019) [2023-02-26 09:23:38,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3497.4). Total num frames: 909312. Throughput: 0: 975.7. Samples: 226838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:23:38,234][00216] Avg episode reward: [(0, '4.586')] [2023-02-26 09:23:43,226][00216] Fps is (10 sec: 4100.3, 60 sec: 3754.7, 300 sec: 3493.2). Total num frames: 925696. Throughput: 0: 948.0. Samples: 232620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:23:43,231][00216] Avg episode reward: [(0, '4.537')] [2023-02-26 09:23:47,817][13474] Updated weights for policy 0, policy_version 230 (0.0018) [2023-02-26 09:23:48,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3489.2). Total num frames: 942080. Throughput: 0: 918.0. Samples: 234710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:23:48,231][00216] Avg episode reward: [(0, '4.453')] [2023-02-26 09:23:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3485.3). Total num frames: 958464. Throughput: 0: 932.6. Samples: 239736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:23:53,231][00216] Avg episode reward: [(0, '4.556')] [2023-02-26 09:23:57,537][13474] Updated weights for policy 0, policy_version 240 (0.0024) [2023-02-26 09:23:58,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3510.9). Total num frames: 983040. Throughput: 0: 965.8. Samples: 246664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:23:58,233][00216] Avg episode reward: [(0, '4.762')] [2023-02-26 09:24:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3521.1). Total num frames: 1003520. Throughput: 0: 956.9. Samples: 249710. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-26 09:24:03,231][00216] Avg episode reward: [(0, '4.707')] [2023-02-26 09:24:08,228][00216] Fps is (10 sec: 3276.1, 60 sec: 3754.6, 300 sec: 3502.8). Total num frames: 1015808. Throughput: 0: 908.0. Samples: 254044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:24:08,230][00216] Avg episode reward: [(0, '4.537')] [2023-02-26 09:24:10,026][13474] Updated weights for policy 0, policy_version 250 (0.0022) [2023-02-26 09:24:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 936.7. Samples: 259692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:24:13,230][00216] Avg episode reward: [(0, '4.457')] [2023-02-26 09:24:18,226][00216] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 957.7. Samples: 263074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:18,228][00216] Avg episode reward: [(0, '4.693')] [2023-02-26 09:24:19,349][13474] Updated weights for policy 0, policy_version 260 (0.0020) [2023-02-26 09:24:23,230][00216] Fps is (10 sec: 3684.9, 60 sec: 3686.1, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 931.8. Samples: 268772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:23,238][00216] Avg episode reward: [(0, '4.536')] [2023-02-26 09:24:28,227][00216] Fps is (10 sec: 2866.7, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 1085440. Throughput: 0: 873.7. Samples: 271936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:24:28,231][00216] Avg episode reward: [(0, '4.543')] [2023-02-26 09:24:33,226][00216] Fps is (10 sec: 2458.6, 60 sec: 3550.5, 300 sec: 3665.6). Total num frames: 1097728. Throughput: 0: 866.9. Samples: 273722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:33,230][00216] Avg episode reward: [(0, '4.571')] [2023-02-26 09:24:35,653][13474] Updated weights for policy 0, policy_version 270 (0.0012) [2023-02-26 09:24:38,226][00216] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 3665.6). Total num frames: 1114112. Throughput: 0: 862.5. Samples: 278548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:38,229][00216] Avg episode reward: [(0, '4.833')] [2023-02-26 09:24:38,241][13460] Saving new best policy, reward=4.833! [2023-02-26 09:24:43,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 1138688. Throughput: 0: 856.6. Samples: 285212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:43,228][00216] Avg episode reward: [(0, '4.914')] [2023-02-26 09:24:43,236][13460] Saving new best policy, reward=4.914! [2023-02-26 09:24:45,541][13474] Updated weights for policy 0, policy_version 280 (0.0012) [2023-02-26 09:24:48,226][00216] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3665.6). Total num frames: 1150976. Throughput: 0: 838.5. Samples: 287444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:24:48,229][00216] Avg episode reward: [(0, '5.062')] [2023-02-26 09:24:48,247][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000281_1150976.pth... [2023-02-26 09:24:48,396][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2023-02-26 09:24:48,411][13460] Saving new best policy, reward=5.062! [2023-02-26 09:24:53,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1167360. Throughput: 0: 834.6. Samples: 291600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:53,232][00216] Avg episode reward: [(0, '5.176')] [2023-02-26 09:24:53,237][13460] Saving new best policy, reward=5.176! [2023-02-26 09:24:57,549][13474] Updated weights for policy 0, policy_version 290 (0.0032) [2023-02-26 09:24:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3665.6). Total num frames: 1187840. Throughput: 0: 852.0. Samples: 298030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:24:58,234][00216] Avg episode reward: [(0, '5.084')] [2023-02-26 09:25:03,225][00216] Fps is (10 sec: 4505.7, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 1212416. Throughput: 0: 855.7. Samples: 301580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:25:03,228][00216] Avg episode reward: [(0, '5.305')] [2023-02-26 09:25:03,235][13460] Saving new best policy, reward=5.305! [2023-02-26 09:25:07,757][13474] Updated weights for policy 0, policy_version 300 (0.0012) [2023-02-26 09:25:08,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3679.5). Total num frames: 1228800. Throughput: 0: 851.7. Samples: 307094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:25:08,232][00216] Avg episode reward: [(0, '5.334')] [2023-02-26 09:25:08,247][13460] Saving new best policy, reward=5.334! [2023-02-26 09:25:13,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3637.8). Total num frames: 1241088. Throughput: 0: 876.5. Samples: 311376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:25:13,230][00216] Avg episode reward: [(0, '5.201')] [2023-02-26 09:25:18,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 1265664. Throughput: 0: 910.3. Samples: 314684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:25:18,228][00216] Avg episode reward: [(0, '5.289')] [2023-02-26 09:25:18,869][13474] Updated weights for policy 0, policy_version 310 (0.0015) [2023-02-26 09:25:23,230][00216] Fps is (10 sec: 4503.7, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 1286144. Throughput: 0: 954.9. Samples: 321524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:25:23,232][00216] Avg episode reward: [(0, '5.435')] [2023-02-26 09:25:23,241][13460] Saving new best policy, reward=5.435! [2023-02-26 09:25:28,228][00216] Fps is (10 sec: 3685.6, 60 sec: 3618.1, 300 sec: 3679.4). Total num frames: 1302528. Throughput: 0: 917.2. Samples: 326488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:25:28,231][00216] Avg episode reward: [(0, '5.516')] [2023-02-26 09:25:28,252][13460] Saving new best policy, reward=5.516! [2023-02-26 09:25:30,271][13474] Updated weights for policy 0, policy_version 320 (0.0019) [2023-02-26 09:25:33,226][00216] Fps is (10 sec: 3278.2, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1318912. Throughput: 0: 914.7. Samples: 328606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:25:33,233][00216] Avg episode reward: [(0, '5.423')] [2023-02-26 09:25:38,227][00216] Fps is (10 sec: 3686.8, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 1339392. Throughput: 0: 957.3. Samples: 334678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:25:38,232][00216] Avg episode reward: [(0, '5.275')] [2023-02-26 09:25:40,324][13474] Updated weights for policy 0, policy_version 330 (0.0016) [2023-02-26 09:25:43,227][00216] Fps is (10 sec: 4505.1, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 1363968. Throughput: 0: 963.9. Samples: 341408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:25:43,229][00216] Avg episode reward: [(0, '5.337')] [2023-02-26 09:25:48,226][00216] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1376256. Throughput: 0: 933.0. Samples: 343566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:25:48,228][00216] Avg episode reward: [(0, '5.293')] [2023-02-26 09:25:52,990][13474] Updated weights for policy 0, policy_version 340 (0.0026) [2023-02-26 09:25:53,226][00216] Fps is (10 sec: 2867.5, 60 sec: 3754.7, 300 sec: 3637.9). Total num frames: 1392640. Throughput: 0: 904.1. Samples: 347780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:25:53,227][00216] Avg episode reward: [(0, '5.256')] [2023-02-26 09:25:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1413120. Throughput: 0: 952.4. Samples: 354236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 09:25:58,229][00216] Avg episode reward: [(0, '5.181')] [2023-02-26 09:26:02,133][13474] Updated weights for policy 0, policy_version 350 (0.0020) [2023-02-26 09:26:03,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1433600. Throughput: 0: 954.8. Samples: 357648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:26:03,231][00216] Avg episode reward: [(0, '5.417')] [2023-02-26 09:26:08,229][00216] Fps is (10 sec: 3685.2, 60 sec: 3686.2, 300 sec: 3665.5). Total num frames: 1449984. Throughput: 0: 916.7. Samples: 362774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:26:08,232][00216] Avg episode reward: [(0, '5.374')] [2023-02-26 09:26:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1466368. Throughput: 0: 907.9. Samples: 367342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:26:13,230][00216] Avg episode reward: [(0, '5.660')] [2023-02-26 09:26:13,233][13460] Saving new best policy, reward=5.660! [2023-02-26 09:26:14,460][13474] Updated weights for policy 0, policy_version 360 (0.0013) [2023-02-26 09:26:18,226][00216] Fps is (10 sec: 4097.3, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1490944. Throughput: 0: 936.6. Samples: 370754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:26:18,228][00216] Avg episode reward: [(0, '5.510')] [2023-02-26 09:26:23,232][00216] Fps is (10 sec: 4502.8, 60 sec: 3754.5, 300 sec: 3734.9). Total num frames: 1511424. Throughput: 0: 957.9. Samples: 377790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:26:23,235][00216] Avg episode reward: [(0, '5.276')] [2023-02-26 09:26:23,625][13474] Updated weights for policy 0, policy_version 370 (0.0031) [2023-02-26 09:26:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 1527808. Throughput: 0: 915.2. Samples: 382590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 09:26:28,232][00216] Avg episode reward: [(0, '5.612')] [2023-02-26 09:26:33,226][00216] Fps is (10 sec: 3278.9, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1544192. Throughput: 0: 916.3. Samples: 384798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:26:33,232][00216] Avg episode reward: [(0, '5.876')] [2023-02-26 09:26:33,235][13460] Saving new best policy, reward=5.876! [2023-02-26 09:26:35,611][13474] Updated weights for policy 0, policy_version 380 (0.0032) [2023-02-26 09:26:38,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 1568768. Throughput: 0: 967.4. Samples: 391312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:26:38,228][00216] Avg episode reward: [(0, '5.991')] [2023-02-26 09:26:38,238][13460] Saving new best policy, reward=5.991! [2023-02-26 09:26:43,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1589248. Throughput: 0: 973.2. Samples: 398028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:26:43,234][00216] Avg episode reward: [(0, '5.915')] [2023-02-26 09:26:45,492][13474] Updated weights for policy 0, policy_version 390 (0.0019) [2023-02-26 09:26:48,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1605632. Throughput: 0: 948.8. Samples: 400346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:26:48,230][00216] Avg episode reward: [(0, '5.760')] [2023-02-26 09:26:48,237][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000392_1605632.pth... [2023-02-26 09:26:48,366][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000175_716800.pth [2023-02-26 09:26:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1622016. Throughput: 0: 934.2. Samples: 404812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:26:53,228][00216] Avg episode reward: [(0, '6.108')] [2023-02-26 09:26:53,233][13460] Saving new best policy, reward=6.108! [2023-02-26 09:26:56,499][13474] Updated weights for policy 0, policy_version 400 (0.0018) [2023-02-26 09:26:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 1642496. Throughput: 0: 987.5. Samples: 411780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:26:58,228][00216] Avg episode reward: [(0, '6.231')] [2023-02-26 09:26:58,244][13460] Saving new best policy, reward=6.231! [2023-02-26 09:27:03,229][00216] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3748.8). Total num frames: 1667072. Throughput: 0: 989.2. Samples: 415270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:27:03,235][00216] Avg episode reward: [(0, '6.173')] [2023-02-26 09:27:07,235][13474] Updated weights for policy 0, policy_version 410 (0.0018) [2023-02-26 09:27:08,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3721.1). Total num frames: 1679360. Throughput: 0: 942.3. Samples: 420186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:27:08,235][00216] Avg episode reward: [(0, '6.699')] [2023-02-26 09:27:08,251][13460] Saving new best policy, reward=6.699! [2023-02-26 09:27:13,226][00216] Fps is (10 sec: 2868.2, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1695744. Throughput: 0: 939.4. Samples: 424864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:27:13,232][00216] Avg episode reward: [(0, '7.311')] [2023-02-26 09:27:13,235][13460] Saving new best policy, reward=7.311! [2023-02-26 09:27:18,038][13474] Updated weights for policy 0, policy_version 420 (0.0023) [2023-02-26 09:27:18,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1720320. Throughput: 0: 965.6. Samples: 428248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:27:18,232][00216] Avg episode reward: [(0, '7.888')] [2023-02-26 09:27:18,242][13460] Saving new best policy, reward=7.888! [2023-02-26 09:27:23,226][00216] Fps is (10 sec: 4505.7, 60 sec: 3823.3, 300 sec: 3735.0). Total num frames: 1740800. Throughput: 0: 976.6. Samples: 435258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:27:23,228][00216] Avg episode reward: [(0, '7.251')] [2023-02-26 09:27:28,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1753088. Throughput: 0: 917.8. Samples: 439330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-26 09:27:28,228][00216] Avg episode reward: [(0, '7.411')] [2023-02-26 09:27:30,521][13474] Updated weights for policy 0, policy_version 430 (0.0016) [2023-02-26 09:27:33,226][00216] Fps is (10 sec: 2457.6, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1765376. Throughput: 0: 903.9. Samples: 441022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:27:33,232][00216] Avg episode reward: [(0, '7.307')] [2023-02-26 09:27:38,229][00216] Fps is (10 sec: 2866.3, 60 sec: 3549.7, 300 sec: 3665.5). Total num frames: 1781760. Throughput: 0: 892.0. Samples: 444954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:27:38,234][00216] Avg episode reward: [(0, '7.298')] [2023-02-26 09:27:42,718][13474] Updated weights for policy 0, policy_version 440 (0.0022) [2023-02-26 09:27:43,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 1802240. Throughput: 0: 882.8. Samples: 451508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:27:43,231][00216] Avg episode reward: [(0, '7.609')] [2023-02-26 09:27:48,227][00216] Fps is (10 sec: 3687.2, 60 sec: 3549.8, 300 sec: 3679.4). Total num frames: 1818624. Throughput: 0: 870.1. Samples: 454422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:27:48,230][00216] Avg episode reward: [(0, '7.923')] [2023-02-26 09:27:48,247][13460] Saving new best policy, reward=7.923! [2023-02-26 09:27:53,233][00216] Fps is (10 sec: 3274.6, 60 sec: 3549.5, 300 sec: 3651.6). Total num frames: 1835008. Throughput: 0: 858.9. Samples: 458844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:27:53,238][00216] Avg episode reward: [(0, '8.439')] [2023-02-26 09:27:53,243][13460] Saving new best policy, reward=8.439! [2023-02-26 09:27:55,046][13474] Updated weights for policy 0, policy_version 450 (0.0036) [2023-02-26 09:27:58,226][00216] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1855488. Throughput: 0: 889.3. Samples: 464882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:27:58,235][00216] Avg episode reward: [(0, '9.050')] [2023-02-26 09:27:58,249][13460] Saving new best policy, reward=9.050! [2023-02-26 09:28:03,226][00216] Fps is (10 sec: 4508.7, 60 sec: 3550.1, 300 sec: 3693.4). Total num frames: 1880064. Throughput: 0: 889.7. Samples: 468286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:28:03,228][00216] Avg episode reward: [(0, '9.371')] [2023-02-26 09:28:03,232][13460] Saving new best policy, reward=9.371! [2023-02-26 09:28:03,815][13474] Updated weights for policy 0, policy_version 460 (0.0028) [2023-02-26 09:28:08,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1896448. Throughput: 0: 864.6. Samples: 474164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:28:08,231][00216] Avg episode reward: [(0, '9.103')] [2023-02-26 09:28:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1912832. Throughput: 0: 872.4. Samples: 478588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:28:13,228][00216] Avg episode reward: [(0, '9.286')] [2023-02-26 09:28:16,175][13474] Updated weights for policy 0, policy_version 470 (0.0012) [2023-02-26 09:28:18,227][00216] Fps is (10 sec: 3686.0, 60 sec: 3549.8, 300 sec: 3665.6). Total num frames: 1933312. Throughput: 0: 900.6. Samples: 481552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:28:18,232][00216] Avg episode reward: [(0, '9.031')] [2023-02-26 09:28:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1957888. Throughput: 0: 967.8. Samples: 488502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:28:23,228][00216] Avg episode reward: [(0, '9.141')] [2023-02-26 09:28:25,365][13474] Updated weights for policy 0, policy_version 480 (0.0014) [2023-02-26 09:28:28,226][00216] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3693.5). Total num frames: 1974272. Throughput: 0: 944.1. Samples: 493994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:28:28,235][00216] Avg episode reward: [(0, '9.003')] [2023-02-26 09:28:33,227][00216] Fps is (10 sec: 2866.8, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 1986560. Throughput: 0: 927.8. Samples: 496174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:28:33,233][00216] Avg episode reward: [(0, '8.939')] [2023-02-26 09:28:37,442][13474] Updated weights for policy 0, policy_version 490 (0.0013) [2023-02-26 09:28:38,226][00216] Fps is (10 sec: 3276.9, 60 sec: 3754.9, 300 sec: 3665.6). Total num frames: 2007040. Throughput: 0: 956.5. Samples: 501880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:28:38,228][00216] Avg episode reward: [(0, '8.846')] [2023-02-26 09:28:43,226][00216] Fps is (10 sec: 4506.2, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 2031616. Throughput: 0: 978.6. Samples: 508918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:28:43,229][00216] Avg episode reward: [(0, '9.300')] [2023-02-26 09:28:47,263][13474] Updated weights for policy 0, policy_version 500 (0.0017) [2023-02-26 09:28:48,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3693.3). Total num frames: 2048000. Throughput: 0: 965.0. Samples: 511710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:28:48,233][00216] Avg episode reward: [(0, '9.482')] [2023-02-26 09:28:48,247][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth... [2023-02-26 09:28:48,397][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000281_1150976.pth [2023-02-26 09:28:48,409][13460] Saving new best policy, reward=9.482! [2023-02-26 09:28:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3823.4, 300 sec: 3665.6). Total num frames: 2064384. Throughput: 0: 928.6. Samples: 515950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:28:53,228][00216] Avg episode reward: [(0, '9.634')] [2023-02-26 09:28:53,231][13460] Saving new best policy, reward=9.634! [2023-02-26 09:28:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2084864. Throughput: 0: 967.5. Samples: 522126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:28:58,233][00216] Avg episode reward: [(0, '9.609')] [2023-02-26 09:28:58,566][13474] Updated weights for policy 0, policy_version 510 (0.0021) [2023-02-26 09:29:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 2109440. Throughput: 0: 980.4. Samples: 525670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:29:03,228][00216] Avg episode reward: [(0, '9.504')] [2023-02-26 09:29:08,226][00216] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 2125824. Throughput: 0: 954.2. Samples: 531442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:29:08,231][00216] Avg episode reward: [(0, '9.927')] [2023-02-26 09:29:08,248][13460] Saving new best policy, reward=9.927! [2023-02-26 09:29:09,217][13474] Updated weights for policy 0, policy_version 520 (0.0018) [2023-02-26 09:29:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2142208. Throughput: 0: 929.9. Samples: 535840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:29:13,229][00216] Avg episode reward: [(0, '10.053')] [2023-02-26 09:29:13,231][13460] Saving new best policy, reward=10.053! [2023-02-26 09:29:18,226][00216] Fps is (10 sec: 3686.7, 60 sec: 3823.0, 300 sec: 3693.4). Total num frames: 2162688. Throughput: 0: 950.3. Samples: 538936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:29:18,232][00216] Avg episode reward: [(0, '10.755')] [2023-02-26 09:29:18,242][13460] Saving new best policy, reward=10.755! [2023-02-26 09:29:19,657][13474] Updated weights for policy 0, policy_version 530 (0.0014) [2023-02-26 09:29:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2187264. Throughput: 0: 978.2. Samples: 545898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:29:23,227][00216] Avg episode reward: [(0, '11.304')] [2023-02-26 09:29:23,234][13460] Saving new best policy, reward=11.304! [2023-02-26 09:29:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 2199552. Throughput: 0: 937.5. Samples: 551104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:29:28,228][00216] Avg episode reward: [(0, '11.625')] [2023-02-26 09:29:28,245][13460] Saving new best policy, reward=11.625! [2023-02-26 09:29:31,298][13474] Updated weights for policy 0, policy_version 540 (0.0011) [2023-02-26 09:29:33,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 2215936. Throughput: 0: 922.7. Samples: 553230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:29:33,229][00216] Avg episode reward: [(0, '11.423')] [2023-02-26 09:29:38,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2240512. Throughput: 0: 962.5. Samples: 559262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:29:38,228][00216] Avg episode reward: [(0, '11.195')] [2023-02-26 09:29:40,784][13474] Updated weights for policy 0, policy_version 550 (0.0011) [2023-02-26 09:29:43,226][00216] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2260992. Throughput: 0: 983.0. Samples: 566360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:29:43,228][00216] Avg episode reward: [(0, '11.257')] [2023-02-26 09:29:48,229][00216] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 2277376. Throughput: 0: 961.4. Samples: 568934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:29:48,231][00216] Avg episode reward: [(0, '11.194')] [2023-02-26 09:29:52,590][13474] Updated weights for policy 0, policy_version 560 (0.0019) [2023-02-26 09:29:53,227][00216] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 2293760. Throughput: 0: 934.2. Samples: 573480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:29:53,237][00216] Avg episode reward: [(0, '11.864')] [2023-02-26 09:29:53,243][13460] Saving new best policy, reward=11.864! [2023-02-26 09:29:58,226][00216] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2318336. Throughput: 0: 980.6. Samples: 579966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:29:58,229][00216] Avg episode reward: [(0, '11.984')] [2023-02-26 09:29:58,243][13460] Saving new best policy, reward=11.984! [2023-02-26 09:30:01,761][13474] Updated weights for policy 0, policy_version 570 (0.0014) [2023-02-26 09:30:03,226][00216] Fps is (10 sec: 4506.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2338816. Throughput: 0: 988.2. Samples: 583406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:30:03,234][00216] Avg episode reward: [(0, '12.548')] [2023-02-26 09:30:03,237][13460] Saving new best policy, reward=12.548! [2023-02-26 09:30:08,229][00216] Fps is (10 sec: 3685.2, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 2355200. Throughput: 0: 954.3. Samples: 588844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:30:08,232][00216] Avg episode reward: [(0, '11.949')] [2023-02-26 09:30:13,227][00216] Fps is (10 sec: 3276.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2371584. Throughput: 0: 938.3. Samples: 593330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:30:13,231][00216] Avg episode reward: [(0, '11.476')] [2023-02-26 09:30:14,102][13474] Updated weights for policy 0, policy_version 580 (0.0017) [2023-02-26 09:30:18,226][00216] Fps is (10 sec: 3687.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2392064. Throughput: 0: 965.1. Samples: 596658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:30:18,234][00216] Avg episode reward: [(0, '11.146')] [2023-02-26 09:30:22,760][13474] Updated weights for policy 0, policy_version 590 (0.0013) [2023-02-26 09:30:23,226][00216] Fps is (10 sec: 4505.9, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2416640. Throughput: 0: 988.5. Samples: 603746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:30:23,228][00216] Avg episode reward: [(0, '11.839')] [2023-02-26 09:30:28,226][00216] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 2433024. Throughput: 0: 943.6. Samples: 608824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:30:28,229][00216] Avg episode reward: [(0, '12.059')] [2023-02-26 09:30:33,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2445312. Throughput: 0: 931.9. Samples: 610868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:30:33,229][00216] Avg episode reward: [(0, '12.442')] [2023-02-26 09:30:36,871][13474] Updated weights for policy 0, policy_version 600 (0.0015) [2023-02-26 09:30:38,226][00216] Fps is (10 sec: 2457.7, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2457600. Throughput: 0: 918.0. Samples: 614788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:30:38,232][00216] Avg episode reward: [(0, '13.385')] [2023-02-26 09:30:38,250][13460] Saving new best policy, reward=13.385! [2023-02-26 09:30:43,228][00216] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3735.0). Total num frames: 2478080. Throughput: 0: 894.8. Samples: 620232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-26 09:30:43,234][00216] Avg episode reward: [(0, '13.486')] [2023-02-26 09:30:43,237][13460] Saving new best policy, reward=13.486! [2023-02-26 09:30:48,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3735.0). Total num frames: 2494464. Throughput: 0: 883.0. Samples: 623140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:30:48,231][00216] Avg episode reward: [(0, '14.082')] [2023-02-26 09:30:48,245][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... [2023-02-26 09:30:48,351][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000392_1605632.pth [2023-02-26 09:30:48,363][13460] Saving new best policy, reward=14.082! [2023-02-26 09:30:48,845][13474] Updated weights for policy 0, policy_version 610 (0.0014) [2023-02-26 09:30:53,230][00216] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3721.1). Total num frames: 2510848. Throughput: 0: 860.3. Samples: 627558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:30:53,232][00216] Avg episode reward: [(0, '14.645')] [2023-02-26 09:30:53,241][13460] Saving new best policy, reward=14.645! [2023-02-26 09:30:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2531328. Throughput: 0: 898.5. Samples: 633762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:30:58,233][00216] Avg episode reward: [(0, '15.440')] [2023-02-26 09:30:58,246][13460] Saving new best policy, reward=15.440! [2023-02-26 09:30:59,413][13474] Updated weights for policy 0, policy_version 620 (0.0015) [2023-02-26 09:31:03,226][00216] Fps is (10 sec: 4507.4, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2555904. Throughput: 0: 902.2. Samples: 637256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:03,234][00216] Avg episode reward: [(0, '14.876')] [2023-02-26 09:31:08,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3748.9). Total num frames: 2572288. Throughput: 0: 880.4. Samples: 643362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:08,229][00216] Avg episode reward: [(0, '14.582')] [2023-02-26 09:31:09,953][13474] Updated weights for policy 0, policy_version 630 (0.0038) [2023-02-26 09:31:13,226][00216] Fps is (10 sec: 3276.9, 60 sec: 3618.2, 300 sec: 3721.1). Total num frames: 2588672. Throughput: 0: 866.8. Samples: 647830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:31:13,231][00216] Avg episode reward: [(0, '14.313')] [2023-02-26 09:31:18,227][00216] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3721.2). Total num frames: 2609152. Throughput: 0: 888.7. Samples: 650860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:31:18,230][00216] Avg episode reward: [(0, '14.172')] [2023-02-26 09:31:20,256][13474] Updated weights for policy 0, policy_version 640 (0.0028) [2023-02-26 09:31:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3748.9). Total num frames: 2633728. Throughput: 0: 959.3. Samples: 657958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:23,232][00216] Avg episode reward: [(0, '15.906')] [2023-02-26 09:31:23,236][13460] Saving new best policy, reward=15.906! [2023-02-26 09:31:28,226][00216] Fps is (10 sec: 4096.5, 60 sec: 3618.2, 300 sec: 3748.9). Total num frames: 2650112. Throughput: 0: 956.9. Samples: 663290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:28,231][00216] Avg episode reward: [(0, '16.873')] [2023-02-26 09:31:28,245][13460] Saving new best policy, reward=16.873! [2023-02-26 09:31:31,847][13474] Updated weights for policy 0, policy_version 650 (0.0018) [2023-02-26 09:31:33,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2666496. Throughput: 0: 940.8. Samples: 665476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:31:33,231][00216] Avg episode reward: [(0, '16.678')] [2023-02-26 09:31:38,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2686976. Throughput: 0: 970.5. Samples: 671226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:38,228][00216] Avg episode reward: [(0, '16.084')] [2023-02-26 09:31:41,168][13474] Updated weights for policy 0, policy_version 660 (0.0019) [2023-02-26 09:31:43,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3891.3, 300 sec: 3748.9). Total num frames: 2711552. Throughput: 0: 993.1. Samples: 678452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:31:43,233][00216] Avg episode reward: [(0, '15.074')] [2023-02-26 09:31:48,228][00216] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3748.9). Total num frames: 2727936. Throughput: 0: 976.3. Samples: 681192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:31:48,230][00216] Avg episode reward: [(0, '14.117')] [2023-02-26 09:31:53,192][13474] Updated weights for policy 0, policy_version 670 (0.0014) [2023-02-26 09:31:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 3735.0). Total num frames: 2744320. Throughput: 0: 938.8. Samples: 685610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:31:53,233][00216] Avg episode reward: [(0, '15.521')] [2023-02-26 09:31:58,226][00216] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3721.2). Total num frames: 2764800. Throughput: 0: 979.5. Samples: 691908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:31:58,232][00216] Avg episode reward: [(0, '17.436')] [2023-02-26 09:31:58,242][13460] Saving new best policy, reward=17.436! [2023-02-26 09:32:02,188][13474] Updated weights for policy 0, policy_version 680 (0.0013) [2023-02-26 09:32:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2789376. Throughput: 0: 989.3. Samples: 695376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:32:03,233][00216] Avg episode reward: [(0, '16.932')] [2023-02-26 09:32:08,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2805760. Throughput: 0: 959.2. Samples: 701120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:32:08,229][00216] Avg episode reward: [(0, '17.583')] [2023-02-26 09:32:08,248][13460] Saving new best policy, reward=17.583! [2023-02-26 09:32:13,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2818048. Throughput: 0: 939.3. Samples: 705558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:32:13,234][00216] Avg episode reward: [(0, '16.656')] [2023-02-26 09:32:14,468][13474] Updated weights for policy 0, policy_version 690 (0.0027) [2023-02-26 09:32:18,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3735.0). Total num frames: 2842624. Throughput: 0: 961.2. Samples: 708730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:32:18,228][00216] Avg episode reward: [(0, '16.884')] [2023-02-26 09:32:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2863104. Throughput: 0: 990.0. Samples: 715774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:32:23,228][00216] Avg episode reward: [(0, '17.133')] [2023-02-26 09:32:23,261][13474] Updated weights for policy 0, policy_version 700 (0.0017) [2023-02-26 09:32:28,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2879488. Throughput: 0: 945.7. Samples: 721010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:32:28,228][00216] Avg episode reward: [(0, '16.495')] [2023-02-26 09:32:33,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2895872. Throughput: 0: 933.4. Samples: 723194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:32:33,231][00216] Avg episode reward: [(0, '16.505')] [2023-02-26 09:32:35,532][13474] Updated weights for policy 0, policy_version 710 (0.0023) [2023-02-26 09:32:38,226][00216] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2920448. Throughput: 0: 970.7. Samples: 729290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:32:38,228][00216] Avg episode reward: [(0, '16.442')] [2023-02-26 09:32:43,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2940928. Throughput: 0: 986.8. Samples: 736314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:32:43,230][00216] Avg episode reward: [(0, '17.443')] [2023-02-26 09:32:44,797][13474] Updated weights for policy 0, policy_version 720 (0.0015) [2023-02-26 09:32:48,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3804.5). Total num frames: 2957312. Throughput: 0: 961.2. Samples: 738630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:32:48,230][00216] Avg episode reward: [(0, '18.086')] [2023-02-26 09:32:48,246][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000722_2957312.pth... [2023-02-26 09:32:48,374][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000500_2048000.pth [2023-02-26 09:32:48,382][13460] Saving new best policy, reward=18.086! [2023-02-26 09:32:53,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2973696. Throughput: 0: 927.6. Samples: 742862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:32:53,235][00216] Avg episode reward: [(0, '18.974')] [2023-02-26 09:32:53,237][13460] Saving new best policy, reward=18.974! [2023-02-26 09:32:56,765][13474] Updated weights for policy 0, policy_version 730 (0.0039) [2023-02-26 09:32:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2994176. Throughput: 0: 975.5. Samples: 749456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:32:58,228][00216] Avg episode reward: [(0, '20.149')] [2023-02-26 09:32:58,247][13460] Saving new best policy, reward=20.149! [2023-02-26 09:33:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3018752. Throughput: 0: 982.8. Samples: 752956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:03,230][00216] Avg episode reward: [(0, '19.032')] [2023-02-26 09:33:07,053][13474] Updated weights for policy 0, policy_version 740 (0.0014) [2023-02-26 09:33:08,226][00216] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3031040. Throughput: 0: 942.9. Samples: 758204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:33:08,231][00216] Avg episode reward: [(0, '18.793')] [2023-02-26 09:33:13,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3047424. Throughput: 0: 930.4. Samples: 762878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:33:13,229][00216] Avg episode reward: [(0, '18.609')] [2023-02-26 09:33:17,738][13474] Updated weights for policy 0, policy_version 750 (0.0014) [2023-02-26 09:33:18,226][00216] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3072000. Throughput: 0: 961.6. Samples: 766464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:33:18,233][00216] Avg episode reward: [(0, '18.087')] [2023-02-26 09:33:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3092480. Throughput: 0: 981.4. Samples: 773452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:23,230][00216] Avg episode reward: [(0, '16.883')] [2023-02-26 09:33:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3108864. Throughput: 0: 929.7. Samples: 778152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:33:28,233][00216] Avg episode reward: [(0, '17.660')] [2023-02-26 09:33:28,781][13474] Updated weights for policy 0, policy_version 760 (0.0020) [2023-02-26 09:33:33,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3125248. Throughput: 0: 927.1. Samples: 780350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:33:33,227][00216] Avg episode reward: [(0, '16.969')] [2023-02-26 09:33:38,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3145728. Throughput: 0: 978.4. Samples: 786890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:38,232][00216] Avg episode reward: [(0, '16.538')] [2023-02-26 09:33:39,947][13474] Updated weights for policy 0, policy_version 770 (0.0024) [2023-02-26 09:33:43,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3162112. Throughput: 0: 928.3. Samples: 791230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:43,228][00216] Avg episode reward: [(0, '17.020')] [2023-02-26 09:33:48,228][00216] Fps is (10 sec: 2866.6, 60 sec: 3618.0, 300 sec: 3762.7). Total num frames: 3174400. Throughput: 0: 890.6. Samples: 793036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:33:48,231][00216] Avg episode reward: [(0, '16.985')] [2023-02-26 09:33:53,226][00216] Fps is (10 sec: 2457.5, 60 sec: 3549.8, 300 sec: 3735.0). Total num frames: 3186688. Throughput: 0: 862.8. Samples: 797032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:53,228][00216] Avg episode reward: [(0, '19.298')] [2023-02-26 09:33:54,426][13474] Updated weights for policy 0, policy_version 780 (0.0020) [2023-02-26 09:33:58,226][00216] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3211264. Throughput: 0: 901.5. Samples: 803444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:33:58,229][00216] Avg episode reward: [(0, '18.446')] [2023-02-26 09:34:03,166][13474] Updated weights for policy 0, policy_version 790 (0.0019) [2023-02-26 09:34:03,226][00216] Fps is (10 sec: 4915.3, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3235840. Throughput: 0: 897.9. Samples: 806868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:03,234][00216] Avg episode reward: [(0, '21.317')] [2023-02-26 09:34:03,237][13460] Saving new best policy, reward=21.317! [2023-02-26 09:34:08,252][00216] Fps is (10 sec: 3676.9, 60 sec: 3616.6, 300 sec: 3748.5). Total num frames: 3248128. Throughput: 0: 869.4. Samples: 812596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:08,256][00216] Avg episode reward: [(0, '21.325')] [2023-02-26 09:34:08,344][13460] Saving new best policy, reward=21.325! [2023-02-26 09:34:13,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3264512. Throughput: 0: 863.1. Samples: 816990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:13,233][00216] Avg episode reward: [(0, '21.111')] [2023-02-26 09:34:15,516][13474] Updated weights for policy 0, policy_version 800 (0.0024) [2023-02-26 09:34:18,226][00216] Fps is (10 sec: 4106.7, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3289088. Throughput: 0: 886.8. Samples: 820258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:18,231][00216] Avg episode reward: [(0, '19.757')] [2023-02-26 09:34:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3309568. Throughput: 0: 895.7. Samples: 827196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:23,232][00216] Avg episode reward: [(0, '17.580')] [2023-02-26 09:34:24,879][13474] Updated weights for policy 0, policy_version 810 (0.0037) [2023-02-26 09:34:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3325952. Throughput: 0: 915.8. Samples: 832442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:34:28,228][00216] Avg episode reward: [(0, '16.973')] [2023-02-26 09:34:33,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3342336. Throughput: 0: 926.3. Samples: 834718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:33,231][00216] Avg episode reward: [(0, '16.238')] [2023-02-26 09:34:36,376][13474] Updated weights for policy 0, policy_version 820 (0.0014) [2023-02-26 09:34:38,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3366912. Throughput: 0: 974.8. Samples: 840896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:38,228][00216] Avg episode reward: [(0, '15.791')] [2023-02-26 09:34:43,226][00216] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3391488. Throughput: 0: 992.4. Samples: 848102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:43,231][00216] Avg episode reward: [(0, '16.458')] [2023-02-26 09:34:45,794][13474] Updated weights for policy 0, policy_version 830 (0.0011) [2023-02-26 09:34:48,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3762.8). Total num frames: 3403776. Throughput: 0: 969.4. Samples: 850490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:34:48,236][00216] Avg episode reward: [(0, '17.738')] [2023-02-26 09:34:48,256][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000831_3403776.pth... [2023-02-26 09:34:48,407][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth [2023-02-26 09:34:53,226][00216] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3420160. Throughput: 0: 939.8. Samples: 854864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:34:53,231][00216] Avg episode reward: [(0, '18.051')] [2023-02-26 09:34:57,047][13474] Updated weights for policy 0, policy_version 840 (0.0021) [2023-02-26 09:34:58,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3444736. Throughput: 0: 994.3. Samples: 861734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:34:58,228][00216] Avg episode reward: [(0, '19.342')] [2023-02-26 09:35:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3465216. Throughput: 0: 1000.4. Samples: 865278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:35:03,228][00216] Avg episode reward: [(0, '19.225')] [2023-02-26 09:35:07,555][13474] Updated weights for policy 0, policy_version 850 (0.0012) [2023-02-26 09:35:08,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3892.9, 300 sec: 3762.8). Total num frames: 3481600. Throughput: 0: 964.3. Samples: 870588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:35:08,229][00216] Avg episode reward: [(0, '19.059')] [2023-02-26 09:35:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3497984. Throughput: 0: 947.6. Samples: 875084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:35:13,231][00216] Avg episode reward: [(0, '19.283')] [2023-02-26 09:35:18,143][13474] Updated weights for policy 0, policy_version 860 (0.0013) [2023-02-26 09:35:18,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 3522560. Throughput: 0: 975.5. Samples: 878616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:35:18,228][00216] Avg episode reward: [(0, '18.462')] [2023-02-26 09:35:23,227][00216] Fps is (10 sec: 4505.1, 60 sec: 3891.1, 300 sec: 3762.8). Total num frames: 3543040. Throughput: 0: 989.7. Samples: 885432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:35:23,231][00216] Avg episode reward: [(0, '18.448')] [2023-02-26 09:35:28,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3559424. Throughput: 0: 938.4. Samples: 890328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:35:28,231][00216] Avg episode reward: [(0, '19.081')] [2023-02-26 09:35:29,453][13474] Updated weights for policy 0, policy_version 870 (0.0011) [2023-02-26 09:35:33,226][00216] Fps is (10 sec: 3277.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3575808. Throughput: 0: 934.2. Samples: 892528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:35:33,229][00216] Avg episode reward: [(0, '19.582')] [2023-02-26 09:35:38,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 3596288. Throughput: 0: 979.3. Samples: 898932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:35:38,227][00216] Avg episode reward: [(0, '19.699')] [2023-02-26 09:35:39,250][13474] Updated weights for policy 0, policy_version 880 (0.0019) [2023-02-26 09:35:43,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3620864. Throughput: 0: 980.8. Samples: 905872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:35:43,233][00216] Avg episode reward: [(0, '19.967')] [2023-02-26 09:35:48,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.4). Total num frames: 3637248. Throughput: 0: 952.3. Samples: 908130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:35:48,228][00216] Avg episode reward: [(0, '19.150')] [2023-02-26 09:35:50,978][13474] Updated weights for policy 0, policy_version 890 (0.0014) [2023-02-26 09:35:53,225][00216] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3653632. Throughput: 0: 933.4. Samples: 912590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:35:53,228][00216] Avg episode reward: [(0, '18.615')] [2023-02-26 09:35:58,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3674112. Throughput: 0: 991.2. Samples: 919690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:35:58,231][00216] Avg episode reward: [(0, '18.432')] [2023-02-26 09:36:00,028][13474] Updated weights for policy 0, policy_version 900 (0.0012) [2023-02-26 09:36:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3698688. Throughput: 0: 991.2. Samples: 923218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:36:03,232][00216] Avg episode reward: [(0, '18.462')] [2023-02-26 09:36:08,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3710976. Throughput: 0: 952.6. Samples: 928298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:36:08,230][00216] Avg episode reward: [(0, '18.432')] [2023-02-26 09:36:12,258][13474] Updated weights for policy 0, policy_version 910 (0.0014) [2023-02-26 09:36:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3731456. Throughput: 0: 955.1. Samples: 933306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:36:13,234][00216] Avg episode reward: [(0, '19.106')] [2023-02-26 09:36:18,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3751936. Throughput: 0: 983.4. Samples: 936782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:36:18,228][00216] Avg episode reward: [(0, '20.129')] [2023-02-26 09:36:20,864][13474] Updated weights for policy 0, policy_version 920 (0.0020) [2023-02-26 09:36:23,226][00216] Fps is (10 sec: 4505.2, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3776512. Throughput: 0: 997.9. Samples: 943838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-26 09:36:23,231][00216] Avg episode reward: [(0, '20.616')] [2023-02-26 09:36:28,233][00216] Fps is (10 sec: 3683.8, 60 sec: 3822.5, 300 sec: 3804.3). Total num frames: 3788800. Throughput: 0: 944.4. Samples: 948376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:36:28,235][00216] Avg episode reward: [(0, '21.902')] [2023-02-26 09:36:28,254][13460] Saving new best policy, reward=21.902! [2023-02-26 09:36:32,883][13474] Updated weights for policy 0, policy_version 930 (0.0048) [2023-02-26 09:36:33,226][00216] Fps is (10 sec: 3277.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3809280. Throughput: 0: 943.1. Samples: 950568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:36:33,229][00216] Avg episode reward: [(0, '21.855')] [2023-02-26 09:36:38,226][00216] Fps is (10 sec: 4508.8, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3833856. Throughput: 0: 997.5. Samples: 957478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:36:38,235][00216] Avg episode reward: [(0, '21.639')] [2023-02-26 09:36:43,175][13474] Updated weights for policy 0, policy_version 940 (0.0013) [2023-02-26 09:36:43,226][00216] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.5). Total num frames: 3850240. Throughput: 0: 963.2. Samples: 963036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:36:43,236][00216] Avg episode reward: [(0, '21.340')] [2023-02-26 09:36:48,230][00216] Fps is (10 sec: 2456.4, 60 sec: 3686.1, 300 sec: 3776.6). Total num frames: 3858432. Throughput: 0: 924.3. Samples: 964814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:36:48,233][00216] Avg episode reward: [(0, '20.290')] [2023-02-26 09:36:48,350][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000943_3862528.pth... [2023-02-26 09:36:48,593][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000722_2957312.pth [2023-02-26 09:36:53,226][00216] Fps is (10 sec: 2048.0, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3870720. Throughput: 0: 887.4. Samples: 968232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-26 09:36:53,233][00216] Avg episode reward: [(0, '20.981')] [2023-02-26 09:36:57,394][13474] Updated weights for policy 0, policy_version 950 (0.0019) [2023-02-26 09:36:58,226][00216] Fps is (10 sec: 3278.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3891200. Throughput: 0: 903.7. Samples: 973972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-26 09:36:58,227][00216] Avg episode reward: [(0, '19.445')] [2023-02-26 09:37:03,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 3915776. Throughput: 0: 904.7. Samples: 977494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:37:03,227][00216] Avg episode reward: [(0, '20.813')] [2023-02-26 09:37:06,358][13474] Updated weights for policy 0, policy_version 960 (0.0013) [2023-02-26 09:37:08,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3936256. Throughput: 0: 889.0. Samples: 983844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:37:08,229][00216] Avg episode reward: [(0, '22.096')] [2023-02-26 09:37:08,243][13460] Saving new best policy, reward=22.096! [2023-02-26 09:37:13,226][00216] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3948544. Throughput: 0: 889.9. Samples: 988414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-26 09:37:13,234][00216] Avg episode reward: [(0, '22.623')] [2023-02-26 09:37:13,239][13460] Saving new best policy, reward=22.623! [2023-02-26 09:37:18,149][13474] Updated weights for policy 0, policy_version 970 (0.0034) [2023-02-26 09:37:18,226][00216] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3973120. Throughput: 0: 900.4. Samples: 991086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:37:18,232][00216] Avg episode reward: [(0, '22.831')] [2023-02-26 09:37:18,247][13460] Saving new best policy, reward=22.831! [2023-02-26 09:37:23,226][00216] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3776.7). Total num frames: 3993600. Throughput: 0: 903.2. Samples: 998120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-26 09:37:23,228][00216] Avg episode reward: [(0, '23.140')] [2023-02-26 09:37:23,230][13460] Saving new best policy, reward=23.140! [2023-02-26 09:37:25,483][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 09:37:25,493][13460] Stopping Batcher_0... [2023-02-26 09:37:25,493][13460] Loop batcher_evt_loop terminating... [2023-02-26 09:37:25,494][00216] Component Batcher_0 stopped! [2023-02-26 09:37:25,516][13475] Stopping RolloutWorker_w0... [2023-02-26 09:37:25,516][13475] Loop rollout_proc0_evt_loop terminating... [2023-02-26 09:37:25,517][00216] Component RolloutWorker_w0 stopped! [2023-02-26 09:37:25,557][13481] Stopping RolloutWorker_w6... [2023-02-26 09:37:25,557][13481] Loop rollout_proc6_evt_loop terminating... [2023-02-26 09:37:25,555][13478] Stopping RolloutWorker_w4... [2023-02-26 09:37:25,559][00216] Component RolloutWorker_w4 stopped! [2023-02-26 09:37:25,559][13478] Loop rollout_proc4_evt_loop terminating... [2023-02-26 09:37:25,561][00216] Component RolloutWorker_w6 stopped! [2023-02-26 09:37:25,575][13477] Stopping RolloutWorker_w2... [2023-02-26 09:37:25,579][13477] Loop rollout_proc2_evt_loop terminating... [2023-02-26 09:37:25,575][00216] Component RolloutWorker_w2 stopped! [2023-02-26 09:37:25,666][13474] Weights refcount: 2 0 [2023-02-26 09:37:25,682][00216] Component InferenceWorker_p0-w0 stopped! [2023-02-26 09:37:25,684][13474] Stopping InferenceWorker_p0-w0... [2023-02-26 09:37:25,685][13474] Loop inference_proc0-0_evt_loop terminating... [2023-02-26 09:37:25,757][00216] Component RolloutWorker_w5 stopped! [2023-02-26 09:37:25,769][13480] Stopping RolloutWorker_w5... [2023-02-26 09:37:25,770][13480] Loop rollout_proc5_evt_loop terminating... [2023-02-26 09:37:25,772][00216] Component RolloutWorker_w3 stopped! [2023-02-26 09:37:25,774][13479] Stopping RolloutWorker_w3... [2023-02-26 09:37:25,753][13460] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000831_3403776.pth [2023-02-26 09:37:25,775][13479] Loop rollout_proc3_evt_loop terminating... [2023-02-26 09:37:25,793][13460] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 09:37:25,808][13476] Stopping RolloutWorker_w1... [2023-02-26 09:37:25,808][00216] Component RolloutWorker_w1 stopped! [2023-02-26 09:37:25,818][13482] Stopping RolloutWorker_w7... [2023-02-26 09:37:25,818][00216] Component RolloutWorker_w7 stopped! [2023-02-26 09:37:25,826][13476] Loop rollout_proc1_evt_loop terminating... [2023-02-26 09:37:25,846][13482] Loop rollout_proc7_evt_loop terminating... [2023-02-26 09:37:26,121][13460] Stopping LearnerWorker_p0... [2023-02-26 09:37:26,122][00216] Component LearnerWorker_p0 stopped! [2023-02-26 09:37:26,123][13460] Loop learner_proc0_evt_loop terminating... [2023-02-26 09:37:26,123][00216] Waiting for process learner_proc0 to stop... [2023-02-26 09:37:28,253][00216] Waiting for process inference_proc0-0 to join... [2023-02-26 09:37:29,031][00216] Waiting for process rollout_proc0 to join... [2023-02-26 09:37:29,081][00216] Waiting for process rollout_proc1 to join... [2023-02-26 09:37:29,845][00216] Waiting for process rollout_proc2 to join... [2023-02-26 09:37:29,847][00216] Waiting for process rollout_proc3 to join... [2023-02-26 09:37:29,856][00216] Waiting for process rollout_proc4 to join... [2023-02-26 09:37:29,861][00216] Waiting for process rollout_proc5 to join... [2023-02-26 09:37:29,866][00216] Waiting for process rollout_proc6 to join... [2023-02-26 09:37:29,868][00216] Waiting for process rollout_proc7 to join... [2023-02-26 09:37:29,875][00216] Batcher 0 profile tree view: batching: 25.9250, releasing_batches: 0.0241 [2023-02-26 09:37:29,877][00216] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 534.9245 update_model: 7.6956 weight_update: 0.0036 one_step: 0.0167 handle_policy_step: 502.2466 deserialize: 14.6894, stack: 2.8199, obs_to_device_normalize: 113.4075, forward: 239.1676, send_messages: 25.3623 prepare_outputs: 81.2103 to_cpu: 50.4486 [2023-02-26 09:37:29,879][00216] Learner 0 profile tree view: misc: 0.0062, prepare_batch: 15.6359 train: 75.1877 epoch_init: 0.0058, minibatch_init: 0.0171, losses_postprocess: 0.5741, kl_divergence: 0.5399, after_optimizer: 32.9012 calculate_losses: 26.7405 losses_init: 0.0037, forward_head: 1.6479, bptt_initial: 17.7252, tail: 1.0197, advantages_returns: 0.3107, losses: 3.6813 bptt: 2.0454 bptt_forward_core: 1.9500 update: 13.7707 clip: 1.3561 [2023-02-26 09:37:29,881][00216] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3931, enqueue_policy_requests: 147.6515, env_step: 811.2446, overhead: 20.7207, complete_rollouts: 6.2950 save_policy_outputs: 20.2096 split_output_tensors: 9.7217 [2023-02-26 09:37:29,884][00216] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2951, enqueue_policy_requests: 144.5329, env_step: 815.4259, overhead: 20.3001, complete_rollouts: 7.1259 save_policy_outputs: 19.9807 split_output_tensors: 9.4458 [2023-02-26 09:37:29,887][00216] Loop Runner_EvtLoop terminating... [2023-02-26 09:37:29,889][00216] Runner profile tree view: main_loop: 1115.8789 [2023-02-26 09:37:29,891][00216] Collected {0: 4005888}, FPS: 3589.9 [2023-02-26 09:49:35,609][00216] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 09:49:35,611][00216] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 09:49:35,614][00216] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 09:49:35,617][00216] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 09:49:35,619][00216] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 09:49:35,620][00216] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 09:49:35,623][00216] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 09:49:35,625][00216] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 09:49:35,627][00216] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-26 09:49:35,630][00216] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-26 09:49:35,631][00216] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 09:49:35,633][00216] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 09:49:35,634][00216] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 09:49:35,635][00216] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 09:49:35,636][00216] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 09:49:35,665][00216] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-26 09:49:35,669][00216] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 09:49:35,673][00216] RunningMeanStd input shape: (1,) [2023-02-26 09:49:35,691][00216] ConvEncoder: input_channels=3 [2023-02-26 09:49:36,352][00216] Conv encoder output size: 512 [2023-02-26 09:49:36,354][00216] Policy head output size: 512 [2023-02-26 09:49:38,877][00216] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 09:49:40,155][00216] Num frames 100... [2023-02-26 09:49:40,270][00216] Num frames 200... [2023-02-26 09:49:40,384][00216] Num frames 300... [2023-02-26 09:49:40,501][00216] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520 [2023-02-26 09:49:40,503][00216] Avg episode reward: 4.520, avg true_objective: 3.520 [2023-02-26 09:49:40,567][00216] Num frames 400... [2023-02-26 09:49:40,684][00216] Num frames 500... [2023-02-26 09:49:40,811][00216] Num frames 600... [2023-02-26 09:49:40,930][00216] Num frames 700... [2023-02-26 09:49:41,081][00216] Num frames 800... [2023-02-26 09:49:41,241][00216] Num frames 900... [2023-02-26 09:49:41,401][00216] Num frames 1000... [2023-02-26 09:49:41,547][00216] Avg episode rewards: #0: 9.780, true rewards: #0: 5.280 [2023-02-26 09:49:41,549][00216] Avg episode reward: 9.780, avg true_objective: 5.280 [2023-02-26 09:49:41,622][00216] Num frames 1100... [2023-02-26 09:49:41,781][00216] Num frames 1200... [2023-02-26 09:49:41,967][00216] Num frames 1300... [2023-02-26 09:49:42,136][00216] Num frames 1400... [2023-02-26 09:49:42,304][00216] Num frames 1500... [2023-02-26 09:49:42,466][00216] Num frames 1600... [2023-02-26 09:49:42,632][00216] Avg episode rewards: #0: 9.547, true rewards: #0: 5.547 [2023-02-26 09:49:42,635][00216] Avg episode reward: 9.547, avg true_objective: 5.547 [2023-02-26 09:49:42,709][00216] Num frames 1700... [2023-02-26 09:49:42,879][00216] Num frames 1800... [2023-02-26 09:49:43,056][00216] Num frames 1900... [2023-02-26 09:49:43,224][00216] Num frames 2000... [2023-02-26 09:49:43,394][00216] Num frames 2100... [2023-02-26 09:49:43,566][00216] Num frames 2200... [2023-02-26 09:49:43,742][00216] Num frames 2300... [2023-02-26 09:49:43,923][00216] Num frames 2400... [2023-02-26 09:49:44,092][00216] Num frames 2500... [2023-02-26 09:49:44,259][00216] Num frames 2600... [2023-02-26 09:49:44,431][00216] Num frames 2700... [2023-02-26 09:49:44,524][00216] Avg episode rewards: #0: 13.800, true rewards: #0: 6.800 [2023-02-26 09:49:44,526][00216] Avg episode reward: 13.800, avg true_objective: 6.800 [2023-02-26 09:49:44,669][00216] Num frames 2800... [2023-02-26 09:49:44,807][00216] Num frames 2900... [2023-02-26 09:49:44,945][00216] Num frames 3000... [2023-02-26 09:49:45,077][00216] Num frames 3100... [2023-02-26 09:49:45,194][00216] Num frames 3200... [2023-02-26 09:49:45,312][00216] Num frames 3300... [2023-02-26 09:49:45,429][00216] Num frames 3400... [2023-02-26 09:49:45,549][00216] Num frames 3500... [2023-02-26 09:49:45,676][00216] Num frames 3600... [2023-02-26 09:49:45,800][00216] Num frames 3700... [2023-02-26 09:49:45,924][00216] Num frames 3800... [2023-02-26 09:49:46,055][00216] Num frames 3900... [2023-02-26 09:49:46,177][00216] Num frames 4000... [2023-02-26 09:49:46,301][00216] Num frames 4100... [2023-02-26 09:49:46,419][00216] Num frames 4200... [2023-02-26 09:49:46,574][00216] Avg episode rewards: #0: 18.568, true rewards: #0: 8.568 [2023-02-26 09:49:46,577][00216] Avg episode reward: 18.568, avg true_objective: 8.568 [2023-02-26 09:49:46,599][00216] Num frames 4300... [2023-02-26 09:49:46,727][00216] Num frames 4400... [2023-02-26 09:49:46,854][00216] Num frames 4500... [2023-02-26 09:49:46,978][00216] Num frames 4600... [2023-02-26 09:49:47,106][00216] Num frames 4700... [2023-02-26 09:49:47,224][00216] Num frames 4800... [2023-02-26 09:49:47,352][00216] Num frames 4900... [2023-02-26 09:49:47,471][00216] Num frames 5000... [2023-02-26 09:49:47,598][00216] Num frames 5100... [2023-02-26 09:49:47,735][00216] Num frames 5200... [2023-02-26 09:49:47,811][00216] Avg episode rewards: #0: 18.525, true rewards: #0: 8.692 [2023-02-26 09:49:47,812][00216] Avg episode reward: 18.525, avg true_objective: 8.692 [2023-02-26 09:49:47,915][00216] Num frames 5300... [2023-02-26 09:49:48,032][00216] Num frames 5400... [2023-02-26 09:49:48,148][00216] Num frames 5500... [2023-02-26 09:49:48,271][00216] Num frames 5600... [2023-02-26 09:49:48,388][00216] Num frames 5700... [2023-02-26 09:49:48,507][00216] Num frames 5800... [2023-02-26 09:49:48,623][00216] Num frames 5900... [2023-02-26 09:49:48,790][00216] Avg episode rewards: #0: 17.690, true rewards: #0: 8.547 [2023-02-26 09:49:48,792][00216] Avg episode reward: 17.690, avg true_objective: 8.547 [2023-02-26 09:49:48,818][00216] Num frames 6000... [2023-02-26 09:49:48,940][00216] Num frames 6100... [2023-02-26 09:49:49,068][00216] Num frames 6200... [2023-02-26 09:49:49,187][00216] Num frames 6300... [2023-02-26 09:49:49,316][00216] Num frames 6400... [2023-02-26 09:49:49,437][00216] Num frames 6500... [2023-02-26 09:49:49,572][00216] Num frames 6600... [2023-02-26 09:49:49,701][00216] Num frames 6700... [2023-02-26 09:49:49,789][00216] Avg episode rewards: #0: 17.399, true rewards: #0: 8.399 [2023-02-26 09:49:49,791][00216] Avg episode reward: 17.399, avg true_objective: 8.399 [2023-02-26 09:49:49,889][00216] Num frames 6800... [2023-02-26 09:49:50,008][00216] Num frames 6900... [2023-02-26 09:49:50,123][00216] Num frames 7000... [2023-02-26 09:49:50,239][00216] Num frames 7100... [2023-02-26 09:49:50,364][00216] Num frames 7200... [2023-02-26 09:49:50,493][00216] Num frames 7300... [2023-02-26 09:49:50,619][00216] Num frames 7400... [2023-02-26 09:49:50,743][00216] Num frames 7500... [2023-02-26 09:49:50,869][00216] Num frames 7600... [2023-02-26 09:49:51,003][00216] Num frames 7700... [2023-02-26 09:49:51,127][00216] Num frames 7800... [2023-02-26 09:49:51,252][00216] Num frames 7900... [2023-02-26 09:49:51,370][00216] Num frames 8000... [2023-02-26 09:49:51,489][00216] Num frames 8100... [2023-02-26 09:49:51,617][00216] Num frames 8200... [2023-02-26 09:49:51,737][00216] Num frames 8300... [2023-02-26 09:49:51,872][00216] Num frames 8400... [2023-02-26 09:49:51,990][00216] Num frames 8500... [2023-02-26 09:49:52,114][00216] Num frames 8600... [2023-02-26 09:49:52,233][00216] Num frames 8700... [2023-02-26 09:49:52,361][00216] Num frames 8800... [2023-02-26 09:49:52,441][00216] Avg episode rewards: #0: 21.132, true rewards: #0: 9.799 [2023-02-26 09:49:52,442][00216] Avg episode reward: 21.132, avg true_objective: 9.799 [2023-02-26 09:49:52,544][00216] Num frames 8900... [2023-02-26 09:49:52,660][00216] Num frames 9000... [2023-02-26 09:49:52,779][00216] Num frames 9100... [2023-02-26 09:49:52,893][00216] Num frames 9200... [2023-02-26 09:49:53,017][00216] Num frames 9300... [2023-02-26 09:49:53,137][00216] Num frames 9400... [2023-02-26 09:49:53,264][00216] Num frames 9500... [2023-02-26 09:49:53,380][00216] Num frames 9600... [2023-02-26 09:49:53,503][00216] Num frames 9700... [2023-02-26 09:49:53,595][00216] Avg episode rewards: #0: 20.632, true rewards: #0: 9.732 [2023-02-26 09:49:53,597][00216] Avg episode reward: 20.632, avg true_objective: 9.732 [2023-02-26 09:50:58,109][00216] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-26 09:52:41,498][00216] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-26 09:52:41,501][00216] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-26 09:52:41,504][00216] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-26 09:52:41,507][00216] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-26 09:52:41,508][00216] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-26 09:52:41,510][00216] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-26 09:52:41,512][00216] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-26 09:52:41,514][00216] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-26 09:52:41,515][00216] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-26 09:52:41,517][00216] Adding new argument 'hf_repository'='numan966/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-26 09:52:41,519][00216] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-26 09:52:41,520][00216] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-26 09:52:41,522][00216] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-26 09:52:41,523][00216] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-26 09:52:41,525][00216] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-26 09:52:41,557][00216] RunningMeanStd input shape: (3, 72, 128) [2023-02-26 09:52:41,560][00216] RunningMeanStd input shape: (1,) [2023-02-26 09:52:41,577][00216] ConvEncoder: input_channels=3 [2023-02-26 09:52:41,619][00216] Conv encoder output size: 512 [2023-02-26 09:52:41,621][00216] Policy head output size: 512 [2023-02-26 09:52:41,644][00216] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-26 09:52:42,163][00216] Num frames 100... [2023-02-26 09:52:42,297][00216] Num frames 200... [2023-02-26 09:52:42,435][00216] Num frames 300... [2023-02-26 09:52:42,570][00216] Num frames 400... [2023-02-26 09:52:42,708][00216] Num frames 500... [2023-02-26 09:52:42,845][00216] Num frames 600... [2023-02-26 09:52:42,978][00216] Num frames 700... [2023-02-26 09:52:43,110][00216] Num frames 800... [2023-02-26 09:52:43,247][00216] Num frames 900... [2023-02-26 09:52:43,385][00216] Num frames 1000... [2023-02-26 09:52:43,519][00216] Num frames 1100... [2023-02-26 09:52:43,646][00216] Num frames 1200... [2023-02-26 09:52:43,779][00216] Num frames 1300... [2023-02-26 09:52:43,913][00216] Num frames 1400... [2023-02-26 09:52:44,047][00216] Num frames 1500... [2023-02-26 09:52:44,181][00216] Num frames 1600... [2023-02-26 09:52:44,315][00216] Num frames 1700... [2023-02-26 09:52:44,460][00216] Num frames 1800... [2023-02-26 09:52:44,588][00216] Num frames 1900... [2023-02-26 09:52:44,727][00216] Num frames 2000... [2023-02-26 09:52:44,862][00216] Num frames 2100... [2023-02-26 09:52:44,922][00216] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 [2023-02-26 09:52:44,926][00216] Avg episode reward: 58.999, avg true_objective: 21.000 [2023-02-26 09:52:45,054][00216] Num frames 2200... [2023-02-26 09:52:45,189][00216] Num frames 2300... [2023-02-26 09:52:45,328][00216] Num frames 2400... [2023-02-26 09:52:45,464][00216] Num frames 2500... [2023-02-26 09:52:45,600][00216] Num frames 2600... [2023-02-26 09:52:45,728][00216] Num frames 2700... [2023-02-26 09:52:45,864][00216] Num frames 2800... [2023-02-26 09:52:45,998][00216] Num frames 2900... [2023-02-26 09:52:46,129][00216] Num frames 3000... [2023-02-26 09:52:46,271][00216] Num frames 3100... [2023-02-26 09:52:46,402][00216] Num frames 3200... [2023-02-26 09:52:46,545][00216] Num frames 3300... [2023-02-26 09:52:46,663][00216] Avg episode rewards: #0: 42.240, true rewards: #0: 16.740 [2023-02-26 09:52:46,668][00216] Avg episode reward: 42.240, avg true_objective: 16.740 [2023-02-26 09:52:46,739][00216] Num frames 3400... [2023-02-26 09:52:46,868][00216] Num frames 3500... [2023-02-26 09:52:47,004][00216] Num frames 3600... [2023-02-26 09:52:47,143][00216] Num frames 3700... [2023-02-26 09:52:47,278][00216] Num frames 3800... [2023-02-26 09:52:47,416][00216] Num frames 3900... [2023-02-26 09:52:47,559][00216] Num frames 4000... [2023-02-26 09:52:47,686][00216] Num frames 4100... [2023-02-26 09:52:47,821][00216] Num frames 4200... [2023-02-26 09:52:47,953][00216] Num frames 4300... [2023-02-26 09:52:48,086][00216] Num frames 4400... [2023-02-26 09:52:48,254][00216] Num frames 4500... [2023-02-26 09:52:48,487][00216] Avg episode rewards: #0: 36.986, true rewards: #0: 15.320 [2023-02-26 09:52:48,490][00216] Avg episode reward: 36.986, avg true_objective: 15.320 [2023-02-26 09:52:48,503][00216] Num frames 4600... [2023-02-26 09:52:48,693][00216] Num frames 4700... [2023-02-26 09:52:48,882][00216] Num frames 4800... [2023-02-26 09:52:49,069][00216] Num frames 4900... [2023-02-26 09:52:49,269][00216] Num frames 5000... [2023-02-26 09:52:49,456][00216] Num frames 5100... [2023-02-26 09:52:49,657][00216] Num frames 5200... [2023-02-26 09:52:49,846][00216] Num frames 5300... [2023-02-26 09:52:50,038][00216] Num frames 5400... [2023-02-26 09:52:50,221][00216] Num frames 5500... [2023-02-26 09:52:50,424][00216] Num frames 5600... [2023-02-26 09:52:50,603][00216] Num frames 5700... [2023-02-26 09:52:50,784][00216] Num frames 5800... [2023-02-26 09:52:50,988][00216] Avg episode rewards: #0: 34.190, true rewards: #0: 14.690 [2023-02-26 09:52:50,990][00216] Avg episode reward: 34.190, avg true_objective: 14.690 [2023-02-26 09:52:51,041][00216] Num frames 5900... [2023-02-26 09:52:51,225][00216] Num frames 6000... [2023-02-26 09:52:51,420][00216] Num frames 6100... [2023-02-26 09:52:51,611][00216] Num frames 6200... [2023-02-26 09:52:51,816][00216] Num frames 6300... [2023-02-26 09:52:51,922][00216] Avg episode rewards: #0: 28.448, true rewards: #0: 12.648 [2023-02-26 09:52:51,925][00216] Avg episode reward: 28.448, avg true_objective: 12.648 [2023-02-26 09:52:52,072][00216] Num frames 6400... [2023-02-26 09:52:52,269][00216] Num frames 6500... [2023-02-26 09:52:52,459][00216] Num frames 6600... [2023-02-26 09:52:52,591][00216] Num frames 6700... [2023-02-26 09:52:52,740][00216] Num frames 6800... [2023-02-26 09:52:52,870][00216] Num frames 6900... [2023-02-26 09:52:53,014][00216] Num frames 7000... [2023-02-26 09:52:53,145][00216] Num frames 7100... [2023-02-26 09:52:53,277][00216] Num frames 7200... [2023-02-26 09:52:53,411][00216] Num frames 7300... [2023-02-26 09:52:53,539][00216] Num frames 7400... [2023-02-26 09:52:53,678][00216] Num frames 7500... [2023-02-26 09:52:53,813][00216] Num frames 7600... [2023-02-26 09:52:53,955][00216] Num frames 7700... [2023-02-26 09:52:54,084][00216] Num frames 7800... [2023-02-26 09:52:54,260][00216] Avg episode rewards: #0: 30.153, true rewards: #0: 13.153 [2023-02-26 09:52:54,264][00216] Avg episode reward: 30.153, avg true_objective: 13.153 [2023-02-26 09:52:54,283][00216] Num frames 7900... [2023-02-26 09:52:54,414][00216] Num frames 8000... [2023-02-26 09:52:54,550][00216] Num frames 8100... [2023-02-26 09:52:54,681][00216] Num frames 8200... [2023-02-26 09:52:54,826][00216] Num frames 8300... [2023-02-26 09:52:54,966][00216] Num frames 8400... [2023-02-26 09:52:55,102][00216] Num frames 8500... [2023-02-26 09:52:55,243][00216] Num frames 8600... [2023-02-26 09:52:55,379][00216] Num frames 8700... [2023-02-26 09:52:55,517][00216] Num frames 8800... [2023-02-26 09:52:55,646][00216] Num frames 8900... [2023-02-26 09:52:55,789][00216] Num frames 9000... [2023-02-26 09:52:55,928][00216] Num frames 9100... [2023-02-26 09:52:56,065][00216] Num frames 9200... [2023-02-26 09:52:56,196][00216] Num frames 9300... [2023-02-26 09:52:56,334][00216] Num frames 9400... [2023-02-26 09:52:56,465][00216] Num frames 9500... [2023-02-26 09:52:56,645][00216] Avg episode rewards: #0: 30.839, true rewards: #0: 13.696 [2023-02-26 09:52:56,647][00216] Avg episode reward: 30.839, avg true_objective: 13.696 [2023-02-26 09:52:56,670][00216] Num frames 9600... [2023-02-26 09:52:56,810][00216] Num frames 9700... [2023-02-26 09:52:56,940][00216] Num frames 9800... [2023-02-26 09:52:57,073][00216] Num frames 9900... [2023-02-26 09:52:57,221][00216] Avg episode rewards: #0: 27.714, true rewards: #0: 12.464 [2023-02-26 09:52:57,223][00216] Avg episode reward: 27.714, avg true_objective: 12.464 [2023-02-26 09:52:57,266][00216] Num frames 10000... [2023-02-26 09:52:57,394][00216] Num frames 10100... [2023-02-26 09:52:57,523][00216] Num frames 10200... [2023-02-26 09:52:57,648][00216] Num frames 10300... [2023-02-26 09:52:57,776][00216] Avg episode rewards: #0: 25.283, true rewards: #0: 11.506 [2023-02-26 09:52:57,778][00216] Avg episode reward: 25.283, avg true_objective: 11.506 [2023-02-26 09:52:57,842][00216] Num frames 10400... [2023-02-26 09:52:57,979][00216] Num frames 10500... [2023-02-26 09:52:58,109][00216] Num frames 10600... [2023-02-26 09:52:58,245][00216] Num frames 10700... [2023-02-26 09:52:58,390][00216] Num frames 10800... [2023-02-26 09:52:58,524][00216] Num frames 10900... [2023-02-26 09:52:58,667][00216] Num frames 11000... [2023-02-26 09:52:58,805][00216] Num frames 11100... [2023-02-26 09:52:58,943][00216] Num frames 11200... [2023-02-26 09:52:59,078][00216] Num frames 11300... [2023-02-26 09:52:59,210][00216] Num frames 11400... [2023-02-26 09:52:59,348][00216] Num frames 11500... [2023-02-26 09:52:59,482][00216] Num frames 11600... [2023-02-26 09:52:59,615][00216] Avg episode rewards: #0: 25.756, true rewards: #0: 11.656 [2023-02-26 09:52:59,618][00216] Avg episode reward: 25.756, avg true_objective: 11.656 [2023-02-26 09:54:22,274][00216] Replay video saved to /content/train_dir/default_experiment/replay.mp4!