[2023-02-22 13:59:07,005][01098] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 13:59:07,010][01098] Rollout worker 0 uses device cpu [2023-02-22 13:59:07,015][01098] Rollout worker 1 uses device cpu [2023-02-22 13:59:07,016][01098] Rollout worker 2 uses device cpu [2023-02-22 13:59:07,017][01098] Rollout worker 3 uses device cpu [2023-02-22 13:59:07,019][01098] Rollout worker 4 uses device cpu [2023-02-22 13:59:07,020][01098] Rollout worker 5 uses device cpu [2023-02-22 13:59:07,021][01098] Rollout worker 6 uses device cpu [2023-02-22 13:59:07,022][01098] Rollout worker 7 uses device cpu [2023-02-22 13:59:07,506][01098] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 13:59:07,509][01098] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 13:59:07,590][01098] Starting all processes... [2023-02-22 13:59:07,592][01098] Starting process learner_proc0 [2023-02-22 13:59:07,688][01098] Starting all processes... [2023-02-22 13:59:07,750][01098] Starting process inference_proc0-0 [2023-02-22 13:59:07,751][01098] Starting process rollout_proc0 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc1 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc2 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc3 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc4 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc5 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc6 [2023-02-22 13:59:07,753][01098] Starting process rollout_proc7 [2023-02-22 13:59:20,396][11398] Worker 0 uses CPU cores [0] [2023-02-22 13:59:20,502][11402] Worker 4 uses CPU cores [0] [2023-02-22 13:59:20,558][11401] Worker 3 uses CPU cores [1] [2023-02-22 13:59:20,576][11383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 13:59:20,576][11383] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 13:59:20,658][11403] Worker 5 uses CPU cores [1] [2023-02-22 13:59:20,731][11399] Worker 1 uses CPU cores [1] [2023-02-22 13:59:20,792][11404] Worker 6 uses CPU cores [0] [2023-02-22 13:59:20,835][11397] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 13:59:20,835][11397] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 13:59:20,900][11400] Worker 2 uses CPU cores [0] [2023-02-22 13:59:21,031][11405] Worker 7 uses CPU cores [1] [2023-02-22 13:59:21,423][11397] Num visible devices: 1 [2023-02-22 13:59:21,428][11383] Num visible devices: 1 [2023-02-22 13:59:21,453][11383] Starting seed is not provided [2023-02-22 13:59:21,454][11383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 13:59:21,455][11383] Initializing actor-critic model on device cuda:0 [2023-02-22 13:59:21,456][11383] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 13:59:21,458][11383] RunningMeanStd input shape: (1,) [2023-02-22 13:59:21,475][11383] ConvEncoder: input_channels=3 [2023-02-22 13:59:21,856][11383] Conv encoder output size: 512 [2023-02-22 13:59:21,857][11383] Policy head output size: 512 [2023-02-22 13:59:21,914][11383] Created Actor Critic model with architecture: [2023-02-22 13:59:21,914][11383] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 13:59:27,388][01098] Heartbeat connected on Batcher_0 [2023-02-22 13:59:27,507][01098] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 13:59:27,525][01098] Heartbeat connected on RolloutWorker_w0 [2023-02-22 13:59:27,532][01098] Heartbeat connected on RolloutWorker_w1 [2023-02-22 13:59:27,555][01098] Heartbeat connected on RolloutWorker_w2 [2023-02-22 13:59:27,557][01098] Heartbeat connected on RolloutWorker_w3 [2023-02-22 13:59:27,567][01098] Heartbeat connected on RolloutWorker_w4 [2023-02-22 13:59:27,578][01098] Heartbeat connected on RolloutWorker_w5 [2023-02-22 13:59:27,584][01098] Heartbeat connected on RolloutWorker_w6 [2023-02-22 13:59:27,587][01098] Heartbeat connected on RolloutWorker_w7 [2023-02-22 13:59:29,940][11383] Using optimizer [2023-02-22 13:59:29,941][11383] No checkpoints found [2023-02-22 13:59:29,941][11383] Did not load from checkpoint, starting from scratch! [2023-02-22 13:59:29,941][11383] Initialized policy 0 weights for model version 0 [2023-02-22 13:59:29,945][11383] LearnerWorker_p0 finished initialization! [2023-02-22 13:59:29,950][11383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 13:59:29,946][01098] Heartbeat connected on LearnerWorker_p0 [2023-02-22 13:59:30,156][11397] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 13:59:30,157][11397] RunningMeanStd input shape: (1,) [2023-02-22 13:59:30,169][11397] ConvEncoder: input_channels=3 [2023-02-22 13:59:30,273][11397] Conv encoder output size: 512 [2023-02-22 13:59:30,274][11397] Policy head output size: 512 [2023-02-22 13:59:32,535][01098] Inference worker 0-0 is ready! [2023-02-22 13:59:32,537][01098] All inference workers are ready! Signal rollout workers to start! [2023-02-22 13:59:32,644][11405] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,649][11403] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,656][11399] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,665][11404] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,665][11401] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,687][11398] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,697][11402] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:32,700][11400] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 13:59:33,001][01098] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 13:59:33,859][11400] Decorrelating experience for 0 frames... [2023-02-22 13:59:33,861][11402] Decorrelating experience for 0 frames... [2023-02-22 13:59:33,860][11404] Decorrelating experience for 0 frames... [2023-02-22 13:59:33,859][11399] Decorrelating experience for 0 frames... [2023-02-22 13:59:33,861][11403] Decorrelating experience for 0 frames... [2023-02-22 13:59:33,862][11405] Decorrelating experience for 0 frames... [2023-02-22 13:59:34,945][11402] Decorrelating experience for 32 frames... [2023-02-22 13:59:34,955][11398] Decorrelating experience for 0 frames... [2023-02-22 13:59:34,958][11404] Decorrelating experience for 32 frames... [2023-02-22 13:59:35,241][11405] Decorrelating experience for 32 frames... [2023-02-22 13:59:35,247][11399] Decorrelating experience for 32 frames... [2023-02-22 13:59:35,249][11403] Decorrelating experience for 32 frames... [2023-02-22 13:59:35,295][11401] Decorrelating experience for 0 frames... [2023-02-22 13:59:36,023][11400] Decorrelating experience for 32 frames... [2023-02-22 13:59:36,256][11399] Decorrelating experience for 64 frames... [2023-02-22 13:59:36,264][11403] Decorrelating experience for 64 frames... [2023-02-22 13:59:36,688][11401] Decorrelating experience for 32 frames... [2023-02-22 13:59:36,690][11404] Decorrelating experience for 64 frames... [2023-02-22 13:59:36,782][11402] Decorrelating experience for 64 frames... [2023-02-22 13:59:36,820][11398] Decorrelating experience for 32 frames... [2023-02-22 13:59:37,336][11400] Decorrelating experience for 64 frames... [2023-02-22 13:59:37,768][11398] Decorrelating experience for 64 frames... [2023-02-22 13:59:38,001][01098] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 13:59:38,005][11403] Decorrelating experience for 96 frames... [2023-02-22 13:59:38,222][11399] Decorrelating experience for 96 frames... [2023-02-22 13:59:38,250][11401] Decorrelating experience for 64 frames... [2023-02-22 13:59:38,303][11400] Decorrelating experience for 96 frames... [2023-02-22 13:59:38,358][11405] Decorrelating experience for 64 frames... [2023-02-22 13:59:38,955][11398] Decorrelating experience for 96 frames... [2023-02-22 13:59:39,175][11404] Decorrelating experience for 96 frames... [2023-02-22 13:59:39,775][11405] Decorrelating experience for 96 frames... [2023-02-22 13:59:39,794][11402] Decorrelating experience for 96 frames... [2023-02-22 13:59:39,967][11401] Decorrelating experience for 96 frames... [2023-02-22 13:59:43,001][01098] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.4. Samples: 44. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 13:59:43,004][01098] Avg episode reward: [(0, '0.480')] [2023-02-22 13:59:46,790][11383] Signal inference workers to stop experience collection... [2023-02-22 13:59:46,835][11397] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 13:59:48,001][01098] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 159.2. Samples: 2388. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 13:59:48,002][01098] Avg episode reward: [(0, '1.903')] [2023-02-22 13:59:49,199][11383] Signal inference workers to resume experience collection... [2023-02-22 13:59:49,202][11397] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 13:59:53,001][01098] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 175.4. Samples: 3508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 13:59:53,003][01098] Avg episode reward: [(0, '3.191')] [2023-02-22 13:59:58,006][01098] Fps is (10 sec: 2865.7, 60 sec: 1146.6, 300 sec: 1146.6). Total num frames: 28672. Throughput: 0: 295.5. Samples: 7390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 13:59:58,009][01098] Avg episode reward: [(0, '3.664')] [2023-02-22 14:00:01,578][11397] Updated weights for policy 0, policy_version 10 (0.0025) [2023-02-22 14:00:03,002][01098] Fps is (10 sec: 2866.9, 60 sec: 1501.8, 300 sec: 1501.8). Total num frames: 45056. Throughput: 0: 380.1. Samples: 11404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:00:03,004][01098] Avg episode reward: [(0, '4.103')] [2023-02-22 14:00:08,002][01098] Fps is (10 sec: 2868.3, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 57344. Throughput: 0: 387.6. Samples: 13566. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:00:08,007][01098] Avg episode reward: [(0, '4.597')] [2023-02-22 14:00:12,553][11397] Updated weights for policy 0, policy_version 20 (0.0029) [2023-02-22 14:00:13,002][01098] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 81920. Throughput: 0: 491.6. Samples: 19666. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:00:13,008][01098] Avg episode reward: [(0, '4.543')] [2023-02-22 14:00:18,001][01098] Fps is (10 sec: 4915.9, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 594.5. Samples: 26754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:00:18,012][01098] Avg episode reward: [(0, '4.195')] [2023-02-22 14:00:18,017][11383] Saving new best policy, reward=4.195! [2023-02-22 14:00:23,001][01098] Fps is (10 sec: 3686.7, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 118784. Throughput: 0: 645.6. Samples: 29054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:00:23,007][01098] Avg episode reward: [(0, '4.198')] [2023-02-22 14:00:23,018][11383] Saving new best policy, reward=4.198! [2023-02-22 14:00:23,745][11397] Updated weights for policy 0, policy_version 30 (0.0019) [2023-02-22 14:00:28,001][01098] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 742.1. Samples: 33440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:00:28,003][01098] Avg episode reward: [(0, '4.276')] [2023-02-22 14:00:28,009][11383] Saving new best policy, reward=4.276! [2023-02-22 14:00:33,001][01098] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 155648. Throughput: 0: 835.2. Samples: 39972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:00:33,006][01098] Avg episode reward: [(0, '4.371')] [2023-02-22 14:00:33,110][11383] Saving new best policy, reward=4.371! [2023-02-22 14:00:34,071][11397] Updated weights for policy 0, policy_version 40 (0.0025) [2023-02-22 14:00:38,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 886.8. Samples: 43416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:00:38,007][01098] Avg episode reward: [(0, '4.332')] [2023-02-22 14:00:43,002][01098] Fps is (10 sec: 4095.6, 60 sec: 3276.8, 300 sec: 2808.6). Total num frames: 196608. Throughput: 0: 919.0. Samples: 48740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:00:43,009][01098] Avg episode reward: [(0, '4.246')] [2023-02-22 14:00:45,330][11397] Updated weights for policy 0, policy_version 50 (0.0019) [2023-02-22 14:00:48,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2785.3). Total num frames: 208896. Throughput: 0: 930.4. Samples: 53272. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:00:48,002][01098] Avg episode reward: [(0, '4.235')] [2023-02-22 14:00:53,001][01098] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 2918.4). Total num frames: 233472. Throughput: 0: 958.1. Samples: 56680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:00:53,008][01098] Avg episode reward: [(0, '4.445')] [2023-02-22 14:00:53,016][11383] Saving new best policy, reward=4.445! [2023-02-22 14:00:55,013][11397] Updated weights for policy 0, policy_version 60 (0.0026) [2023-02-22 14:00:58,002][01098] Fps is (10 sec: 4914.4, 60 sec: 3823.2, 300 sec: 3035.8). Total num frames: 258048. Throughput: 0: 978.6. Samples: 63702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:00:58,006][01098] Avg episode reward: [(0, '4.678')] [2023-02-22 14:00:58,011][11383] Saving new best policy, reward=4.678! [2023-02-22 14:01:03,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3003.7). Total num frames: 270336. Throughput: 0: 922.0. Samples: 68242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-22 14:01:03,011][01098] Avg episode reward: [(0, '4.622')] [2023-02-22 14:01:03,022][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth... [2023-02-22 14:01:08,001][01098] Fps is (10 sec: 2458.0, 60 sec: 3754.8, 300 sec: 2975.0). Total num frames: 282624. Throughput: 0: 904.5. Samples: 69758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:01:08,003][01098] Avg episode reward: [(0, '4.440')] [2023-02-22 14:01:08,405][11397] Updated weights for policy 0, policy_version 70 (0.0036) [2023-02-22 14:01:13,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3072.0). Total num frames: 307200. Throughput: 0: 942.5. Samples: 75854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:01:13,004][01098] Avg episode reward: [(0, '4.410')] [2023-02-22 14:01:17,043][11397] Updated weights for policy 0, policy_version 80 (0.0013) [2023-02-22 14:01:18,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3159.8). Total num frames: 331776. Throughput: 0: 957.2. Samples: 83044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-22 14:01:18,007][01098] Avg episode reward: [(0, '4.451')] [2023-02-22 14:01:23,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3127.9). Total num frames: 344064. Throughput: 0: 928.3. Samples: 85190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-22 14:01:23,020][01098] Avg episode reward: [(0, '4.284')] [2023-02-22 14:01:28,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3134.3). Total num frames: 360448. Throughput: 0: 906.0. Samples: 89508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:01:28,005][01098] Avg episode reward: [(0, '4.236')] [2023-02-22 14:01:29,774][11397] Updated weights for policy 0, policy_version 90 (0.0018) [2023-02-22 14:01:33,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3208.5). Total num frames: 385024. Throughput: 0: 952.3. Samples: 96126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-22 14:01:33,008][01098] Avg episode reward: [(0, '4.443')] [2023-02-22 14:01:38,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3244.0). Total num frames: 405504. Throughput: 0: 954.5. Samples: 99634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:01:38,007][01098] Avg episode reward: [(0, '4.585')] [2023-02-22 14:01:38,722][11397] Updated weights for policy 0, policy_version 100 (0.0027) [2023-02-22 14:01:43,001][01098] Fps is (10 sec: 3686.2, 60 sec: 3754.7, 300 sec: 3245.3). Total num frames: 421888. Throughput: 0: 914.7. Samples: 104864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:01:43,008][01098] Avg episode reward: [(0, '4.631')] [2023-02-22 14:01:48,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3216.1). Total num frames: 434176. Throughput: 0: 914.5. Samples: 109394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:01:48,004][01098] Avg episode reward: [(0, '4.678')] [2023-02-22 14:01:50,822][11397] Updated weights for policy 0, policy_version 110 (0.0021) [2023-02-22 14:01:53,001][01098] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 458752. Throughput: 0: 957.7. Samples: 112854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:01:53,003][01098] Avg episode reward: [(0, '4.645')] [2023-02-22 14:01:58,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3754.8, 300 sec: 3333.3). Total num frames: 483328. Throughput: 0: 977.6. Samples: 119846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:01:58,003][01098] Avg episode reward: [(0, '4.592')] [2023-02-22 14:02:00,663][11397] Updated weights for policy 0, policy_version 120 (0.0014) [2023-02-22 14:02:03,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3304.1). Total num frames: 495616. Throughput: 0: 926.3. Samples: 124728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:02:03,007][01098] Avg episode reward: [(0, '4.627')] [2023-02-22 14:02:08,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3303.2). Total num frames: 512000. Throughput: 0: 927.1. Samples: 126910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:02:08,004][01098] Avg episode reward: [(0, '4.652')] [2023-02-22 14:02:12,295][11397] Updated weights for policy 0, policy_version 130 (0.0029) [2023-02-22 14:02:13,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3328.0). Total num frames: 532480. Throughput: 0: 961.2. Samples: 132762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:02:13,003][01098] Avg episode reward: [(0, '4.693')] [2023-02-22 14:02:13,020][11383] Saving new best policy, reward=4.693! [2023-02-22 14:02:18,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3376.1). Total num frames: 557056. Throughput: 0: 965.9. Samples: 139592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:02:18,006][01098] Avg episode reward: [(0, '4.742')] [2023-02-22 14:02:18,012][11383] Saving new best policy, reward=4.742! [2023-02-22 14:02:23,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3349.1). Total num frames: 569344. Throughput: 0: 938.0. Samples: 141846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:02:23,004][01098] Avg episode reward: [(0, '4.753')] [2023-02-22 14:02:23,012][11383] Saving new best policy, reward=4.753! [2023-02-22 14:02:23,516][11397] Updated weights for policy 0, policy_version 140 (0.0026) [2023-02-22 14:02:28,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3347.0). Total num frames: 585728. Throughput: 0: 917.9. Samples: 146168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:02:28,004][01098] Avg episode reward: [(0, '4.805')] [2023-02-22 14:02:28,006][11383] Saving new best policy, reward=4.805! [2023-02-22 14:02:33,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3367.8). Total num frames: 606208. Throughput: 0: 957.3. Samples: 152472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:02:33,002][01098] Avg episode reward: [(0, '4.486')] [2023-02-22 14:02:33,939][11397] Updated weights for policy 0, policy_version 150 (0.0027) [2023-02-22 14:02:38,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3409.6). Total num frames: 630784. Throughput: 0: 959.4. Samples: 156026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:02:38,009][01098] Avg episode reward: [(0, '4.781')] [2023-02-22 14:02:43,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3406.1). Total num frames: 647168. Throughput: 0: 924.4. Samples: 161446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:02:43,006][01098] Avg episode reward: [(0, '4.785')] [2023-02-22 14:02:45,460][11397] Updated weights for policy 0, policy_version 160 (0.0011) [2023-02-22 14:02:48,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3381.8). Total num frames: 659456. Throughput: 0: 915.3. Samples: 165918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:02:48,011][01098] Avg episode reward: [(0, '4.836')] [2023-02-22 14:02:48,058][11383] Saving new best policy, reward=4.836! [2023-02-22 14:02:53,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3420.2). Total num frames: 684032. Throughput: 0: 939.0. Samples: 169164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:02:53,009][01098] Avg episode reward: [(0, '4.852')] [2023-02-22 14:02:53,025][11383] Saving new best policy, reward=4.852! [2023-02-22 14:02:55,457][11397] Updated weights for policy 0, policy_version 170 (0.0049) [2023-02-22 14:02:58,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3456.6). Total num frames: 708608. Throughput: 0: 962.6. Samples: 176080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:02:58,003][01098] Avg episode reward: [(0, '5.012')] [2023-02-22 14:02:58,005][11383] Saving new best policy, reward=5.012! [2023-02-22 14:03:03,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3432.8). Total num frames: 720896. Throughput: 0: 921.8. Samples: 181074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:03:03,003][01098] Avg episode reward: [(0, '5.073')] [2023-02-22 14:03:03,014][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth... [2023-02-22 14:03:03,241][11383] Saving new best policy, reward=5.073! [2023-02-22 14:03:07,985][11397] Updated weights for policy 0, policy_version 180 (0.0028) [2023-02-22 14:03:08,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3429.2). Total num frames: 737280. Throughput: 0: 917.5. Samples: 183134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:03:08,009][01098] Avg episode reward: [(0, '5.145')] [2023-02-22 14:03:08,013][11383] Saving new best policy, reward=5.145! [2023-02-22 14:03:13,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3444.4). Total num frames: 757760. Throughput: 0: 949.1. Samples: 188876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:03:13,006][01098] Avg episode reward: [(0, '5.139')] [2023-02-22 14:03:16,949][11397] Updated weights for policy 0, policy_version 190 (0.0019) [2023-02-22 14:03:18,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3477.0). Total num frames: 782336. Throughput: 0: 962.6. Samples: 195790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:03:18,003][01098] Avg episode reward: [(0, '5.337')] [2023-02-22 14:03:18,006][11383] Saving new best policy, reward=5.337! [2023-02-22 14:03:23,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3454.9). Total num frames: 794624. Throughput: 0: 939.3. Samples: 198294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:03:23,006][01098] Avg episode reward: [(0, '5.342')] [2023-02-22 14:03:23,027][11383] Saving new best policy, reward=5.342! [2023-02-22 14:03:28,001][01098] Fps is (10 sec: 2867.1, 60 sec: 3754.7, 300 sec: 3451.1). Total num frames: 811008. Throughput: 0: 914.4. Samples: 202592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:03:28,008][01098] Avg episode reward: [(0, '5.254')] [2023-02-22 14:03:29,451][11397] Updated weights for policy 0, policy_version 200 (0.0012) [2023-02-22 14:03:33,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3464.5). Total num frames: 831488. Throughput: 0: 956.6. Samples: 208964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:03:33,003][01098] Avg episode reward: [(0, '5.499')] [2023-02-22 14:03:33,104][11383] Saving new best policy, reward=5.499! [2023-02-22 14:03:38,001][01098] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3494.1). Total num frames: 856064. Throughput: 0: 960.5. Samples: 212388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:03:38,008][01098] Avg episode reward: [(0, '5.524')] [2023-02-22 14:03:38,013][11383] Saving new best policy, reward=5.524! [2023-02-22 14:03:38,335][11397] Updated weights for policy 0, policy_version 210 (0.0016) [2023-02-22 14:03:43,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3473.4). Total num frames: 868352. Throughput: 0: 921.3. Samples: 217538. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:03:43,003][01098] Avg episode reward: [(0, '5.081')] [2023-02-22 14:03:48,003][01098] Fps is (10 sec: 2457.1, 60 sec: 3686.3, 300 sec: 3453.5). Total num frames: 880640. Throughput: 0: 886.7. Samples: 220976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:03:48,006][01098] Avg episode reward: [(0, '5.139')] [2023-02-22 14:03:53,001][01098] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3434.3). Total num frames: 892928. Throughput: 0: 879.6. Samples: 222716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:03:53,006][01098] Avg episode reward: [(0, '5.314')] [2023-02-22 14:03:54,645][11397] Updated weights for policy 0, policy_version 220 (0.0041) [2023-02-22 14:03:58,001][01098] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3446.8). Total num frames: 913408. Throughput: 0: 870.2. Samples: 228036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:03:58,008][01098] Avg episode reward: [(0, '5.687')] [2023-02-22 14:03:58,013][11383] Saving new best policy, reward=5.687! [2023-02-22 14:04:03,001][01098] Fps is (10 sec: 4505.7, 60 sec: 3618.1, 300 sec: 3474.0). Total num frames: 937984. Throughput: 0: 870.6. Samples: 234966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:04:03,003][01098] Avg episode reward: [(0, '5.914')] [2023-02-22 14:04:03,018][11383] Saving new best policy, reward=5.914! [2023-02-22 14:04:03,523][11397] Updated weights for policy 0, policy_version 230 (0.0016) [2023-02-22 14:04:08,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3470.4). Total num frames: 954368. Throughput: 0: 870.9. Samples: 237486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:04:08,007][01098] Avg episode reward: [(0, '6.001')] [2023-02-22 14:04:08,011][11383] Saving new best policy, reward=6.001! [2023-02-22 14:04:13,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3452.3). Total num frames: 966656. Throughput: 0: 869.9. Samples: 241738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:04:13,006][01098] Avg episode reward: [(0, '5.949')] [2023-02-22 14:04:16,216][11397] Updated weights for policy 0, policy_version 240 (0.0029) [2023-02-22 14:04:18,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3478.0). Total num frames: 991232. Throughput: 0: 866.6. Samples: 247962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:04:18,003][01098] Avg episode reward: [(0, '6.172')] [2023-02-22 14:04:18,006][11383] Saving new best policy, reward=6.172! [2023-02-22 14:04:23,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3488.7). Total num frames: 1011712. Throughput: 0: 867.1. Samples: 251406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:04:23,003][01098] Avg episode reward: [(0, '6.286')] [2023-02-22 14:04:23,014][11383] Saving new best policy, reward=6.286! [2023-02-22 14:04:25,851][11397] Updated weights for policy 0, policy_version 250 (0.0025) [2023-02-22 14:04:28,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 878.0. Samples: 257048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:04:28,005][01098] Avg episode reward: [(0, '6.379')] [2023-02-22 14:04:28,012][11383] Saving new best policy, reward=6.379! [2023-02-22 14:04:33,002][01098] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 1044480. Throughput: 0: 898.6. Samples: 261414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:04:33,006][01098] Avg episode reward: [(0, '7.167')] [2023-02-22 14:04:33,015][11383] Saving new best policy, reward=7.167! [2023-02-22 14:04:37,617][11397] Updated weights for policy 0, policy_version 260 (0.0029) [2023-02-22 14:04:38,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 927.6. Samples: 264460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:04:38,003][01098] Avg episode reward: [(0, '7.230')] [2023-02-22 14:04:38,006][11383] Saving new best policy, reward=7.230! [2023-02-22 14:04:43,001][01098] Fps is (10 sec: 4096.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 963.7. Samples: 271404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:04:43,003][01098] Avg episode reward: [(0, '7.090')] [2023-02-22 14:04:48,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 1101824. Throughput: 0: 926.5. Samples: 276658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:04:48,007][01098] Avg episode reward: [(0, '6.963')] [2023-02-22 14:04:48,330][11397] Updated weights for policy 0, policy_version 270 (0.0034) [2023-02-22 14:04:53,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 1118208. Throughput: 0: 918.7. Samples: 278828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:04:53,003][01098] Avg episode reward: [(0, '7.032')] [2023-02-22 14:04:58,001][01098] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1138688. Throughput: 0: 949.6. Samples: 284472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:04:58,007][01098] Avg episode reward: [(0, '7.647')] [2023-02-22 14:04:58,010][11383] Saving new best policy, reward=7.647! [2023-02-22 14:04:59,082][11397] Updated weights for policy 0, policy_version 280 (0.0012) [2023-02-22 14:05:03,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1163264. Throughput: 0: 966.6. Samples: 291460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:05:03,006][01098] Avg episode reward: [(0, '7.968')] [2023-02-22 14:05:03,019][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth... [2023-02-22 14:05:03,151][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth [2023-02-22 14:05:03,167][11383] Saving new best policy, reward=7.968! [2023-02-22 14:05:08,001][01098] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1179648. Throughput: 0: 943.2. Samples: 293850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:05:08,012][01098] Avg episode reward: [(0, '7.526')] [2023-02-22 14:05:10,639][11397] Updated weights for policy 0, policy_version 290 (0.0037) [2023-02-22 14:05:13,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1191936. Throughput: 0: 914.3. Samples: 298190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:13,009][01098] Avg episode reward: [(0, '7.509')] [2023-02-22 14:05:18,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1216512. Throughput: 0: 953.1. Samples: 304302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:18,008][01098] Avg episode reward: [(0, '7.838')] [2023-02-22 14:05:20,468][11397] Updated weights for policy 0, policy_version 300 (0.0025) [2023-02-22 14:05:23,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1241088. Throughput: 0: 962.2. Samples: 307758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:05:23,006][01098] Avg episode reward: [(0, '8.645')] [2023-02-22 14:05:23,015][11383] Saving new best policy, reward=8.645! [2023-02-22 14:05:28,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1253376. Throughput: 0: 935.0. Samples: 313478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:28,003][01098] Avg episode reward: [(0, '8.950')] [2023-02-22 14:05:28,006][11383] Saving new best policy, reward=8.950! [2023-02-22 14:05:32,905][11397] Updated weights for policy 0, policy_version 310 (0.0012) [2023-02-22 14:05:33,002][01098] Fps is (10 sec: 2866.9, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1269760. Throughput: 0: 915.2. Samples: 317842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:05:33,006][01098] Avg episode reward: [(0, '8.676')] [2023-02-22 14:05:38,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1290240. Throughput: 0: 936.8. Samples: 320986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:05:38,010][01098] Avg episode reward: [(0, '8.618')] [2023-02-22 14:05:41,781][11397] Updated weights for policy 0, policy_version 320 (0.0015) [2023-02-22 14:05:43,001][01098] Fps is (10 sec: 4506.0, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1314816. Throughput: 0: 967.1. Samples: 327990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:43,007][01098] Avg episode reward: [(0, '8.848')] [2023-02-22 14:05:48,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1331200. Throughput: 0: 928.2. Samples: 333228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:48,003][01098] Avg episode reward: [(0, '9.579')] [2023-02-22 14:05:48,007][11383] Saving new best policy, reward=9.579! [2023-02-22 14:05:53,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1343488. Throughput: 0: 922.0. Samples: 335342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:05:53,005][01098] Avg episode reward: [(0, '10.302')] [2023-02-22 14:05:53,021][11383] Saving new best policy, reward=10.302! [2023-02-22 14:05:54,338][11397] Updated weights for policy 0, policy_version 330 (0.0015) [2023-02-22 14:05:58,003][01098] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 1368064. Throughput: 0: 952.4. Samples: 341048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:05:58,018][01098] Avg episode reward: [(0, '10.771')] [2023-02-22 14:05:58,022][11383] Saving new best policy, reward=10.771! [2023-02-22 14:06:03,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1388544. Throughput: 0: 972.1. Samples: 348048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:06:03,003][01098] Avg episode reward: [(0, '10.389')] [2023-02-22 14:06:03,186][11397] Updated weights for policy 0, policy_version 340 (0.0027) [2023-02-22 14:06:08,001][01098] Fps is (10 sec: 3686.9, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 1404928. Throughput: 0: 952.0. Samples: 350598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:06:08,009][01098] Avg episode reward: [(0, '10.444')] [2023-02-22 14:06:13,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1421312. Throughput: 0: 920.0. Samples: 354876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:06:13,008][01098] Avg episode reward: [(0, '10.372')] [2023-02-22 14:06:15,664][11397] Updated weights for policy 0, policy_version 350 (0.0012) [2023-02-22 14:06:18,001][01098] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1441792. Throughput: 0: 961.3. Samples: 361100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:18,004][01098] Avg episode reward: [(0, '10.946')] [2023-02-22 14:06:18,008][11383] Saving new best policy, reward=10.946! [2023-02-22 14:06:23,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1466368. Throughput: 0: 965.8. Samples: 364446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:23,003][01098] Avg episode reward: [(0, '10.775')] [2023-02-22 14:06:25,222][11397] Updated weights for policy 0, policy_version 360 (0.0033) [2023-02-22 14:06:28,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1482752. Throughput: 0: 936.0. Samples: 370110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:28,010][01098] Avg episode reward: [(0, '11.109')] [2023-02-22 14:06:28,015][11383] Saving new best policy, reward=11.109! [2023-02-22 14:06:33,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1495040. Throughput: 0: 918.4. Samples: 374554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:33,007][01098] Avg episode reward: [(0, '10.519')] [2023-02-22 14:06:37,087][11397] Updated weights for policy 0, policy_version 370 (0.0016) [2023-02-22 14:06:38,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1519616. Throughput: 0: 938.0. Samples: 377552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:38,003][01098] Avg episode reward: [(0, '10.807')] [2023-02-22 14:06:43,001][01098] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1540096. Throughput: 0: 969.1. Samples: 384654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:43,010][01098] Avg episode reward: [(0, '10.961')] [2023-02-22 14:06:47,360][11397] Updated weights for policy 0, policy_version 380 (0.0016) [2023-02-22 14:06:48,001][01098] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1556480. Throughput: 0: 931.4. Samples: 389962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:48,003][01098] Avg episode reward: [(0, '10.467')] [2023-02-22 14:06:53,001][01098] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1572864. Throughput: 0: 924.6. Samples: 392204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:06:53,012][01098] Avg episode reward: [(0, '11.393')] [2023-02-22 14:06:53,026][11383] Saving new best policy, reward=11.393! [2023-02-22 14:06:58,001][01098] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3721.1). Total num frames: 1593344. Throughput: 0: 953.1. Samples: 397766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:06:58,004][01098] Avg episode reward: [(0, '12.349')] [2023-02-22 14:06:58,011][11383] Saving new best policy, reward=12.349! [2023-02-22 14:06:58,442][11397] Updated weights for policy 0, policy_version 390 (0.0016) [2023-02-22 14:07:03,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1617920. Throughput: 0: 970.8. Samples: 404786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:07:03,002][01098] Avg episode reward: [(0, '12.305')] [2023-02-22 14:07:03,014][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000395_1617920.pth... [2023-02-22 14:07:03,122][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000176_720896.pth [2023-02-22 14:07:08,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1630208. Throughput: 0: 953.0. Samples: 407332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:07:08,005][01098] Avg episode reward: [(0, '12.755')] [2023-02-22 14:07:08,026][11383] Saving new best policy, reward=12.755! [2023-02-22 14:07:09,555][11397] Updated weights for policy 0, policy_version 400 (0.0016) [2023-02-22 14:07:13,001][01098] Fps is (10 sec: 2867.0, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 1646592. Throughput: 0: 924.7. Samples: 411722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:07:13,007][01098] Avg episode reward: [(0, '12.111')] [2023-02-22 14:07:18,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1667072. Throughput: 0: 959.8. Samples: 417744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:07:18,003][01098] Avg episode reward: [(0, '12.250')] [2023-02-22 14:07:19,836][11397] Updated weights for policy 0, policy_version 410 (0.0018) [2023-02-22 14:07:23,002][01098] Fps is (10 sec: 4505.3, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 1691648. Throughput: 0: 971.7. Samples: 421280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:07:23,005][01098] Avg episode reward: [(0, '13.816')] [2023-02-22 14:07:23,016][11383] Saving new best policy, reward=13.816! [2023-02-22 14:07:28,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1708032. Throughput: 0: 944.6. Samples: 427162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:07:28,003][01098] Avg episode reward: [(0, '13.620')] [2023-02-22 14:07:31,246][11397] Updated weights for policy 0, policy_version 420 (0.0029) [2023-02-22 14:07:33,001][01098] Fps is (10 sec: 3277.2, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1724416. Throughput: 0: 926.4. Samples: 431650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:07:33,006][01098] Avg episode reward: [(0, '13.915')] [2023-02-22 14:07:33,022][11383] Saving new best policy, reward=13.915! [2023-02-22 14:07:38,001][01098] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 1744896. Throughput: 0: 945.1. Samples: 434734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:07:38,003][01098] Avg episode reward: [(0, '13.929')] [2023-02-22 14:07:38,101][11383] Saving new best policy, reward=13.929! [2023-02-22 14:07:40,803][11397] Updated weights for policy 0, policy_version 430 (0.0020) [2023-02-22 14:07:43,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1769472. Throughput: 0: 978.8. Samples: 441814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:07:43,008][01098] Avg episode reward: [(0, '13.183')] [2023-02-22 14:07:48,001][01098] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1785856. Throughput: 0: 942.3. Samples: 447188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:07:48,003][01098] Avg episode reward: [(0, '13.390')] [2023-02-22 14:07:52,716][11397] Updated weights for policy 0, policy_version 440 (0.0022) [2023-02-22 14:07:53,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1802240. Throughput: 0: 934.5. Samples: 449384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:07:53,006][01098] Avg episode reward: [(0, '14.410')] [2023-02-22 14:07:53,019][11383] Saving new best policy, reward=14.410! [2023-02-22 14:07:58,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 1822720. Throughput: 0: 966.5. Samples: 455212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:07:58,003][01098] Avg episode reward: [(0, '15.000')] [2023-02-22 14:07:58,006][11383] Saving new best policy, reward=15.000! [2023-02-22 14:08:01,635][11397] Updated weights for policy 0, policy_version 450 (0.0018) [2023-02-22 14:08:03,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1847296. Throughput: 0: 992.9. Samples: 462424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:08:03,003][01098] Avg episode reward: [(0, '15.033')] [2023-02-22 14:08:03,013][11383] Saving new best policy, reward=15.033! [2023-02-22 14:08:08,002][01098] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3748.9). Total num frames: 1863680. Throughput: 0: 973.6. Samples: 465092. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:08:08,005][01098] Avg episode reward: [(0, '16.656')] [2023-02-22 14:08:08,012][11383] Saving new best policy, reward=16.656! [2023-02-22 14:08:13,002][01098] Fps is (10 sec: 3276.4, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1880064. Throughput: 0: 939.9. Samples: 469458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:08:13,005][01098] Avg episode reward: [(0, '16.941')] [2023-02-22 14:08:13,015][11383] Saving new best policy, reward=16.941! [2023-02-22 14:08:13,899][11397] Updated weights for policy 0, policy_version 460 (0.0039) [2023-02-22 14:08:18,001][01098] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1900544. Throughput: 0: 983.4. Samples: 475904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:08:18,009][01098] Avg episode reward: [(0, '16.629')] [2023-02-22 14:08:23,001][01098] Fps is (10 sec: 4096.5, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 1921024. Throughput: 0: 993.7. Samples: 479452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:08:23,005][01098] Avg episode reward: [(0, '17.041')] [2023-02-22 14:08:23,037][11397] Updated weights for policy 0, policy_version 470 (0.0018) [2023-02-22 14:08:23,040][11383] Saving new best policy, reward=17.041! [2023-02-22 14:08:28,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1933312. Throughput: 0: 935.5. Samples: 483912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:08:28,006][01098] Avg episode reward: [(0, '18.528')] [2023-02-22 14:08:28,011][11383] Saving new best policy, reward=18.528! [2023-02-22 14:08:33,007][01098] Fps is (10 sec: 2456.1, 60 sec: 3686.0, 300 sec: 3693.3). Total num frames: 1945600. Throughput: 0: 892.5. Samples: 487354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:08:33,009][01098] Avg episode reward: [(0, '19.203')] [2023-02-22 14:08:33,023][11383] Saving new best policy, reward=19.203! [2023-02-22 14:08:38,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1961984. Throughput: 0: 882.6. Samples: 489102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:08:38,003][01098] Avg episode reward: [(0, '18.379')] [2023-02-22 14:08:38,881][11397] Updated weights for policy 0, policy_version 480 (0.0046) [2023-02-22 14:08:43,001][01098] Fps is (10 sec: 3688.6, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 1982464. Throughput: 0: 889.8. Samples: 495252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:08:43,003][01098] Avg episode reward: [(0, '19.954')] [2023-02-22 14:08:43,017][11383] Saving new best policy, reward=19.954! [2023-02-22 14:08:47,665][11397] Updated weights for policy 0, policy_version 490 (0.0012) [2023-02-22 14:08:48,001][01098] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2007040. Throughput: 0: 885.6. Samples: 502276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:08:48,004][01098] Avg episode reward: [(0, '19.237')] [2023-02-22 14:08:53,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2023424. Throughput: 0: 878.9. Samples: 504642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:08:53,004][01098] Avg episode reward: [(0, '18.337')] [2023-02-22 14:08:58,001][01098] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2035712. Throughput: 0: 882.6. Samples: 509172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:08:58,003][01098] Avg episode reward: [(0, '17.970')] [2023-02-22 14:08:59,707][11397] Updated weights for policy 0, policy_version 500 (0.0025) [2023-02-22 14:09:03,001][01098] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3748.9). Total num frames: 2060288. Throughput: 0: 890.5. Samples: 515976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:09:03,010][01098] Avg episode reward: [(0, '17.709')] [2023-02-22 14:09:03,082][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000504_2064384.pth... [2023-02-22 14:09:03,216][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth [2023-02-22 14:09:08,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3686.5, 300 sec: 3790.5). Total num frames: 2084864. Throughput: 0: 888.8. Samples: 519450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:09:08,007][01098] Avg episode reward: [(0, '18.827')] [2023-02-22 14:09:08,747][11397] Updated weights for policy 0, policy_version 510 (0.0015) [2023-02-22 14:09:13,003][01098] Fps is (10 sec: 3685.9, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2097152. Throughput: 0: 909.1. Samples: 524822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:09:13,009][01098] Avg episode reward: [(0, '18.690')] [2023-02-22 14:09:18,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 2113536. Throughput: 0: 933.6. Samples: 529360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:09:18,011][01098] Avg episode reward: [(0, '18.769')] [2023-02-22 14:09:20,825][11397] Updated weights for policy 0, policy_version 520 (0.0025) [2023-02-22 14:09:23,001][01098] Fps is (10 sec: 4096.8, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2138112. Throughput: 0: 972.2. Samples: 532852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:09:23,004][01098] Avg episode reward: [(0, '18.994')] [2023-02-22 14:09:28,001][01098] Fps is (10 sec: 4915.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2162688. Throughput: 0: 993.9. Samples: 539976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:09:28,003][01098] Avg episode reward: [(0, '18.213')] [2023-02-22 14:09:30,385][11397] Updated weights for policy 0, policy_version 530 (0.0013) [2023-02-22 14:09:33,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3762.8). Total num frames: 2174976. Throughput: 0: 950.2. Samples: 545036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:09:33,003][01098] Avg episode reward: [(0, '18.228')] [2023-02-22 14:09:38,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2191360. Throughput: 0: 948.0. Samples: 547300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:09:38,004][01098] Avg episode reward: [(0, '18.399')] [2023-02-22 14:09:41,504][11397] Updated weights for policy 0, policy_version 540 (0.0029) [2023-02-22 14:09:43,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2215936. Throughput: 0: 987.8. Samples: 553622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:09:43,004][01098] Avg episode reward: [(0, '18.645')] [2023-02-22 14:09:48,005][01098] Fps is (10 sec: 4913.0, 60 sec: 3890.9, 300 sec: 3804.4). Total num frames: 2240512. Throughput: 0: 995.6. Samples: 560782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:09:48,008][01098] Avg episode reward: [(0, '19.322')] [2023-02-22 14:09:51,354][11397] Updated weights for policy 0, policy_version 550 (0.0021) [2023-02-22 14:09:53,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2256896. Throughput: 0: 971.3. Samples: 563160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:09:53,003][01098] Avg episode reward: [(0, '18.693')] [2023-02-22 14:09:58,001][01098] Fps is (10 sec: 3278.2, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2273280. Throughput: 0: 953.1. Samples: 567710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:09:58,004][01098] Avg episode reward: [(0, '19.644')] [2023-02-22 14:10:02,235][11397] Updated weights for policy 0, policy_version 560 (0.0020) [2023-02-22 14:10:03,004][01098] Fps is (10 sec: 3685.3, 60 sec: 3891.0, 300 sec: 3776.6). Total num frames: 2293760. Throughput: 0: 1002.2. Samples: 574464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:10:03,006][01098] Avg episode reward: [(0, '20.642')] [2023-02-22 14:10:03,103][11383] Saving new best policy, reward=20.642! [2023-02-22 14:10:08,001][01098] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2318336. Throughput: 0: 1002.2. Samples: 577950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:10:08,006][01098] Avg episode reward: [(0, '22.511')] [2023-02-22 14:10:08,009][11383] Saving new best policy, reward=22.511! [2023-02-22 14:10:12,610][11397] Updated weights for policy 0, policy_version 570 (0.0013) [2023-02-22 14:10:13,001][01098] Fps is (10 sec: 4097.2, 60 sec: 3959.6, 300 sec: 3790.5). Total num frames: 2334720. Throughput: 0: 964.1. Samples: 583362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:10:13,003][01098] Avg episode reward: [(0, '24.078')] [2023-02-22 14:10:13,021][11383] Saving new best policy, reward=24.078! [2023-02-22 14:10:18,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 2351104. Throughput: 0: 951.1. Samples: 587836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:10:18,007][01098] Avg episode reward: [(0, '23.966')] [2023-02-22 14:10:23,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2371584. Throughput: 0: 978.8. Samples: 591344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:10:23,004][01098] Avg episode reward: [(0, '23.479')] [2023-02-22 14:10:23,122][11397] Updated weights for policy 0, policy_version 580 (0.0026) [2023-02-22 14:10:28,003][01098] Fps is (10 sec: 4504.4, 60 sec: 3891.0, 300 sec: 3818.3). Total num frames: 2396160. Throughput: 0: 997.7. Samples: 598520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:10:28,007][01098] Avg episode reward: [(0, '22.136')] [2023-02-22 14:10:33,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2412544. Throughput: 0: 949.8. Samples: 603520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:10:33,003][01098] Avg episode reward: [(0, '20.611')] [2023-02-22 14:10:34,146][11397] Updated weights for policy 0, policy_version 590 (0.0012) [2023-02-22 14:10:38,001][01098] Fps is (10 sec: 3277.7, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2428928. Throughput: 0: 946.4. Samples: 605746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:10:38,008][01098] Avg episode reward: [(0, '19.789')] [2023-02-22 14:10:43,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2449408. Throughput: 0: 986.4. Samples: 612100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:10:43,003][01098] Avg episode reward: [(0, '19.976')] [2023-02-22 14:10:43,955][11397] Updated weights for policy 0, policy_version 600 (0.0017) [2023-02-22 14:10:48,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3891.5, 300 sec: 3832.2). Total num frames: 2473984. Throughput: 0: 998.7. Samples: 619402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:10:48,005][01098] Avg episode reward: [(0, '20.467')] [2023-02-22 14:10:53,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2490368. Throughput: 0: 972.8. Samples: 621726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:10:53,005][01098] Avg episode reward: [(0, '20.564')] [2023-02-22 14:10:55,090][11397] Updated weights for policy 0, policy_version 610 (0.0016) [2023-02-22 14:10:58,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2506752. Throughput: 0: 954.0. Samples: 626294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:10:58,007][01098] Avg episode reward: [(0, '21.189')] [2023-02-22 14:11:03,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3818.3). Total num frames: 2531328. Throughput: 0: 1004.1. Samples: 633020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:11:03,008][01098] Avg episode reward: [(0, '20.777')] [2023-02-22 14:11:03,021][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000618_2531328.pth... [2023-02-22 14:11:03,130][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000395_1617920.pth [2023-02-22 14:11:04,650][11397] Updated weights for policy 0, policy_version 620 (0.0037) [2023-02-22 14:11:08,001][01098] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2551808. Throughput: 0: 1003.5. Samples: 636500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:11:08,006][01098] Avg episode reward: [(0, '20.448')] [2023-02-22 14:11:13,002][01098] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2568192. Throughput: 0: 961.5. Samples: 641788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:11:13,005][01098] Avg episode reward: [(0, '20.670')] [2023-02-22 14:11:16,618][11397] Updated weights for policy 0, policy_version 630 (0.0029) [2023-02-22 14:11:18,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2584576. Throughput: 0: 952.1. Samples: 646364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:11:18,004][01098] Avg episode reward: [(0, '21.395')] [2023-02-22 14:11:23,001][01098] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2605056. Throughput: 0: 978.4. Samples: 649774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:11:23,004][01098] Avg episode reward: [(0, '21.342')] [2023-02-22 14:11:25,566][11397] Updated weights for policy 0, policy_version 640 (0.0018) [2023-02-22 14:11:28,001][01098] Fps is (10 sec: 4505.5, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 2629632. Throughput: 0: 995.9. Samples: 656914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:11:28,008][01098] Avg episode reward: [(0, '22.327')] [2023-02-22 14:11:33,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2646016. Throughput: 0: 945.3. Samples: 661940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 14:11:33,003][01098] Avg episode reward: [(0, '21.332')] [2023-02-22 14:11:37,855][11397] Updated weights for policy 0, policy_version 650 (0.0022) [2023-02-22 14:11:38,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2662400. Throughput: 0: 943.3. Samples: 664176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:11:38,007][01098] Avg episode reward: [(0, '21.987')] [2023-02-22 14:11:43,002][01098] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2682880. Throughput: 0: 980.3. Samples: 670408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:11:43,004][01098] Avg episode reward: [(0, '21.391')] [2023-02-22 14:11:46,644][11397] Updated weights for policy 0, policy_version 660 (0.0019) [2023-02-22 14:11:48,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2707456. Throughput: 0: 990.7. Samples: 677602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:11:48,006][01098] Avg episode reward: [(0, '21.357')] [2023-02-22 14:11:53,001][01098] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2723840. Throughput: 0: 966.4. Samples: 679988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:11:53,007][01098] Avg episode reward: [(0, '20.276')] [2023-02-22 14:11:58,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2740224. Throughput: 0: 949.9. Samples: 684534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:11:58,003][01098] Avg episode reward: [(0, '21.031')] [2023-02-22 14:11:58,755][11397] Updated weights for policy 0, policy_version 670 (0.0028) [2023-02-22 14:12:03,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2760704. Throughput: 0: 995.2. Samples: 691150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:12:03,002][01098] Avg episode reward: [(0, '19.353')] [2023-02-22 14:12:07,524][11397] Updated weights for policy 0, policy_version 680 (0.0020) [2023-02-22 14:12:08,002][01098] Fps is (10 sec: 4504.9, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 2785280. Throughput: 0: 998.2. Samples: 694694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:12:08,011][01098] Avg episode reward: [(0, '20.981')] [2023-02-22 14:12:13,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 2801664. Throughput: 0: 959.7. Samples: 700100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 14:12:13,007][01098] Avg episode reward: [(0, '21.259')] [2023-02-22 14:12:18,001][01098] Fps is (10 sec: 2867.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2813952. Throughput: 0: 946.0. Samples: 704510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:12:18,003][01098] Avg episode reward: [(0, '21.958')] [2023-02-22 14:12:19,858][11397] Updated weights for policy 0, policy_version 690 (0.0015) [2023-02-22 14:12:23,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2838528. Throughput: 0: 971.6. Samples: 707896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:12:23,004][01098] Avg episode reward: [(0, '21.989')] [2023-02-22 14:12:28,005][01098] Fps is (10 sec: 4913.2, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 2863104. Throughput: 0: 988.4. Samples: 714888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:12:28,007][01098] Avg episode reward: [(0, '23.277')] [2023-02-22 14:12:28,975][11397] Updated weights for policy 0, policy_version 700 (0.0025) [2023-02-22 14:12:33,001][01098] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2875392. Throughput: 0: 938.3. Samples: 719824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:12:33,009][01098] Avg episode reward: [(0, '24.329')] [2023-02-22 14:12:33,025][11383] Saving new best policy, reward=24.329! [2023-02-22 14:12:38,001][01098] Fps is (10 sec: 2868.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2891776. Throughput: 0: 932.3. Samples: 721942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:12:38,009][01098] Avg episode reward: [(0, '23.418')] [2023-02-22 14:12:41,279][11397] Updated weights for policy 0, policy_version 710 (0.0023) [2023-02-22 14:12:43,001][01098] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2912256. Throughput: 0: 961.6. Samples: 727806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:12:43,003][01098] Avg episode reward: [(0, '22.008')] [2023-02-22 14:12:48,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2936832. Throughput: 0: 970.1. Samples: 734806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:12:48,003][01098] Avg episode reward: [(0, '21.241')] [2023-02-22 14:12:51,264][11397] Updated weights for policy 0, policy_version 720 (0.0016) [2023-02-22 14:12:53,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2953216. Throughput: 0: 946.2. Samples: 737270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:12:53,005][01098] Avg episode reward: [(0, '21.166')] [2023-02-22 14:12:58,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2965504. Throughput: 0: 924.6. Samples: 741706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:12:58,005][01098] Avg episode reward: [(0, '21.124')] [2023-02-22 14:13:02,768][11397] Updated weights for policy 0, policy_version 730 (0.0019) [2023-02-22 14:13:03,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2990080. Throughput: 0: 965.3. Samples: 747950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:13:03,003][01098] Avg episode reward: [(0, '22.132')] [2023-02-22 14:13:03,017][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth... [2023-02-22 14:13:03,135][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000504_2064384.pth [2023-02-22 14:13:08,002][01098] Fps is (10 sec: 4095.5, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3006464. Throughput: 0: 943.6. Samples: 750360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:13:08,007][01098] Avg episode reward: [(0, '22.016')] [2023-02-22 14:13:13,004][01098] Fps is (10 sec: 2866.3, 60 sec: 3617.9, 300 sec: 3790.5). Total num frames: 3018752. Throughput: 0: 878.2. Samples: 754404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:13:13,006][01098] Avg episode reward: [(0, '22.214')] [2023-02-22 14:13:17,744][11397] Updated weights for policy 0, policy_version 740 (0.0017) [2023-02-22 14:13:18,008][01098] Fps is (10 sec: 2456.2, 60 sec: 3617.7, 300 sec: 3762.7). Total num frames: 3031040. Throughput: 0: 854.3. Samples: 758274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:13:18,010][01098] Avg episode reward: [(0, '22.016')] [2023-02-22 14:13:23,001][01098] Fps is (10 sec: 2868.1, 60 sec: 3481.6, 300 sec: 3776.7). Total num frames: 3047424. Throughput: 0: 857.1. Samples: 760512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:13:23,013][01098] Avg episode reward: [(0, '21.076')] [2023-02-22 14:13:27,695][11397] Updated weights for policy 0, policy_version 750 (0.0020) [2023-02-22 14:13:28,001][01098] Fps is (10 sec: 4098.9, 60 sec: 3481.8, 300 sec: 3818.4). Total num frames: 3072000. Throughput: 0: 878.4. Samples: 767332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:13:28,003][01098] Avg episode reward: [(0, '20.544')] [2023-02-22 14:13:33,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3618.2, 300 sec: 3832.2). Total num frames: 3092480. Throughput: 0: 868.0. Samples: 773868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 14:13:33,005][01098] Avg episode reward: [(0, '22.428')] [2023-02-22 14:13:38,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3108864. Throughput: 0: 863.2. Samples: 776114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 14:13:38,007][01098] Avg episode reward: [(0, '21.994')] [2023-02-22 14:13:39,368][11397] Updated weights for policy 0, policy_version 760 (0.0018) [2023-02-22 14:13:43,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 3125248. Throughput: 0: 869.4. Samples: 780830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:13:43,004][01098] Avg episode reward: [(0, '22.892')] [2023-02-22 14:13:48,001][01098] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3818.3). Total num frames: 3149824. Throughput: 0: 886.8. Samples: 787856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:13:48,003][01098] Avg episode reward: [(0, '23.291')] [2023-02-22 14:13:48,599][11397] Updated weights for policy 0, policy_version 770 (0.0032) [2023-02-22 14:13:53,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 3170304. Throughput: 0: 910.6. Samples: 791338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:13:53,004][01098] Avg episode reward: [(0, '21.969')] [2023-02-22 14:13:58,001][01098] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 3182592. Throughput: 0: 927.7. Samples: 796146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 14:13:58,008][01098] Avg episode reward: [(0, '21.394')] [2023-02-22 14:14:00,695][11397] Updated weights for policy 0, policy_version 780 (0.0031) [2023-02-22 14:14:03,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 3203072. Throughput: 0: 953.5. Samples: 801174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:14:03,006][01098] Avg episode reward: [(0, '21.340')] [2023-02-22 14:14:08,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3686.5, 300 sec: 3832.2). Total num frames: 3227648. Throughput: 0: 981.0. Samples: 804658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:14:08,003][01098] Avg episode reward: [(0, '21.314')] [2023-02-22 14:14:09,562][11397] Updated weights for policy 0, policy_version 790 (0.0019) [2023-02-22 14:14:13,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 3248128. Throughput: 0: 989.6. Samples: 811864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:14:13,003][01098] Avg episode reward: [(0, '21.970')] [2023-02-22 14:14:18,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3818.3). Total num frames: 3264512. Throughput: 0: 944.3. Samples: 816360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:14:18,002][01098] Avg episode reward: [(0, '22.880')] [2023-02-22 14:14:21,916][11397] Updated weights for policy 0, policy_version 800 (0.0012) [2023-02-22 14:14:23,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3280896. Throughput: 0: 944.7. Samples: 818626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:14:23,007][01098] Avg episode reward: [(0, '23.161')] [2023-02-22 14:14:28,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3305472. Throughput: 0: 987.6. Samples: 825270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:14:28,003][01098] Avg episode reward: [(0, '24.426')] [2023-02-22 14:14:28,005][11383] Saving new best policy, reward=24.426! [2023-02-22 14:14:30,528][11397] Updated weights for policy 0, policy_version 810 (0.0013) [2023-02-22 14:14:33,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3325952. Throughput: 0: 979.9. Samples: 831952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:14:33,003][01098] Avg episode reward: [(0, '24.389')] [2023-02-22 14:14:38,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3338240. Throughput: 0: 952.2. Samples: 834188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:14:38,005][01098] Avg episode reward: [(0, '24.761')] [2023-02-22 14:14:38,010][11383] Saving new best policy, reward=24.761! [2023-02-22 14:14:42,856][11397] Updated weights for policy 0, policy_version 820 (0.0035) [2023-02-22 14:14:43,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 3358720. Throughput: 0: 946.7. Samples: 838748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:14:43,009][01098] Avg episode reward: [(0, '24.866')] [2023-02-22 14:14:43,028][11383] Saving new best policy, reward=24.866! [2023-02-22 14:14:48,001][01098] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3383296. Throughput: 0: 991.5. Samples: 845790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:14:48,004][01098] Avg episode reward: [(0, '23.259')] [2023-02-22 14:14:51,342][11397] Updated weights for policy 0, policy_version 830 (0.0012) [2023-02-22 14:14:53,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3403776. Throughput: 0: 994.7. Samples: 849418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:14:53,011][01098] Avg episode reward: [(0, '21.827')] [2023-02-22 14:14:58,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3420160. Throughput: 0: 946.9. Samples: 854474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:14:58,008][01098] Avg episode reward: [(0, '22.326')] [2023-02-22 14:15:03,001][01098] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3436544. Throughput: 0: 957.3. Samples: 859438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:15:03,011][01098] Avg episode reward: [(0, '21.586')] [2023-02-22 14:15:03,022][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth... [2023-02-22 14:15:03,146][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000618_2531328.pth [2023-02-22 14:15:03,666][11397] Updated weights for policy 0, policy_version 840 (0.0033) [2023-02-22 14:15:08,001][01098] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3461120. Throughput: 0: 984.2. Samples: 862914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:15:08,007][01098] Avg episode reward: [(0, '20.967')] [2023-02-22 14:15:12,805][11397] Updated weights for policy 0, policy_version 850 (0.0019) [2023-02-22 14:15:13,001][01098] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3481600. Throughput: 0: 994.0. Samples: 870000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:15:13,005][01098] Avg episode reward: [(0, '21.666')] [2023-02-22 14:15:18,006][01098] Fps is (10 sec: 3275.2, 60 sec: 3822.6, 300 sec: 3804.4). Total num frames: 3493888. Throughput: 0: 947.7. Samples: 874604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:15:18,008][01098] Avg episode reward: [(0, '21.591')] [2023-02-22 14:15:23,002][01098] Fps is (10 sec: 3276.4, 60 sec: 3891.1, 300 sec: 3790.6). Total num frames: 3514368. Throughput: 0: 948.0. Samples: 876848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:15:23,006][01098] Avg episode reward: [(0, '20.122')] [2023-02-22 14:15:24,714][11397] Updated weights for policy 0, policy_version 860 (0.0015) [2023-02-22 14:15:28,001][01098] Fps is (10 sec: 4098.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3534848. Throughput: 0: 994.9. Samples: 883518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:15:28,003][01098] Avg episode reward: [(0, '20.305')] [2023-02-22 14:15:33,001][01098] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3559424. Throughput: 0: 990.7. Samples: 890372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:15:33,005][01098] Avg episode reward: [(0, '21.408')] [2023-02-22 14:15:33,867][11397] Updated weights for policy 0, policy_version 870 (0.0012) [2023-02-22 14:15:38,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3575808. Throughput: 0: 960.7. Samples: 892650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:15:38,011][01098] Avg episode reward: [(0, '20.903')] [2023-02-22 14:15:43,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3592192. Throughput: 0: 951.2. Samples: 897276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:15:43,009][01098] Avg episode reward: [(0, '20.741')] [2023-02-22 14:15:45,352][11397] Updated weights for policy 0, policy_version 880 (0.0012) [2023-02-22 14:15:48,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3616768. Throughput: 0: 996.9. Samples: 904298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:15:48,004][01098] Avg episode reward: [(0, '21.859')] [2023-02-22 14:15:53,006][01098] Fps is (10 sec: 4503.3, 60 sec: 3890.9, 300 sec: 3832.1). Total num frames: 3637248. Throughput: 0: 998.8. Samples: 907864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:15:53,009][01098] Avg episode reward: [(0, '24.187')] [2023-02-22 14:15:54,951][11397] Updated weights for policy 0, policy_version 890 (0.0017) [2023-02-22 14:15:58,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3653632. Throughput: 0: 955.0. Samples: 912974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 14:15:58,003][01098] Avg episode reward: [(0, '25.083')] [2023-02-22 14:15:58,009][11383] Saving new best policy, reward=25.083! [2023-02-22 14:16:03,001][01098] Fps is (10 sec: 3278.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3670016. Throughput: 0: 960.3. Samples: 917814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:16:03,004][01098] Avg episode reward: [(0, '24.453')] [2023-02-22 14:16:06,248][11397] Updated weights for policy 0, policy_version 900 (0.0030) [2023-02-22 14:16:08,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3694592. Throughput: 0: 987.7. Samples: 921292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:16:08,004][01098] Avg episode reward: [(0, '25.340')] [2023-02-22 14:16:08,012][11383] Saving new best policy, reward=25.340! [2023-02-22 14:16:13,001][01098] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3715072. Throughput: 0: 997.1. Samples: 928388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:16:13,005][01098] Avg episode reward: [(0, '25.338')] [2023-02-22 14:16:16,745][11397] Updated weights for policy 0, policy_version 910 (0.0017) [2023-02-22 14:16:18,005][01098] Fps is (10 sec: 3275.5, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 3727360. Throughput: 0: 947.8. Samples: 933026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:16:18,008][01098] Avg episode reward: [(0, '24.691')] [2023-02-22 14:16:23,001][01098] Fps is (10 sec: 3276.7, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 3747840. Throughput: 0: 946.7. Samples: 935252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:16:23,003][01098] Avg episode reward: [(0, '23.504')] [2023-02-22 14:16:27,303][11397] Updated weights for policy 0, policy_version 920 (0.0018) [2023-02-22 14:16:28,001][01098] Fps is (10 sec: 4097.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3768320. Throughput: 0: 988.4. Samples: 941752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:16:28,003][01098] Avg episode reward: [(0, '22.681')] [2023-02-22 14:16:33,001][01098] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3788800. Throughput: 0: 982.7. Samples: 948518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:16:33,006][01098] Avg episode reward: [(0, '23.198')] [2023-02-22 14:16:38,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3805184. Throughput: 0: 954.1. Samples: 950792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:16:38,003][01098] Avg episode reward: [(0, '24.277')] [2023-02-22 14:16:38,314][11397] Updated weights for policy 0, policy_version 930 (0.0012) [2023-02-22 14:16:43,001][01098] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3821568. Throughput: 0: 940.8. Samples: 955312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:16:43,006][01098] Avg episode reward: [(0, '23.972')] [2023-02-22 14:16:48,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3846144. Throughput: 0: 990.0. Samples: 962364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 14:16:48,006][01098] Avg episode reward: [(0, '24.601')] [2023-02-22 14:16:48,146][11397] Updated weights for policy 0, policy_version 940 (0.0017) [2023-02-22 14:16:53,001][01098] Fps is (10 sec: 4915.2, 60 sec: 3891.5, 300 sec: 3832.2). Total num frames: 3870720. Throughput: 0: 994.3. Samples: 966034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:16:53,004][01098] Avg episode reward: [(0, '24.666')] [2023-02-22 14:16:58,001][01098] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3883008. Throughput: 0: 944.5. Samples: 970890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:16:58,007][01098] Avg episode reward: [(0, '25.127')] [2023-02-22 14:16:59,987][11397] Updated weights for policy 0, policy_version 950 (0.0019) [2023-02-22 14:17:03,001][01098] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3899392. Throughput: 0: 948.8. Samples: 975718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 14:17:03,008][01098] Avg episode reward: [(0, '25.991')] [2023-02-22 14:17:03,023][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000952_3899392.pth... [2023-02-22 14:17:03,185][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000730_2990080.pth [2023-02-22 14:17:03,207][11383] Saving new best policy, reward=25.991! [2023-02-22 14:17:08,001][01098] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3923968. Throughput: 0: 973.1. Samples: 979042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:17:08,003][01098] Avg episode reward: [(0, '25.405')] [2023-02-22 14:17:09,415][11397] Updated weights for policy 0, policy_version 960 (0.0012) [2023-02-22 14:17:13,002][01098] Fps is (10 sec: 4505.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3944448. Throughput: 0: 987.9. Samples: 986210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:17:13,009][01098] Avg episode reward: [(0, '25.576')] [2023-02-22 14:17:18,006][01098] Fps is (10 sec: 3684.6, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 3960832. Throughput: 0: 939.6. Samples: 990804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:17:18,008][01098] Avg episode reward: [(0, '24.602')] [2023-02-22 14:17:21,474][11397] Updated weights for policy 0, policy_version 970 (0.0014) [2023-02-22 14:17:23,001][01098] Fps is (10 sec: 3277.1, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3977216. Throughput: 0: 939.2. Samples: 993058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 14:17:23,003][01098] Avg episode reward: [(0, '26.101')] [2023-02-22 14:17:23,017][11383] Saving new best policy, reward=26.101! [2023-02-22 14:17:28,001][01098] Fps is (10 sec: 4098.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 4001792. Throughput: 0: 985.3. Samples: 999652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 14:17:28,005][01098] Avg episode reward: [(0, '27.789')] [2023-02-22 14:17:28,015][11383] Saving new best policy, reward=27.789! [2023-02-22 14:17:28,745][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 14:17:28,747][11383] Stopping Batcher_0... [2023-02-22 14:17:28,759][11383] Loop batcher_evt_loop terminating... [2023-02-22 14:17:28,778][01098] Component Batcher_0 stopped! [2023-02-22 14:17:28,805][01098] Component RolloutWorker_w1 stopped! [2023-02-22 14:17:28,811][11399] Stopping RolloutWorker_w1... [2023-02-22 14:17:28,811][11399] Loop rollout_proc1_evt_loop terminating... [2023-02-22 14:17:28,825][01098] Component RolloutWorker_w6 stopped! [2023-02-22 14:17:28,827][01098] Component RolloutWorker_w5 stopped! [2023-02-22 14:17:28,829][11404] Stopping RolloutWorker_w6... [2023-02-22 14:17:28,830][01098] Component RolloutWorker_w2 stopped! [2023-02-22 14:17:28,832][11400] Stopping RolloutWorker_w2... [2023-02-22 14:17:28,825][11403] Stopping RolloutWorker_w5... [2023-02-22 14:17:28,833][11403] Loop rollout_proc5_evt_loop terminating... [2023-02-22 14:17:28,836][11400] Loop rollout_proc2_evt_loop terminating... [2023-02-22 14:17:28,842][11402] Stopping RolloutWorker_w4... [2023-02-22 14:17:28,842][11402] Loop rollout_proc4_evt_loop terminating... [2023-02-22 14:17:28,841][01098] Component RolloutWorker_w4 stopped! [2023-02-22 14:17:28,823][11397] Weights refcount: 2 0 [2023-02-22 14:17:28,849][11404] Loop rollout_proc6_evt_loop terminating... [2023-02-22 14:17:28,852][11397] Stopping InferenceWorker_p0-w0... [2023-02-22 14:17:28,852][11397] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 14:17:28,855][11401] Stopping RolloutWorker_w3... [2023-02-22 14:17:28,856][11401] Loop rollout_proc3_evt_loop terminating... [2023-02-22 14:17:28,852][01098] Component InferenceWorker_p0-w0 stopped! [2023-02-22 14:17:28,857][01098] Component RolloutWorker_w3 stopped! [2023-02-22 14:17:28,864][01098] Component RolloutWorker_w0 stopped! [2023-02-22 14:17:28,869][11398] Stopping RolloutWorker_w0... [2023-02-22 14:17:28,870][11398] Loop rollout_proc0_evt_loop terminating... [2023-02-22 14:17:28,868][01098] Component RolloutWorker_w7 stopped! [2023-02-22 14:17:28,869][11405] Stopping RolloutWorker_w7... [2023-02-22 14:17:28,882][11405] Loop rollout_proc7_evt_loop terminating... [2023-02-22 14:17:28,979][11383] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000839_3436544.pth [2023-02-22 14:17:28,999][11383] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 14:17:29,177][01098] Component LearnerWorker_p0 stopped! [2023-02-22 14:17:29,184][01098] Waiting for process learner_proc0 to stop... [2023-02-22 14:17:29,191][11383] Stopping LearnerWorker_p0... [2023-02-22 14:17:29,192][11383] Loop learner_proc0_evt_loop terminating... [2023-02-22 14:17:31,084][01098] Waiting for process inference_proc0-0 to join... [2023-02-22 14:17:31,393][01098] Waiting for process rollout_proc0 to join... [2023-02-22 14:17:31,744][01098] Waiting for process rollout_proc1 to join... [2023-02-22 14:17:31,748][01098] Waiting for process rollout_proc2 to join... [2023-02-22 14:17:31,751][01098] Waiting for process rollout_proc3 to join... [2023-02-22 14:17:31,756][01098] Waiting for process rollout_proc4 to join... [2023-02-22 14:17:31,757][01098] Waiting for process rollout_proc5 to join... [2023-02-22 14:17:31,758][01098] Waiting for process rollout_proc6 to join... [2023-02-22 14:17:31,764][01098] Waiting for process rollout_proc7 to join... [2023-02-22 14:17:31,769][01098] Batcher 0 profile tree view: batching: 26.0837, releasing_batches: 0.0247 [2023-02-22 14:17:31,776][01098] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 513.7760 update_model: 7.6910 weight_update: 0.0018 one_step: 0.0231 handle_policy_step: 512.5229 deserialize: 15.0519, stack: 2.9921, obs_to_device_normalize: 113.6286, forward: 246.2152, send_messages: 26.5018 prepare_outputs: 82.1053 to_cpu: 50.8105 [2023-02-22 14:17:31,781][01098] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 17.3735 train: 75.5126 epoch_init: 0.0107, minibatch_init: 0.0101, losses_postprocess: 0.6237, kl_divergence: 0.6037, after_optimizer: 32.7369 calculate_losses: 26.4190 losses_init: 0.0035, forward_head: 1.7061, bptt_initial: 17.3999, tail: 1.1866, advantages_returns: 0.2720, losses: 3.4053 bptt: 2.1120 bptt_forward_core: 2.0494 update: 14.5101 clip: 1.4633 [2023-02-22 14:17:31,783][01098] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2678, enqueue_policy_requests: 129.2975, env_step: 819.5471, overhead: 19.4131, complete_rollouts: 6.9613 save_policy_outputs: 20.1775 split_output_tensors: 9.9183 [2023-02-22 14:17:31,786][01098] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3145, enqueue_policy_requests: 134.5494, env_step: 812.6344, overhead: 19.3210, complete_rollouts: 7.0076 save_policy_outputs: 20.5431 split_output_tensors: 9.7633 [2023-02-22 14:17:31,787][01098] Loop Runner_EvtLoop terminating... [2023-02-22 14:17:31,789][01098] Runner profile tree view: main_loop: 1104.1999 [2023-02-22 14:17:31,791][01098] Collected {0: 4005888}, FPS: 3627.9 [2023-02-22 14:17:31,914][01098] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 14:17:31,916][01098] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 14:17:31,918][01098] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 14:17:31,920][01098] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 14:17:31,922][01098] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 14:17:31,925][01098] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 14:17:31,927][01098] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 14:17:31,928][01098] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 14:17:31,930][01098] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 14:17:31,931][01098] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 14:17:31,932][01098] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 14:17:31,933][01098] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 14:17:31,935][01098] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 14:17:31,936][01098] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 14:17:31,938][01098] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 14:17:31,967][01098] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 14:17:31,972][01098] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 14:17:31,975][01098] RunningMeanStd input shape: (1,) [2023-02-22 14:17:31,993][01098] ConvEncoder: input_channels=3 [2023-02-22 14:17:32,776][01098] Conv encoder output size: 512 [2023-02-22 14:17:32,783][01098] Policy head output size: 512 [2023-02-22 14:17:35,908][01098] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 14:17:37,209][01098] Num frames 100... [2023-02-22 14:17:37,321][01098] Num frames 200... [2023-02-22 14:17:37,430][01098] Num frames 300... [2023-02-22 14:17:37,544][01098] Num frames 400... [2023-02-22 14:17:37,656][01098] Num frames 500... [2023-02-22 14:17:37,773][01098] Num frames 600... [2023-02-22 14:17:37,888][01098] Num frames 700... [2023-02-22 14:17:37,983][01098] Avg episode rewards: #0: 15.360, true rewards: #0: 7.360 [2023-02-22 14:17:37,985][01098] Avg episode reward: 15.360, avg true_objective: 7.360 [2023-02-22 14:17:38,061][01098] Num frames 800... [2023-02-22 14:17:38,181][01098] Num frames 900... [2023-02-22 14:17:38,301][01098] Num frames 1000... [2023-02-22 14:17:38,413][01098] Num frames 1100... [2023-02-22 14:17:38,527][01098] Num frames 1200... [2023-02-22 14:17:38,641][01098] Num frames 1300... [2023-02-22 14:17:38,773][01098] Num frames 1400... [2023-02-22 14:17:38,839][01098] Avg episode rewards: #0: 14.535, true rewards: #0: 7.035 [2023-02-22 14:17:38,840][01098] Avg episode reward: 14.535, avg true_objective: 7.035 [2023-02-22 14:17:38,949][01098] Num frames 1500... [2023-02-22 14:17:39,063][01098] Num frames 1600... [2023-02-22 14:17:39,182][01098] Num frames 1700... [2023-02-22 14:17:39,301][01098] Num frames 1800... [2023-02-22 14:17:39,418][01098] Num frames 1900... [2023-02-22 14:17:39,532][01098] Num frames 2000... [2023-02-22 14:17:39,644][01098] Num frames 2100... [2023-02-22 14:17:39,762][01098] Num frames 2200... [2023-02-22 14:17:39,828][01098] Avg episode rewards: #0: 15.357, true rewards: #0: 7.357 [2023-02-22 14:17:39,829][01098] Avg episode reward: 15.357, avg true_objective: 7.357 [2023-02-22 14:17:39,945][01098] Num frames 2300... [2023-02-22 14:17:40,061][01098] Num frames 2400... [2023-02-22 14:17:40,181][01098] Num frames 2500... [2023-02-22 14:17:40,299][01098] Num frames 2600... [2023-02-22 14:17:40,412][01098] Num frames 2700... [2023-02-22 14:17:40,536][01098] Num frames 2800... [2023-02-22 14:17:40,649][01098] Num frames 2900... [2023-02-22 14:17:40,765][01098] Num frames 3000... [2023-02-22 14:17:40,877][01098] Num frames 3100... [2023-02-22 14:17:40,996][01098] Num frames 3200... [2023-02-22 14:17:41,105][01098] Num frames 3300... [2023-02-22 14:17:41,269][01098] Avg episode rewards: #0: 17.728, true rewards: #0: 8.477 [2023-02-22 14:17:41,271][01098] Avg episode reward: 17.728, avg true_objective: 8.477 [2023-02-22 14:17:41,284][01098] Num frames 3400... [2023-02-22 14:17:41,396][01098] Num frames 3500... [2023-02-22 14:17:41,505][01098] Num frames 3600... [2023-02-22 14:17:41,627][01098] Num frames 3700... [2023-02-22 14:17:41,743][01098] Num frames 3800... [2023-02-22 14:17:41,860][01098] Num frames 3900... [2023-02-22 14:17:41,974][01098] Num frames 4000... [2023-02-22 14:17:42,084][01098] Num frames 4100... [2023-02-22 14:17:42,204][01098] Num frames 4200... [2023-02-22 14:17:42,318][01098] Num frames 4300... [2023-02-22 14:17:42,429][01098] Num frames 4400... [2023-02-22 14:17:42,548][01098] Num frames 4500... [2023-02-22 14:17:42,659][01098] Num frames 4600... [2023-02-22 14:17:42,775][01098] Num frames 4700... [2023-02-22 14:17:42,894][01098] Num frames 4800... [2023-02-22 14:17:43,016][01098] Num frames 4900... [2023-02-22 14:17:43,129][01098] Num frames 5000... [2023-02-22 14:17:43,254][01098] Num frames 5100... [2023-02-22 14:17:43,374][01098] Num frames 5200... [2023-02-22 14:17:43,488][01098] Num frames 5300... [2023-02-22 14:17:43,648][01098] Avg episode rewards: #0: 25.772, true rewards: #0: 10.772 [2023-02-22 14:17:43,649][01098] Avg episode reward: 25.772, avg true_objective: 10.772 [2023-02-22 14:17:43,670][01098] Num frames 5400... [2023-02-22 14:17:43,782][01098] Num frames 5500... [2023-02-22 14:17:43,898][01098] Num frames 5600... [2023-02-22 14:17:44,013][01098] Num frames 5700... [2023-02-22 14:17:44,121][01098] Num frames 5800... [2023-02-22 14:17:44,238][01098] Num frames 5900... [2023-02-22 14:17:44,395][01098] Avg episode rewards: #0: 23.323, true rewards: #0: 9.990 [2023-02-22 14:17:44,399][01098] Avg episode reward: 23.323, avg true_objective: 9.990 [2023-02-22 14:17:44,408][01098] Num frames 6000... [2023-02-22 14:17:44,520][01098] Num frames 6100... [2023-02-22 14:17:44,630][01098] Num frames 6200... [2023-02-22 14:17:44,740][01098] Num frames 6300... [2023-02-22 14:17:44,850][01098] Num frames 6400... [2023-02-22 14:17:44,922][01098] Avg episode rewards: #0: 20.871, true rewards: #0: 9.157 [2023-02-22 14:17:44,924][01098] Avg episode reward: 20.871, avg true_objective: 9.157 [2023-02-22 14:17:45,024][01098] Num frames 6500... [2023-02-22 14:17:45,133][01098] Num frames 6600... [2023-02-22 14:17:45,251][01098] Num frames 6700... [2023-02-22 14:17:45,360][01098] Num frames 6800... [2023-02-22 14:17:45,471][01098] Num frames 6900... [2023-02-22 14:17:45,608][01098] Num frames 7000... [2023-02-22 14:17:45,768][01098] Num frames 7100... [2023-02-22 14:17:45,929][01098] Num frames 7200... [2023-02-22 14:17:46,087][01098] Num frames 7300... [2023-02-22 14:17:46,254][01098] Num frames 7400... [2023-02-22 14:17:46,405][01098] Num frames 7500... [2023-02-22 14:17:46,511][01098] Avg episode rewards: #0: 21.287, true rewards: #0: 9.412 [2023-02-22 14:17:46,514][01098] Avg episode reward: 21.287, avg true_objective: 9.412 [2023-02-22 14:17:46,635][01098] Num frames 7600... [2023-02-22 14:17:46,791][01098] Num frames 7700... [2023-02-22 14:17:46,947][01098] Num frames 7800... [2023-02-22 14:17:47,094][01098] Num frames 7900... [2023-02-22 14:17:47,251][01098] Num frames 8000... [2023-02-22 14:17:47,408][01098] Num frames 8100... [2023-02-22 14:17:47,561][01098] Num frames 8200... [2023-02-22 14:17:47,714][01098] Num frames 8300... [2023-02-22 14:17:47,871][01098] Avg episode rewards: #0: 20.958, true rewards: #0: 9.291 [2023-02-22 14:17:47,873][01098] Avg episode reward: 20.958, avg true_objective: 9.291 [2023-02-22 14:17:47,935][01098] Num frames 8400... [2023-02-22 14:17:48,090][01098] Num frames 8500... [2023-02-22 14:17:48,247][01098] Num frames 8600... [2023-02-22 14:17:48,407][01098] Num frames 8700... [2023-02-22 14:17:48,567][01098] Num frames 8800... [2023-02-22 14:17:48,731][01098] Num frames 8900... [2023-02-22 14:17:48,903][01098] Num frames 9000... [2023-02-22 14:17:49,066][01098] Num frames 9100... [2023-02-22 14:17:49,185][01098] Num frames 9200... [2023-02-22 14:17:49,302][01098] Num frames 9300... [2023-02-22 14:17:49,384][01098] Avg episode rewards: #0: 21.122, true rewards: #0: 9.322 [2023-02-22 14:17:49,387][01098] Avg episode reward: 21.122, avg true_objective: 9.322 [2023-02-22 14:18:48,330][01098] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 14:19:13,306][01098] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 14:19:13,309][01098] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 14:19:13,311][01098] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 14:19:13,315][01098] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 14:19:13,317][01098] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 14:19:13,318][01098] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 14:19:13,320][01098] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-22 14:19:13,322][01098] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 14:19:13,325][01098] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-22 14:19:13,326][01098] Adding new argument 'hf_repository'='Bill010602/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-22 14:19:13,327][01098] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 14:19:13,328][01098] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 14:19:13,332][01098] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 14:19:13,333][01098] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 14:19:13,335][01098] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 14:19:13,366][01098] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 14:19:13,369][01098] RunningMeanStd input shape: (1,) [2023-02-22 14:19:13,389][01098] ConvEncoder: input_channels=3 [2023-02-22 14:19:13,449][01098] Conv encoder output size: 512 [2023-02-22 14:19:13,452][01098] Policy head output size: 512 [2023-02-22 14:19:13,483][01098] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 14:19:14,128][01098] Num frames 100... [2023-02-22 14:19:14,290][01098] Num frames 200... [2023-02-22 14:19:14,454][01098] Num frames 300... [2023-02-22 14:19:14,613][01098] Num frames 400... [2023-02-22 14:19:14,773][01098] Num frames 500... [2023-02-22 14:19:14,932][01098] Num frames 600... [2023-02-22 14:19:15,087][01098] Num frames 700... [2023-02-22 14:19:15,240][01098] Num frames 800... [2023-02-22 14:19:15,362][01098] Num frames 900... [2023-02-22 14:19:15,482][01098] Num frames 1000... [2023-02-22 14:19:15,597][01098] Num frames 1100... [2023-02-22 14:19:15,711][01098] Num frames 1200... [2023-02-22 14:19:15,800][01098] Avg episode rewards: #0: 26.220, true rewards: #0: 12.220 [2023-02-22 14:19:15,801][01098] Avg episode reward: 26.220, avg true_objective: 12.220 [2023-02-22 14:19:15,892][01098] Num frames 1300... [2023-02-22 14:19:16,005][01098] Num frames 1400... [2023-02-22 14:19:16,121][01098] Num frames 1500... [2023-02-22 14:19:16,244][01098] Num frames 1600... [2023-02-22 14:19:16,365][01098] Num frames 1700... [2023-02-22 14:19:16,482][01098] Num frames 1800... [2023-02-22 14:19:16,596][01098] Num frames 1900... [2023-02-22 14:19:16,706][01098] Num frames 2000... [2023-02-22 14:19:16,819][01098] Num frames 2100... [2023-02-22 14:19:16,931][01098] Num frames 2200... [2023-02-22 14:19:17,046][01098] Num frames 2300... [2023-02-22 14:19:17,116][01098] Avg episode rewards: #0: 23.550, true rewards: #0: 11.550 [2023-02-22 14:19:17,118][01098] Avg episode reward: 23.550, avg true_objective: 11.550 [2023-02-22 14:19:17,222][01098] Num frames 2400... [2023-02-22 14:19:17,335][01098] Num frames 2500... [2023-02-22 14:19:17,452][01098] Num frames 2600... [2023-02-22 14:19:17,564][01098] Num frames 2700... [2023-02-22 14:19:17,682][01098] Num frames 2800... [2023-02-22 14:19:17,797][01098] Num frames 2900... [2023-02-22 14:19:17,909][01098] Num frames 3000... [2023-02-22 14:19:18,033][01098] Num frames 3100... [2023-02-22 14:19:18,145][01098] Num frames 3200... [2023-02-22 14:19:18,259][01098] Num frames 3300... [2023-02-22 14:19:18,371][01098] Num frames 3400... [2023-02-22 14:19:18,488][01098] Num frames 3500... [2023-02-22 14:19:18,627][01098] Avg episode rewards: #0: 26.557, true rewards: #0: 11.890 [2023-02-22 14:19:18,628][01098] Avg episode reward: 26.557, avg true_objective: 11.890 [2023-02-22 14:19:18,667][01098] Num frames 3600... [2023-02-22 14:19:18,776][01098] Num frames 3700... [2023-02-22 14:19:18,896][01098] Num frames 3800... [2023-02-22 14:19:19,013][01098] Num frames 3900... [2023-02-22 14:19:19,134][01098] Num frames 4000... [2023-02-22 14:19:19,248][01098] Num frames 4100... [2023-02-22 14:19:19,364][01098] Num frames 4200... [2023-02-22 14:19:19,484][01098] Num frames 4300... [2023-02-22 14:19:19,615][01098] Num frames 4400... [2023-02-22 14:19:19,729][01098] Num frames 4500... [2023-02-22 14:19:19,851][01098] Num frames 4600... [2023-02-22 14:19:19,965][01098] Num frames 4700... [2023-02-22 14:19:20,088][01098] Num frames 4800... [2023-02-22 14:19:20,198][01098] Num frames 4900... [2023-02-22 14:19:20,318][01098] Num frames 5000... [2023-02-22 14:19:20,433][01098] Num frames 5100... [2023-02-22 14:19:20,559][01098] Num frames 5200... [2023-02-22 14:19:20,674][01098] Num frames 5300... [2023-02-22 14:19:20,810][01098] Num frames 5400... [2023-02-22 14:19:20,929][01098] Num frames 5500... [2023-02-22 14:19:21,044][01098] Num frames 5600... [2023-02-22 14:19:21,173][01098] Avg episode rewards: #0: 35.167, true rewards: #0: 14.167 [2023-02-22 14:19:21,175][01098] Avg episode reward: 35.167, avg true_objective: 14.167 [2023-02-22 14:19:21,222][01098] Num frames 5700... [2023-02-22 14:19:21,335][01098] Num frames 5800... [2023-02-22 14:19:21,446][01098] Num frames 5900... [2023-02-22 14:19:21,567][01098] Num frames 6000... [2023-02-22 14:19:21,679][01098] Num frames 6100... [2023-02-22 14:19:21,793][01098] Num frames 6200... [2023-02-22 14:19:21,915][01098] Num frames 6300... [2023-02-22 14:19:22,028][01098] Num frames 6400... [2023-02-22 14:19:22,157][01098] Avg episode rewards: #0: 30.934, true rewards: #0: 12.934 [2023-02-22 14:19:22,159][01098] Avg episode reward: 30.934, avg true_objective: 12.934 [2023-02-22 14:19:22,199][01098] Num frames 6500... [2023-02-22 14:19:22,320][01098] Num frames 6600... [2023-02-22 14:19:22,437][01098] Num frames 6700... [2023-02-22 14:19:22,556][01098] Num frames 6800... [2023-02-22 14:19:22,667][01098] Num frames 6900... [2023-02-22 14:19:22,784][01098] Num frames 7000... [2023-02-22 14:19:22,898][01098] Num frames 7100... [2023-02-22 14:19:23,013][01098] Num frames 7200... [2023-02-22 14:19:23,125][01098] Num frames 7300... [2023-02-22 14:19:23,244][01098] Num frames 7400... [2023-02-22 14:19:23,354][01098] Num frames 7500... [2023-02-22 14:19:23,464][01098] Num frames 7600... [2023-02-22 14:19:23,585][01098] Num frames 7700... [2023-02-22 14:19:23,696][01098] Num frames 7800... [2023-02-22 14:19:23,810][01098] Num frames 7900... [2023-02-22 14:19:23,927][01098] Num frames 8000... [2023-02-22 14:19:24,041][01098] Num frames 8100... [2023-02-22 14:19:24,155][01098] Num frames 8200... [2023-02-22 14:19:24,274][01098] Num frames 8300... [2023-02-22 14:19:24,448][01098] Avg episode rewards: #0: 34.163, true rewards: #0: 13.997 [2023-02-22 14:19:24,449][01098] Avg episode reward: 34.163, avg true_objective: 13.997 [2023-02-22 14:19:24,457][01098] Num frames 8400... [2023-02-22 14:19:24,574][01098] Num frames 8500... [2023-02-22 14:19:24,686][01098] Num frames 8600... [2023-02-22 14:19:24,804][01098] Num frames 8700... [2023-02-22 14:19:24,926][01098] Num frames 8800... [2023-02-22 14:19:25,037][01098] Num frames 8900... [2023-02-22 14:19:25,148][01098] Num frames 9000... [2023-02-22 14:19:25,305][01098] Num frames 9100... [2023-02-22 14:19:25,471][01098] Num frames 9200... [2023-02-22 14:19:25,633][01098] Num frames 9300... [2023-02-22 14:19:25,793][01098] Num frames 9400... [2023-02-22 14:19:25,949][01098] Num frames 9500... [2023-02-22 14:19:26,120][01098] Num frames 9600... [2023-02-22 14:19:26,210][01098] Avg episode rewards: #0: 33.028, true rewards: #0: 13.743 [2023-02-22 14:19:26,213][01098] Avg episode reward: 33.028, avg true_objective: 13.743 [2023-02-22 14:19:26,347][01098] Num frames 9700... [2023-02-22 14:19:26,515][01098] Num frames 9800... [2023-02-22 14:19:26,677][01098] Num frames 9900... [2023-02-22 14:19:26,835][01098] Num frames 10000... [2023-02-22 14:19:27,002][01098] Num frames 10100... [2023-02-22 14:19:27,169][01098] Num frames 10200... [2023-02-22 14:19:27,337][01098] Num frames 10300... [2023-02-22 14:19:27,502][01098] Num frames 10400... [2023-02-22 14:19:27,663][01098] Num frames 10500... [2023-02-22 14:19:27,842][01098] Num frames 10600... [2023-02-22 14:19:28,011][01098] Num frames 10700... [2023-02-22 14:19:28,180][01098] Num frames 10800... [2023-02-22 14:19:28,352][01098] Num frames 10900... [2023-02-22 14:19:28,513][01098] Num frames 11000... [2023-02-22 14:19:28,673][01098] Num frames 11100... [2023-02-22 14:19:28,831][01098] Num frames 11200... [2023-02-22 14:19:28,960][01098] Num frames 11300... [2023-02-22 14:19:29,076][01098] Num frames 11400... [2023-02-22 14:19:29,159][01098] Avg episode rewards: #0: 35.277, true rewards: #0: 14.277 [2023-02-22 14:19:29,160][01098] Avg episode reward: 35.277, avg true_objective: 14.277 [2023-02-22 14:19:29,264][01098] Num frames 11500... [2023-02-22 14:19:29,379][01098] Num frames 11600... [2023-02-22 14:19:29,493][01098] Num frames 11700... [2023-02-22 14:19:29,608][01098] Num frames 11800... [2023-02-22 14:19:29,707][01098] Avg episode rewards: #0: 32.264, true rewards: #0: 13.153 [2023-02-22 14:19:29,709][01098] Avg episode reward: 32.264, avg true_objective: 13.153 [2023-02-22 14:19:29,790][01098] Num frames 11900... [2023-02-22 14:19:29,904][01098] Num frames 12000... [2023-02-22 14:19:30,033][01098] Num frames 12100... [2023-02-22 14:19:30,146][01098] Num frames 12200... [2023-02-22 14:19:30,270][01098] Num frames 12300... [2023-02-22 14:19:30,385][01098] Num frames 12400... [2023-02-22 14:19:30,498][01098] Num frames 12500... [2023-02-22 14:19:30,617][01098] Num frames 12600... [2023-02-22 14:19:30,737][01098] Num frames 12700... [2023-02-22 14:19:30,868][01098] Num frames 12800... [2023-02-22 14:19:30,987][01098] Num frames 12900... [2023-02-22 14:19:31,102][01098] Num frames 13000... [2023-02-22 14:19:31,217][01098] Num frames 13100... [2023-02-22 14:19:31,333][01098] Num frames 13200... [2023-02-22 14:19:31,453][01098] Num frames 13300... [2023-02-22 14:19:31,572][01098] Num frames 13400... [2023-02-22 14:19:31,692][01098] Num frames 13500... [2023-02-22 14:19:31,818][01098] Num frames 13600... [2023-02-22 14:19:31,932][01098] Num frames 13700... [2023-02-22 14:19:32,050][01098] Num frames 13800... [2023-02-22 14:19:32,169][01098] Num frames 13900... [2023-02-22 14:19:32,271][01098] Avg episode rewards: #0: 35.138, true rewards: #0: 13.938 [2023-02-22 14:19:32,273][01098] Avg episode reward: 35.138, avg true_objective: 13.938 [2023-02-22 14:20:57,434][01098] Replay video saved to /content/train_dir/default_experiment/replay.mp4!