[2023-02-25 13:42:27,039][09251] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-25 13:42:27,043][09251] Rollout worker 0 uses device cpu [2023-02-25 13:42:27,044][09251] Rollout worker 1 uses device cpu [2023-02-25 13:42:27,046][09251] Rollout worker 2 uses device cpu [2023-02-25 13:42:27,049][09251] Rollout worker 3 uses device cpu [2023-02-25 13:42:27,051][09251] Rollout worker 4 uses device cpu [2023-02-25 13:42:27,053][09251] Rollout worker 5 uses device cpu [2023-02-25 13:42:27,055][09251] Rollout worker 6 uses device cpu [2023-02-25 13:42:27,057][09251] Rollout worker 7 uses device cpu [2023-02-25 13:42:27,255][09251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-25 13:42:27,257][09251] InferenceWorker_p0-w0: min num requests: 2 [2023-02-25 13:42:27,290][09251] Starting all processes... [2023-02-25 13:42:27,291][09251] Starting process learner_proc0 [2023-02-25 13:42:27,347][09251] Starting all processes... [2023-02-25 13:42:27,358][09251] Starting process inference_proc0-0 [2023-02-25 13:42:27,361][09251] Starting process rollout_proc0 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc1 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc2 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc3 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc4 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc5 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc6 [2023-02-25 13:42:27,362][09251] Starting process rollout_proc7 [2023-02-25 13:42:38,827][15037] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-25 13:42:38,832][15037] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-25 13:42:39,277][15055] Worker 3 uses CPU cores [1] [2023-02-25 13:42:39,387][15051] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-25 13:42:39,389][15059] Worker 6 uses CPU cores [0] [2023-02-25 13:42:39,390][15051] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-25 13:42:39,570][15058] Worker 7 uses CPU cores [1] [2023-02-25 13:42:39,579][15052] Worker 0 uses CPU cores [0] [2023-02-25 13:42:39,606][15053] Worker 1 uses CPU cores [1] [2023-02-25 13:42:39,657][15054] Worker 2 uses CPU cores [0] [2023-02-25 13:42:39,669][15037] Num visible devices: 1 [2023-02-25 13:42:39,669][15051] Num visible devices: 1 [2023-02-25 13:42:39,697][15037] Starting seed is not provided [2023-02-25 13:42:39,698][15037] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-25 13:42:39,699][15037] Initializing actor-critic model on device cuda:0 [2023-02-25 13:42:39,700][15037] RunningMeanStd input shape: (3, 72, 128) [2023-02-25 13:42:39,702][15037] RunningMeanStd input shape: (1,) [2023-02-25 13:42:39,728][15057] Worker 5 uses CPU cores [1] [2023-02-25 13:42:39,739][15037] ConvEncoder: input_channels=3 [2023-02-25 13:42:39,743][15056] Worker 4 uses CPU cores [0] [2023-02-25 13:42:40,028][15037] Conv encoder output size: 512 [2023-02-25 13:42:40,028][15037] Policy head output size: 512 [2023-02-25 13:42:40,083][15037] Created Actor Critic model with architecture: [2023-02-25 13:42:40,083][15037] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-25 13:42:47,248][09251] Heartbeat connected on Batcher_0 [2023-02-25 13:42:47,256][09251] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-25 13:42:47,266][09251] Heartbeat connected on RolloutWorker_w0 [2023-02-25 13:42:47,269][09251] Heartbeat connected on RolloutWorker_w1 [2023-02-25 13:42:47,272][09251] Heartbeat connected on RolloutWorker_w2 [2023-02-25 13:42:47,276][09251] Heartbeat connected on RolloutWorker_w3 [2023-02-25 13:42:47,279][09251] Heartbeat connected on RolloutWorker_w4 [2023-02-25 13:42:47,283][09251] Heartbeat connected on RolloutWorker_w5 [2023-02-25 13:42:47,286][09251] Heartbeat connected on RolloutWorker_w6 [2023-02-25 13:42:47,288][09251] Heartbeat connected on RolloutWorker_w7 [2023-02-25 13:42:48,253][15037] Using optimizer [2023-02-25 13:42:48,254][15037] No checkpoints found [2023-02-25 13:42:48,254][15037] Did not load from checkpoint, starting from scratch! [2023-02-25 13:42:48,254][15037] Initialized policy 0 weights for model version 0 [2023-02-25 13:42:48,261][15037] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-25 13:42:48,275][15037] LearnerWorker_p0 finished initialization! [2023-02-25 13:42:48,276][09251] Heartbeat connected on LearnerWorker_p0 [2023-02-25 13:42:48,471][15051] RunningMeanStd input shape: (3, 72, 128) [2023-02-25 13:42:48,472][15051] RunningMeanStd input shape: (1,) [2023-02-25 13:42:48,485][15051] ConvEncoder: input_channels=3 [2023-02-25 13:42:48,587][15051] Conv encoder output size: 512 [2023-02-25 13:42:48,587][15051] Policy head output size: 512 [2023-02-25 13:42:50,892][09251] Inference worker 0-0 is ready! [2023-02-25 13:42:50,894][09251] All inference workers are ready! Signal rollout workers to start! [2023-02-25 13:42:51,015][15054] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,024][15052] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,038][15056] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,043][15057] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,042][15055] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,067][15059] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,077][15053] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,076][15058] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 13:42:51,938][15057] Decorrelating experience for 0 frames... [2023-02-25 13:42:51,936][15055] Decorrelating experience for 0 frames... [2023-02-25 13:42:52,554][15054] Decorrelating experience for 0 frames... [2023-02-25 13:42:52,557][15052] Decorrelating experience for 0 frames... [2023-02-25 13:42:52,566][15056] Decorrelating experience for 0 frames... [2023-02-25 13:42:52,574][15059] Decorrelating experience for 0 frames... [2023-02-25 13:42:52,822][09251] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-25 13:42:52,930][15054] Decorrelating experience for 32 frames... [2023-02-25 13:42:53,156][15058] Decorrelating experience for 0 frames... [2023-02-25 13:42:53,169][15055] Decorrelating experience for 32 frames... [2023-02-25 13:42:53,187][15057] Decorrelating experience for 32 frames... [2023-02-25 13:42:53,535][15054] Decorrelating experience for 64 frames... [2023-02-25 13:42:53,920][15053] Decorrelating experience for 0 frames... [2023-02-25 13:42:53,996][15052] Decorrelating experience for 32 frames... [2023-02-25 13:42:54,208][15058] Decorrelating experience for 32 frames... [2023-02-25 13:42:54,345][15055] Decorrelating experience for 64 frames... [2023-02-25 13:42:54,487][15054] Decorrelating experience for 96 frames... [2023-02-25 13:42:55,177][15056] Decorrelating experience for 32 frames... [2023-02-25 13:42:55,268][15058] Decorrelating experience for 64 frames... [2023-02-25 13:42:55,465][15059] Decorrelating experience for 32 frames... [2023-02-25 13:42:55,469][15055] Decorrelating experience for 96 frames... [2023-02-25 13:42:55,772][15052] Decorrelating experience for 64 frames... [2023-02-25 13:42:56,602][15053] Decorrelating experience for 32 frames... [2023-02-25 13:42:56,754][15058] Decorrelating experience for 96 frames... [2023-02-25 13:42:56,820][15057] Decorrelating experience for 64 frames... [2023-02-25 13:42:57,452][15053] Decorrelating experience for 64 frames... [2023-02-25 13:42:57,822][09251] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-25 13:42:58,071][15053] Decorrelating experience for 96 frames... [2023-02-25 13:42:58,427][15059] Decorrelating experience for 64 frames... [2023-02-25 13:42:58,814][15056] Decorrelating experience for 64 frames... [2023-02-25 13:42:58,984][15052] Decorrelating experience for 96 frames... [2023-02-25 13:42:59,130][15057] Decorrelating experience for 96 frames... [2023-02-25 13:43:00,433][15059] Decorrelating experience for 96 frames... [2023-02-25 13:43:00,810][15056] Decorrelating experience for 96 frames... [2023-02-25 13:43:02,822][09251] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.2. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-25 13:43:02,828][09251] Avg episode reward: [(0, '1.368')] [2023-02-25 13:43:05,261][15037] Signal inference workers to stop experience collection... [2023-02-25 13:43:05,284][15051] InferenceWorker_p0-w0: stopping experience collection [2023-02-25 13:43:07,821][09251] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 174.5. Samples: 2618. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-25 13:43:07,823][09251] Avg episode reward: [(0, '1.941')] [2023-02-25 13:43:07,846][15037] Signal inference workers to resume experience collection... [2023-02-25 13:43:07,848][15051] InferenceWorker_p0-w0: resuming experience collection [2023-02-25 13:43:12,821][09251] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 315.7. Samples: 6314. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-02-25 13:43:12,824][09251] Avg episode reward: [(0, '3.493')] [2023-02-25 13:43:16,094][15051] Updated weights for policy 0, policy_version 10 (0.0374) [2023-02-25 13:43:17,822][09251] Fps is (10 sec: 4505.5, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 386.3. Samples: 9658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:43:17,826][09251] Avg episode reward: [(0, '4.196')] [2023-02-25 13:43:22,821][09251] Fps is (10 sec: 3686.4, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 492.5. Samples: 14776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:43:22,826][09251] Avg episode reward: [(0, '4.255')] [2023-02-25 13:43:27,821][09251] Fps is (10 sec: 3276.9, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 543.3. Samples: 19014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:43:27,830][09251] Avg episode reward: [(0, '4.382')] [2023-02-25 13:43:28,596][15051] Updated weights for policy 0, policy_version 20 (0.0022) [2023-02-25 13:43:32,821][09251] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 560.9. Samples: 22436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:43:32,824][09251] Avg episode reward: [(0, '4.363')] [2023-02-25 13:43:37,821][09251] Fps is (10 sec: 4096.0, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 650.0. Samples: 29252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:43:37,824][09251] Avg episode reward: [(0, '4.395')] [2023-02-25 13:43:37,826][15037] Saving new best policy, reward=4.395! [2023-02-25 13:43:38,096][15051] Updated weights for policy 0, policy_version 30 (0.0012) [2023-02-25 13:43:42,823][09251] Fps is (10 sec: 3685.8, 60 sec: 2703.3, 300 sec: 2703.3). Total num frames: 135168. Throughput: 0: 754.9. Samples: 33972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:43:42,826][09251] Avg episode reward: [(0, '4.484')] [2023-02-25 13:43:42,837][15037] Saving new best policy, reward=4.484! [2023-02-25 13:43:47,821][09251] Fps is (10 sec: 3276.8, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 799.4. Samples: 36004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:43:47,823][09251] Avg episode reward: [(0, '4.376')] [2023-02-25 13:43:50,480][15051] Updated weights for policy 0, policy_version 40 (0.0025) [2023-02-25 13:43:52,821][09251] Fps is (10 sec: 3687.0, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 878.0. Samples: 42130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:43:52,829][09251] Avg episode reward: [(0, '4.451')] [2023-02-25 13:43:57,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 947.4. Samples: 48946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:43:57,823][09251] Avg episode reward: [(0, '4.467')] [2023-02-25 13:44:00,921][15051] Updated weights for policy 0, policy_version 50 (0.0022) [2023-02-25 13:44:02,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 920.1. Samples: 51062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:44:02,828][09251] Avg episode reward: [(0, '4.398')] [2023-02-25 13:44:07,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3003.7). Total num frames: 225280. Throughput: 0: 902.6. Samples: 55394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:07,824][09251] Avg episode reward: [(0, '4.345')] [2023-02-25 13:44:12,020][15051] Updated weights for policy 0, policy_version 60 (0.0023) [2023-02-25 13:44:12,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 951.0. Samples: 61808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:44:12,829][09251] Avg episode reward: [(0, '4.418')] [2023-02-25 13:44:17,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 949.8. Samples: 65176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:17,826][09251] Avg episode reward: [(0, '4.478')] [2023-02-25 13:44:22,825][09251] Fps is (10 sec: 3684.9, 60 sec: 3686.2, 300 sec: 3140.1). Total num frames: 282624. Throughput: 0: 915.1. Samples: 70434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:44:22,832][09251] Avg episode reward: [(0, '4.507')] [2023-02-25 13:44:22,843][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2023-02-25 13:44:23,048][15037] Saving new best policy, reward=4.507! [2023-02-25 13:44:23,409][15051] Updated weights for policy 0, policy_version 70 (0.0021) [2023-02-25 13:44:27,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 903.0. Samples: 74606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:27,824][09251] Avg episode reward: [(0, '4.564')] [2023-02-25 13:44:27,829][15037] Saving new best policy, reward=4.564! [2023-02-25 13:44:32,821][09251] Fps is (10 sec: 3687.9, 60 sec: 3686.4, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 926.3. Samples: 77688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:44:32,828][09251] Avg episode reward: [(0, '4.562')] [2023-02-25 13:44:34,029][15051] Updated weights for policy 0, policy_version 80 (0.0022) [2023-02-25 13:44:37,822][09251] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 942.9. Samples: 84560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:37,826][09251] Avg episode reward: [(0, '4.288')] [2023-02-25 13:44:42,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3239.6). Total num frames: 356352. Throughput: 0: 901.5. Samples: 89514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:44:42,828][09251] Avg episode reward: [(0, '4.469')] [2023-02-25 13:44:46,181][15051] Updated weights for policy 0, policy_version 90 (0.0014) [2023-02-25 13:44:47,821][09251] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3241.2). Total num frames: 372736. Throughput: 0: 902.7. Samples: 91682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:47,828][09251] Avg episode reward: [(0, '4.462')] [2023-02-25 13:44:52,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 936.7. Samples: 97546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:44:52,827][09251] Avg episode reward: [(0, '4.563')] [2023-02-25 13:44:55,830][15051] Updated weights for policy 0, policy_version 100 (0.0022) [2023-02-25 13:44:57,827][09251] Fps is (10 sec: 4502.8, 60 sec: 3754.3, 300 sec: 3342.2). Total num frames: 417792. Throughput: 0: 945.8. Samples: 104374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:44:57,840][09251] Avg episode reward: [(0, '4.741')] [2023-02-25 13:44:57,844][15037] Saving new best policy, reward=4.741! [2023-02-25 13:45:02,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3308.3). Total num frames: 430080. Throughput: 0: 923.1. Samples: 106714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:45:02,824][09251] Avg episode reward: [(0, '4.677')] [2023-02-25 13:45:07,821][09251] Fps is (10 sec: 2869.0, 60 sec: 3686.4, 300 sec: 3307.1). Total num frames: 446464. Throughput: 0: 903.4. Samples: 111082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:45:07,829][09251] Avg episode reward: [(0, '4.581')] [2023-02-25 13:45:08,299][15051] Updated weights for policy 0, policy_version 110 (0.0031) [2023-02-25 13:45:12,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 947.9. Samples: 117262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:45:12,828][09251] Avg episode reward: [(0, '4.552')] [2023-02-25 13:45:17,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3361.5). Total num frames: 487424. Throughput: 0: 946.3. Samples: 120270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:45:17,825][09251] Avg episode reward: [(0, '4.679')] [2023-02-25 13:45:19,553][15051] Updated weights for policy 0, policy_version 120 (0.0014) [2023-02-25 13:45:22,821][09251] Fps is (10 sec: 2457.6, 60 sec: 3550.1, 300 sec: 3304.1). Total num frames: 495616. Throughput: 0: 878.8. Samples: 124106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:45:22,826][09251] Avg episode reward: [(0, '4.652')] [2023-02-25 13:45:27,821][09251] Fps is (10 sec: 2048.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 507904. Throughput: 0: 844.3. Samples: 127506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:45:27,824][09251] Avg episode reward: [(0, '4.604')] [2023-02-25 13:45:32,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 844.9. Samples: 129704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:45:32,829][09251] Avg episode reward: [(0, '4.485')] [2023-02-25 13:45:33,685][15051] Updated weights for policy 0, policy_version 130 (0.0033) [2023-02-25 13:45:37,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3326.5). Total num frames: 548864. Throughput: 0: 856.1. Samples: 136072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:45:37,823][09251] Avg episode reward: [(0, '4.518')] [2023-02-25 13:45:42,821][09251] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3349.1). Total num frames: 569344. Throughput: 0: 845.9. Samples: 142436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:45:42,824][09251] Avg episode reward: [(0, '4.425')] [2023-02-25 13:45:43,971][15051] Updated weights for policy 0, policy_version 140 (0.0015) [2023-02-25 13:45:47,823][09251] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3323.6). Total num frames: 581632. Throughput: 0: 842.4. Samples: 144622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:45:47,829][09251] Avg episode reward: [(0, '4.379')] [2023-02-25 13:45:52,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3322.3). Total num frames: 598016. Throughput: 0: 840.8. Samples: 148918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:45:52,823][09251] Avg episode reward: [(0, '4.375')] [2023-02-25 13:45:55,614][15051] Updated weights for policy 0, policy_version 150 (0.0015) [2023-02-25 13:45:57,821][09251] Fps is (10 sec: 4096.7, 60 sec: 3413.7, 300 sec: 3365.4). Total num frames: 622592. Throughput: 0: 852.8. Samples: 155638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:45:57,828][09251] Avg episode reward: [(0, '4.225')] [2023-02-25 13:46:02,823][09251] Fps is (10 sec: 4504.8, 60 sec: 3549.8, 300 sec: 3384.6). Total num frames: 643072. Throughput: 0: 863.3. Samples: 159122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:46:02,829][09251] Avg episode reward: [(0, '4.353')] [2023-02-25 13:46:06,575][15051] Updated weights for policy 0, policy_version 160 (0.0019) [2023-02-25 13:46:07,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3360.8). Total num frames: 655360. Throughput: 0: 886.0. Samples: 163974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:46:07,824][09251] Avg episode reward: [(0, '4.672')] [2023-02-25 13:46:12,821][09251] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3379.2). Total num frames: 675840. Throughput: 0: 913.6. Samples: 168620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:46:12,829][09251] Avg episode reward: [(0, '4.707')] [2023-02-25 13:46:17,082][15051] Updated weights for policy 0, policy_version 170 (0.0024) [2023-02-25 13:46:17,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3396.7). Total num frames: 696320. Throughput: 0: 941.2. Samples: 172056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:46:17,824][09251] Avg episode reward: [(0, '4.302')] [2023-02-25 13:46:22,822][09251] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3413.3). Total num frames: 716800. Throughput: 0: 950.9. Samples: 178864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:46:22,824][09251] Avg episode reward: [(0, '4.350')] [2023-02-25 13:46:22,836][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000175_716800.pth... [2023-02-25 13:46:27,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3391.1). Total num frames: 729088. Throughput: 0: 906.5. Samples: 183228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:46:27,825][09251] Avg episode reward: [(0, '4.453')] [2023-02-25 13:46:29,375][15051] Updated weights for policy 0, policy_version 180 (0.0011) [2023-02-25 13:46:32,821][09251] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3407.1). Total num frames: 749568. Throughput: 0: 906.0. Samples: 185390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:46:32,824][09251] Avg episode reward: [(0, '4.631')] [2023-02-25 13:46:37,821][09251] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3422.4). Total num frames: 770048. Throughput: 0: 949.3. Samples: 191638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:46:37,829][09251] Avg episode reward: [(0, '4.617')] [2023-02-25 13:46:39,042][15051] Updated weights for policy 0, policy_version 190 (0.0023) [2023-02-25 13:46:42,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3437.1). Total num frames: 790528. Throughput: 0: 947.5. Samples: 198276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:46:42,827][09251] Avg episode reward: [(0, '4.456')] [2023-02-25 13:46:47,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3433.7). Total num frames: 806912. Throughput: 0: 919.2. Samples: 200486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:46:47,826][09251] Avg episode reward: [(0, '4.494')] [2023-02-25 13:46:51,469][15051] Updated weights for policy 0, policy_version 200 (0.0013) [2023-02-25 13:46:52,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3430.4). Total num frames: 823296. Throughput: 0: 910.0. Samples: 204922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:46:52,829][09251] Avg episode reward: [(0, '4.499')] [2023-02-25 13:46:57,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3460.7). Total num frames: 847872. Throughput: 0: 952.5. Samples: 211482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:46:57,827][09251] Avg episode reward: [(0, '4.475')] [2023-02-25 13:47:00,604][15051] Updated weights for policy 0, policy_version 210 (0.0013) [2023-02-25 13:47:02,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3473.4). Total num frames: 868352. Throughput: 0: 952.6. Samples: 214922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:02,827][09251] Avg episode reward: [(0, '4.552')] [2023-02-25 13:47:07,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3453.5). Total num frames: 880640. Throughput: 0: 912.0. Samples: 219902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:07,826][09251] Avg episode reward: [(0, '4.664')] [2023-02-25 13:47:12,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3450.1). Total num frames: 897024. Throughput: 0: 914.5. Samples: 224382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:12,829][09251] Avg episode reward: [(0, '4.576')] [2023-02-25 13:47:13,185][15051] Updated weights for policy 0, policy_version 220 (0.0038) [2023-02-25 13:47:17,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3477.7). Total num frames: 921600. Throughput: 0: 943.4. Samples: 227844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:17,824][09251] Avg episode reward: [(0, '4.428')] [2023-02-25 13:47:22,043][15051] Updated weights for policy 0, policy_version 230 (0.0020) [2023-02-25 13:47:22,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3489.2). Total num frames: 942080. Throughput: 0: 954.1. Samples: 234574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:47:22,824][09251] Avg episode reward: [(0, '4.485')] [2023-02-25 13:47:27,828][09251] Fps is (10 sec: 3274.6, 60 sec: 3754.3, 300 sec: 3470.3). Total num frames: 954368. Throughput: 0: 911.0. Samples: 239276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-25 13:47:27,834][09251] Avg episode reward: [(0, '4.532')] [2023-02-25 13:47:32,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3467.0). Total num frames: 970752. Throughput: 0: 909.6. Samples: 241418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:47:32,825][09251] Avg episode reward: [(0, '4.522')] [2023-02-25 13:47:35,090][15051] Updated weights for policy 0, policy_version 240 (0.0012) [2023-02-25 13:47:37,821][09251] Fps is (10 sec: 4098.8, 60 sec: 3754.7, 300 sec: 3492.4). Total num frames: 995328. Throughput: 0: 942.2. Samples: 247320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:37,828][09251] Avg episode reward: [(0, '4.414')] [2023-02-25 13:47:42,823][09251] Fps is (10 sec: 4504.9, 60 sec: 3754.6, 300 sec: 3502.8). Total num frames: 1015808. Throughput: 0: 941.4. Samples: 253846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:47:42,831][09251] Avg episode reward: [(0, '4.588')] [2023-02-25 13:47:45,546][15051] Updated weights for policy 0, policy_version 250 (0.0028) [2023-02-25 13:47:47,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 913.2. Samples: 256018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:47:47,826][09251] Avg episode reward: [(0, '4.539')] [2023-02-25 13:47:52,821][09251] Fps is (10 sec: 2867.7, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1044480. Throughput: 0: 900.8. Samples: 260436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:47:52,824][09251] Avg episode reward: [(0, '4.430')] [2023-02-25 13:47:57,024][15051] Updated weights for policy 0, policy_version 260 (0.0025) [2023-02-25 13:47:57,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 943.5. Samples: 266840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:47:57,828][09251] Avg episode reward: [(0, '4.446')] [2023-02-25 13:48:02,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 940.0. Samples: 270144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:48:02,827][09251] Avg episode reward: [(0, '4.657')] [2023-02-25 13:48:07,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1101824. Throughput: 0: 907.6. Samples: 275414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:48:07,829][09251] Avg episode reward: [(0, '4.748')] [2023-02-25 13:48:07,831][15037] Saving new best policy, reward=4.748! [2023-02-25 13:48:08,385][15051] Updated weights for policy 0, policy_version 270 (0.0018) [2023-02-25 13:48:12,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1118208. Throughput: 0: 899.0. Samples: 279726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:48:12,830][09251] Avg episode reward: [(0, '4.912')] [2023-02-25 13:48:12,842][15037] Saving new best policy, reward=4.912! [2023-02-25 13:48:17,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1142784. Throughput: 0: 922.8. Samples: 282942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:48:17,828][09251] Avg episode reward: [(0, '4.704')] [2023-02-25 13:48:18,803][15051] Updated weights for policy 0, policy_version 280 (0.0016) [2023-02-25 13:48:22,829][09251] Fps is (10 sec: 4502.1, 60 sec: 3685.9, 300 sec: 3679.4). Total num frames: 1163264. Throughput: 0: 943.4. Samples: 289782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:48:22,831][09251] Avg episode reward: [(0, '4.642')] [2023-02-25 13:48:22,853][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth... [2023-02-25 13:48:23,020][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2023-02-25 13:48:27,823][09251] Fps is (10 sec: 3276.1, 60 sec: 3686.7, 300 sec: 3651.7). Total num frames: 1175552. Throughput: 0: 905.9. Samples: 294612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:48:27,826][09251] Avg episode reward: [(0, '4.687')] [2023-02-25 13:48:32,013][15051] Updated weights for policy 0, policy_version 290 (0.0021) [2023-02-25 13:48:32,823][09251] Fps is (10 sec: 2459.0, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 1187840. Throughput: 0: 896.9. Samples: 296380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:48:32,826][09251] Avg episode reward: [(0, '4.932')] [2023-02-25 13:48:32,838][15037] Saving new best policy, reward=4.932! [2023-02-25 13:48:37,821][09251] Fps is (10 sec: 2458.1, 60 sec: 3413.3, 300 sec: 3610.1). Total num frames: 1200128. Throughput: 0: 876.2. Samples: 299864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:48:37,824][09251] Avg episode reward: [(0, '4.945')] [2023-02-25 13:48:37,833][15037] Saving new best policy, reward=4.945! [2023-02-25 13:48:42,821][09251] Fps is (10 sec: 3277.5, 60 sec: 3413.4, 300 sec: 3623.9). Total num frames: 1220608. Throughput: 0: 851.8. Samples: 305170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:48:42,830][09251] Avg episode reward: [(0, '4.892')] [2023-02-25 13:48:44,295][15051] Updated weights for policy 0, policy_version 300 (0.0023) [2023-02-25 13:48:47,823][09251] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 1241088. Throughput: 0: 855.0. Samples: 308620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:48:47,826][09251] Avg episode reward: [(0, '4.883')] [2023-02-25 13:48:52,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 1253376. Throughput: 0: 835.5. Samples: 313012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:48:52,830][09251] Avg episode reward: [(0, '4.979')] [2023-02-25 13:48:52,839][15037] Saving new best policy, reward=4.979! [2023-02-25 13:48:56,793][15051] Updated weights for policy 0, policy_version 310 (0.0039) [2023-02-25 13:48:57,823][09251] Fps is (10 sec: 3276.9, 60 sec: 3481.5, 300 sec: 3610.0). Total num frames: 1273856. Throughput: 0: 852.2. Samples: 318078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:48:57,825][09251] Avg episode reward: [(0, '4.941')] [2023-02-25 13:49:02,821][09251] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3623.9). Total num frames: 1294336. Throughput: 0: 856.5. Samples: 321486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:49:02,830][09251] Avg episode reward: [(0, '4.765')] [2023-02-25 13:49:06,251][15051] Updated weights for policy 0, policy_version 320 (0.0014) [2023-02-25 13:49:07,821][09251] Fps is (10 sec: 4096.6, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1314816. Throughput: 0: 852.9. Samples: 328156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:49:07,826][09251] Avg episode reward: [(0, '5.144')] [2023-02-25 13:49:07,831][15037] Saving new best policy, reward=5.144! [2023-02-25 13:49:12,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 1327104. Throughput: 0: 841.7. Samples: 332486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:49:12,826][09251] Avg episode reward: [(0, '5.150')] [2023-02-25 13:49:12,839][15037] Saving new best policy, reward=5.150! [2023-02-25 13:49:17,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3610.1). Total num frames: 1347584. Throughput: 0: 849.7. Samples: 334614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:49:17,824][09251] Avg episode reward: [(0, '5.215')] [2023-02-25 13:49:17,831][15037] Saving new best policy, reward=5.215! [2023-02-25 13:49:18,576][15051] Updated weights for policy 0, policy_version 330 (0.0021) [2023-02-25 13:49:22,822][09251] Fps is (10 sec: 4096.0, 60 sec: 3413.8, 300 sec: 3623.9). Total num frames: 1368064. Throughput: 0: 918.6. Samples: 341202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:49:22,827][09251] Avg episode reward: [(0, '5.224')] [2023-02-25 13:49:22,839][15037] Saving new best policy, reward=5.224! [2023-02-25 13:49:27,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 1388544. Throughput: 0: 941.7. Samples: 347548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:49:27,824][09251] Avg episode reward: [(0, '5.324')] [2023-02-25 13:49:27,830][15037] Saving new best policy, reward=5.324! [2023-02-25 13:49:28,665][15051] Updated weights for policy 0, policy_version 340 (0.0022) [2023-02-25 13:49:32,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 1400832. Throughput: 0: 910.5. Samples: 349590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:49:32,826][09251] Avg episode reward: [(0, '5.326')] [2023-02-25 13:49:32,843][15037] Saving new best policy, reward=5.326! [2023-02-25 13:49:37,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1421312. Throughput: 0: 910.5. Samples: 353986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:49:37,828][09251] Avg episode reward: [(0, '5.690')] [2023-02-25 13:49:37,832][15037] Saving new best policy, reward=5.690! [2023-02-25 13:49:40,096][15051] Updated weights for policy 0, policy_version 350 (0.0017) [2023-02-25 13:49:42,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1441792. Throughput: 0: 951.1. Samples: 360876. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:49:42,824][09251] Avg episode reward: [(0, '5.716')] [2023-02-25 13:49:42,882][15037] Saving new best policy, reward=5.716! [2023-02-25 13:49:47,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 1462272. Throughput: 0: 950.3. Samples: 364248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:49:47,824][09251] Avg episode reward: [(0, '5.390')] [2023-02-25 13:49:50,999][15051] Updated weights for policy 0, policy_version 360 (0.0021) [2023-02-25 13:49:52,825][09251] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3596.2). Total num frames: 1478656. Throughput: 0: 907.0. Samples: 368974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:49:52,831][09251] Avg episode reward: [(0, '5.542')] [2023-02-25 13:49:57,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3610.0). Total num frames: 1495040. Throughput: 0: 919.0. Samples: 373840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:49:57,828][09251] Avg episode reward: [(0, '5.795')] [2023-02-25 13:49:57,833][15037] Saving new best policy, reward=5.795! [2023-02-25 13:50:01,796][15051] Updated weights for policy 0, policy_version 370 (0.0017) [2023-02-25 13:50:02,821][09251] Fps is (10 sec: 4097.6, 60 sec: 3754.7, 300 sec: 3637.8). Total num frames: 1519616. Throughput: 0: 947.5. Samples: 377250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:50:02,824][09251] Avg episode reward: [(0, '5.895')] [2023-02-25 13:50:02,840][15037] Saving new best policy, reward=5.895! [2023-02-25 13:50:07,822][09251] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1536000. Throughput: 0: 952.5. Samples: 384066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:50:07,828][09251] Avg episode reward: [(0, '6.122')] [2023-02-25 13:50:07,833][15037] Saving new best policy, reward=6.122! [2023-02-25 13:50:12,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 1552384. Throughput: 0: 909.2. Samples: 388464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:50:12,827][09251] Avg episode reward: [(0, '6.345')] [2023-02-25 13:50:12,843][15037] Saving new best policy, reward=6.345! [2023-02-25 13:50:13,770][15051] Updated weights for policy 0, policy_version 380 (0.0014) [2023-02-25 13:50:17,821][09251] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1568768. Throughput: 0: 911.5. Samples: 390606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:50:17,829][09251] Avg episode reward: [(0, '6.601')] [2023-02-25 13:50:17,833][15037] Saving new best policy, reward=6.601! [2023-02-25 13:50:22,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1593344. Throughput: 0: 956.8. Samples: 397044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:50:22,824][09251] Avg episode reward: [(0, '6.478')] [2023-02-25 13:50:22,835][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000389_1593344.pth... [2023-02-25 13:50:22,992][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000175_716800.pth [2023-02-25 13:50:23,630][15051] Updated weights for policy 0, policy_version 390 (0.0015) [2023-02-25 13:50:27,823][09251] Fps is (10 sec: 4504.8, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 1613824. Throughput: 0: 945.4. Samples: 403420. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2023-02-25 13:50:27,829][09251] Avg episode reward: [(0, '6.728')] [2023-02-25 13:50:27,838][15037] Saving new best policy, reward=6.728! [2023-02-25 13:50:32,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1626112. Throughput: 0: 916.9. Samples: 405510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:50:32,824][09251] Avg episode reward: [(0, '6.590')] [2023-02-25 13:50:36,369][15051] Updated weights for policy 0, policy_version 400 (0.0013) [2023-02-25 13:50:37,821][09251] Fps is (10 sec: 2867.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1642496. Throughput: 0: 909.8. Samples: 409912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:50:37,827][09251] Avg episode reward: [(0, '6.477')] [2023-02-25 13:50:42,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1667072. Throughput: 0: 955.3. Samples: 416828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:50:42,824][09251] Avg episode reward: [(0, '6.916')] [2023-02-25 13:50:42,837][15037] Saving new best policy, reward=6.916! [2023-02-25 13:50:45,229][15051] Updated weights for policy 0, policy_version 410 (0.0014) [2023-02-25 13:50:47,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1687552. Throughput: 0: 955.5. Samples: 420248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:50:47,825][09251] Avg episode reward: [(0, '7.660')] [2023-02-25 13:50:47,828][15037] Saving new best policy, reward=7.660! [2023-02-25 13:50:52,824][09251] Fps is (10 sec: 3685.3, 60 sec: 3754.7, 300 sec: 3665.5). Total num frames: 1703936. Throughput: 0: 914.0. Samples: 425200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:50:52,831][09251] Avg episode reward: [(0, '7.711')] [2023-02-25 13:50:52,847][15037] Saving new best policy, reward=7.711! [2023-02-25 13:50:57,665][15051] Updated weights for policy 0, policy_version 420 (0.0017) [2023-02-25 13:50:57,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1720320. Throughput: 0: 916.0. Samples: 429686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-25 13:50:57,827][09251] Avg episode reward: [(0, '7.927')] [2023-02-25 13:50:57,830][15037] Saving new best policy, reward=7.927! [2023-02-25 13:51:02,821][09251] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1740800. Throughput: 0: 946.9. Samples: 433216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:51:02,829][09251] Avg episode reward: [(0, '8.054')] [2023-02-25 13:51:02,843][15037] Saving new best policy, reward=8.054! [2023-02-25 13:51:06,772][15051] Updated weights for policy 0, policy_version 430 (0.0024) [2023-02-25 13:51:07,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1761280. Throughput: 0: 955.2. Samples: 440030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:51:07,825][09251] Avg episode reward: [(0, '8.258')] [2023-02-25 13:51:07,832][15037] Saving new best policy, reward=8.258! [2023-02-25 13:51:12,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1777664. Throughput: 0: 915.4. Samples: 444610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:51:12,831][09251] Avg episode reward: [(0, '8.165')] [2023-02-25 13:51:17,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1794048. Throughput: 0: 919.0. Samples: 446864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:51:17,829][09251] Avg episode reward: [(0, '8.894')] [2023-02-25 13:51:17,832][15037] Saving new best policy, reward=8.894! [2023-02-25 13:51:19,206][15051] Updated weights for policy 0, policy_version 440 (0.0017) [2023-02-25 13:51:22,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1818624. Throughput: 0: 963.7. Samples: 453278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:51:22,827][09251] Avg episode reward: [(0, '8.967')] [2023-02-25 13:51:22,837][15037] Saving new best policy, reward=8.967! [2023-02-25 13:51:27,825][09251] Fps is (10 sec: 4503.8, 60 sec: 3754.5, 300 sec: 3693.3). Total num frames: 1839104. Throughput: 0: 961.6. Samples: 460106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:51:27,828][09251] Avg episode reward: [(0, '9.430')] [2023-02-25 13:51:27,835][15037] Saving new best policy, reward=9.430! [2023-02-25 13:51:28,814][15051] Updated weights for policy 0, policy_version 450 (0.0013) [2023-02-25 13:51:32,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1851392. Throughput: 0: 933.1. Samples: 462238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:51:32,831][09251] Avg episode reward: [(0, '9.492')] [2023-02-25 13:51:32,843][15037] Saving new best policy, reward=9.492! [2023-02-25 13:51:37,821][09251] Fps is (10 sec: 3278.1, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 1871872. Throughput: 0: 920.2. Samples: 466608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:51:37,826][09251] Avg episode reward: [(0, '10.239')] [2023-02-25 13:51:37,831][15037] Saving new best policy, reward=10.239! [2023-02-25 13:51:40,536][15051] Updated weights for policy 0, policy_version 460 (0.0026) [2023-02-25 13:51:42,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1892352. Throughput: 0: 972.8. Samples: 473462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:51:42,824][09251] Avg episode reward: [(0, '10.262')] [2023-02-25 13:51:42,834][15037] Saving new best policy, reward=10.262! [2023-02-25 13:51:47,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1908736. Throughput: 0: 971.1. Samples: 476914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-25 13:51:47,829][09251] Avg episode reward: [(0, '10.776')] [2023-02-25 13:51:47,834][15037] Saving new best policy, reward=10.776! [2023-02-25 13:51:52,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 1921024. Throughput: 0: 900.3. Samples: 480542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-25 13:51:52,829][09251] Avg episode reward: [(0, '10.449')] [2023-02-25 13:51:53,476][15051] Updated weights for policy 0, policy_version 470 (0.0012) [2023-02-25 13:51:57,822][09251] Fps is (10 sec: 2457.4, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 1933312. Throughput: 0: 876.5. Samples: 484054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:51:57,825][09251] Avg episode reward: [(0, '10.403')] [2023-02-25 13:52:02,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1953792. Throughput: 0: 872.2. Samples: 486114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:52:02,827][09251] Avg episode reward: [(0, '10.896')] [2023-02-25 13:52:02,837][15037] Saving new best policy, reward=10.896! [2023-02-25 13:52:05,317][15051] Updated weights for policy 0, policy_version 480 (0.0029) [2023-02-25 13:52:07,821][09251] Fps is (10 sec: 4096.4, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1974272. Throughput: 0: 880.5. Samples: 492900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:52:07,823][09251] Avg episode reward: [(0, '11.116')] [2023-02-25 13:52:07,826][15037] Saving new best policy, reward=11.116! [2023-02-25 13:52:12,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 1994752. Throughput: 0: 871.8. Samples: 499334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:52:12,824][09251] Avg episode reward: [(0, '12.358')] [2023-02-25 13:52:12,844][15037] Saving new best policy, reward=12.358! [2023-02-25 13:52:15,972][15051] Updated weights for policy 0, policy_version 490 (0.0016) [2023-02-25 13:52:17,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2011136. Throughput: 0: 871.4. Samples: 501450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:52:17,824][09251] Avg episode reward: [(0, '12.808')] [2023-02-25 13:52:17,827][15037] Saving new best policy, reward=12.808! [2023-02-25 13:52:22,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3637.9). Total num frames: 2027520. Throughput: 0: 877.9. Samples: 506114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:52:22,824][09251] Avg episode reward: [(0, '12.752')] [2023-02-25 13:52:22,853][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000496_2031616.pth... [2023-02-25 13:52:22,963][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth [2023-02-25 13:52:26,492][15051] Updated weights for policy 0, policy_version 500 (0.0011) [2023-02-25 13:52:27,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3550.1, 300 sec: 3665.6). Total num frames: 2052096. Throughput: 0: 880.8. Samples: 513098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:52:27,824][09251] Avg episode reward: [(0, '13.331')] [2023-02-25 13:52:27,833][15037] Saving new best policy, reward=13.331! [2023-02-25 13:52:32,822][09251] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2072576. Throughput: 0: 881.2. Samples: 516570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:52:32,827][09251] Avg episode reward: [(0, '12.913')] [2023-02-25 13:52:37,822][09251] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 2084864. Throughput: 0: 901.1. Samples: 521094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:52:37,828][09251] Avg episode reward: [(0, '14.020')] [2023-02-25 13:52:37,834][15037] Saving new best policy, reward=14.020! [2023-02-25 13:52:38,598][15051] Updated weights for policy 0, policy_version 510 (0.0013) [2023-02-25 13:52:42,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2105344. Throughput: 0: 930.6. Samples: 525928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:52:42,829][09251] Avg episode reward: [(0, '14.661')] [2023-02-25 13:52:42,842][15037] Saving new best policy, reward=14.661! [2023-02-25 13:52:47,821][09251] Fps is (10 sec: 4096.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2125824. Throughput: 0: 960.6. Samples: 529342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:52:47,829][09251] Avg episode reward: [(0, '15.672')] [2023-02-25 13:52:47,833][15037] Saving new best policy, reward=15.672! [2023-02-25 13:52:48,044][15051] Updated weights for policy 0, policy_version 520 (0.0022) [2023-02-25 13:52:52,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2146304. Throughput: 0: 961.0. Samples: 536146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:52:52,824][09251] Avg episode reward: [(0, '15.587')] [2023-02-25 13:52:57,822][09251] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 2162688. Throughput: 0: 919.0. Samples: 540692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:52:57,828][09251] Avg episode reward: [(0, '14.027')] [2023-02-25 13:53:00,437][15051] Updated weights for policy 0, policy_version 530 (0.0030) [2023-02-25 13:53:02,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 2179072. Throughput: 0: 923.2. Samples: 542996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:53:02,824][09251] Avg episode reward: [(0, '14.420')] [2023-02-25 13:53:07,821][09251] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2203648. Throughput: 0: 968.9. Samples: 549716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:53:07,824][09251] Avg episode reward: [(0, '15.455')] [2023-02-25 13:53:09,151][15051] Updated weights for policy 0, policy_version 540 (0.0022) [2023-02-25 13:53:12,821][09251] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2224128. Throughput: 0: 956.7. Samples: 556150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:53:12,829][09251] Avg episode reward: [(0, '16.080')] [2023-02-25 13:53:12,836][15037] Saving new best policy, reward=16.080! [2023-02-25 13:53:17,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3637.9). Total num frames: 2236416. Throughput: 0: 928.0. Samples: 558328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:53:17,824][09251] Avg episode reward: [(0, '16.151')] [2023-02-25 13:53:17,826][15037] Saving new best policy, reward=16.151! [2023-02-25 13:53:21,640][15051] Updated weights for policy 0, policy_version 550 (0.0018) [2023-02-25 13:53:22,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2256896. Throughput: 0: 930.8. Samples: 562980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:53:22,829][09251] Avg episode reward: [(0, '16.672')] [2023-02-25 13:53:22,838][15037] Saving new best policy, reward=16.672! [2023-02-25 13:53:27,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3707.3). Total num frames: 2281472. Throughput: 0: 979.5. Samples: 570004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:53:27,830][09251] Avg episode reward: [(0, '17.151')] [2023-02-25 13:53:27,832][15037] Saving new best policy, reward=17.151! [2023-02-25 13:53:30,327][15051] Updated weights for policy 0, policy_version 560 (0.0026) [2023-02-25 13:53:32,822][09251] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 2297856. Throughput: 0: 982.2. Samples: 573542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:53:32,825][09251] Avg episode reward: [(0, '16.661')] [2023-02-25 13:53:37,824][09251] Fps is (10 sec: 3276.6, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 2314240. Throughput: 0: 932.0. Samples: 578088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:53:37,834][09251] Avg episode reward: [(0, '17.125')] [2023-02-25 13:53:42,821][09251] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3693.4). Total num frames: 2330624. Throughput: 0: 947.8. Samples: 583340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:53:42,828][09251] Avg episode reward: [(0, '15.840')] [2023-02-25 13:53:42,841][15051] Updated weights for policy 0, policy_version 570 (0.0047) [2023-02-25 13:53:47,821][09251] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2355200. Throughput: 0: 975.3. Samples: 586886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:53:47,824][09251] Avg episode reward: [(0, '14.812')] [2023-02-25 13:53:52,236][15051] Updated weights for policy 0, policy_version 580 (0.0012) [2023-02-25 13:53:52,823][09251] Fps is (10 sec: 4504.7, 60 sec: 3822.8, 300 sec: 3735.0). Total num frames: 2375680. Throughput: 0: 973.7. Samples: 593534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:53:52,827][09251] Avg episode reward: [(0, '15.720')] [2023-02-25 13:53:57,822][09251] Fps is (10 sec: 3276.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2387968. Throughput: 0: 927.4. Samples: 597884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:53:57,831][09251] Avg episode reward: [(0, '15.714')] [2023-02-25 13:54:02,821][09251] Fps is (10 sec: 3277.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 2408448. Throughput: 0: 929.6. Samples: 600162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:54:02,823][09251] Avg episode reward: [(0, '16.039')] [2023-02-25 13:54:03,781][15051] Updated weights for policy 0, policy_version 590 (0.0020) [2023-02-25 13:54:07,821][09251] Fps is (10 sec: 4506.1, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2433024. Throughput: 0: 984.0. Samples: 607260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:54:07,830][09251] Avg episode reward: [(0, '15.132')] [2023-02-25 13:54:12,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 2453504. Throughput: 0: 967.0. Samples: 613520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:54:12,826][09251] Avg episode reward: [(0, '14.663')] [2023-02-25 13:54:13,861][15051] Updated weights for policy 0, policy_version 600 (0.0015) [2023-02-25 13:54:17,824][09251] Fps is (10 sec: 3276.1, 60 sec: 3822.8, 300 sec: 3721.1). Total num frames: 2465792. Throughput: 0: 937.2. Samples: 615716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:54:17,833][09251] Avg episode reward: [(0, '15.108')] [2023-02-25 13:54:22,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 2486272. Throughput: 0: 948.1. Samples: 620750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:54:22,824][09251] Avg episode reward: [(0, '14.133')] [2023-02-25 13:54:22,835][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000607_2486272.pth... [2023-02-25 13:54:22,983][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000389_1593344.pth [2023-02-25 13:54:24,759][15051] Updated weights for policy 0, policy_version 610 (0.0015) [2023-02-25 13:54:27,821][09251] Fps is (10 sec: 4506.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2510848. Throughput: 0: 988.4. Samples: 627816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:54:27,824][09251] Avg episode reward: [(0, '16.719')] [2023-02-25 13:54:32,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2531328. Throughput: 0: 987.2. Samples: 631310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:54:32,825][09251] Avg episode reward: [(0, '15.650')] [2023-02-25 13:54:35,549][15051] Updated weights for policy 0, policy_version 620 (0.0014) [2023-02-25 13:54:37,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 2543616. Throughput: 0: 936.8. Samples: 635690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:54:37,827][09251] Avg episode reward: [(0, '17.027')] [2023-02-25 13:54:42,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 2564096. Throughput: 0: 964.0. Samples: 641262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:54:42,824][09251] Avg episode reward: [(0, '18.127')] [2023-02-25 13:54:42,831][15037] Saving new best policy, reward=18.127! [2023-02-25 13:54:45,733][15051] Updated weights for policy 0, policy_version 630 (0.0029) [2023-02-25 13:54:47,821][09251] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 2588672. Throughput: 0: 989.9. Samples: 644708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:54:47,824][09251] Avg episode reward: [(0, '18.251')] [2023-02-25 13:54:47,827][15037] Saving new best policy, reward=18.251! [2023-02-25 13:54:52,822][09251] Fps is (10 sec: 4095.9, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 2605056. Throughput: 0: 975.6. Samples: 651162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:54:52,824][09251] Avg episode reward: [(0, '18.207')] [2023-02-25 13:54:57,619][15051] Updated weights for policy 0, policy_version 640 (0.0043) [2023-02-25 13:54:57,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3735.0). Total num frames: 2621440. Throughput: 0: 936.8. Samples: 655674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:54:57,825][09251] Avg episode reward: [(0, '18.453')] [2023-02-25 13:54:57,829][15037] Saving new best policy, reward=18.453! [2023-02-25 13:55:02,821][09251] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 2641920. Throughput: 0: 941.5. Samples: 658080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:55:02,824][09251] Avg episode reward: [(0, '17.066')] [2023-02-25 13:55:07,824][09251] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3748.8). Total num frames: 2658304. Throughput: 0: 960.2. Samples: 663960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:55:07,827][09251] Avg episode reward: [(0, '16.462')] [2023-02-25 13:55:09,141][15051] Updated weights for policy 0, policy_version 650 (0.0011) [2023-02-25 13:55:12,822][09251] Fps is (10 sec: 2867.1, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2670592. Throughput: 0: 890.8. Samples: 667902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:55:12,827][09251] Avg episode reward: [(0, '18.152')] [2023-02-25 13:55:17,822][09251] Fps is (10 sec: 2458.2, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 2682880. Throughput: 0: 852.5. Samples: 669672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:55:17,823][09251] Avg episode reward: [(0, '19.601')] [2023-02-25 13:55:17,829][15037] Saving new best policy, reward=19.601! [2023-02-25 13:55:22,821][09251] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 2699264. Throughput: 0: 853.3. Samples: 674088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:55:22,824][09251] Avg episode reward: [(0, '19.745')] [2023-02-25 13:55:22,837][15037] Saving new best policy, reward=19.745! [2023-02-25 13:55:23,090][15051] Updated weights for policy 0, policy_version 660 (0.0012) [2023-02-25 13:55:27,821][09251] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2723840. Throughput: 0: 883.5. Samples: 681018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:55:27,827][09251] Avg episode reward: [(0, '20.533')] [2023-02-25 13:55:27,830][15037] Saving new best policy, reward=20.533! [2023-02-25 13:55:32,190][15051] Updated weights for policy 0, policy_version 670 (0.0028) [2023-02-25 13:55:32,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 2744320. Throughput: 0: 884.5. Samples: 684510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:55:32,823][09251] Avg episode reward: [(0, '20.961')] [2023-02-25 13:55:32,840][15037] Saving new best policy, reward=20.961! [2023-02-25 13:55:37,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2756608. Throughput: 0: 846.5. Samples: 689254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:55:37,835][09251] Avg episode reward: [(0, '22.508')] [2023-02-25 13:55:37,839][15037] Saving new best policy, reward=22.508! [2023-02-25 13:55:42,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2777088. Throughput: 0: 855.1. Samples: 694152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:55:42,829][09251] Avg episode reward: [(0, '20.459')] [2023-02-25 13:55:44,207][15051] Updated weights for policy 0, policy_version 680 (0.0025) [2023-02-25 13:55:47,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3721.2). Total num frames: 2801664. Throughput: 0: 882.8. Samples: 697808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:55:47,829][09251] Avg episode reward: [(0, '19.317')] [2023-02-25 13:55:52,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 2822144. Throughput: 0: 908.6. Samples: 704846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-25 13:55:52,826][09251] Avg episode reward: [(0, '18.740')] [2023-02-25 13:55:54,310][15051] Updated weights for policy 0, policy_version 690 (0.0012) [2023-02-25 13:55:57,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 2834432. Throughput: 0: 918.9. Samples: 709254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:55:57,826][09251] Avg episode reward: [(0, '18.333')] [2023-02-25 13:56:02,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 2854912. Throughput: 0: 928.5. Samples: 711456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:56:02,824][09251] Avg episode reward: [(0, '17.836')] [2023-02-25 13:56:05,361][15051] Updated weights for policy 0, policy_version 700 (0.0016) [2023-02-25 13:56:07,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3721.1). Total num frames: 2875392. Throughput: 0: 979.4. Samples: 718162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:56:07,829][09251] Avg episode reward: [(0, '19.669')] [2023-02-25 13:56:12,825][09251] Fps is (10 sec: 4094.7, 60 sec: 3754.5, 300 sec: 3735.0). Total num frames: 2895872. Throughput: 0: 966.4. Samples: 724508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:56:12,827][09251] Avg episode reward: [(0, '19.880')] [2023-02-25 13:56:16,309][15051] Updated weights for policy 0, policy_version 710 (0.0021) [2023-02-25 13:56:17,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3707.2). Total num frames: 2912256. Throughput: 0: 938.0. Samples: 726718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:56:17,824][09251] Avg episode reward: [(0, '19.258')] [2023-02-25 13:56:22,821][09251] Fps is (10 sec: 3277.8, 60 sec: 3822.9, 300 sec: 3693.4). Total num frames: 2928640. Throughput: 0: 936.4. Samples: 731394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:56:22,829][09251] Avg episode reward: [(0, '19.381')] [2023-02-25 13:56:22,837][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000715_2928640.pth... [2023-02-25 13:56:22,967][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000496_2031616.pth [2023-02-25 13:56:26,454][15051] Updated weights for policy 0, policy_version 720 (0.0021) [2023-02-25 13:56:27,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2953216. Throughput: 0: 984.9. Samples: 738472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:56:27,827][09251] Avg episode reward: [(0, '19.032')] [2023-02-25 13:56:32,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2973696. Throughput: 0: 981.7. Samples: 741984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:56:32,826][09251] Avg episode reward: [(0, '18.649')] [2023-02-25 13:56:37,823][09251] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 3707.2). Total num frames: 2985984. Throughput: 0: 928.3. Samples: 746620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:56:37,828][09251] Avg episode reward: [(0, '20.724')] [2023-02-25 13:56:37,911][15051] Updated weights for policy 0, policy_version 730 (0.0014) [2023-02-25 13:56:42,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3006464. Throughput: 0: 944.1. Samples: 751740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:56:42,824][09251] Avg episode reward: [(0, '21.188')] [2023-02-25 13:56:47,371][15051] Updated weights for policy 0, policy_version 740 (0.0015) [2023-02-25 13:56:47,821][09251] Fps is (10 sec: 4506.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3031040. Throughput: 0: 973.7. Samples: 755274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-25 13:56:47,830][09251] Avg episode reward: [(0, '21.611')] [2023-02-25 13:56:52,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3051520. Throughput: 0: 977.4. Samples: 762144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:56:52,828][09251] Avg episode reward: [(0, '22.911')] [2023-02-25 13:56:52,840][15037] Saving new best policy, reward=22.911! [2023-02-25 13:56:57,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3063808. Throughput: 0: 934.6. Samples: 766562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:56:57,830][09251] Avg episode reward: [(0, '22.242')] [2023-02-25 13:56:59,669][15051] Updated weights for policy 0, policy_version 750 (0.0016) [2023-02-25 13:57:02,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3084288. Throughput: 0: 935.8. Samples: 768828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:57:02,824][09251] Avg episode reward: [(0, '20.085')] [2023-02-25 13:57:07,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3108864. Throughput: 0: 988.4. Samples: 775870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:07,829][09251] Avg episode reward: [(0, '20.251')] [2023-02-25 13:57:08,405][15051] Updated weights for policy 0, policy_version 760 (0.0014) [2023-02-25 13:57:12,821][09251] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3776.7). Total num frames: 3125248. Throughput: 0: 969.3. Samples: 782092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:57:12,826][09251] Avg episode reward: [(0, '20.388')] [2023-02-25 13:57:17,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3141632. Throughput: 0: 940.5. Samples: 784306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:17,827][09251] Avg episode reward: [(0, '20.707')] [2023-02-25 13:57:20,743][15051] Updated weights for policy 0, policy_version 770 (0.0031) [2023-02-25 13:57:22,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3162112. Throughput: 0: 946.7. Samples: 789218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:22,824][09251] Avg episode reward: [(0, '20.954')] [2023-02-25 13:57:27,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3186688. Throughput: 0: 991.4. Samples: 796354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:27,827][09251] Avg episode reward: [(0, '21.617')] [2023-02-25 13:57:29,424][15051] Updated weights for policy 0, policy_version 780 (0.0013) [2023-02-25 13:57:32,823][09251] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 3203072. Throughput: 0: 990.7. Samples: 799856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:32,828][09251] Avg episode reward: [(0, '21.486')] [2023-02-25 13:57:37,822][09251] Fps is (10 sec: 3276.4, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 3219456. Throughput: 0: 933.4. Samples: 804150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:57:37,831][09251] Avg episode reward: [(0, '21.513')] [2023-02-25 13:57:42,155][15051] Updated weights for policy 0, policy_version 790 (0.0022) [2023-02-25 13:57:42,822][09251] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3235840. Throughput: 0: 949.8. Samples: 809302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:57:42,829][09251] Avg episode reward: [(0, '21.038')] [2023-02-25 13:57:47,821][09251] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3260416. Throughput: 0: 976.9. Samples: 812788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:47,824][09251] Avg episode reward: [(0, '21.092')] [2023-02-25 13:57:51,579][15051] Updated weights for policy 0, policy_version 800 (0.0011) [2023-02-25 13:57:52,821][09251] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3276800. Throughput: 0: 964.7. Samples: 819280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:52,828][09251] Avg episode reward: [(0, '21.826')] [2023-02-25 13:57:57,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3293184. Throughput: 0: 923.5. Samples: 823652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:57:57,830][09251] Avg episode reward: [(0, '22.505')] [2023-02-25 13:58:02,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3313664. Throughput: 0: 927.0. Samples: 826022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:58:02,823][09251] Avg episode reward: [(0, '22.700')] [2023-02-25 13:58:03,303][15051] Updated weights for policy 0, policy_version 810 (0.0015) [2023-02-25 13:58:07,821][09251] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3338240. Throughput: 0: 975.1. Samples: 833098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:58:07,824][09251] Avg episode reward: [(0, '24.397')] [2023-02-25 13:58:07,830][15037] Saving new best policy, reward=24.397! [2023-02-25 13:58:12,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3354624. Throughput: 0: 949.6. Samples: 839086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:58:12,829][09251] Avg episode reward: [(0, '23.047')] [2023-02-25 13:58:13,440][15051] Updated weights for policy 0, policy_version 820 (0.0032) [2023-02-25 13:58:17,822][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3371008. Throughput: 0: 921.9. Samples: 841340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:58:17,828][09251] Avg episode reward: [(0, '23.326')] [2023-02-25 13:58:22,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3391488. Throughput: 0: 937.4. Samples: 846332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:58:22,823][09251] Avg episode reward: [(0, '21.221')] [2023-02-25 13:58:22,838][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000828_3391488.pth... [2023-02-25 13:58:22,956][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000607_2486272.pth [2023-02-25 13:58:24,430][15051] Updated weights for policy 0, policy_version 830 (0.0023) [2023-02-25 13:58:27,824][09251] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 3411968. Throughput: 0: 981.2. Samples: 853458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:58:27,828][09251] Avg episode reward: [(0, '20.632')] [2023-02-25 13:58:32,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 3424256. Throughput: 0: 955.9. Samples: 855804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:58:32,824][09251] Avg episode reward: [(0, '19.714')] [2023-02-25 13:58:37,821][09251] Fps is (10 sec: 2458.3, 60 sec: 3618.2, 300 sec: 3748.9). Total num frames: 3436544. Throughput: 0: 888.8. Samples: 859274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:58:37,829][09251] Avg episode reward: [(0, '19.333')] [2023-02-25 13:58:38,511][15051] Updated weights for policy 0, policy_version 840 (0.0012) [2023-02-25 13:58:42,823][09251] Fps is (10 sec: 2457.4, 60 sec: 3549.8, 300 sec: 3707.2). Total num frames: 3448832. Throughput: 0: 872.2. Samples: 862900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:58:42,832][09251] Avg episode reward: [(0, '20.058')] [2023-02-25 13:58:47,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3473408. Throughput: 0: 888.2. Samples: 865990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:58:47,823][09251] Avg episode reward: [(0, '20.603')] [2023-02-25 13:58:49,153][15051] Updated weights for policy 0, policy_version 850 (0.0014) [2023-02-25 13:58:52,822][09251] Fps is (10 sec: 4915.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3497984. Throughput: 0: 888.5. Samples: 873080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:58:52,824][09251] Avg episode reward: [(0, '21.643')] [2023-02-25 13:58:57,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3735.0). Total num frames: 3510272. Throughput: 0: 876.0. Samples: 878504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-25 13:58:57,824][09251] Avg episode reward: [(0, '22.706')] [2023-02-25 13:59:00,630][15051] Updated weights for policy 0, policy_version 860 (0.0014) [2023-02-25 13:59:02,821][09251] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3526656. Throughput: 0: 876.5. Samples: 880782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:59:02,828][09251] Avg episode reward: [(0, '23.766')] [2023-02-25 13:59:07,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3551232. Throughput: 0: 891.1. Samples: 886430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:59:07,824][09251] Avg episode reward: [(0, '23.039')] [2023-02-25 13:59:10,299][15051] Updated weights for policy 0, policy_version 870 (0.0012) [2023-02-25 13:59:12,821][09251] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3571712. Throughput: 0: 889.5. Samples: 893482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:59:12,826][09251] Avg episode reward: [(0, '22.176')] [2023-02-25 13:59:17,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 3588096. Throughput: 0: 896.8. Samples: 896160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-25 13:59:17,826][09251] Avg episode reward: [(0, '20.820')] [2023-02-25 13:59:22,454][15051] Updated weights for policy 0, policy_version 880 (0.0025) [2023-02-25 13:59:22,822][09251] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3707.2). Total num frames: 3604480. Throughput: 0: 918.4. Samples: 900602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:59:22,826][09251] Avg episode reward: [(0, '19.553')] [2023-02-25 13:59:27,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3721.1). Total num frames: 3629056. Throughput: 0: 977.4. Samples: 906884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 13:59:27,824][09251] Avg episode reward: [(0, '19.208')] [2023-02-25 13:59:31,221][15051] Updated weights for policy 0, policy_version 890 (0.0026) [2023-02-25 13:59:32,821][09251] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3649536. Throughput: 0: 986.4. Samples: 910376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:59:32,829][09251] Avg episode reward: [(0, '19.788')] [2023-02-25 13:59:37,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3665920. Throughput: 0: 957.2. Samples: 916152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:59:37,825][09251] Avg episode reward: [(0, '19.655')] [2023-02-25 13:59:42,822][09251] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 3682304. Throughput: 0: 938.2. Samples: 920722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 13:59:42,824][09251] Avg episode reward: [(0, '20.214')] [2023-02-25 13:59:43,531][15051] Updated weights for policy 0, policy_version 900 (0.0019) [2023-02-25 13:59:47,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3702784. Throughput: 0: 956.2. Samples: 923812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 13:59:47,825][09251] Avg episode reward: [(0, '21.611')] [2023-02-25 13:59:52,195][15051] Updated weights for policy 0, policy_version 910 (0.0017) [2023-02-25 13:59:52,821][09251] Fps is (10 sec: 4505.7, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 3727360. Throughput: 0: 988.4. Samples: 930910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-25 13:59:52,829][09251] Avg episode reward: [(0, '22.283')] [2023-02-25 13:59:57,821][09251] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3743744. Throughput: 0: 951.1. Samples: 936280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-25 13:59:57,829][09251] Avg episode reward: [(0, '22.198')] [2023-02-25 14:00:02,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3735.0). Total num frames: 3760128. Throughput: 0: 940.7. Samples: 938490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 14:00:02,831][09251] Avg episode reward: [(0, '22.016')] [2023-02-25 14:00:04,490][15051] Updated weights for policy 0, policy_version 920 (0.0017) [2023-02-25 14:00:07,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3780608. Throughput: 0: 974.3. Samples: 944446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 14:00:07,824][09251] Avg episode reward: [(0, '22.612')] [2023-02-25 14:00:12,822][09251] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3805184. Throughput: 0: 992.6. Samples: 951552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 14:00:12,823][09251] Avg episode reward: [(0, '21.155')] [2023-02-25 14:00:13,523][15051] Updated weights for policy 0, policy_version 930 (0.0012) [2023-02-25 14:00:17,824][09251] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 3821568. Throughput: 0: 969.3. Samples: 953998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 14:00:17,831][09251] Avg episode reward: [(0, '21.582')] [2023-02-25 14:00:22,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 3762.8). Total num frames: 3833856. Throughput: 0: 941.5. Samples: 958520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-25 14:00:22,827][09251] Avg episode reward: [(0, '21.115')] [2023-02-25 14:00:22,837][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000936_3833856.pth... [2023-02-25 14:00:22,972][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000715_2928640.pth [2023-02-25 14:00:25,628][15051] Updated weights for policy 0, policy_version 940 (0.0016) [2023-02-25 14:00:27,821][09251] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3858432. Throughput: 0: 978.1. Samples: 964738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 14:00:27,824][09251] Avg episode reward: [(0, '21.319')] [2023-02-25 14:00:32,821][09251] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3883008. Throughput: 0: 985.7. Samples: 968168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 14:00:32,824][09251] Avg episode reward: [(0, '20.469')] [2023-02-25 14:00:35,370][15051] Updated weights for policy 0, policy_version 950 (0.0029) [2023-02-25 14:00:37,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3895296. Throughput: 0: 952.0. Samples: 973748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 14:00:37,825][09251] Avg episode reward: [(0, '20.579')] [2023-02-25 14:00:42,821][09251] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3911680. Throughput: 0: 930.0. Samples: 978132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 14:00:42,824][09251] Avg episode reward: [(0, '21.135')] [2023-02-25 14:00:46,927][15051] Updated weights for policy 0, policy_version 960 (0.0022) [2023-02-25 14:00:47,821][09251] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3932160. Throughput: 0: 950.7. Samples: 981270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-25 14:00:47,824][09251] Avg episode reward: [(0, '20.834')] [2023-02-25 14:00:52,822][09251] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3956736. Throughput: 0: 976.5. Samples: 988388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 14:00:52,827][09251] Avg episode reward: [(0, '19.000')] [2023-02-25 14:00:57,375][15051] Updated weights for policy 0, policy_version 970 (0.0014) [2023-02-25 14:00:57,825][09251] Fps is (10 sec: 4094.3, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 3973120. Throughput: 0: 933.4. Samples: 993560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-25 14:00:57,829][09251] Avg episode reward: [(0, '19.313')] [2023-02-25 14:01:02,821][09251] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3989504. Throughput: 0: 928.8. Samples: 995790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-25 14:01:02,827][09251] Avg episode reward: [(0, '19.547')] [2023-02-25 14:01:06,195][15037] Stopping Batcher_0... [2023-02-25 14:01:06,195][15037] Loop batcher_evt_loop terminating... [2023-02-25 14:01:06,197][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-25 14:01:06,196][09251] Component Batcher_0 stopped! [2023-02-25 14:01:06,241][15051] Weights refcount: 2 0 [2023-02-25 14:01:06,248][15051] Stopping InferenceWorker_p0-w0... [2023-02-25 14:01:06,249][15051] Loop inference_proc0-0_evt_loop terminating... [2023-02-25 14:01:06,248][09251] Component InferenceWorker_p0-w0 stopped! [2023-02-25 14:01:06,261][09251] Component RolloutWorker_w3 stopped! [2023-02-25 14:01:06,261][15055] Stopping RolloutWorker_w3... [2023-02-25 14:01:06,265][15055] Loop rollout_proc3_evt_loop terminating... [2023-02-25 14:01:06,283][09251] Component RolloutWorker_w7 stopped! [2023-02-25 14:01:06,283][15058] Stopping RolloutWorker_w7... [2023-02-25 14:01:06,291][15058] Loop rollout_proc7_evt_loop terminating... [2023-02-25 14:01:06,305][09251] Component RolloutWorker_w1 stopped! [2023-02-25 14:01:06,307][15053] Stopping RolloutWorker_w1... [2023-02-25 14:01:06,311][15053] Loop rollout_proc1_evt_loop terminating... [2023-02-25 14:01:06,313][09251] Component RolloutWorker_w2 stopped! [2023-02-25 14:01:06,318][15054] Stopping RolloutWorker_w2... [2023-02-25 14:01:06,319][15054] Loop rollout_proc2_evt_loop terminating... [2023-02-25 14:01:06,321][09251] Component RolloutWorker_w0 stopped! [2023-02-25 14:01:06,322][15057] Stopping RolloutWorker_w5... [2023-02-25 14:01:06,324][09251] Component RolloutWorker_w5 stopped! [2023-02-25 14:01:06,328][15052] Stopping RolloutWorker_w0... [2023-02-25 14:01:06,325][15057] Loop rollout_proc5_evt_loop terminating... [2023-02-25 14:01:06,330][09251] Component RolloutWorker_w4 stopped! [2023-02-25 14:01:06,333][15056] Stopping RolloutWorker_w4... [2023-02-25 14:01:06,329][15052] Loop rollout_proc0_evt_loop terminating... [2023-02-25 14:01:06,334][15056] Loop rollout_proc4_evt_loop terminating... [2023-02-25 14:01:06,370][09251] Component RolloutWorker_w6 stopped! [2023-02-25 14:01:06,374][15059] Stopping RolloutWorker_w6... [2023-02-25 14:01:06,375][15059] Loop rollout_proc6_evt_loop terminating... [2023-02-25 14:01:06,379][15037] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000828_3391488.pth [2023-02-25 14:01:06,395][15037] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-25 14:01:06,576][09251] Component LearnerWorker_p0 stopped! [2023-02-25 14:01:06,579][09251] Waiting for process learner_proc0 to stop... [2023-02-25 14:01:06,576][15037] Stopping LearnerWorker_p0... [2023-02-25 14:01:06,589][15037] Loop learner_proc0_evt_loop terminating... [2023-02-25 14:01:08,455][09251] Waiting for process inference_proc0-0 to join... [2023-02-25 14:01:08,809][09251] Waiting for process rollout_proc0 to join... [2023-02-25 14:01:08,814][09251] Waiting for process rollout_proc1 to join... [2023-02-25 14:01:09,171][09251] Waiting for process rollout_proc2 to join... [2023-02-25 14:01:09,174][09251] Waiting for process rollout_proc3 to join... [2023-02-25 14:01:09,177][09251] Waiting for process rollout_proc4 to join... [2023-02-25 14:01:09,181][09251] Waiting for process rollout_proc5 to join... [2023-02-25 14:01:09,184][09251] Waiting for process rollout_proc6 to join... [2023-02-25 14:01:09,186][09251] Waiting for process rollout_proc7 to join... [2023-02-25 14:01:09,189][09251] Batcher 0 profile tree view: batching: 27.3169, releasing_batches: 0.0258 [2023-02-25 14:01:09,191][09251] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 539.4863 update_model: 7.9098 weight_update: 0.0020 one_step: 0.0023 handle_policy_step: 505.9325 deserialize: 15.1174, stack: 3.0819, obs_to_device_normalize: 112.2852, forward: 240.7939, send_messages: 26.8329 prepare_outputs: 82.1689 to_cpu: 51.1689 [2023-02-25 14:01:09,193][09251] Learner 0 profile tree view: misc: 0.0055, prepare_batch: 16.9510 train: 75.4955 epoch_init: 0.0060, minibatch_init: 0.0061, losses_postprocess: 0.5422, kl_divergence: 0.6237, after_optimizer: 33.2131 calculate_losses: 26.6416 losses_init: 0.0035, forward_head: 1.6337, bptt_initial: 17.6563, tail: 1.0204, advantages_returns: 0.2707, losses: 3.4630 bptt: 2.2690 bptt_forward_core: 2.2113 update: 13.8212 clip: 1.3897 [2023-02-25 14:01:09,195][09251] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3841, enqueue_policy_requests: 148.0746, env_step: 820.2675, overhead: 21.0997, complete_rollouts: 7.5083 save_policy_outputs: 20.3495 split_output_tensors: 9.7836 [2023-02-25 14:01:09,197][09251] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3430, enqueue_policy_requests: 147.7431, env_step: 823.5338, overhead: 20.1045, complete_rollouts: 6.9078 save_policy_outputs: 19.5773 split_output_tensors: 9.3412 [2023-02-25 14:01:09,200][09251] Loop Runner_EvtLoop terminating... [2023-02-25 14:01:09,202][09251] Runner profile tree view: main_loop: 1121.9126 [2023-02-25 14:01:09,204][09251] Collected {0: 4005888}, FPS: 3570.6 [2023-02-25 14:11:30,064][09251] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-25 14:11:30,065][09251] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-25 14:11:30,069][09251] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-25 14:11:30,071][09251] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-25 14:11:30,074][09251] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-25 14:11:30,077][09251] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-25 14:11:30,080][09251] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-25 14:11:30,085][09251] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-25 14:11:30,086][09251] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-25 14:11:30,087][09251] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-25 14:11:30,088][09251] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-25 14:11:30,090][09251] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-25 14:11:30,091][09251] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-25 14:11:30,093][09251] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-25 14:11:30,094][09251] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-25 14:11:30,124][09251] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-25 14:11:30,127][09251] RunningMeanStd input shape: (3, 72, 128) [2023-02-25 14:11:30,132][09251] RunningMeanStd input shape: (1,) [2023-02-25 14:11:30,153][09251] ConvEncoder: input_channels=3 [2023-02-25 14:11:30,827][09251] Conv encoder output size: 512 [2023-02-25 14:11:30,829][09251] Policy head output size: 512 [2023-02-25 14:11:33,293][09251] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-25 14:11:34,729][09251] Num frames 100... [2023-02-25 14:11:34,888][09251] Num frames 200... [2023-02-25 14:11:35,049][09251] Num frames 300... [2023-02-25 14:11:35,213][09251] Num frames 400... [2023-02-25 14:11:35,373][09251] Num frames 500... [2023-02-25 14:11:35,531][09251] Num frames 600... [2023-02-25 14:11:35,694][09251] Num frames 700... [2023-02-25 14:11:35,846][09251] Num frames 800... [2023-02-25 14:11:36,009][09251] Num frames 900... [2023-02-25 14:11:36,181][09251] Num frames 1000... [2023-02-25 14:11:36,343][09251] Num frames 1100... [2023-02-25 14:11:36,509][09251] Num frames 1200... [2023-02-25 14:11:36,671][09251] Num frames 1300... [2023-02-25 14:11:36,834][09251] Num frames 1400... [2023-02-25 14:11:36,991][09251] Num frames 1500... [2023-02-25 14:11:37,158][09251] Num frames 1600... [2023-02-25 14:11:37,214][09251] Avg episode rewards: #0: 38.000, true rewards: #0: 16.000 [2023-02-25 14:11:37,216][09251] Avg episode reward: 38.000, avg true_objective: 16.000 [2023-02-25 14:11:37,397][09251] Num frames 1700... [2023-02-25 14:11:37,556][09251] Num frames 1800... [2023-02-25 14:11:37,700][09251] Num frames 1900... [2023-02-25 14:11:37,812][09251] Num frames 2000... [2023-02-25 14:11:37,932][09251] Num frames 2100... [2023-02-25 14:11:38,053][09251] Num frames 2200... [2023-02-25 14:11:38,180][09251] Num frames 2300... [2023-02-25 14:11:38,304][09251] Num frames 2400... [2023-02-25 14:11:38,424][09251] Num frames 2500... [2023-02-25 14:11:38,540][09251] Num frames 2600... [2023-02-25 14:11:38,663][09251] Num frames 2700... [2023-02-25 14:11:38,780][09251] Num frames 2800... [2023-02-25 14:11:38,854][09251] Avg episode rewards: #0: 33.080, true rewards: #0: 14.080 [2023-02-25 14:11:38,856][09251] Avg episode reward: 33.080, avg true_objective: 14.080 [2023-02-25 14:11:38,958][09251] Num frames 2900... [2023-02-25 14:11:39,074][09251] Num frames 3000... [2023-02-25 14:11:39,189][09251] Num frames 3100... [2023-02-25 14:11:39,303][09251] Num frames 3200... [2023-02-25 14:11:39,426][09251] Num frames 3300... [2023-02-25 14:11:39,541][09251] Num frames 3400... [2023-02-25 14:11:39,655][09251] Num frames 3500... [2023-02-25 14:11:39,768][09251] Num frames 3600... [2023-02-25 14:11:39,881][09251] Num frames 3700... [2023-02-25 14:11:39,992][09251] Num frames 3800... [2023-02-25 14:11:40,112][09251] Num frames 3900... [2023-02-25 14:11:40,224][09251] Num frames 4000... [2023-02-25 14:11:40,374][09251] Avg episode rewards: #0: 31.603, true rewards: #0: 13.603 [2023-02-25 14:11:40,376][09251] Avg episode reward: 31.603, avg true_objective: 13.603 [2023-02-25 14:11:40,399][09251] Num frames 4100... [2023-02-25 14:11:40,522][09251] Num frames 4200... [2023-02-25 14:11:40,637][09251] Num frames 4300... [2023-02-25 14:11:40,748][09251] Num frames 4400... [2023-02-25 14:11:40,867][09251] Num frames 4500... [2023-02-25 14:11:40,979][09251] Num frames 4600... [2023-02-25 14:11:41,094][09251] Num frames 4700... [2023-02-25 14:11:41,246][09251] Avg episode rewards: #0: 26.713, true rewards: #0: 11.962 [2023-02-25 14:11:41,248][09251] Avg episode reward: 26.713, avg true_objective: 11.962 [2023-02-25 14:11:41,268][09251] Num frames 4800... [2023-02-25 14:11:41,379][09251] Num frames 4900... [2023-02-25 14:11:41,497][09251] Num frames 5000... [2023-02-25 14:11:41,610][09251] Num frames 5100... [2023-02-25 14:11:41,724][09251] Num frames 5200... [2023-02-25 14:11:41,842][09251] Num frames 5300... [2023-02-25 14:11:41,921][09251] Avg episode rewards: #0: 23.040, true rewards: #0: 10.640 [2023-02-25 14:11:41,922][09251] Avg episode reward: 23.040, avg true_objective: 10.640 [2023-02-25 14:11:42,015][09251] Num frames 5400... [2023-02-25 14:11:42,132][09251] Num frames 5500... [2023-02-25 14:11:42,245][09251] Num frames 5600... [2023-02-25 14:11:42,356][09251] Num frames 5700... [2023-02-25 14:11:42,480][09251] Num frames 5800... [2023-02-25 14:11:42,640][09251] Avg episode rewards: #0: 20.827, true rewards: #0: 9.827 [2023-02-25 14:11:42,642][09251] Avg episode reward: 20.827, avg true_objective: 9.827 [2023-02-25 14:11:42,652][09251] Num frames 5900... [2023-02-25 14:11:42,772][09251] Num frames 6000... [2023-02-25 14:11:42,884][09251] Num frames 6100... [2023-02-25 14:11:43,010][09251] Num frames 6200... [2023-02-25 14:11:43,127][09251] Num frames 6300... [2023-02-25 14:11:43,197][09251] Avg episode rewards: #0: 18.731, true rewards: #0: 9.017 [2023-02-25 14:11:43,200][09251] Avg episode reward: 18.731, avg true_objective: 9.017 [2023-02-25 14:11:43,303][09251] Num frames 6400... [2023-02-25 14:11:43,415][09251] Num frames 6500... [2023-02-25 14:11:43,530][09251] Num frames 6600... [2023-02-25 14:11:43,639][09251] Num frames 6700... [2023-02-25 14:11:43,748][09251] Num frames 6800... [2023-02-25 14:11:43,865][09251] Num frames 6900... [2023-02-25 14:11:43,971][09251] Num frames 7000... [2023-02-25 14:11:44,084][09251] Num frames 7100... [2023-02-25 14:11:44,210][09251] Num frames 7200... [2023-02-25 14:11:44,324][09251] Num frames 7300... [2023-02-25 14:11:44,455][09251] Avg episode rewards: #0: 18.960, true rewards: #0: 9.210 [2023-02-25 14:11:44,456][09251] Avg episode reward: 18.960, avg true_objective: 9.210 [2023-02-25 14:11:44,507][09251] Num frames 7400... [2023-02-25 14:11:44,617][09251] Num frames 7500... [2023-02-25 14:11:44,729][09251] Num frames 7600... [2023-02-25 14:11:44,842][09251] Num frames 7700... [2023-02-25 14:11:44,961][09251] Num frames 7800... [2023-02-25 14:11:45,077][09251] Num frames 7900... [2023-02-25 14:11:45,195][09251] Num frames 8000... [2023-02-25 14:11:45,311][09251] Num frames 8100... [2023-02-25 14:11:45,425][09251] Num frames 8200... [2023-02-25 14:11:45,550][09251] Num frames 8300... [2023-02-25 14:11:45,663][09251] Num frames 8400... [2023-02-25 14:11:45,776][09251] Num frames 8500... [2023-02-25 14:11:45,897][09251] Num frames 8600... [2023-02-25 14:11:46,004][09251] Avg episode rewards: #0: 20.273, true rewards: #0: 9.607 [2023-02-25 14:11:46,006][09251] Avg episode reward: 20.273, avg true_objective: 9.607 [2023-02-25 14:11:46,073][09251] Num frames 8700... [2023-02-25 14:11:46,187][09251] Num frames 8800... [2023-02-25 14:11:46,303][09251] Num frames 8900... [2023-02-25 14:11:46,417][09251] Num frames 9000... [2023-02-25 14:11:46,537][09251] Num frames 9100... [2023-02-25 14:11:46,659][09251] Num frames 9200... [2023-02-25 14:11:46,775][09251] Num frames 9300... [2023-02-25 14:11:46,886][09251] Num frames 9400... [2023-02-25 14:11:46,999][09251] Num frames 9500... [2023-02-25 14:11:47,067][09251] Avg episode rewards: #0: 20.210, true rewards: #0: 9.510 [2023-02-25 14:11:47,069][09251] Avg episode reward: 20.210, avg true_objective: 9.510 [2023-02-25 14:12:47,559][09251] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-25 14:14:10,421][09251] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-25 14:14:10,423][09251] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-25 14:14:10,425][09251] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-25 14:14:10,429][09251] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-25 14:14:10,431][09251] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-25 14:14:10,434][09251] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-25 14:14:10,435][09251] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-25 14:14:10,437][09251] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-25 14:14:10,439][09251] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-25 14:14:10,440][09251] Adding new argument 'hf_repository'='yizhangliu/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-25 14:14:10,441][09251] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-25 14:14:10,443][09251] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-25 14:14:10,445][09251] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-25 14:14:10,447][09251] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-25 14:14:10,448][09251] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-25 14:14:10,476][09251] RunningMeanStd input shape: (3, 72, 128) [2023-02-25 14:14:10,478][09251] RunningMeanStd input shape: (1,) [2023-02-25 14:14:10,491][09251] ConvEncoder: input_channels=3 [2023-02-25 14:14:10,528][09251] Conv encoder output size: 512 [2023-02-25 14:14:10,529][09251] Policy head output size: 512 [2023-02-25 14:14:10,549][09251] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-25 14:14:10,985][09251] Num frames 100... [2023-02-25 14:14:11,096][09251] Num frames 200... [2023-02-25 14:14:11,212][09251] Num frames 300... [2023-02-25 14:14:11,332][09251] Num frames 400... [2023-02-25 14:14:11,446][09251] Num frames 500... [2023-02-25 14:14:11,560][09251] Num frames 600... [2023-02-25 14:14:11,669][09251] Num frames 700... [2023-02-25 14:14:11,785][09251] Num frames 800... [2023-02-25 14:14:11,956][09251] Avg episode rewards: #0: 17.960, true rewards: #0: 8.960 [2023-02-25 14:14:11,960][09251] Avg episode reward: 17.960, avg true_objective: 8.960 [2023-02-25 14:14:11,968][09251] Num frames 900... [2023-02-25 14:14:12,078][09251] Num frames 1000... [2023-02-25 14:14:12,191][09251] Num frames 1100... [2023-02-25 14:14:12,306][09251] Num frames 1200... [2023-02-25 14:14:12,450][09251] Avg episode rewards: #0: 13.900, true rewards: #0: 6.400 [2023-02-25 14:14:12,452][09251] Avg episode reward: 13.900, avg true_objective: 6.400 [2023-02-25 14:14:12,480][09251] Num frames 1300... [2023-02-25 14:14:12,594][09251] Num frames 1400... [2023-02-25 14:14:12,703][09251] Num frames 1500... [2023-02-25 14:14:12,821][09251] Num frames 1600... [2023-02-25 14:14:12,938][09251] Num frames 1700... [2023-02-25 14:14:13,061][09251] Avg episode rewards: #0: 11.867, true rewards: #0: 5.867 [2023-02-25 14:14:13,062][09251] Avg episode reward: 11.867, avg true_objective: 5.867 [2023-02-25 14:14:13,116][09251] Num frames 1800... [2023-02-25 14:14:13,236][09251] Num frames 1900... [2023-02-25 14:14:13,351][09251] Num frames 2000... [2023-02-25 14:14:13,468][09251] Num frames 2100... [2023-02-25 14:14:13,601][09251] Num frames 2200... [2023-02-25 14:14:13,751][09251] Num frames 2300... [2023-02-25 14:14:13,914][09251] Num frames 2400... [2023-02-25 14:14:13,969][09251] Avg episode rewards: #0: 12.500, true rewards: #0: 6.000 [2023-02-25 14:14:13,972][09251] Avg episode reward: 12.500, avg true_objective: 6.000 [2023-02-25 14:14:14,126][09251] Num frames 2500... [2023-02-25 14:14:14,279][09251] Num frames 2600... [2023-02-25 14:14:14,436][09251] Num frames 2700... [2023-02-25 14:14:14,603][09251] Num frames 2800... [2023-02-25 14:14:14,764][09251] Num frames 2900... [2023-02-25 14:14:14,922][09251] Num frames 3000... [2023-02-25 14:14:14,993][09251] Avg episode rewards: #0: 12.214, true rewards: #0: 6.014 [2023-02-25 14:14:14,998][09251] Avg episode reward: 12.214, avg true_objective: 6.014 [2023-02-25 14:14:15,157][09251] Num frames 3100... [2023-02-25 14:14:15,316][09251] Num frames 3200... [2023-02-25 14:14:15,485][09251] Num frames 3300... [2023-02-25 14:14:15,640][09251] Num frames 3400... [2023-02-25 14:14:15,799][09251] Num frames 3500... [2023-02-25 14:14:15,961][09251] Num frames 3600... [2023-02-25 14:14:16,130][09251] Num frames 3700... [2023-02-25 14:14:16,296][09251] Num frames 3800... [2023-02-25 14:14:16,461][09251] Num frames 3900... [2023-02-25 14:14:16,629][09251] Num frames 4000... [2023-02-25 14:14:16,736][09251] Avg episode rewards: #0: 14.050, true rewards: #0: 6.717 [2023-02-25 14:14:16,737][09251] Avg episode reward: 14.050, avg true_objective: 6.717 [2023-02-25 14:14:16,859][09251] Num frames 4100... [2023-02-25 14:14:17,018][09251] Num frames 4200... [2023-02-25 14:14:17,184][09251] Num frames 4300... [2023-02-25 14:14:17,306][09251] Num frames 4400... [2023-02-25 14:14:17,421][09251] Num frames 4500... [2023-02-25 14:14:17,539][09251] Num frames 4600... [2023-02-25 14:14:17,653][09251] Num frames 4700... [2023-02-25 14:14:17,764][09251] Num frames 4800... [2023-02-25 14:14:17,895][09251] Avg episode rewards: #0: 14.803, true rewards: #0: 6.946 [2023-02-25 14:14:17,897][09251] Avg episode reward: 14.803, avg true_objective: 6.946 [2023-02-25 14:14:17,943][09251] Num frames 4900... [2023-02-25 14:14:18,063][09251] Num frames 5000... [2023-02-25 14:14:18,178][09251] Num frames 5100... [2023-02-25 14:14:18,296][09251] Num frames 5200... [2023-02-25 14:14:18,408][09251] Num frames 5300... [2023-02-25 14:14:18,527][09251] Num frames 5400... [2023-02-25 14:14:18,591][09251] Avg episode rewards: #0: 14.008, true rewards: #0: 6.757 [2023-02-25 14:14:18,593][09251] Avg episode reward: 14.008, avg true_objective: 6.757 [2023-02-25 14:14:18,701][09251] Num frames 5500... [2023-02-25 14:14:18,829][09251] Num frames 5600... [2023-02-25 14:14:18,949][09251] Num frames 5700... [2023-02-25 14:14:19,065][09251] Num frames 5800... [2023-02-25 14:14:19,180][09251] Num frames 5900... [2023-02-25 14:14:19,296][09251] Num frames 6000... [2023-02-25 14:14:19,413][09251] Num frames 6100... [2023-02-25 14:14:19,526][09251] Num frames 6200... [2023-02-25 14:14:19,644][09251] Num frames 6300... [2023-02-25 14:14:19,762][09251] Num frames 6400... [2023-02-25 14:14:19,922][09251] Avg episode rewards: #0: 15.216, true rewards: #0: 7.216 [2023-02-25 14:14:19,924][09251] Avg episode reward: 15.216, avg true_objective: 7.216 [2023-02-25 14:14:19,943][09251] Num frames 6500... [2023-02-25 14:14:20,056][09251] Num frames 6600... [2023-02-25 14:14:20,173][09251] Num frames 6700... [2023-02-25 14:14:20,286][09251] Num frames 6800... [2023-02-25 14:14:20,417][09251] Num frames 6900... [2023-02-25 14:14:20,530][09251] Num frames 7000... [2023-02-25 14:14:20,647][09251] Num frames 7100... [2023-02-25 14:14:20,756][09251] Num frames 7200... [2023-02-25 14:14:20,914][09251] Avg episode rewards: #0: 15.386, true rewards: #0: 7.286 [2023-02-25 14:14:20,916][09251] Avg episode reward: 15.386, avg true_objective: 7.286 [2023-02-25 14:15:04,551][09251] Replay video saved to /content/train_dir/default_experiment/replay.mp4!