[2023-11-22 03:55:25,159][05156] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-11-22 03:55:25,165][05156] Rollout worker 0 uses device cpu [2023-11-22 03:55:25,168][05156] Rollout worker 1 uses device cpu [2023-11-22 03:55:25,169][05156] Rollout worker 2 uses device cpu [2023-11-22 03:55:25,171][05156] Rollout worker 3 uses device cpu [2023-11-22 03:55:25,172][05156] Rollout worker 4 uses device cpu [2023-11-22 03:55:25,173][05156] Rollout worker 5 uses device cpu [2023-11-22 03:55:25,175][05156] Rollout worker 6 uses device cpu [2023-11-22 03:55:25,176][05156] Rollout worker 7 uses device cpu [2023-11-22 03:55:25,244][05156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-11-22 03:55:25,246][05156] InferenceWorker_p0-w0: min num requests: 2 [2023-11-22 03:55:25,276][05156] Starting all processes... [2023-11-22 03:55:25,277][05156] Starting process learner_proc0 [2023-11-22 03:55:25,334][05156] Starting all processes... [2023-11-22 03:55:25,342][05156] Starting process inference_proc0-0 [2023-11-22 03:55:25,342][05156] Starting process rollout_proc0 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc1 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc2 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc3 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc4 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc5 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc6 [2023-11-22 03:55:25,344][05156] Starting process rollout_proc7 [2023-11-22 03:55:43,001][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-11-22 03:55:43,002][06878] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-11-22 03:55:43,071][06878] Num visible devices: 1 [2023-11-22 03:55:43,083][06896] Worker 4 uses CPU cores [0] [2023-11-22 03:55:43,085][06898] Worker 6 uses CPU cores [0] [2023-11-22 03:55:43,096][06897] Worker 5 uses CPU cores [1] [2023-11-22 03:55:43,101][06878] Starting seed is not provided [2023-11-22 03:55:43,101][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-11-22 03:55:43,101][06878] Initializing actor-critic model on device cuda:0 [2023-11-22 03:55:43,102][06878] RunningMeanStd input shape: (3, 72, 128) [2023-11-22 03:55:43,103][06892] Worker 0 uses CPU cores [0] [2023-11-22 03:55:43,103][06899] Worker 7 uses CPU cores [1] [2023-11-22 03:55:43,108][06878] RunningMeanStd input shape: (1,) [2023-11-22 03:55:43,112][06894] Worker 2 uses CPU cores [0] [2023-11-22 03:55:43,134][06891] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-11-22 03:55:43,135][06891] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-11-22 03:55:43,150][06891] Num visible devices: 1 [2023-11-22 03:55:43,162][06878] ConvEncoder: input_channels=3 [2023-11-22 03:55:43,189][06895] Worker 3 uses CPU cores [1] [2023-11-22 03:55:43,240][06893] Worker 1 uses CPU cores [1] [2023-11-22 03:55:43,299][06878] Conv encoder output size: 512 [2023-11-22 03:55:43,300][06878] Policy head output size: 512 [2023-11-22 03:55:43,314][06878] Created Actor Critic model with architecture: [2023-11-22 03:55:43,314][06878] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-11-22 03:55:43,434][06878] Using optimizer [2023-11-22 03:55:44,393][06878] No checkpoints found [2023-11-22 03:55:44,394][06878] Did not load from checkpoint, starting from scratch! [2023-11-22 03:55:44,394][06878] Initialized policy 0 weights for model version 0 [2023-11-22 03:55:44,397][06878] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-11-22 03:55:44,402][06878] LearnerWorker_p0 finished initialization! [2023-11-22 03:55:44,487][06891] RunningMeanStd input shape: (3, 72, 128) [2023-11-22 03:55:44,488][06891] RunningMeanStd input shape: (1,) [2023-11-22 03:55:44,500][06891] ConvEncoder: input_channels=3 [2023-11-22 03:55:44,610][06891] Conv encoder output size: 512 [2023-11-22 03:55:44,611][06891] Policy head output size: 512 [2023-11-22 03:55:44,702][05156] Inference worker 0-0 is ready! [2023-11-22 03:55:44,704][05156] All inference workers are ready! Signal rollout workers to start! [2023-11-22 03:55:44,895][06894] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:44,897][06892] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:44,899][06896] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:44,903][06898] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:45,062][06893] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:45,056][06895] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:45,061][06897] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:45,067][06899] Doom resolution: 160x120, resize resolution: (128, 72) [2023-11-22 03:55:45,238][05156] Heartbeat connected on Batcher_0 [2023-11-22 03:55:45,241][05156] Heartbeat connected on LearnerWorker_p0 [2023-11-22 03:55:45,282][05156] Heartbeat connected on InferenceWorker_p0-w0 [2023-11-22 03:55:46,195][05156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-11-22 03:55:46,612][06894] Decorrelating experience for 0 frames... [2023-11-22 03:55:46,630][06892] Decorrelating experience for 0 frames... [2023-11-22 03:55:46,621][06898] Decorrelating experience for 0 frames... [2023-11-22 03:55:46,656][06897] Decorrelating experience for 0 frames... [2023-11-22 03:55:46,659][06895] Decorrelating experience for 0 frames... [2023-11-22 03:55:47,764][06899] Decorrelating experience for 0 frames... [2023-11-22 03:55:47,783][06895] Decorrelating experience for 32 frames... [2023-11-22 03:55:48,711][06896] Decorrelating experience for 0 frames... [2023-11-22 03:55:48,719][06892] Decorrelating experience for 32 frames... [2023-11-22 03:55:48,714][06898] Decorrelating experience for 32 frames... [2023-11-22 03:55:48,822][06894] Decorrelating experience for 32 frames... [2023-11-22 03:55:50,144][06893] Decorrelating experience for 0 frames... [2023-11-22 03:55:50,163][06899] Decorrelating experience for 32 frames... [2023-11-22 03:55:50,623][06895] Decorrelating experience for 64 frames... [2023-11-22 03:55:50,959][06897] Decorrelating experience for 32 frames... [2023-11-22 03:55:51,054][06898] Decorrelating experience for 64 frames... [2023-11-22 03:55:51,109][06894] Decorrelating experience for 64 frames... [2023-11-22 03:55:51,195][05156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-11-22 03:55:51,902][06896] Decorrelating experience for 32 frames... [2023-11-22 03:55:52,309][06892] Decorrelating experience for 64 frames... [2023-11-22 03:55:52,585][06895] Decorrelating experience for 96 frames... [2023-11-22 03:55:52,715][06899] Decorrelating experience for 64 frames... [2023-11-22 03:55:52,850][05156] Heartbeat connected on RolloutWorker_w3 [2023-11-22 03:55:53,091][06898] Decorrelating experience for 96 frames... [2023-11-22 03:55:53,533][05156] Heartbeat connected on RolloutWorker_w6 [2023-11-22 03:55:54,926][06896] Decorrelating experience for 64 frames... [2023-11-22 03:55:55,101][06892] Decorrelating experience for 96 frames... [2023-11-22 03:55:55,428][06894] Decorrelating experience for 96 frames... [2023-11-22 03:55:55,604][06893] Decorrelating experience for 32 frames... [2023-11-22 03:55:55,675][05156] Heartbeat connected on RolloutWorker_w0 [2023-11-22 03:55:56,032][05156] Heartbeat connected on RolloutWorker_w2 [2023-11-22 03:55:56,196][05156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-11-22 03:55:56,197][05156] Avg episode reward: [(0, '1.927')] [2023-11-22 03:55:56,242][06897] Decorrelating experience for 64 frames... [2023-11-22 03:55:56,368][06899] Decorrelating experience for 96 frames... [2023-11-22 03:55:56,666][05156] Heartbeat connected on RolloutWorker_w7 [2023-11-22 03:55:58,593][06893] Decorrelating experience for 64 frames... [2023-11-22 03:55:58,926][06896] Decorrelating experience for 96 frames... [2023-11-22 03:55:58,937][06897] Decorrelating experience for 96 frames... [2023-11-22 03:55:59,566][05156] Heartbeat connected on RolloutWorker_w4 [2023-11-22 03:55:59,568][05156] Heartbeat connected on RolloutWorker_w5 [2023-11-22 03:55:59,896][06878] Signal inference workers to stop experience collection... [2023-11-22 03:55:59,909][06891] InferenceWorker_p0-w0: stopping experience collection [2023-11-22 03:56:00,160][06893] Decorrelating experience for 96 frames... [2023-11-22 03:56:00,242][05156] Heartbeat connected on RolloutWorker_w1 [2023-11-22 03:56:00,793][06878] Signal inference workers to resume experience collection... [2023-11-22 03:56:00,794][06891] InferenceWorker_p0-w0: resuming experience collection [2023-11-22 03:56:01,195][05156] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 158.4. Samples: 2376. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-11-22 03:56:01,198][05156] Avg episode reward: [(0, '2.925')] [2023-11-22 03:56:06,197][05156] Fps is (10 sec: 2457.3, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 24576. Throughput: 0: 340.4. Samples: 6808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-11-22 03:56:06,200][05156] Avg episode reward: [(0, '3.772')] [2023-11-22 03:56:11,195][05156] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 352.2. Samples: 8806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:56:11,202][05156] Avg episode reward: [(0, '3.840')] [2023-11-22 03:56:11,716][06891] Updated weights for policy 0, policy_version 10 (0.0021) [2023-11-22 03:56:16,195][05156] Fps is (10 sec: 2458.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 49152. Throughput: 0: 422.9. Samples: 12688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:56:16,203][05156] Avg episode reward: [(0, '4.327')] [2023-11-22 03:56:21,196][05156] Fps is (10 sec: 2457.4, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 470.2. Samples: 16458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:56:21,198][05156] Avg episode reward: [(0, '4.323')] [2023-11-22 03:56:26,195][05156] Fps is (10 sec: 2867.2, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 464.7. Samples: 18588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:56:26,198][05156] Avg episode reward: [(0, '4.465')] [2023-11-22 03:56:26,357][06891] Updated weights for policy 0, policy_version 20 (0.0027) [2023-11-22 03:56:31,195][05156] Fps is (10 sec: 3686.7, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 98304. Throughput: 0: 541.6. Samples: 24372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 03:56:31,198][05156] Avg episode reward: [(0, '4.436')] [2023-11-22 03:56:36,197][05156] Fps is (10 sec: 3685.7, 60 sec: 2293.7, 300 sec: 2293.7). Total num frames: 114688. Throughput: 0: 649.4. Samples: 29224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:56:36,204][05156] Avg episode reward: [(0, '4.324')] [2023-11-22 03:56:36,210][06878] Saving new best policy, reward=4.324! [2023-11-22 03:56:38,850][06891] Updated weights for policy 0, policy_version 30 (0.0019) [2023-11-22 03:56:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2308.7, 300 sec: 2308.7). Total num frames: 126976. Throughput: 0: 691.3. Samples: 31128. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:56:41,198][05156] Avg episode reward: [(0, '4.422')] [2023-11-22 03:56:41,216][06878] Saving new best policy, reward=4.422! [2023-11-22 03:56:46,196][05156] Fps is (10 sec: 2457.9, 60 sec: 2321.0, 300 sec: 2321.0). Total num frames: 139264. Throughput: 0: 722.3. Samples: 34882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:56:46,199][05156] Avg episode reward: [(0, '4.500')] [2023-11-22 03:56:46,204][06878] Saving new best policy, reward=4.500! [2023-11-22 03:56:51,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 2394.6). Total num frames: 155648. Throughput: 0: 725.6. Samples: 39458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:56:51,198][05156] Avg episode reward: [(0, '4.462')] [2023-11-22 03:56:52,718][06891] Updated weights for policy 0, policy_version 40 (0.0033) [2023-11-22 03:56:56,195][05156] Fps is (10 sec: 3686.6, 60 sec: 2935.5, 300 sec: 2516.1). Total num frames: 176128. Throughput: 0: 748.0. Samples: 42464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:56:56,202][05156] Avg episode reward: [(0, '4.405')] [2023-11-22 03:57:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2566.8). Total num frames: 192512. Throughput: 0: 781.9. Samples: 47872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:57:01,205][05156] Avg episode reward: [(0, '4.348')] [2023-11-22 03:57:05,879][06891] Updated weights for policy 0, policy_version 50 (0.0016) [2023-11-22 03:57:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2560.0). Total num frames: 204800. Throughput: 0: 780.5. Samples: 51582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:57:06,198][05156] Avg episode reward: [(0, '4.461')] [2023-11-22 03:57:11,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2554.0). Total num frames: 217088. Throughput: 0: 774.5. Samples: 53440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 03:57:11,198][05156] Avg episode reward: [(0, '4.392')] [2023-11-22 03:57:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2548.6). Total num frames: 229376. Throughput: 0: 730.8. Samples: 57256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:57:16,204][05156] Avg episode reward: [(0, '4.354')] [2023-11-22 03:57:19,550][06891] Updated weights for policy 0, policy_version 60 (0.0020) [2023-11-22 03:57:21,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2630.1). Total num frames: 249856. Throughput: 0: 753.2. Samples: 63118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:57:21,202][05156] Avg episode reward: [(0, '4.367')] [2023-11-22 03:57:21,213][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth... [2023-11-22 03:57:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2662.4). Total num frames: 266240. Throughput: 0: 775.2. Samples: 66010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:57:26,198][05156] Avg episode reward: [(0, '4.447')] [2023-11-22 03:57:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2652.6). Total num frames: 278528. Throughput: 0: 778.6. Samples: 69920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:57:31,199][05156] Avg episode reward: [(0, '4.333')] [2023-11-22 03:57:33,633][06891] Updated weights for policy 0, policy_version 70 (0.0013) [2023-11-22 03:57:36,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2935.5, 300 sec: 2643.7). Total num frames: 290816. Throughput: 0: 759.4. Samples: 73634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 03:57:36,200][05156] Avg episode reward: [(0, '4.436')] [2023-11-22 03:57:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2671.3). Total num frames: 307200. Throughput: 0: 734.9. Samples: 75534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 03:57:41,200][05156] Avg episode reward: [(0, '4.325')] [2023-11-22 03:57:46,195][05156] Fps is (10 sec: 3277.3, 60 sec: 3072.0, 300 sec: 2696.5). Total num frames: 323584. Throughput: 0: 735.2. Samples: 80958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:57:46,203][05156] Avg episode reward: [(0, '4.200')] [2023-11-22 03:57:46,211][06891] Updated weights for policy 0, policy_version 80 (0.0018) [2023-11-22 03:57:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2752.5). Total num frames: 344064. Throughput: 0: 780.4. Samples: 86702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:57:51,205][05156] Avg episode reward: [(0, '4.222')] [2023-11-22 03:57:56,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2741.2). Total num frames: 356352. Throughput: 0: 779.7. Samples: 88528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:57:56,198][05156] Avg episode reward: [(0, '4.314')] [2023-11-22 03:58:00,086][06891] Updated weights for policy 0, policy_version 90 (0.0017) [2023-11-22 03:58:01,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2935.5, 300 sec: 2730.7). Total num frames: 368640. Throughput: 0: 780.6. Samples: 92382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:58:01,204][05156] Avg episode reward: [(0, '4.416')] [2023-11-22 03:58:06,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2935.5, 300 sec: 2720.9). Total num frames: 380928. Throughput: 0: 733.6. Samples: 96132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:58:06,203][05156] Avg episode reward: [(0, '4.405')] [2023-11-22 03:58:11,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 2768.3). Total num frames: 401408. Throughput: 0: 733.9. Samples: 99036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:11,202][05156] Avg episode reward: [(0, '4.574')] [2023-11-22 03:58:11,213][06878] Saving new best policy, reward=4.574! [2023-11-22 03:58:12,693][06891] Updated weights for policy 0, policy_version 100 (0.0019) [2023-11-22 03:58:16,198][05156] Fps is (10 sec: 3685.4, 60 sec: 3140.1, 300 sec: 2785.2). Total num frames: 417792. Throughput: 0: 777.8. Samples: 104922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:16,200][05156] Avg episode reward: [(0, '4.580')] [2023-11-22 03:58:16,204][06878] Saving new best policy, reward=4.580! [2023-11-22 03:58:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2774.7). Total num frames: 430080. Throughput: 0: 766.0. Samples: 108102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:58:21,198][05156] Avg episode reward: [(0, '4.412')] [2023-11-22 03:58:26,195][05156] Fps is (10 sec: 2048.5, 60 sec: 2867.2, 300 sec: 2739.2). Total num frames: 438272. Throughput: 0: 757.4. Samples: 109618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:26,204][05156] Avg episode reward: [(0, '4.404')] [2023-11-22 03:58:31,195][05156] Fps is (10 sec: 1638.4, 60 sec: 2798.9, 300 sec: 2705.8). Total num frames: 446464. Throughput: 0: 698.1. Samples: 112374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:31,209][05156] Avg episode reward: [(0, '4.585')] [2023-11-22 03:58:31,227][06878] Saving new best policy, reward=4.585! [2023-11-22 03:58:31,258][06891] Updated weights for policy 0, policy_version 110 (0.0015) [2023-11-22 03:58:36,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2799.0, 300 sec: 2698.5). Total num frames: 458752. Throughput: 0: 633.6. Samples: 115214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 03:58:36,198][05156] Avg episode reward: [(0, '4.479')] [2023-11-22 03:58:41,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2662.4, 300 sec: 2668.2). Total num frames: 466944. Throughput: 0: 618.7. Samples: 116368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:41,202][05156] Avg episode reward: [(0, '4.517')] [2023-11-22 03:58:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2662.4, 300 sec: 2685.2). Total num frames: 483328. Throughput: 0: 629.8. Samples: 120724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:46,198][05156] Avg episode reward: [(0, '4.517')] [2023-11-22 03:58:47,350][06891] Updated weights for policy 0, policy_version 120 (0.0029) [2023-11-22 03:58:51,196][05156] Fps is (10 sec: 3686.2, 60 sec: 2662.4, 300 sec: 2723.3). Total num frames: 503808. Throughput: 0: 673.1. Samples: 126420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:58:51,202][05156] Avg episode reward: [(0, '4.571')] [2023-11-22 03:58:56,197][05156] Fps is (10 sec: 3276.3, 60 sec: 2662.3, 300 sec: 2716.3). Total num frames: 516096. Throughput: 0: 646.8. Samples: 128144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:58:56,202][05156] Avg episode reward: [(0, '4.518')] [2023-11-22 03:59:01,199][05156] Fps is (10 sec: 2456.9, 60 sec: 2662.3, 300 sec: 2709.6). Total num frames: 528384. Throughput: 0: 598.7. Samples: 131864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:59:01,201][05156] Avg episode reward: [(0, '4.593')] [2023-11-22 03:59:01,221][06878] Saving new best policy, reward=4.593! [2023-11-22 03:59:02,667][06891] Updated weights for policy 0, policy_version 130 (0.0019) [2023-11-22 03:59:06,198][05156] Fps is (10 sec: 2457.4, 60 sec: 2662.3, 300 sec: 2703.3). Total num frames: 540672. Throughput: 0: 608.9. Samples: 135502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:59:06,205][05156] Avg episode reward: [(0, '4.560')] [2023-11-22 03:59:11,195][05156] Fps is (10 sec: 2868.2, 60 sec: 2594.1, 300 sec: 2717.3). Total num frames: 557056. Throughput: 0: 632.2. Samples: 138068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:59:11,199][05156] Avg episode reward: [(0, '4.519')] [2023-11-22 03:59:14,855][06891] Updated weights for policy 0, policy_version 140 (0.0027) [2023-11-22 03:59:16,195][05156] Fps is (10 sec: 3687.3, 60 sec: 2662.5, 300 sec: 2750.2). Total num frames: 577536. Throughput: 0: 699.9. Samples: 143870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:59:16,198][05156] Avg episode reward: [(0, '4.445')] [2023-11-22 03:59:21,196][05156] Fps is (10 sec: 3276.6, 60 sec: 2662.4, 300 sec: 2743.4). Total num frames: 589824. Throughput: 0: 731.9. Samples: 148150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:59:21,198][05156] Avg episode reward: [(0, '4.493')] [2023-11-22 03:59:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth... [2023-11-22 03:59:26,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2730.6, 300 sec: 2736.9). Total num frames: 602112. Throughput: 0: 744.5. Samples: 149872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 03:59:26,199][05156] Avg episode reward: [(0, '4.466')] [2023-11-22 03:59:31,000][06891] Updated weights for policy 0, policy_version 150 (0.0015) [2023-11-22 03:59:31,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2798.9, 300 sec: 2730.7). Total num frames: 614400. Throughput: 0: 728.8. Samples: 153522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 03:59:31,202][05156] Avg episode reward: [(0, '4.427')] [2023-11-22 03:59:36,195][05156] Fps is (10 sec: 2867.3, 60 sec: 2867.2, 300 sec: 2742.5). Total num frames: 630784. Throughput: 0: 707.8. Samples: 158270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 03:59:36,206][05156] Avg episode reward: [(0, '4.415')] [2023-11-22 03:59:41,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2771.3). Total num frames: 651264. Throughput: 0: 733.3. Samples: 161142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 03:59:41,198][05156] Avg episode reward: [(0, '4.517')] [2023-11-22 03:59:41,988][06891] Updated weights for policy 0, policy_version 160 (0.0017) [2023-11-22 03:59:46,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2764.8). Total num frames: 663552. Throughput: 0: 761.8. Samples: 166142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:59:46,198][05156] Avg episode reward: [(0, '4.493')] [2023-11-22 03:59:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2758.5). Total num frames: 675840. Throughput: 0: 762.3. Samples: 169804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 03:59:51,202][05156] Avg episode reward: [(0, '4.483')] [2023-11-22 03:59:56,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 2752.5). Total num frames: 688128. Throughput: 0: 743.9. Samples: 171542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 03:59:56,201][05156] Avg episode reward: [(0, '4.430')] [2023-11-22 03:59:58,550][06891] Updated weights for policy 0, policy_version 170 (0.0029) [2023-11-22 04:00:01,210][05156] Fps is (10 sec: 2863.1, 60 sec: 2934.9, 300 sec: 2762.6). Total num frames: 704512. Throughput: 0: 705.2. Samples: 175614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:00:01,212][05156] Avg episode reward: [(0, '4.329')] [2023-11-22 04:00:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.9, 300 sec: 2772.7). Total num frames: 720896. Throughput: 0: 735.5. Samples: 181246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:00:06,202][05156] Avg episode reward: [(0, '4.425')] [2023-11-22 04:00:10,164][06891] Updated weights for policy 0, policy_version 180 (0.0047) [2023-11-22 04:00:11,195][05156] Fps is (10 sec: 3281.5, 60 sec: 3003.7, 300 sec: 2782.2). Total num frames: 737280. Throughput: 0: 756.3. Samples: 183906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:00:11,200][05156] Avg episode reward: [(0, '4.550')] [2023-11-22 04:00:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2776.2). Total num frames: 749568. Throughput: 0: 755.9. Samples: 187536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:00:16,201][05156] Avg episode reward: [(0, '4.637')] [2023-11-22 04:00:16,206][06878] Saving new best policy, reward=4.637! [2023-11-22 04:00:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2770.4). Total num frames: 761856. Throughput: 0: 731.5. Samples: 191188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:00:21,201][05156] Avg episode reward: [(0, '4.630')] [2023-11-22 04:00:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2764.8). Total num frames: 774144. Throughput: 0: 707.4. Samples: 192976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:00:26,197][05156] Avg episode reward: [(0, '4.739')] [2023-11-22 04:00:26,206][06878] Saving new best policy, reward=4.739! [2023-11-22 04:00:26,499][06891] Updated weights for policy 0, policy_version 190 (0.0022) [2023-11-22 04:00:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2788.2). Total num frames: 794624. Throughput: 0: 714.7. Samples: 198304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:00:31,204][05156] Avg episode reward: [(0, '4.658')] [2023-11-22 04:00:36,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2796.6). Total num frames: 811008. Throughput: 0: 748.1. Samples: 203470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:00:36,202][05156] Avg episode reward: [(0, '4.511')] [2023-11-22 04:00:38,687][06891] Updated weights for policy 0, policy_version 200 (0.0027) [2023-11-22 04:00:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2790.8). Total num frames: 823296. Throughput: 0: 748.0. Samples: 205202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:00:41,198][05156] Avg episode reward: [(0, '4.491')] [2023-11-22 04:00:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2832.5). Total num frames: 835584. Throughput: 0: 737.9. Samples: 208810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:00:46,200][05156] Avg episode reward: [(0, '4.425')] [2023-11-22 04:00:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2874.1). Total num frames: 847872. Throughput: 0: 692.2. Samples: 212394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:00:51,203][05156] Avg episode reward: [(0, '4.399')] [2023-11-22 04:00:54,114][06891] Updated weights for policy 0, policy_version 210 (0.0025) [2023-11-22 04:00:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 864256. Throughput: 0: 693.6. Samples: 215116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:00:56,198][05156] Avg episode reward: [(0, '4.581')] [2023-11-22 04:01:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3004.4, 300 sec: 2915.8). Total num frames: 884736. Throughput: 0: 740.0. Samples: 220836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:01:01,198][05156] Avg episode reward: [(0, '4.693')] [2023-11-22 04:01:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 897024. Throughput: 0: 742.7. Samples: 224610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:01:06,203][05156] Avg episode reward: [(0, '4.592')] [2023-11-22 04:01:07,697][06891] Updated weights for policy 0, policy_version 220 (0.0036) [2023-11-22 04:01:11,199][05156] Fps is (10 sec: 2456.8, 60 sec: 2867.0, 300 sec: 2915.8). Total num frames: 909312. Throughput: 0: 743.5. Samples: 226434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:01:11,206][05156] Avg episode reward: [(0, '4.627')] [2023-11-22 04:01:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 921600. Throughput: 0: 704.2. Samples: 229994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 04:01:16,201][05156] Avg episode reward: [(0, '4.506')] [2023-11-22 04:01:21,195][05156] Fps is (10 sec: 2868.2, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 937984. Throughput: 0: 702.2. Samples: 235068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:01:21,204][05156] Avg episode reward: [(0, '4.514')] [2023-11-22 04:01:21,219][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000229_937984.pth... [2023-11-22 04:01:21,371][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth [2023-11-22 04:01:21,699][06891] Updated weights for policy 0, policy_version 230 (0.0031) [2023-11-22 04:01:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2901.9). Total num frames: 954368. Throughput: 0: 721.3. Samples: 237662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:01:26,198][05156] Avg episode reward: [(0, '4.429')] [2023-11-22 04:01:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 966656. Throughput: 0: 740.8. Samples: 242148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:01:31,202][05156] Avg episode reward: [(0, '4.425')] [2023-11-22 04:01:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 978944. Throughput: 0: 737.6. Samples: 245586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:01:36,198][05156] Avg episode reward: [(0, '4.429')] [2023-11-22 04:01:36,968][06891] Updated weights for policy 0, policy_version 240 (0.0014) [2023-11-22 04:01:41,196][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 991232. Throughput: 0: 716.3. Samples: 247348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:01:41,199][05156] Avg episode reward: [(0, '4.428')] [2023-11-22 04:01:46,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1007616. Throughput: 0: 680.4. Samples: 251452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:01:46,202][05156] Avg episode reward: [(0, '4.508')] [2023-11-22 04:01:50,172][06891] Updated weights for policy 0, policy_version 250 (0.0038) [2023-11-22 04:01:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2888.0). Total num frames: 1028096. Throughput: 0: 720.8. Samples: 257046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:01:51,203][05156] Avg episode reward: [(0, '4.684')] [2023-11-22 04:01:56,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1040384. Throughput: 0: 738.1. Samples: 259646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:01:56,198][05156] Avg episode reward: [(0, '4.785')] [2023-11-22 04:01:56,200][06878] Saving new best policy, reward=4.785! [2023-11-22 04:02:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 1052672. Throughput: 0: 740.0. Samples: 263292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 04:02:01,199][05156] Avg episode reward: [(0, '4.928')] [2023-11-22 04:02:01,214][06878] Saving new best policy, reward=4.928! [2023-11-22 04:02:05,517][06891] Updated weights for policy 0, policy_version 260 (0.0015) [2023-11-22 04:02:06,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2798.9, 300 sec: 2874.1). Total num frames: 1064960. Throughput: 0: 705.3. Samples: 266806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 04:02:06,200][05156] Avg episode reward: [(0, '4.729')] [2023-11-22 04:02:11,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2799.1, 300 sec: 2874.1). Total num frames: 1077248. Throughput: 0: 685.5. Samples: 268508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:02:11,202][05156] Avg episode reward: [(0, '4.739')] [2023-11-22 04:02:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1097728. Throughput: 0: 710.1. Samples: 274102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:02:16,198][05156] Avg episode reward: [(0, '4.616')] [2023-11-22 04:02:17,819][06891] Updated weights for policy 0, policy_version 270 (0.0031) [2023-11-22 04:02:21,195][05156] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 2874.1). Total num frames: 1114112. Throughput: 0: 746.7. Samples: 279188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:02:21,202][05156] Avg episode reward: [(0, '4.669')] [2023-11-22 04:02:26,201][05156] Fps is (10 sec: 2865.7, 60 sec: 2866.9, 300 sec: 2874.1). Total num frames: 1126400. Throughput: 0: 747.3. Samples: 280980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:02:26,208][05156] Avg episode reward: [(0, '4.663')] [2023-11-22 04:02:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2874.2). Total num frames: 1138688. Throughput: 0: 736.4. Samples: 284588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:02:31,209][05156] Avg episode reward: [(0, '4.670')] [2023-11-22 04:02:34,420][06891] Updated weights for policy 0, policy_version 280 (0.0026) [2023-11-22 04:02:36,195][05156] Fps is (10 sec: 2458.9, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 1150976. Throughput: 0: 694.1. Samples: 288282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:02:36,198][05156] Avg episode reward: [(0, '4.829')] [2023-11-22 04:02:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2860.3). Total num frames: 1167360. Throughput: 0: 698.2. Samples: 291064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:02:41,203][05156] Avg episode reward: [(0, '5.031')] [2023-11-22 04:02:41,214][06878] Saving new best policy, reward=5.031! [2023-11-22 04:02:46,103][06891] Updated weights for policy 0, policy_version 290 (0.0024) [2023-11-22 04:02:46,196][05156] Fps is (10 sec: 3686.2, 60 sec: 3003.7, 300 sec: 2860.3). Total num frames: 1187840. Throughput: 0: 740.5. Samples: 296616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:02:46,198][05156] Avg episode reward: [(0, '5.181')] [2023-11-22 04:02:46,203][06878] Saving new best policy, reward=5.181! [2023-11-22 04:02:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2860.3). Total num frames: 1200128. Throughput: 0: 743.6. Samples: 300270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:02:51,198][05156] Avg episode reward: [(0, '5.253')] [2023-11-22 04:02:51,211][06878] Saving new best policy, reward=5.253! [2023-11-22 04:02:56,195][05156] Fps is (10 sec: 2048.1, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 1208320. Throughput: 0: 743.9. Samples: 301984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:02:56,202][05156] Avg episode reward: [(0, '5.151')] [2023-11-22 04:03:01,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 2846.4). Total num frames: 1220608. Throughput: 0: 700.6. Samples: 305628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:03:01,204][05156] Avg episode reward: [(0, '5.032')] [2023-11-22 04:03:02,474][06891] Updated weights for policy 0, policy_version 300 (0.0025) [2023-11-22 04:03:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2846.4). Total num frames: 1241088. Throughput: 0: 705.3. Samples: 310928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:03:06,200][05156] Avg episode reward: [(0, '5.244')] [2023-11-22 04:03:11,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2846.4). Total num frames: 1257472. Throughput: 0: 729.2. Samples: 313790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:11,198][05156] Avg episode reward: [(0, '5.407')] [2023-11-22 04:03:11,238][06878] Saving new best policy, reward=5.407! [2023-11-22 04:03:14,739][06891] Updated weights for policy 0, policy_version 310 (0.0044) [2023-11-22 04:03:16,197][05156] Fps is (10 sec: 2866.7, 60 sec: 2867.1, 300 sec: 2846.4). Total num frames: 1269760. Throughput: 0: 743.8. Samples: 318060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:16,207][05156] Avg episode reward: [(0, '5.536')] [2023-11-22 04:03:16,209][06878] Saving new best policy, reward=5.536! [2023-11-22 04:03:21,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2798.9, 300 sec: 2860.3). Total num frames: 1282048. Throughput: 0: 743.0. Samples: 321718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:21,202][05156] Avg episode reward: [(0, '5.291')] [2023-11-22 04:03:21,218][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000313_1282048.pth... [2023-11-22 04:03:21,358][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000144_589824.pth [2023-11-22 04:03:26,195][05156] Fps is (10 sec: 2458.0, 60 sec: 2799.2, 300 sec: 2874.1). Total num frames: 1294336. Throughput: 0: 719.4. Samples: 323436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:03:26,203][05156] Avg episode reward: [(0, '5.275')] [2023-11-22 04:03:29,798][06891] Updated weights for policy 0, policy_version 320 (0.0037) [2023-11-22 04:03:31,195][05156] Fps is (10 sec: 3277.0, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1314816. Throughput: 0: 697.7. Samples: 328012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:03:31,198][05156] Avg episode reward: [(0, '5.393')] [2023-11-22 04:03:36,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1331200. Throughput: 0: 740.5. Samples: 333592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:36,201][05156] Avg episode reward: [(0, '5.568')] [2023-11-22 04:03:36,204][06878] Saving new best policy, reward=5.568! [2023-11-22 04:03:41,196][05156] Fps is (10 sec: 2867.1, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1343488. Throughput: 0: 751.8. Samples: 335816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:41,198][05156] Avg episode reward: [(0, '5.622')] [2023-11-22 04:03:41,218][06878] Saving new best policy, reward=5.622! [2023-11-22 04:03:43,283][06891] Updated weights for policy 0, policy_version 330 (0.0014) [2023-11-22 04:03:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2799.0, 300 sec: 2888.0). Total num frames: 1355776. Throughput: 0: 751.2. Samples: 339432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:46,198][05156] Avg episode reward: [(0, '5.441')] [2023-11-22 04:03:51,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2798.9, 300 sec: 2888.0). Total num frames: 1368064. Throughput: 0: 714.4. Samples: 343078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:51,201][05156] Avg episode reward: [(0, '5.486')] [2023-11-22 04:03:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1384448. Throughput: 0: 695.4. Samples: 345082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:03:56,200][05156] Avg episode reward: [(0, '5.599')] [2023-11-22 04:03:57,635][06891] Updated weights for policy 0, policy_version 340 (0.0020) [2023-11-22 04:04:01,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1404928. Throughput: 0: 727.7. Samples: 350806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:04:01,198][05156] Avg episode reward: [(0, '5.884')] [2023-11-22 04:04:01,214][06878] Saving new best policy, reward=5.884! [2023-11-22 04:04:06,196][05156] Fps is (10 sec: 3276.6, 60 sec: 2935.4, 300 sec: 2915.8). Total num frames: 1417216. Throughput: 0: 749.8. Samples: 355460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:04:06,202][05156] Avg episode reward: [(0, '6.037')] [2023-11-22 04:04:06,207][06878] Saving new best policy, reward=6.037! [2023-11-22 04:04:11,201][05156] Fps is (10 sec: 2456.3, 60 sec: 2867.0, 300 sec: 2888.0). Total num frames: 1429504. Throughput: 0: 750.8. Samples: 357226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:04:11,203][05156] Avg episode reward: [(0, '6.006')] [2023-11-22 04:04:11,768][06891] Updated weights for policy 0, policy_version 350 (0.0022) [2023-11-22 04:04:16,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1441792. Throughput: 0: 729.8. Samples: 360854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:16,200][05156] Avg episode reward: [(0, '5.728')] [2023-11-22 04:04:21,195][05156] Fps is (10 sec: 2868.7, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1458176. Throughput: 0: 699.9. Samples: 365086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:21,202][05156] Avg episode reward: [(0, '5.719')] [2023-11-22 04:04:25,160][06891] Updated weights for policy 0, policy_version 360 (0.0019) [2023-11-22 04:04:26,195][05156] Fps is (10 sec: 3687.1, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1478656. Throughput: 0: 715.1. Samples: 367994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:26,198][05156] Avg episode reward: [(0, '5.926')] [2023-11-22 04:04:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1490944. Throughput: 0: 757.0. Samples: 373496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:31,198][05156] Avg episode reward: [(0, '6.494')] [2023-11-22 04:04:31,216][06878] Saving new best policy, reward=6.494! [2023-11-22 04:04:36,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1503232. Throughput: 0: 754.1. Samples: 377010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:36,201][05156] Avg episode reward: [(0, '6.457')] [2023-11-22 04:04:40,131][06891] Updated weights for policy 0, policy_version 370 (0.0027) [2023-11-22 04:04:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1515520. Throughput: 0: 749.7. Samples: 378818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:04:41,199][05156] Avg episode reward: [(0, '6.776')] [2023-11-22 04:04:41,219][06878] Saving new best policy, reward=6.776! [2023-11-22 04:04:46,195][05156] Fps is (10 sec: 2457.7, 60 sec: 2867.2, 300 sec: 2888.0). Total num frames: 1527808. Throughput: 0: 704.2. Samples: 382496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-11-22 04:04:46,197][05156] Avg episode reward: [(0, '6.272')] [2023-11-22 04:04:51,197][05156] Fps is (10 sec: 3276.3, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1548288. Throughput: 0: 724.4. Samples: 388058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:04:51,202][05156] Avg episode reward: [(0, '6.264')] [2023-11-22 04:04:52,281][06891] Updated weights for policy 0, policy_version 380 (0.0035) [2023-11-22 04:04:56,199][05156] Fps is (10 sec: 4094.5, 60 sec: 3071.8, 300 sec: 2929.8). Total num frames: 1568768. Throughput: 0: 749.0. Samples: 390932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:04:56,202][05156] Avg episode reward: [(0, '6.210')] [2023-11-22 04:05:01,195][05156] Fps is (10 sec: 3277.3, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1581056. Throughput: 0: 764.5. Samples: 395256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:05:01,202][05156] Avg episode reward: [(0, '6.588')] [2023-11-22 04:05:06,195][05156] Fps is (10 sec: 2458.5, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1593344. Throughput: 0: 752.4. Samples: 398942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:05:06,197][05156] Avg episode reward: [(0, '6.608')] [2023-11-22 04:05:07,122][06891] Updated weights for policy 0, policy_version 390 (0.0015) [2023-11-22 04:05:11,196][05156] Fps is (10 sec: 2457.4, 60 sec: 2935.7, 300 sec: 2901.9). Total num frames: 1605632. Throughput: 0: 728.6. Samples: 400780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:05:11,198][05156] Avg episode reward: [(0, '6.683')] [2023-11-22 04:05:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.8, 300 sec: 2915.8). Total num frames: 1622016. Throughput: 0: 712.6. Samples: 405564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-11-22 04:05:16,198][05156] Avg episode reward: [(0, '6.637')] [2023-11-22 04:05:19,490][06891] Updated weights for policy 0, policy_version 400 (0.0022) [2023-11-22 04:05:21,198][05156] Fps is (10 sec: 3685.6, 60 sec: 3071.9, 300 sec: 2943.5). Total num frames: 1642496. Throughput: 0: 761.8. Samples: 411292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:05:21,202][05156] Avg episode reward: [(0, '6.839')] [2023-11-22 04:05:21,223][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth... [2023-11-22 04:05:21,365][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000229_937984.pth [2023-11-22 04:05:21,386][06878] Saving new best policy, reward=6.839! [2023-11-22 04:05:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1654784. Throughput: 0: 763.2. Samples: 413162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-11-22 04:05:26,198][05156] Avg episode reward: [(0, '6.863')] [2023-11-22 04:05:26,203][06878] Saving new best policy, reward=6.863! [2023-11-22 04:05:31,205][05156] Fps is (10 sec: 2455.8, 60 sec: 2935.0, 300 sec: 2901.8). Total num frames: 1667072. Throughput: 0: 762.4. Samples: 416812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:05:31,222][05156] Avg episode reward: [(0, '6.658')] [2023-11-22 04:05:36,107][06891] Updated weights for policy 0, policy_version 410 (0.0038) [2023-11-22 04:05:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2901.9). Total num frames: 1679360. Throughput: 0: 718.6. Samples: 420394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:05:36,198][05156] Avg episode reward: [(0, '7.107')] [2023-11-22 04:05:36,200][06878] Saving new best policy, reward=7.107! [2023-11-22 04:05:41,195][05156] Fps is (10 sec: 2870.1, 60 sec: 3003.7, 300 sec: 2915.8). Total num frames: 1695744. Throughput: 0: 704.7. Samples: 422640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-11-22 04:05:41,199][05156] Avg episode reward: [(0, '7.421')] [2023-11-22 04:05:41,211][06878] Saving new best policy, reward=7.421! [2023-11-22 04:05:46,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2929.7). Total num frames: 1712128. Throughput: 0: 734.1. Samples: 428292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:05:46,198][05156] Avg episode reward: [(0, '7.633')] [2023-11-22 04:05:46,218][06878] Saving new best policy, reward=7.633! [2023-11-22 04:05:47,399][06891] Updated weights for policy 0, policy_version 420 (0.0018) [2023-11-22 04:05:51,201][05156] Fps is (10 sec: 3275.1, 60 sec: 3003.5, 300 sec: 2929.6). Total num frames: 1728512. Throughput: 0: 750.3. Samples: 432708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:05:51,208][05156] Avg episode reward: [(0, '7.561')] [2023-11-22 04:05:56,196][05156] Fps is (10 sec: 2867.1, 60 sec: 2867.4, 300 sec: 2901.9). Total num frames: 1740800. Throughput: 0: 751.6. Samples: 434604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:05:56,202][05156] Avg episode reward: [(0, '7.104')] [2023-11-22 04:06:01,197][05156] Fps is (10 sec: 2458.5, 60 sec: 2867.1, 300 sec: 2901.9). Total num frames: 1753088. Throughput: 0: 726.1. Samples: 438238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:06:01,201][05156] Avg episode reward: [(0, '7.215')] [2023-11-22 04:06:03,638][06891] Updated weights for policy 0, policy_version 430 (0.0035) [2023-11-22 04:06:06,195][05156] Fps is (10 sec: 2867.3, 60 sec: 2935.5, 300 sec: 2915.8). Total num frames: 1769472. Throughput: 0: 700.7. Samples: 442822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:06:06,200][05156] Avg episode reward: [(0, '7.219')] [2023-11-22 04:06:11,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3072.0, 300 sec: 2943.6). Total num frames: 1789952. Throughput: 0: 724.5. Samples: 445766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:06:11,205][05156] Avg episode reward: [(0, '7.815')] [2023-11-22 04:06:11,215][06878] Saving new best policy, reward=7.815! [2023-11-22 04:06:15,213][06891] Updated weights for policy 0, policy_version 440 (0.0018) [2023-11-22 04:06:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1802240. Throughput: 0: 756.7. Samples: 450856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:06:16,202][05156] Avg episode reward: [(0, '8.087')] [2023-11-22 04:06:16,206][06878] Saving new best policy, reward=8.087! [2023-11-22 04:06:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.3, 300 sec: 2915.8). Total num frames: 1814528. Throughput: 0: 757.4. Samples: 454478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:06:21,200][05156] Avg episode reward: [(0, '9.065')] [2023-11-22 04:06:21,213][06878] Saving new best policy, reward=9.065! [2023-11-22 04:06:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2915.8). Total num frames: 1826816. Throughput: 0: 748.5. Samples: 456324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:06:26,198][05156] Avg episode reward: [(0, '8.924')] [2023-11-22 04:06:30,778][06891] Updated weights for policy 0, policy_version 450 (0.0037) [2023-11-22 04:06:31,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2936.0, 300 sec: 2929.7). Total num frames: 1843200. Throughput: 0: 712.4. Samples: 460352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:06:31,198][05156] Avg episode reward: [(0, '9.232')] [2023-11-22 04:06:31,221][06878] Saving new best policy, reward=9.232! [2023-11-22 04:06:36,196][05156] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 2957.4). Total num frames: 1863680. Throughput: 0: 743.7. Samples: 466170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:06:36,201][05156] Avg episode reward: [(0, '9.514')] [2023-11-22 04:06:36,203][06878] Saving new best policy, reward=9.514! [2023-11-22 04:06:41,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 1880064. Throughput: 0: 764.8. Samples: 469020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:06:41,199][05156] Avg episode reward: [(0, '9.404')] [2023-11-22 04:06:42,597][06891] Updated weights for policy 0, policy_version 460 (0.0023) [2023-11-22 04:06:46,196][05156] Fps is (10 sec: 2867.1, 60 sec: 3003.7, 300 sec: 2929.7). Total num frames: 1892352. Throughput: 0: 767.4. Samples: 472772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:06:46,204][05156] Avg episode reward: [(0, '10.100')] [2023-11-22 04:06:46,205][06878] Saving new best policy, reward=10.100! [2023-11-22 04:06:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.7, 300 sec: 2929.7). Total num frames: 1904640. Throughput: 0: 749.1. Samples: 476532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:06:51,209][05156] Avg episode reward: [(0, '10.266')] [2023-11-22 04:06:51,232][06878] Saving new best policy, reward=10.266! [2023-11-22 04:06:56,195][05156] Fps is (10 sec: 2457.8, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1916928. Throughput: 0: 724.9. Samples: 478388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:06:56,198][05156] Avg episode reward: [(0, '10.107')] [2023-11-22 04:06:57,698][06891] Updated weights for policy 0, policy_version 470 (0.0013) [2023-11-22 04:07:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 2957.5). Total num frames: 1937408. Throughput: 0: 731.9. Samples: 483790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:01,198][05156] Avg episode reward: [(0, '10.392')] [2023-11-22 04:07:01,208][06878] Saving new best policy, reward=10.392! [2023-11-22 04:07:06,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2971.3). Total num frames: 1953792. Throughput: 0: 770.5. Samples: 489150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:07:06,202][05156] Avg episode reward: [(0, '10.153')] [2023-11-22 04:07:10,438][06891] Updated weights for policy 0, policy_version 480 (0.0014) [2023-11-22 04:07:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2943.6). Total num frames: 1966080. Throughput: 0: 771.4. Samples: 491036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:11,200][05156] Avg episode reward: [(0, '10.279')] [2023-11-22 04:07:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1978368. Throughput: 0: 766.2. Samples: 494832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:07:16,198][05156] Avg episode reward: [(0, '10.458')] [2023-11-22 04:07:16,199][06878] Saving new best policy, reward=10.458! [2023-11-22 04:07:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2929.7). Total num frames: 1990656. Throughput: 0: 717.7. Samples: 498468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:07:21,202][05156] Avg episode reward: [(0, '10.927')] [2023-11-22 04:07:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth... [2023-11-22 04:07:21,331][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000313_1282048.pth [2023-11-22 04:07:21,343][06878] Saving new best policy, reward=10.927! [2023-11-22 04:07:24,675][06891] Updated weights for policy 0, policy_version 490 (0.0014) [2023-11-22 04:07:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2957.5). Total num frames: 2011136. Throughput: 0: 717.5. Samples: 501308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:07:26,198][05156] Avg episode reward: [(0, '11.591')] [2023-11-22 04:07:26,200][06878] Saving new best policy, reward=11.591! [2023-11-22 04:07:31,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 2985.2). Total num frames: 2031616. Throughput: 0: 766.9. Samples: 507280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:07:31,203][05156] Avg episode reward: [(0, '12.051')] [2023-11-22 04:07:31,217][06878] Saving new best policy, reward=12.051! [2023-11-22 04:07:36,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2043904. Throughput: 0: 771.2. Samples: 511234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:36,205][05156] Avg episode reward: [(0, '12.286')] [2023-11-22 04:07:36,206][06878] Saving new best policy, reward=12.286! [2023-11-22 04:07:38,097][06891] Updated weights for policy 0, policy_version 500 (0.0020) [2023-11-22 04:07:41,195][05156] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 2052096. Throughput: 0: 766.7. Samples: 512890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:41,201][05156] Avg episode reward: [(0, '11.796')] [2023-11-22 04:07:46,196][05156] Fps is (10 sec: 2048.0, 60 sec: 2867.2, 300 sec: 2929.7). Total num frames: 2064384. Throughput: 0: 729.3. Samples: 516608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:46,198][05156] Avg episode reward: [(0, '12.246')] [2023-11-22 04:07:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2084864. Throughput: 0: 724.6. Samples: 521756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:07:51,198][05156] Avg episode reward: [(0, '13.190')] [2023-11-22 04:07:51,206][06878] Saving new best policy, reward=13.190! [2023-11-22 04:07:51,962][06891] Updated weights for policy 0, policy_version 510 (0.0020) [2023-11-22 04:07:56,195][05156] Fps is (10 sec: 4096.1, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2105344. Throughput: 0: 748.4. Samples: 524712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:07:56,198][05156] Avg episode reward: [(0, '13.524')] [2023-11-22 04:07:56,200][06878] Saving new best policy, reward=13.524! [2023-11-22 04:08:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2117632. Throughput: 0: 765.8. Samples: 529292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:08:01,197][05156] Avg episode reward: [(0, '14.561')] [2023-11-22 04:08:01,210][06878] Saving new best policy, reward=14.561! [2023-11-22 04:08:05,532][06891] Updated weights for policy 0, policy_version 520 (0.0023) [2023-11-22 04:08:06,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2957.5). Total num frames: 2129920. Throughput: 0: 767.2. Samples: 532992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:08:06,198][05156] Avg episode reward: [(0, '15.548')] [2023-11-22 04:08:06,203][06878] Saving new best policy, reward=15.548! [2023-11-22 04:08:11,197][05156] Fps is (10 sec: 2457.2, 60 sec: 2935.4, 300 sec: 2957.4). Total num frames: 2142208. Throughput: 0: 743.8. Samples: 534780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:08:11,203][05156] Avg episode reward: [(0, '15.024')] [2023-11-22 04:08:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2158592. Throughput: 0: 711.7. Samples: 539308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:08:16,198][05156] Avg episode reward: [(0, '15.533')] [2023-11-22 04:08:18,760][06891] Updated weights for policy 0, policy_version 530 (0.0023) [2023-11-22 04:08:21,195][05156] Fps is (10 sec: 3687.0, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2179072. Throughput: 0: 755.1. Samples: 545214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:08:21,201][05156] Avg episode reward: [(0, '15.017')] [2023-11-22 04:08:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2191360. Throughput: 0: 769.8. Samples: 547532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:08:26,200][05156] Avg episode reward: [(0, '13.726')] [2023-11-22 04:08:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2957.5). Total num frames: 2203648. Throughput: 0: 772.7. Samples: 551378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:08:31,203][05156] Avg episode reward: [(0, '13.754')] [2023-11-22 04:08:33,140][06891] Updated weights for policy 0, policy_version 540 (0.0021) [2023-11-22 04:08:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 2957.5). Total num frames: 2215936. Throughput: 0: 739.6. Samples: 555036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:08:36,201][05156] Avg episode reward: [(0, '13.851')] [2023-11-22 04:08:41,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2232320. Throughput: 0: 720.0. Samples: 557112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:08:41,202][05156] Avg episode reward: [(0, '12.784')] [2023-11-22 04:08:45,470][06891] Updated weights for policy 0, policy_version 550 (0.0019) [2023-11-22 04:08:46,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2999.1). Total num frames: 2252800. Throughput: 0: 751.0. Samples: 563086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:08:46,205][05156] Avg episode reward: [(0, '13.759')] [2023-11-22 04:08:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 2269184. Throughput: 0: 780.8. Samples: 568128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:08:51,202][05156] Avg episode reward: [(0, '14.106')] [2023-11-22 04:08:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2281472. Throughput: 0: 783.4. Samples: 570032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:08:56,202][05156] Avg episode reward: [(0, '14.623')] [2023-11-22 04:09:00,313][06891] Updated weights for policy 0, policy_version 560 (0.0013) [2023-11-22 04:09:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2293760. Throughput: 0: 765.6. Samples: 573762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:09:01,204][05156] Avg episode reward: [(0, '15.614')] [2023-11-22 04:09:01,218][06878] Saving new best policy, reward=15.614! [2023-11-22 04:09:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2985.3). Total num frames: 2310144. Throughput: 0: 735.3. Samples: 578304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:09:06,202][05156] Avg episode reward: [(0, '15.894')] [2023-11-22 04:09:06,205][06878] Saving new best policy, reward=15.894! [2023-11-22 04:09:11,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 3013.0). Total num frames: 2330624. Throughput: 0: 747.2. Samples: 581154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:09:11,201][05156] Avg episode reward: [(0, '15.337')] [2023-11-22 04:09:11,881][06891] Updated weights for policy 0, policy_version 570 (0.0033) [2023-11-22 04:09:16,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2347008. Throughput: 0: 781.3. Samples: 586538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:09:16,198][05156] Avg episode reward: [(0, '14.867')] [2023-11-22 04:09:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2359296. Throughput: 0: 786.9. Samples: 590446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:09:21,203][05156] Avg episode reward: [(0, '13.953')] [2023-11-22 04:09:21,212][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth... [2023-11-22 04:09:21,359][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000401_1642496.pth [2023-11-22 04:09:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2371584. Throughput: 0: 782.6. Samples: 592330. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:09:26,198][05156] Avg episode reward: [(0, '14.771')] [2023-11-22 04:09:27,271][06891] Updated weights for policy 0, policy_version 580 (0.0018) [2023-11-22 04:09:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2383872. Throughput: 0: 731.3. Samples: 595994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:09:31,201][05156] Avg episode reward: [(0, '14.867')] [2023-11-22 04:09:36,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2404352. Throughput: 0: 751.5. Samples: 601944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:09:36,198][05156] Avg episode reward: [(0, '15.682')] [2023-11-22 04:09:38,567][06891] Updated weights for policy 0, policy_version 590 (0.0030) [2023-11-22 04:09:41,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 2424832. Throughput: 0: 776.5. Samples: 604974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:09:41,205][05156] Avg episode reward: [(0, '15.999')] [2023-11-22 04:09:41,219][06878] Saving new best policy, reward=15.999! [2023-11-22 04:09:46,199][05156] Fps is (10 sec: 3275.6, 60 sec: 3071.8, 300 sec: 3013.0). Total num frames: 2437120. Throughput: 0: 784.0. Samples: 609046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:09:46,201][05156] Avg episode reward: [(0, '16.978')] [2023-11-22 04:09:46,204][06878] Saving new best policy, reward=16.978! [2023-11-22 04:09:51,199][05156] Fps is (10 sec: 2456.7, 60 sec: 3003.6, 300 sec: 2985.2). Total num frames: 2449408. Throughput: 0: 768.7. Samples: 612896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:09:51,201][05156] Avg episode reward: [(0, '16.994')] [2023-11-22 04:09:51,218][06878] Saving new best policy, reward=16.994! [2023-11-22 04:09:54,168][06891] Updated weights for policy 0, policy_version 600 (0.0019) [2023-11-22 04:09:56,195][05156] Fps is (10 sec: 2458.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2461696. Throughput: 0: 744.5. Samples: 614658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:09:56,198][05156] Avg episode reward: [(0, '16.785')] [2023-11-22 04:10:01,195][05156] Fps is (10 sec: 3278.0, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2482176. Throughput: 0: 740.8. Samples: 619874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:10:01,198][05156] Avg episode reward: [(0, '18.278')] [2023-11-22 04:10:01,215][06878] Saving new best policy, reward=18.278! [2023-11-22 04:10:05,177][06891] Updated weights for policy 0, policy_version 610 (0.0021) [2023-11-22 04:10:06,197][05156] Fps is (10 sec: 3685.8, 60 sec: 3140.2, 300 sec: 3026.9). Total num frames: 2498560. Throughput: 0: 783.5. Samples: 625706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:10:06,203][05156] Avg episode reward: [(0, '18.083')] [2023-11-22 04:10:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 2510848. Throughput: 0: 785.1. Samples: 627658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:10:11,204][05156] Avg episode reward: [(0, '18.486')] [2023-11-22 04:10:11,217][06878] Saving new best policy, reward=18.486! [2023-11-22 04:10:16,195][05156] Fps is (10 sec: 2458.0, 60 sec: 2935.5, 300 sec: 2985.2). Total num frames: 2523136. Throughput: 0: 787.0. Samples: 631410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:10:16,203][05156] Avg episode reward: [(0, '18.291')] [2023-11-22 04:10:20,878][06891] Updated weights for policy 0, policy_version 620 (0.0016) [2023-11-22 04:10:21,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 2539520. Throughput: 0: 741.2. Samples: 635300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:10:21,199][05156] Avg episode reward: [(0, '18.189')] [2023-11-22 04:10:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3027.0). Total num frames: 2560000. Throughput: 0: 737.7. Samples: 638172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:10:26,197][05156] Avg episode reward: [(0, '17.159')] [2023-11-22 04:10:31,038][06891] Updated weights for policy 0, policy_version 630 (0.0020) [2023-11-22 04:10:31,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3054.6). Total num frames: 2580480. Throughput: 0: 784.8. Samples: 644358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:10:31,203][05156] Avg episode reward: [(0, '16.638')] [2023-11-22 04:10:36,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3040.8). Total num frames: 2592768. Throughput: 0: 798.3. Samples: 648816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:10:36,202][05156] Avg episode reward: [(0, '17.171')] [2023-11-22 04:10:41,197][05156] Fps is (10 sec: 2457.2, 60 sec: 3003.7, 300 sec: 3026.9). Total num frames: 2605056. Throughput: 0: 801.1. Samples: 650710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:10:41,202][05156] Avg episode reward: [(0, '17.343')] [2023-11-22 04:10:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.9, 300 sec: 3013.0). Total num frames: 2617344. Throughput: 0: 768.9. Samples: 654474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:10:46,200][05156] Avg episode reward: [(0, '18.108')] [2023-11-22 04:10:46,970][06891] Updated weights for policy 0, policy_version 640 (0.0035) [2023-11-22 04:10:51,195][05156] Fps is (10 sec: 2867.6, 60 sec: 3072.2, 300 sec: 3026.9). Total num frames: 2633728. Throughput: 0: 744.9. Samples: 659224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:10:51,198][05156] Avg episode reward: [(0, '17.344')] [2023-11-22 04:10:56,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3054.7). Total num frames: 2654208. Throughput: 0: 766.0. Samples: 662126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:10:56,198][05156] Avg episode reward: [(0, '18.163')] [2023-11-22 04:10:57,688][06891] Updated weights for policy 0, policy_version 650 (0.0018) [2023-11-22 04:11:01,200][05156] Fps is (10 sec: 3685.3, 60 sec: 3140.1, 300 sec: 3054.6). Total num frames: 2670592. Throughput: 0: 801.9. Samples: 667496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:11:01,205][05156] Avg episode reward: [(0, '18.739')] [2023-11-22 04:11:01,222][06878] Saving new best policy, reward=18.739! [2023-11-22 04:11:06,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.1, 300 sec: 3026.9). Total num frames: 2682880. Throughput: 0: 798.7. Samples: 671240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:11:06,201][05156] Avg episode reward: [(0, '18.660')] [2023-11-22 04:11:11,195][05156] Fps is (10 sec: 2458.4, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 2695168. Throughput: 0: 775.1. Samples: 673050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:11:11,198][05156] Avg episode reward: [(0, '20.152')] [2023-11-22 04:11:11,206][06878] Saving new best policy, reward=20.152! [2023-11-22 04:11:14,164][06891] Updated weights for policy 0, policy_version 660 (0.0020) [2023-11-22 04:11:16,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 2707456. Throughput: 0: 723.4. Samples: 676912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:11:16,198][05156] Avg episode reward: [(0, '20.790')] [2023-11-22 04:11:16,206][06878] Saving new best policy, reward=20.790! [2023-11-22 04:11:21,197][05156] Fps is (10 sec: 2457.1, 60 sec: 3003.6, 300 sec: 3026.9). Total num frames: 2719744. Throughput: 0: 707.4. Samples: 680652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:11:21,202][05156] Avg episode reward: [(0, '20.220')] [2023-11-22 04:11:21,214][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth... [2023-11-22 04:11:21,375][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth [2023-11-22 04:11:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3013.0). Total num frames: 2732032. Throughput: 0: 706.4. Samples: 682496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:11:26,198][05156] Avg episode reward: [(0, '20.614')] [2023-11-22 04:11:31,195][05156] Fps is (10 sec: 2048.4, 60 sec: 2662.4, 300 sec: 2971.3). Total num frames: 2740224. Throughput: 0: 689.3. Samples: 685494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:11:31,198][05156] Avg episode reward: [(0, '20.620')] [2023-11-22 04:11:31,791][06891] Updated weights for policy 0, policy_version 670 (0.0052) [2023-11-22 04:11:36,200][05156] Fps is (10 sec: 2047.1, 60 sec: 2662.2, 300 sec: 2957.4). Total num frames: 2752512. Throughput: 0: 660.5. Samples: 688948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:11:36,202][05156] Avg episode reward: [(0, '20.278')] [2023-11-22 04:11:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2662.5, 300 sec: 2957.5). Total num frames: 2764800. Throughput: 0: 640.5. Samples: 690950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 04:11:41,202][05156] Avg episode reward: [(0, '20.879')] [2023-11-22 04:11:41,216][06878] Saving new best policy, reward=20.879! [2023-11-22 04:11:46,195][05156] Fps is (10 sec: 2868.5, 60 sec: 2730.7, 300 sec: 2971.3). Total num frames: 2781184. Throughput: 0: 614.4. Samples: 695140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:11:46,204][05156] Avg episode reward: [(0, '21.041')] [2023-11-22 04:11:46,206][06878] Saving new best policy, reward=21.041! [2023-11-22 04:11:46,441][06891] Updated weights for policy 0, policy_version 680 (0.0014) [2023-11-22 04:11:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 2798.9, 300 sec: 2999.1). Total num frames: 2801664. Throughput: 0: 664.0. Samples: 701118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:11:51,200][05156] Avg episode reward: [(0, '21.091')] [2023-11-22 04:11:51,215][06878] Saving new best policy, reward=21.091! [2023-11-22 04:11:56,196][05156] Fps is (10 sec: 3686.2, 60 sec: 2730.6, 300 sec: 2985.2). Total num frames: 2818048. Throughput: 0: 685.4. Samples: 703894. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:11:56,200][05156] Avg episode reward: [(0, '21.150')] [2023-11-22 04:11:56,202][06878] Saving new best policy, reward=21.150! [2023-11-22 04:11:58,873][06891] Updated weights for policy 0, policy_version 690 (0.0031) [2023-11-22 04:12:01,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2662.5, 300 sec: 2971.3). Total num frames: 2830336. Throughput: 0: 681.8. Samples: 707594. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-11-22 04:12:01,200][05156] Avg episode reward: [(0, '21.972')] [2023-11-22 04:12:01,214][06878] Saving new best policy, reward=21.972! [2023-11-22 04:12:06,197][05156] Fps is (10 sec: 2457.3, 60 sec: 2662.3, 300 sec: 2971.3). Total num frames: 2842624. Throughput: 0: 680.8. Samples: 711290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:12:06,200][05156] Avg episode reward: [(0, '22.124')] [2023-11-22 04:12:06,203][06878] Saving new best policy, reward=22.124! [2023-11-22 04:12:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 2730.7, 300 sec: 2985.2). Total num frames: 2859008. Throughput: 0: 679.5. Samples: 713072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:12:11,198][05156] Avg episode reward: [(0, '21.421')] [2023-11-22 04:12:13,174][06891] Updated weights for policy 0, policy_version 700 (0.0014) [2023-11-22 04:12:16,195][05156] Fps is (10 sec: 3277.3, 60 sec: 2798.9, 300 sec: 2999.1). Total num frames: 2875392. Throughput: 0: 740.4. Samples: 718814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:12:16,198][05156] Avg episode reward: [(0, '20.765')] [2023-11-22 04:12:21,195][05156] Fps is (10 sec: 3686.5, 60 sec: 2935.6, 300 sec: 2999.1). Total num frames: 2895872. Throughput: 0: 785.2. Samples: 724280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:12:21,205][05156] Avg episode reward: [(0, '20.099')] [2023-11-22 04:12:25,932][06891] Updated weights for policy 0, policy_version 710 (0.0018) [2023-11-22 04:12:26,195][05156] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 2971.3). Total num frames: 2908160. Throughput: 0: 782.8. Samples: 726176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:12:26,203][05156] Avg episode reward: [(0, '20.335')] [2023-11-22 04:12:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 2971.3). Total num frames: 2920448. Throughput: 0: 776.4. Samples: 730076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:12:31,204][05156] Avg episode reward: [(0, '21.765')] [2023-11-22 04:12:36,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3004.0, 300 sec: 2985.2). Total num frames: 2932736. Throughput: 0: 733.3. Samples: 734116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:12:36,197][05156] Avg episode reward: [(0, '21.044')] [2023-11-22 04:12:39,397][06891] Updated weights for policy 0, policy_version 720 (0.0025) [2023-11-22 04:12:41,195][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 2953216. Throughput: 0: 738.1. Samples: 737110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:12:41,203][05156] Avg episode reward: [(0, '21.669')] [2023-11-22 04:12:46,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3013.0). Total num frames: 2973696. Throughput: 0: 788.0. Samples: 743056. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:12:46,199][05156] Avg episode reward: [(0, '22.084')] [2023-11-22 04:12:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 2985984. Throughput: 0: 792.8. Samples: 746966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:12:51,198][05156] Avg episode reward: [(0, '23.145')] [2023-11-22 04:12:51,210][06878] Saving new best policy, reward=23.145! [2023-11-22 04:12:52,408][06891] Updated weights for policy 0, policy_version 730 (0.0024) [2023-11-22 04:12:56,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 2998272. Throughput: 0: 796.2. Samples: 748902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:12:56,203][05156] Avg episode reward: [(0, '23.776')] [2023-11-22 04:12:56,206][06878] Saving new best policy, reward=23.776! [2023-11-22 04:13:01,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 2985.2). Total num frames: 3010560. Throughput: 0: 749.9. Samples: 752558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:13:01,198][05156] Avg episode reward: [(0, '24.874')] [2023-11-22 04:13:01,213][06878] Saving new best policy, reward=24.874! [2023-11-22 04:13:05,975][06891] Updated weights for policy 0, policy_version 740 (0.0024) [2023-11-22 04:13:06,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3140.4, 300 sec: 3013.0). Total num frames: 3031040. Throughput: 0: 750.0. Samples: 758030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:13:06,201][05156] Avg episode reward: [(0, '25.598')] [2023-11-22 04:13:06,207][06878] Saving new best policy, reward=25.598! [2023-11-22 04:13:11,195][05156] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 3047424. Throughput: 0: 773.1. Samples: 760964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:13:11,201][05156] Avg episode reward: [(0, '24.715')] [2023-11-22 04:13:16,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2985.2). Total num frames: 3059712. Throughput: 0: 786.4. Samples: 765466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:13:16,199][05156] Avg episode reward: [(0, '23.849')] [2023-11-22 04:13:19,516][06891] Updated weights for policy 0, policy_version 750 (0.0014) [2023-11-22 04:13:21,196][05156] Fps is (10 sec: 2457.5, 60 sec: 2935.4, 300 sec: 2985.2). Total num frames: 3072000. Throughput: 0: 781.9. Samples: 769302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:13:21,202][05156] Avg episode reward: [(0, '23.683')] [2023-11-22 04:13:21,258][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth... [2023-11-22 04:13:21,393][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000576_2359296.pth [2023-11-22 04:13:26,195][05156] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2985.2). Total num frames: 3084288. Throughput: 0: 755.4. Samples: 771102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:13:26,198][05156] Avg episode reward: [(0, '22.674')] [2023-11-22 04:13:31,195][05156] Fps is (10 sec: 3276.9, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3104768. Throughput: 0: 732.0. Samples: 775994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:13:31,198][05156] Avg episode reward: [(0, '21.411')] [2023-11-22 04:13:32,461][06891] Updated weights for policy 0, policy_version 760 (0.0014) [2023-11-22 04:13:36,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3125248. Throughput: 0: 778.0. Samples: 781974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:13:36,198][05156] Avg episode reward: [(0, '20.716')] [2023-11-22 04:13:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3137536. Throughput: 0: 783.4. Samples: 784156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:13:41,202][05156] Avg episode reward: [(0, '21.679')] [2023-11-22 04:13:46,048][06891] Updated weights for policy 0, policy_version 770 (0.0038) [2023-11-22 04:13:46,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 3153920. Throughput: 0: 787.9. Samples: 788014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:13:46,201][05156] Avg episode reward: [(0, '23.157')] [2023-11-22 04:13:51,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2999.1). Total num frames: 3166208. Throughput: 0: 754.6. Samples: 791988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:13:51,203][05156] Avg episode reward: [(0, '22.659')] [2023-11-22 04:13:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3182592. Throughput: 0: 738.5. Samples: 794196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:13:56,198][05156] Avg episode reward: [(0, '22.170')] [2023-11-22 04:13:58,685][06891] Updated weights for policy 0, policy_version 780 (0.0016) [2023-11-22 04:14:01,196][05156] Fps is (10 sec: 3686.3, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3203072. Throughput: 0: 772.9. Samples: 800246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:14:01,205][05156] Avg episode reward: [(0, '21.700')] [2023-11-22 04:14:06,201][05156] Fps is (10 sec: 3684.4, 60 sec: 3140.0, 300 sec: 3012.9). Total num frames: 3219456. Throughput: 0: 801.6. Samples: 805380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:14:06,203][05156] Avg episode reward: [(0, '22.940')] [2023-11-22 04:14:11,196][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3231744. Throughput: 0: 803.7. Samples: 807268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:14:11,198][05156] Avg episode reward: [(0, '22.366')] [2023-11-22 04:14:12,249][06891] Updated weights for policy 0, policy_version 790 (0.0012) [2023-11-22 04:14:16,195][05156] Fps is (10 sec: 2458.9, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3244032. Throughput: 0: 780.4. Samples: 811114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:14:16,202][05156] Avg episode reward: [(0, '22.147')] [2023-11-22 04:14:21,195][05156] Fps is (10 sec: 2867.3, 60 sec: 3140.3, 300 sec: 3013.0). Total num frames: 3260416. Throughput: 0: 750.6. Samples: 815750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:14:21,204][05156] Avg episode reward: [(0, '22.734')] [2023-11-22 04:14:24,699][06891] Updated weights for policy 0, policy_version 800 (0.0029) [2023-11-22 04:14:26,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3280896. Throughput: 0: 769.9. Samples: 818800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:14:26,198][05156] Avg episode reward: [(0, '24.043')] [2023-11-22 04:14:31,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3026.9). Total num frames: 3297280. Throughput: 0: 811.6. Samples: 824534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:14:31,198][05156] Avg episode reward: [(0, '24.913')] [2023-11-22 04:14:36,196][05156] Fps is (10 sec: 2867.0, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3309568. Throughput: 0: 810.5. Samples: 828460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:14:36,201][05156] Avg episode reward: [(0, '24.507')] [2023-11-22 04:14:38,265][06891] Updated weights for policy 0, policy_version 810 (0.0019) [2023-11-22 04:14:41,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2999.1). Total num frames: 3321856. Throughput: 0: 803.7. Samples: 830362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:14:41,203][05156] Avg episode reward: [(0, '24.406')] [2023-11-22 04:14:46,195][05156] Fps is (10 sec: 2867.4, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3338240. Throughput: 0: 755.6. Samples: 834246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:14:46,198][05156] Avg episode reward: [(0, '23.702')] [2023-11-22 04:14:50,763][06891] Updated weights for policy 0, policy_version 820 (0.0022) [2023-11-22 04:14:51,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3358720. Throughput: 0: 775.9. Samples: 840292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:14:51,198][05156] Avg episode reward: [(0, '23.647')] [2023-11-22 04:14:56,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3379200. Throughput: 0: 801.3. Samples: 843328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:14:56,202][05156] Avg episode reward: [(0, '25.418')] [2023-11-22 04:15:01,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 3391488. Throughput: 0: 807.5. Samples: 847454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:15:01,198][05156] Avg episode reward: [(0, '25.987')] [2023-11-22 04:15:01,209][06878] Saving new best policy, reward=25.987! [2023-11-22 04:15:04,587][06891] Updated weights for policy 0, policy_version 830 (0.0025) [2023-11-22 04:15:06,199][05156] Fps is (10 sec: 2047.2, 60 sec: 3003.8, 300 sec: 3013.0). Total num frames: 3399680. Throughput: 0: 790.7. Samples: 851336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:15:06,211][05156] Avg episode reward: [(0, '24.891')] [2023-11-22 04:15:11,195][05156] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3026.9). Total num frames: 3416064. Throughput: 0: 766.5. Samples: 853294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:15:11,205][05156] Avg episode reward: [(0, '24.704')] [2023-11-22 04:15:16,195][05156] Fps is (10 sec: 3687.8, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3436544. Throughput: 0: 760.8. Samples: 858772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:15:16,198][05156] Avg episode reward: [(0, '23.919')] [2023-11-22 04:15:16,796][06891] Updated weights for policy 0, policy_version 840 (0.0013) [2023-11-22 04:15:21,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3040.8). Total num frames: 3457024. Throughput: 0: 806.3. Samples: 864742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:15:21,200][05156] Avg episode reward: [(0, '25.798')] [2023-11-22 04:15:21,219][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth... [2023-11-22 04:15:21,360][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000664_2719744.pth [2023-11-22 04:15:26,196][05156] Fps is (10 sec: 3276.7, 60 sec: 3140.2, 300 sec: 3013.0). Total num frames: 3469312. Throughput: 0: 805.4. Samples: 866606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:15:26,203][05156] Avg episode reward: [(0, '26.161')] [2023-11-22 04:15:26,205][06878] Saving new best policy, reward=26.161! [2023-11-22 04:15:30,685][06891] Updated weights for policy 0, policy_version 850 (0.0015) [2023-11-22 04:15:31,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3481600. Throughput: 0: 803.3. Samples: 870394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:15:31,198][05156] Avg episode reward: [(0, '24.897')] [2023-11-22 04:15:36,195][05156] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3013.0). Total num frames: 3493888. Throughput: 0: 753.9. Samples: 874216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:15:36,198][05156] Avg episode reward: [(0, '23.802')] [2023-11-22 04:15:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 3514368. Throughput: 0: 751.1. Samples: 877126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:15:41,198][05156] Avg episode reward: [(0, '24.381')] [2023-11-22 04:15:42,658][06891] Updated weights for policy 0, policy_version 860 (0.0024) [2023-11-22 04:15:46,195][05156] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3054.6). Total num frames: 3534848. Throughput: 0: 795.3. Samples: 883242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:15:46,201][05156] Avg episode reward: [(0, '24.265')] [2023-11-22 04:15:51,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3026.9). Total num frames: 3547136. Throughput: 0: 803.3. Samples: 887482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:15:51,204][05156] Avg episode reward: [(0, '25.170')] [2023-11-22 04:15:56,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 3559424. Throughput: 0: 803.6. Samples: 889458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-11-22 04:15:56,200][05156] Avg episode reward: [(0, '24.861')] [2023-11-22 04:15:56,818][06891] Updated weights for policy 0, policy_version 870 (0.0032) [2023-11-22 04:16:01,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3013.0). Total num frames: 3571712. Throughput: 0: 767.6. Samples: 893316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:16:01,204][05156] Avg episode reward: [(0, '24.435')] [2023-11-22 04:16:06,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3208.7, 300 sec: 3040.8). Total num frames: 3592192. Throughput: 0: 751.1. Samples: 898542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:16:06,204][05156] Avg episode reward: [(0, '23.138')] [2023-11-22 04:16:08,890][06891] Updated weights for policy 0, policy_version 880 (0.0034) [2023-11-22 04:16:11,195][05156] Fps is (10 sec: 4096.1, 60 sec: 3276.8, 300 sec: 3068.5). Total num frames: 3612672. Throughput: 0: 777.8. Samples: 901608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:16:11,198][05156] Avg episode reward: [(0, '21.819')] [2023-11-22 04:16:16,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3068.5). Total num frames: 3624960. Throughput: 0: 799.1. Samples: 906352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:16:16,203][05156] Avg episode reward: [(0, '22.080')] [2023-11-22 04:16:21,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3068.5). Total num frames: 3637248. Throughput: 0: 800.9. Samples: 910256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:16:21,206][05156] Avg episode reward: [(0, '19.197')] [2023-11-22 04:16:23,279][06891] Updated weights for policy 0, policy_version 890 (0.0035) [2023-11-22 04:16:26,196][05156] Fps is (10 sec: 2457.5, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 3649536. Throughput: 0: 778.3. Samples: 912152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:16:26,198][05156] Avg episode reward: [(0, '19.435')] [2023-11-22 04:16:31,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3670016. Throughput: 0: 748.9. Samples: 916942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:16:31,204][05156] Avg episode reward: [(0, '19.907')] [2023-11-22 04:16:34,819][06891] Updated weights for policy 0, policy_version 900 (0.0027) [2023-11-22 04:16:36,195][05156] Fps is (10 sec: 4096.2, 60 sec: 3276.8, 300 sec: 3138.0). Total num frames: 3690496. Throughput: 0: 784.6. Samples: 922790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:16:36,198][05156] Avg episode reward: [(0, '21.841')] [2023-11-22 04:16:41,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3702784. Throughput: 0: 795.3. Samples: 925248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:16:41,200][05156] Avg episode reward: [(0, '22.503')] [2023-11-22 04:16:46,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 3715072. Throughput: 0: 796.1. Samples: 929142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:16:46,203][05156] Avg episode reward: [(0, '23.755')] [2023-11-22 04:16:49,777][06891] Updated weights for policy 0, policy_version 910 (0.0013) [2023-11-22 04:16:51,195][05156] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 3727360. Throughput: 0: 762.3. Samples: 932844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-11-22 04:16:51,205][05156] Avg episode reward: [(0, '22.778')] [2023-11-22 04:16:56,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 3743744. Throughput: 0: 740.3. Samples: 934922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:16:56,204][05156] Avg episode reward: [(0, '23.357')] [2023-11-22 04:17:01,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3764224. Throughput: 0: 770.3. Samples: 941014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:17:01,203][05156] Avg episode reward: [(0, '25.043')] [2023-11-22 04:17:01,428][06891] Updated weights for policy 0, policy_version 920 (0.0028) [2023-11-22 04:17:06,195][05156] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3780608. Throughput: 0: 798.6. Samples: 946192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-11-22 04:17:06,198][05156] Avg episode reward: [(0, '24.528')] [2023-11-22 04:17:11,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 3796992. Throughput: 0: 800.9. Samples: 948194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:11,198][05156] Avg episode reward: [(0, '23.349')] [2023-11-22 04:17:16,014][06891] Updated weights for policy 0, policy_version 930 (0.0055) [2023-11-22 04:17:16,197][05156] Fps is (10 sec: 2866.8, 60 sec: 3071.9, 300 sec: 3096.3). Total num frames: 3809280. Throughput: 0: 781.2. Samples: 952098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:17:16,205][05156] Avg episode reward: [(0, '23.855')] [2023-11-22 04:17:21,196][05156] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3110.2). Total num frames: 3825664. Throughput: 0: 752.8. Samples: 956668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:17:21,201][05156] Avg episode reward: [(0, '23.433')] [2023-11-22 04:17:21,212][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000934_3825664.pth... [2023-11-22 04:17:21,328][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth [2023-11-22 04:17:26,195][05156] Fps is (10 sec: 3686.9, 60 sec: 3276.8, 300 sec: 3138.0). Total num frames: 3846144. Throughput: 0: 764.5. Samples: 959650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:26,198][05156] Avg episode reward: [(0, '24.205')] [2023-11-22 04:17:27,031][06891] Updated weights for policy 0, policy_version 940 (0.0023) [2023-11-22 04:17:31,195][05156] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 3862528. Throughput: 0: 803.9. Samples: 965318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:31,201][05156] Avg episode reward: [(0, '24.389')] [2023-11-22 04:17:36,197][05156] Fps is (10 sec: 2866.8, 60 sec: 3071.9, 300 sec: 3124.1). Total num frames: 3874816. Throughput: 0: 807.0. Samples: 969162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:36,200][05156] Avg episode reward: [(0, '23.201')] [2023-11-22 04:17:41,203][05156] Fps is (10 sec: 2455.8, 60 sec: 3071.6, 300 sec: 3096.2). Total num frames: 3887104. Throughput: 0: 803.6. Samples: 971088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:41,209][05156] Avg episode reward: [(0, '23.697')] [2023-11-22 04:17:42,255][06891] Updated weights for policy 0, policy_version 950 (0.0024) [2023-11-22 04:17:46,195][05156] Fps is (10 sec: 2867.6, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3903488. Throughput: 0: 754.9. Samples: 974984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:46,205][05156] Avg episode reward: [(0, '23.941')] [2023-11-22 04:17:51,195][05156] Fps is (10 sec: 3279.2, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3919872. Throughput: 0: 770.3. Samples: 980854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:17:51,206][05156] Avg episode reward: [(0, '24.826')] [2023-11-22 04:17:53,453][06891] Updated weights for policy 0, policy_version 960 (0.0017) [2023-11-22 04:17:56,195][05156] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3151.8). Total num frames: 3940352. Throughput: 0: 793.5. Samples: 983902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-11-22 04:17:56,199][05156] Avg episode reward: [(0, '24.592')] [2023-11-22 04:18:01,195][05156] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3952640. Throughput: 0: 801.1. Samples: 988146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-11-22 04:18:01,199][05156] Avg episode reward: [(0, '24.691')] [2023-11-22 04:18:06,202][05156] Fps is (10 sec: 2456.0, 60 sec: 3071.7, 300 sec: 3110.1). Total num frames: 3964928. Throughput: 0: 784.9. Samples: 991994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-11-22 04:18:06,204][05156] Avg episode reward: [(0, '25.224')] [2023-11-22 04:18:08,143][06891] Updated weights for policy 0, policy_version 970 (0.0022) [2023-11-22 04:18:11,195][05156] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 3981312. Throughput: 0: 762.2. Samples: 993950. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2023-11-22 04:18:11,200][05156] Avg episode reward: [(0, '24.930')] [2023-11-22 04:18:16,195][05156] Fps is (10 sec: 3278.9, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 3997696. Throughput: 0: 756.6. Samples: 999366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-11-22 04:18:16,198][05156] Avg episode reward: [(0, '24.555')] [2023-11-22 04:18:17,314][06878] Stopping Batcher_0... [2023-11-22 04:18:17,314][05156] Component Batcher_0 stopped! [2023-11-22 04:18:17,315][06878] Loop batcher_evt_loop terminating... [2023-11-22 04:18:17,322][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-11-22 04:18:17,367][05156] Component RolloutWorker_w1 stopped! [2023-11-22 04:18:17,380][06893] Stopping RolloutWorker_w1... [2023-11-22 04:18:17,386][05156] Component RolloutWorker_w2 stopped! [2023-11-22 04:18:17,394][05156] Component RolloutWorker_w0 stopped! [2023-11-22 04:18:17,388][06894] Stopping RolloutWorker_w2... [2023-11-22 04:18:17,382][06893] Loop rollout_proc1_evt_loop terminating... [2023-11-22 04:18:17,396][06892] Stopping RolloutWorker_w0... [2023-11-22 04:18:17,406][06897] Stopping RolloutWorker_w5... [2023-11-22 04:18:17,406][05156] Component RolloutWorker_w5 stopped! [2023-11-22 04:18:17,412][05156] Component RolloutWorker_w4 stopped! [2023-11-22 04:18:17,414][06896] Stopping RolloutWorker_w4... [2023-11-22 04:18:17,397][06894] Loop rollout_proc2_evt_loop terminating... [2023-11-22 04:18:17,418][06891] Weights refcount: 2 0 [2023-11-22 04:18:17,407][06897] Loop rollout_proc5_evt_loop terminating... [2023-11-22 04:18:17,421][05156] Component InferenceWorker_p0-w0 stopped! [2023-11-22 04:18:17,404][06892] Loop rollout_proc0_evt_loop terminating... [2023-11-22 04:18:17,420][06891] Stopping InferenceWorker_p0-w0... [2023-11-22 04:18:17,426][06891] Loop inference_proc0-0_evt_loop terminating... [2023-11-22 04:18:17,426][05156] Component RolloutWorker_w6 stopped! [2023-11-22 04:18:17,428][06898] Stopping RolloutWorker_w6... [2023-11-22 04:18:17,433][06899] Stopping RolloutWorker_w7... [2023-11-22 04:18:17,433][05156] Component RolloutWorker_w7 stopped! [2023-11-22 04:18:17,423][06896] Loop rollout_proc4_evt_loop terminating... [2023-11-22 04:18:17,433][06899] Loop rollout_proc7_evt_loop terminating... [2023-11-22 04:18:17,429][06898] Loop rollout_proc6_evt_loop terminating... [2023-11-22 04:18:17,469][05156] Component RolloutWorker_w3 stopped! [2023-11-22 04:18:17,469][06895] Stopping RolloutWorker_w3... [2023-11-22 04:18:17,476][06895] Loop rollout_proc3_evt_loop terminating... [2023-11-22 04:18:17,491][06878] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth [2023-11-22 04:18:17,499][06878] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-11-22 04:18:17,651][05156] Component LearnerWorker_p0 stopped! [2023-11-22 04:18:17,651][06878] Stopping LearnerWorker_p0... [2023-11-22 04:18:17,654][06878] Loop learner_proc0_evt_loop terminating... [2023-11-22 04:18:17,654][05156] Waiting for process learner_proc0 to stop... [2023-11-22 04:18:19,169][05156] Waiting for process inference_proc0-0 to join... [2023-11-22 04:18:19,173][05156] Waiting for process rollout_proc0 to join... [2023-11-22 04:18:20,972][05156] Waiting for process rollout_proc1 to join... [2023-11-22 04:18:21,240][05156] Waiting for process rollout_proc2 to join... [2023-11-22 04:18:21,247][05156] Waiting for process rollout_proc3 to join... [2023-11-22 04:18:21,249][05156] Waiting for process rollout_proc4 to join... [2023-11-22 04:18:21,252][05156] Waiting for process rollout_proc5 to join... [2023-11-22 04:18:21,254][05156] Waiting for process rollout_proc6 to join... [2023-11-22 04:18:21,255][05156] Waiting for process rollout_proc7 to join... [2023-11-22 04:18:21,256][05156] Batcher 0 profile tree view: batching: 28.1666, releasing_batches: 0.0388 [2023-11-22 04:18:21,258][05156] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 637.9062 update_model: 9.6179 weight_update: 0.0025 one_step: 0.0026 handle_policy_step: 656.3959 deserialize: 18.3586, stack: 3.5409, obs_to_device_normalize: 127.5471, forward: 363.0852, send_messages: 30.5910 prepare_outputs: 81.3628 to_cpu: 45.3745 [2023-11-22 04:18:21,259][05156] Learner 0 profile tree view: misc: 0.0057, prepare_batch: 13.6861 train: 74.5323 epoch_init: 0.0066, minibatch_init: 0.0068, losses_postprocess: 0.6013, kl_divergence: 0.6899, after_optimizer: 33.8195 calculate_losses: 26.9243 losses_init: 0.0036, forward_head: 1.5239, bptt_initial: 17.3557, tail: 1.3661, advantages_returns: 0.3833, losses: 3.6936 bptt: 2.2215 bptt_forward_core: 2.1011 update: 11.8193 clip: 0.9213 [2023-11-22 04:18:21,261][05156] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4902, enqueue_policy_requests: 195.7657, env_step: 987.9004, overhead: 29.8512, complete_rollouts: 8.2389 save_policy_outputs: 25.3051 split_output_tensors: 12.0139 [2023-11-22 04:18:21,263][05156] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3827, enqueue_policy_requests: 199.8480, env_step: 980.8307, overhead: 29.6204, complete_rollouts: 8.4801 save_policy_outputs: 26.1206 split_output_tensors: 12.0324 [2023-11-22 04:18:21,265][05156] Loop Runner_EvtLoop terminating... [2023-11-22 04:18:21,267][05156] Runner profile tree view: main_loop: 1375.9910 [2023-11-22 04:18:21,272][05156] Collected {0: 4005888}, FPS: 2911.3 [2023-11-22 04:18:21,293][05156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-11-22 04:18:21,298][05156] Overriding arg 'num_workers' with value 1 passed from command line [2023-11-22 04:18:21,299][05156] Adding new argument 'no_render'=True that is not in the saved config file! [2023-11-22 04:18:21,301][05156] Adding new argument 'save_video'=True that is not in the saved config file! [2023-11-22 04:18:21,302][05156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-11-22 04:18:21,304][05156] Adding new argument 'video_name'=None that is not in the saved config file! [2023-11-22 04:18:21,305][05156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-11-22 04:18:21,310][05156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-11-22 04:18:21,312][05156] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-11-22 04:18:21,313][05156] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-11-22 04:18:21,313][05156] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-11-22 04:18:21,317][05156] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-11-22 04:18:21,318][05156] Adding new argument 'train_script'=None that is not in the saved config file! [2023-11-22 04:18:21,319][05156] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-11-22 04:18:21,320][05156] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-11-22 04:18:21,376][05156] RunningMeanStd input shape: (3, 72, 128) [2023-11-22 04:18:21,381][05156] RunningMeanStd input shape: (1,) [2023-11-22 04:18:21,410][05156] ConvEncoder: input_channels=3 [2023-11-22 04:18:21,485][05156] Conv encoder output size: 512 [2023-11-22 04:18:21,487][05156] Policy head output size: 512 [2023-11-22 04:18:21,517][05156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-11-22 04:18:22,193][05156] Num frames 100... [2023-11-22 04:18:22,372][05156] Num frames 200... [2023-11-22 04:18:22,559][05156] Num frames 300... [2023-11-22 04:18:22,737][05156] Num frames 400... [2023-11-22 04:18:22,924][05156] Num frames 500... [2023-11-22 04:18:23,102][05156] Num frames 600... [2023-11-22 04:18:23,287][05156] Num frames 700... [2023-11-22 04:18:23,473][05156] Num frames 800... [2023-11-22 04:18:23,661][05156] Num frames 900... [2023-11-22 04:18:23,853][05156] Num frames 1000... [2023-11-22 04:18:24,036][05156] Num frames 1100... [2023-11-22 04:18:24,215][05156] Num frames 1200... [2023-11-22 04:18:24,398][05156] Num frames 1300... [2023-11-22 04:18:24,586][05156] Num frames 1400... [2023-11-22 04:18:24,773][05156] Num frames 1500... [2023-11-22 04:18:24,965][05156] Num frames 1600... [2023-11-22 04:18:25,150][05156] Num frames 1700... [2023-11-22 04:18:25,340][05156] Num frames 1800... [2023-11-22 04:18:25,533][05156] Num frames 1900... [2023-11-22 04:18:25,716][05156] Num frames 2000... [2023-11-22 04:18:25,907][05156] Num frames 2100... [2023-11-22 04:18:25,960][05156] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 [2023-11-22 04:18:25,963][05156] Avg episode reward: 58.999, avg true_objective: 21.000 [2023-11-22 04:18:26,153][05156] Num frames 2200... [2023-11-22 04:18:26,315][05156] Num frames 2300... [2023-11-22 04:18:26,449][05156] Num frames 2400... [2023-11-22 04:18:26,574][05156] Num frames 2500... [2023-11-22 04:18:26,691][05156] Avg episode rewards: #0: 33.739, true rewards: #0: 12.740 [2023-11-22 04:18:26,693][05156] Avg episode reward: 33.739, avg true_objective: 12.740 [2023-11-22 04:18:26,778][05156] Num frames 2600... [2023-11-22 04:18:26,905][05156] Num frames 2700... [2023-11-22 04:18:27,029][05156] Num frames 2800... [2023-11-22 04:18:27,157][05156] Num frames 2900... [2023-11-22 04:18:27,286][05156] Num frames 3000... [2023-11-22 04:18:27,420][05156] Num frames 3100... [2023-11-22 04:18:27,548][05156] Num frames 3200... [2023-11-22 04:18:27,682][05156] Num frames 3300... [2023-11-22 04:18:27,807][05156] Num frames 3400... [2023-11-22 04:18:27,936][05156] Num frames 3500... [2023-11-22 04:18:28,068][05156] Num frames 3600... [2023-11-22 04:18:28,198][05156] Num frames 3700... [2023-11-22 04:18:28,337][05156] Num frames 3800... [2023-11-22 04:18:28,466][05156] Num frames 3900... [2023-11-22 04:18:28,595][05156] Num frames 4000... [2023-11-22 04:18:28,724][05156] Num frames 4100... [2023-11-22 04:18:28,853][05156] Num frames 4200... [2023-11-22 04:18:28,968][05156] Avg episode rewards: #0: 37.146, true rewards: #0: 14.147 [2023-11-22 04:18:28,970][05156] Avg episode reward: 37.146, avg true_objective: 14.147 [2023-11-22 04:18:29,059][05156] Num frames 4300... [2023-11-22 04:18:29,186][05156] Num frames 4400... [2023-11-22 04:18:29,344][05156] Num frames 4500... [2023-11-22 04:18:29,476][05156] Num frames 4600... [2023-11-22 04:18:29,602][05156] Num frames 4700... [2023-11-22 04:18:29,729][05156] Num frames 4800... [2023-11-22 04:18:29,860][05156] Num frames 4900... [2023-11-22 04:18:29,987][05156] Num frames 5000... [2023-11-22 04:18:30,117][05156] Num frames 5100... [2023-11-22 04:18:30,242][05156] Num frames 5200... [2023-11-22 04:18:30,375][05156] Num frames 5300... [2023-11-22 04:18:30,502][05156] Num frames 5400... [2023-11-22 04:18:30,634][05156] Num frames 5500... [2023-11-22 04:18:30,770][05156] Num frames 5600... [2023-11-22 04:18:30,911][05156] Num frames 5700... [2023-11-22 04:18:31,045][05156] Num frames 5800... [2023-11-22 04:18:31,170][05156] Num frames 5900... [2023-11-22 04:18:31,295][05156] Num frames 6000... [2023-11-22 04:18:31,434][05156] Num frames 6100... [2023-11-22 04:18:31,565][05156] Num frames 6200... [2023-11-22 04:18:31,698][05156] Num frames 6300... [2023-11-22 04:18:31,809][05156] Avg episode rewards: #0: 42.359, true rewards: #0: 15.860 [2023-11-22 04:18:31,811][05156] Avg episode reward: 42.359, avg true_objective: 15.860 [2023-11-22 04:18:31,893][05156] Num frames 6400... [2023-11-22 04:18:32,019][05156] Num frames 6500... [2023-11-22 04:18:32,145][05156] Num frames 6600... [2023-11-22 04:18:32,272][05156] Num frames 6700... [2023-11-22 04:18:32,405][05156] Num frames 6800... [2023-11-22 04:18:32,532][05156] Num frames 6900... [2023-11-22 04:18:32,658][05156] Num frames 7000... [2023-11-22 04:18:32,780][05156] Num frames 7100... [2023-11-22 04:18:32,904][05156] Num frames 7200... [2023-11-22 04:18:33,032][05156] Num frames 7300... [2023-11-22 04:18:33,158][05156] Num frames 7400... [2023-11-22 04:18:33,282][05156] Num frames 7500... [2023-11-22 04:18:33,417][05156] Num frames 7600... [2023-11-22 04:18:33,545][05156] Num frames 7700... [2023-11-22 04:18:33,670][05156] Num frames 7800... [2023-11-22 04:18:33,799][05156] Num frames 7900... [2023-11-22 04:18:33,927][05156] Num frames 8000... [2023-11-22 04:18:34,065][05156] Num frames 8100... [2023-11-22 04:18:34,189][05156] Num frames 8200... [2023-11-22 04:18:34,316][05156] Num frames 8300... [2023-11-22 04:18:34,481][05156] Avg episode rewards: #0: 44.971, true rewards: #0: 16.772 [2023-11-22 04:18:34,483][05156] Avg episode reward: 44.971, avg true_objective: 16.772 [2023-11-22 04:18:34,506][05156] Num frames 8400... [2023-11-22 04:18:34,631][05156] Num frames 8500... [2023-11-22 04:18:34,756][05156] Num frames 8600... [2023-11-22 04:18:34,887][05156] Num frames 8700... [2023-11-22 04:18:35,013][05156] Num frames 8800... [2023-11-22 04:18:35,140][05156] Num frames 8900... [2023-11-22 04:18:35,277][05156] Num frames 9000... [2023-11-22 04:18:35,449][05156] Avg episode rewards: #0: 39.816, true rewards: #0: 15.150 [2023-11-22 04:18:35,451][05156] Avg episode reward: 39.816, avg true_objective: 15.150 [2023-11-22 04:18:35,474][05156] Num frames 9100... [2023-11-22 04:18:35,611][05156] Num frames 9200... [2023-11-22 04:18:35,747][05156] Num frames 9300... [2023-11-22 04:18:35,886][05156] Num frames 9400... [2023-11-22 04:18:36,025][05156] Num frames 9500... [2023-11-22 04:18:36,174][05156] Num frames 9600... [2023-11-22 04:18:36,322][05156] Num frames 9700... [2023-11-22 04:18:36,511][05156] Num frames 9800... [2023-11-22 04:18:36,693][05156] Num frames 9900... [2023-11-22 04:18:36,879][05156] Num frames 10000... [2023-11-22 04:18:37,067][05156] Num frames 10100... [2023-11-22 04:18:37,255][05156] Num frames 10200... [2023-11-22 04:18:37,390][05156] Avg episode rewards: #0: 37.631, true rewards: #0: 14.631 [2023-11-22 04:18:37,392][05156] Avg episode reward: 37.631, avg true_objective: 14.631 [2023-11-22 04:18:37,514][05156] Num frames 10300... [2023-11-22 04:18:37,697][05156] Num frames 10400... [2023-11-22 04:18:37,878][05156] Num frames 10500... [2023-11-22 04:18:38,059][05156] Num frames 10600... [2023-11-22 04:18:38,245][05156] Num frames 10700... [2023-11-22 04:18:38,423][05156] Num frames 10800... [2023-11-22 04:18:38,624][05156] Num frames 10900... [2023-11-22 04:18:38,808][05156] Num frames 11000... [2023-11-22 04:18:38,996][05156] Num frames 11100... [2023-11-22 04:18:39,190][05156] Num frames 11200... [2023-11-22 04:18:39,371][05156] Num frames 11300... [2023-11-22 04:18:39,487][05156] Avg episode rewards: #0: 36.162, true rewards: #0: 14.163 [2023-11-22 04:18:39,490][05156] Avg episode reward: 36.162, avg true_objective: 14.163 [2023-11-22 04:18:39,643][05156] Num frames 11400... [2023-11-22 04:18:39,823][05156] Num frames 11500... [2023-11-22 04:18:40,025][05156] Num frames 11600... [2023-11-22 04:18:40,213][05156] Num frames 11700... [2023-11-22 04:18:40,413][05156] Num frames 11800... [2023-11-22 04:18:40,597][05156] Num frames 11900... [2023-11-22 04:18:40,789][05156] Num frames 12000... [2023-11-22 04:18:40,973][05156] Num frames 12100... [2023-11-22 04:18:41,157][05156] Num frames 12200... [2023-11-22 04:18:41,342][05156] Num frames 12300... [2023-11-22 04:18:41,523][05156] Num frames 12400... [2023-11-22 04:18:41,704][05156] Num frames 12500... [2023-11-22 04:18:41,826][05156] Num frames 12600... [2023-11-22 04:18:41,948][05156] Avg episode rewards: #0: 36.725, true rewards: #0: 14.059 [2023-11-22 04:18:41,950][05156] Avg episode reward: 36.725, avg true_objective: 14.059 [2023-11-22 04:18:42,015][05156] Num frames 12700... [2023-11-22 04:18:42,147][05156] Num frames 12800... [2023-11-22 04:18:42,279][05156] Num frames 12900... [2023-11-22 04:18:42,404][05156] Num frames 13000... [2023-11-22 04:18:42,539][05156] Num frames 13100... [2023-11-22 04:18:42,644][05156] Avg episode rewards: #0: 34.037, true rewards: #0: 13.137 [2023-11-22 04:18:42,646][05156] Avg episode reward: 34.037, avg true_objective: 13.137 [2023-11-22 04:20:10,649][05156] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-11-22 04:20:15,985][05156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-11-22 04:20:15,989][05156] Overriding arg 'num_workers' with value 1 passed from command line [2023-11-22 04:20:15,994][05156] Adding new argument 'no_render'=True that is not in the saved config file! [2023-11-22 04:20:15,995][05156] Adding new argument 'save_video'=True that is not in the saved config file! [2023-11-22 04:20:15,997][05156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-11-22 04:20:16,001][05156] Adding new argument 'video_name'=None that is not in the saved config file! [2023-11-22 04:20:16,002][05156] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-11-22 04:20:16,003][05156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-11-22 04:20:16,004][05156] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-11-22 04:20:16,007][05156] Adding new argument 'hf_repository'='tommylam/PPO-doomHealthGatheringSupreme' that is not in the saved config file! [2023-11-22 04:20:16,015][05156] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-11-22 04:20:16,018][05156] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-11-22 04:20:16,019][05156] Adding new argument 'train_script'=None that is not in the saved config file! [2023-11-22 04:20:16,020][05156] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-11-22 04:20:16,021][05156] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-11-22 04:20:16,076][05156] RunningMeanStd input shape: (3, 72, 128) [2023-11-22 04:20:16,079][05156] RunningMeanStd input shape: (1,) [2023-11-22 04:20:16,097][05156] ConvEncoder: input_channels=3 [2023-11-22 04:20:16,158][05156] Conv encoder output size: 512 [2023-11-22 04:20:16,160][05156] Policy head output size: 512 [2023-11-22 04:20:16,188][05156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-11-22 04:20:16,812][05156] Num frames 100... [2023-11-22 04:20:16,991][05156] Num frames 200... [2023-11-22 04:20:17,168][05156] Num frames 300... [2023-11-22 04:20:17,353][05156] Num frames 400... [2023-11-22 04:20:17,545][05156] Num frames 500... [2023-11-22 04:20:17,729][05156] Num frames 600... [2023-11-22 04:20:17,936][05156] Avg episode rewards: #0: 12.720, true rewards: #0: 6.720 [2023-11-22 04:20:17,942][05156] Avg episode reward: 12.720, avg true_objective: 6.720 [2023-11-22 04:20:18,004][05156] Num frames 700... [2023-11-22 04:20:18,193][05156] Num frames 800... [2023-11-22 04:20:18,377][05156] Num frames 900... [2023-11-22 04:20:18,569][05156] Num frames 1000... [2023-11-22 04:20:18,760][05156] Num frames 1100... [2023-11-22 04:20:18,952][05156] Num frames 1200... [2023-11-22 04:20:19,153][05156] Num frames 1300... [2023-11-22 04:20:19,339][05156] Num frames 1400... [2023-11-22 04:20:19,525][05156] Num frames 1500... [2023-11-22 04:20:19,707][05156] Num frames 1600... [2023-11-22 04:20:19,889][05156] Num frames 1700... [2023-11-22 04:20:20,069][05156] Num frames 1800... [2023-11-22 04:20:20,250][05156] Num frames 1900... [2023-11-22 04:20:20,378][05156] Num frames 2000... [2023-11-22 04:20:20,509][05156] Num frames 2100... [2023-11-22 04:20:20,634][05156] Num frames 2200... [2023-11-22 04:20:20,761][05156] Num frames 2300... [2023-11-22 04:20:20,898][05156] Num frames 2400... [2023-11-22 04:20:21,040][05156] Num frames 2500... [2023-11-22 04:20:21,181][05156] Num frames 2600... [2023-11-22 04:20:21,320][05156] Num frames 2700... [2023-11-22 04:20:21,475][05156] Avg episode rewards: #0: 30.860, true rewards: #0: 13.860 [2023-11-22 04:20:21,477][05156] Avg episode reward: 30.860, avg true_objective: 13.860 [2023-11-22 04:20:21,521][05156] Num frames 2800... [2023-11-22 04:20:21,659][05156] Num frames 2900... [2023-11-22 04:20:21,796][05156] Num frames 3000... [2023-11-22 04:20:21,928][05156] Num frames 3100... [2023-11-22 04:20:22,066][05156] Num frames 3200... [2023-11-22 04:20:22,191][05156] Num frames 3300... [2023-11-22 04:20:22,319][05156] Num frames 3400... [2023-11-22 04:20:22,451][05156] Num frames 3500... [2023-11-22 04:20:22,585][05156] Num frames 3600... [2023-11-22 04:20:22,710][05156] Num frames 3700... [2023-11-22 04:20:22,835][05156] Num frames 3800... [2023-11-22 04:20:22,967][05156] Avg episode rewards: #0: 28.866, true rewards: #0: 12.867 [2023-11-22 04:20:22,969][05156] Avg episode reward: 28.866, avg true_objective: 12.867 [2023-11-22 04:20:23,034][05156] Num frames 3900... [2023-11-22 04:20:23,157][05156] Num frames 4000... [2023-11-22 04:20:23,290][05156] Num frames 4100... [2023-11-22 04:20:23,417][05156] Num frames 4200... [2023-11-22 04:20:23,547][05156] Num frames 4300... [2023-11-22 04:20:23,670][05156] Num frames 4400... [2023-11-22 04:20:23,799][05156] Num frames 4500... [2023-11-22 04:20:23,925][05156] Num frames 4600... [2023-11-22 04:20:24,060][05156] Num frames 4700... [2023-11-22 04:20:24,186][05156] Num frames 4800... [2023-11-22 04:20:24,313][05156] Num frames 4900... [2023-11-22 04:20:24,390][05156] Avg episode rewards: #0: 26.790, true rewards: #0: 12.290 [2023-11-22 04:20:24,391][05156] Avg episode reward: 26.790, avg true_objective: 12.290 [2023-11-22 04:20:24,506][05156] Num frames 5000... [2023-11-22 04:20:24,634][05156] Num frames 5100... [2023-11-22 04:20:24,759][05156] Num frames 5200... [2023-11-22 04:20:24,886][05156] Num frames 5300... [2023-11-22 04:20:25,012][05156] Num frames 5400... [2023-11-22 04:20:25,146][05156] Num frames 5500... [2023-11-22 04:20:25,271][05156] Num frames 5600... [2023-11-22 04:20:25,401][05156] Num frames 5700... [2023-11-22 04:20:25,528][05156] Num frames 5800... [2023-11-22 04:20:25,660][05156] Num frames 5900... [2023-11-22 04:20:25,785][05156] Num frames 6000... [2023-11-22 04:20:25,913][05156] Num frames 6100... [2023-11-22 04:20:26,051][05156] Num frames 6200... [2023-11-22 04:20:26,186][05156] Num frames 6300... [2023-11-22 04:20:26,303][05156] Avg episode rewards: #0: 28.688, true rewards: #0: 12.688 [2023-11-22 04:20:26,305][05156] Avg episode reward: 28.688, avg true_objective: 12.688 [2023-11-22 04:20:26,384][05156] Num frames 6400... [2023-11-22 04:20:26,520][05156] Num frames 6500... [2023-11-22 04:20:26,652][05156] Num frames 6600... [2023-11-22 04:20:26,785][05156] Num frames 6700... [2023-11-22 04:20:26,909][05156] Num frames 6800... [2023-11-22 04:20:27,035][05156] Num frames 6900... [2023-11-22 04:20:27,173][05156] Num frames 7000... [2023-11-22 04:20:27,299][05156] Num frames 7100... [2023-11-22 04:20:27,433][05156] Num frames 7200... [2023-11-22 04:20:27,569][05156] Num frames 7300... [2023-11-22 04:20:27,697][05156] Num frames 7400... [2023-11-22 04:20:27,822][05156] Num frames 7500... [2023-11-22 04:20:27,951][05156] Num frames 7600... [2023-11-22 04:20:28,088][05156] Num frames 7700... [2023-11-22 04:20:28,212][05156] Num frames 7800... [2023-11-22 04:20:28,288][05156] Avg episode rewards: #0: 29.693, true rewards: #0: 13.027 [2023-11-22 04:20:28,290][05156] Avg episode reward: 29.693, avg true_objective: 13.027 [2023-11-22 04:20:28,399][05156] Num frames 7900... [2023-11-22 04:20:28,529][05156] Num frames 8000... [2023-11-22 04:20:28,654][05156] Num frames 8100... [2023-11-22 04:20:28,779][05156] Num frames 8200... [2023-11-22 04:20:28,907][05156] Num frames 8300... [2023-11-22 04:20:29,031][05156] Num frames 8400... [2023-11-22 04:20:29,164][05156] Num frames 8500... [2023-11-22 04:20:29,297][05156] Num frames 8600... [2023-11-22 04:20:29,388][05156] Avg episode rewards: #0: 27.754, true rewards: #0: 12.326 [2023-11-22 04:20:29,390][05156] Avg episode reward: 27.754, avg true_objective: 12.326 [2023-11-22 04:20:29,490][05156] Num frames 8700... [2023-11-22 04:20:29,613][05156] Num frames 8800... [2023-11-22 04:20:29,740][05156] Num frames 8900... [2023-11-22 04:20:29,901][05156] Num frames 9000... [2023-11-22 04:20:30,024][05156] Num frames 9100... [2023-11-22 04:20:30,161][05156] Num frames 9200... [2023-11-22 04:20:30,313][05156] Num frames 9300... [2023-11-22 04:20:30,502][05156] Num frames 9400... [2023-11-22 04:20:30,692][05156] Num frames 9500... [2023-11-22 04:20:30,869][05156] Num frames 9600... [2023-11-22 04:20:31,062][05156] Num frames 9700... [2023-11-22 04:20:31,150][05156] Avg episode rewards: #0: 27.145, true rewards: #0: 12.145 [2023-11-22 04:20:31,152][05156] Avg episode reward: 27.145, avg true_objective: 12.145 [2023-11-22 04:20:31,314][05156] Num frames 9800... [2023-11-22 04:20:31,495][05156] Num frames 9900... [2023-11-22 04:20:31,680][05156] Num frames 10000... [2023-11-22 04:20:31,868][05156] Num frames 10100... [2023-11-22 04:20:32,053][05156] Num frames 10200... [2023-11-22 04:20:32,251][05156] Num frames 10300... [2023-11-22 04:20:32,438][05156] Num frames 10400... [2023-11-22 04:20:32,636][05156] Num frames 10500... [2023-11-22 04:20:32,817][05156] Num frames 10600... [2023-11-22 04:20:32,996][05156] Num frames 10700... [2023-11-22 04:20:33,183][05156] Num frames 10800... [2023-11-22 04:20:33,369][05156] Num frames 10900... [2023-11-22 04:20:33,561][05156] Num frames 11000... [2023-11-22 04:20:33,748][05156] Num frames 11100... [2023-11-22 04:20:33,933][05156] Num frames 11200... [2023-11-22 04:20:34,122][05156] Num frames 11300... [2023-11-22 04:20:34,311][05156] Num frames 11400... [2023-11-22 04:20:34,516][05156] Avg episode rewards: #0: 29.639, true rewards: #0: 12.750 [2023-11-22 04:20:34,518][05156] Avg episode reward: 29.639, avg true_objective: 12.750 [2023-11-22 04:20:34,572][05156] Num frames 11500... [2023-11-22 04:20:34,754][05156] Num frames 11600... [2023-11-22 04:20:34,949][05156] Num frames 11700... [2023-11-22 04:20:35,132][05156] Num frames 11800... [2023-11-22 04:20:35,338][05156] Num frames 11900... [2023-11-22 04:20:35,524][05156] Num frames 12000... [2023-11-22 04:20:35,713][05156] Num frames 12100... [2023-11-22 04:20:35,844][05156] Num frames 12200... [2023-11-22 04:20:35,969][05156] Num frames 12300... [2023-11-22 04:20:36,097][05156] Num frames 12400... [2023-11-22 04:20:36,230][05156] Num frames 12500... [2023-11-22 04:20:36,359][05156] Num frames 12600... [2023-11-22 04:20:36,488][05156] Num frames 12700... [2023-11-22 04:20:36,616][05156] Num frames 12800... [2023-11-22 04:20:36,745][05156] Num frames 12900... [2023-11-22 04:20:36,868][05156] Num frames 13000... [2023-11-22 04:20:37,025][05156] Avg episode rewards: #0: 30.880, true rewards: #0: 13.080 [2023-11-22 04:20:37,026][05156] Avg episode reward: 30.880, avg true_objective: 13.080 [2023-11-22 04:22:06,299][05156] Replay video saved to /content/train_dir/default_experiment/replay.mp4!