[2023-02-24 21:21:07,483][01818] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-24 21:21:07,487][01818] Rollout worker 0 uses device cpu [2023-02-24 21:21:07,488][01818] Rollout worker 1 uses device cpu [2023-02-24 21:21:07,490][01818] Rollout worker 2 uses device cpu [2023-02-24 21:21:07,491][01818] Rollout worker 3 uses device cpu [2023-02-24 21:21:07,495][01818] Rollout worker 4 uses device cpu [2023-02-24 21:21:07,497][01818] Rollout worker 5 uses device cpu [2023-02-24 21:21:07,498][01818] Rollout worker 6 uses device cpu [2023-02-24 21:21:07,499][01818] Rollout worker 7 uses device cpu [2023-02-24 21:21:07,717][01818] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 21:21:07,721][01818] InferenceWorker_p0-w0: min num requests: 2 [2023-02-24 21:21:07,766][01818] Starting all processes... [2023-02-24 21:21:07,767][01818] Starting process learner_proc0 [2023-02-24 21:21:07,839][01818] Starting all processes... [2023-02-24 21:21:07,848][01818] Starting process inference_proc0-0 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc0 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc1 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc2 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc3 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc4 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc5 [2023-02-24 21:21:07,849][01818] Starting process rollout_proc6 [2023-02-24 21:21:07,870][01818] Starting process rollout_proc7 [2023-02-24 21:21:17,820][14660] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 21:21:17,823][14660] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-24 21:21:18,381][14669] Worker 7 uses CPU cores [1] [2023-02-24 21:21:18,445][14668] Worker 6 uses CPU cores [0] [2023-02-24 21:21:18,508][14665] Worker 3 uses CPU cores [1] [2023-02-24 21:21:18,523][14647] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 21:21:18,531][14647] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-24 21:21:18,561][14666] Worker 4 uses CPU cores [0] [2023-02-24 21:21:18,616][14663] Worker 2 uses CPU cores [0] [2023-02-24 21:21:18,652][14661] Worker 0 uses CPU cores [0] [2023-02-24 21:21:18,667][14662] Worker 1 uses CPU cores [1] [2023-02-24 21:21:18,673][14667] Worker 5 uses CPU cores [1] [2023-02-24 21:21:18,980][14660] Num visible devices: 1 [2023-02-24 21:21:18,980][14647] Num visible devices: 1 [2023-02-24 21:21:18,998][14647] Starting seed is not provided [2023-02-24 21:21:18,998][14647] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 21:21:18,999][14647] Initializing actor-critic model on device cuda:0 [2023-02-24 21:21:18,999][14647] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 21:21:19,001][14647] RunningMeanStd input shape: (1,) [2023-02-24 21:21:19,013][14647] ConvEncoder: input_channels=3 [2023-02-24 21:21:19,281][14647] Conv encoder output size: 512 [2023-02-24 21:21:19,281][14647] Policy head output size: 512 [2023-02-24 21:21:19,330][14647] Created Actor Critic model with architecture: [2023-02-24 21:21:19,330][14647] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-24 21:21:27,083][14647] Using optimizer [2023-02-24 21:21:27,085][14647] No checkpoints found [2023-02-24 21:21:27,085][14647] Did not load from checkpoint, starting from scratch! [2023-02-24 21:21:27,085][14647] Initialized policy 0 weights for model version 0 [2023-02-24 21:21:27,088][14647] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-24 21:21:27,095][14647] LearnerWorker_p0 finished initialization! [2023-02-24 21:21:27,186][14660] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 21:21:27,188][14660] RunningMeanStd input shape: (1,) [2023-02-24 21:21:27,205][14660] ConvEncoder: input_channels=3 [2023-02-24 21:21:27,317][14660] Conv encoder output size: 512 [2023-02-24 21:21:27,318][14660] Policy head output size: 512 [2023-02-24 21:21:27,702][01818] Heartbeat connected on Batcher_0 [2023-02-24 21:21:27,706][01818] Heartbeat connected on LearnerWorker_p0 [2023-02-24 21:21:27,733][01818] Heartbeat connected on RolloutWorker_w0 [2023-02-24 21:21:27,742][01818] Heartbeat connected on RolloutWorker_w1 [2023-02-24 21:21:27,746][01818] Heartbeat connected on RolloutWorker_w2 [2023-02-24 21:21:27,749][01818] Heartbeat connected on RolloutWorker_w3 [2023-02-24 21:21:27,753][01818] Heartbeat connected on RolloutWorker_w4 [2023-02-24 21:21:27,757][01818] Heartbeat connected on RolloutWorker_w5 [2023-02-24 21:21:27,761][01818] Heartbeat connected on RolloutWorker_w6 [2023-02-24 21:21:27,764][01818] Heartbeat connected on RolloutWorker_w7 [2023-02-24 21:21:28,183][01818] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 21:21:29,624][01818] Inference worker 0-0 is ready! [2023-02-24 21:21:29,626][01818] All inference workers are ready! Signal rollout workers to start! [2023-02-24 21:21:29,632][01818] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-24 21:21:29,746][14668] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,750][14663] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,780][14662] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,777][14661] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,781][14665] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,782][14669] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,790][14667] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:29,799][14666] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:21:30,941][14662] Decorrelating experience for 0 frames... [2023-02-24 21:21:30,942][14667] Decorrelating experience for 0 frames... [2023-02-24 21:21:30,940][14669] Decorrelating experience for 0 frames... [2023-02-24 21:21:30,940][14668] Decorrelating experience for 0 frames... [2023-02-24 21:21:30,940][14663] Decorrelating experience for 0 frames... [2023-02-24 21:21:30,942][14661] Decorrelating experience for 0 frames... [2023-02-24 21:21:31,622][14663] Decorrelating experience for 32 frames... [2023-02-24 21:21:31,661][14666] Decorrelating experience for 0 frames... [2023-02-24 21:21:31,942][14667] Decorrelating experience for 32 frames... [2023-02-24 21:21:31,945][14669] Decorrelating experience for 32 frames... [2023-02-24 21:21:31,958][14662] Decorrelating experience for 32 frames... [2023-02-24 21:21:32,224][14666] Decorrelating experience for 32 frames... [2023-02-24 21:21:32,797][14663] Decorrelating experience for 64 frames... [2023-02-24 21:21:32,830][14668] Decorrelating experience for 32 frames... [2023-02-24 21:21:32,841][14669] Decorrelating experience for 64 frames... [2023-02-24 21:21:32,843][14667] Decorrelating experience for 64 frames... [2023-02-24 21:21:33,183][01818] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 21:21:33,640][14666] Decorrelating experience for 64 frames... [2023-02-24 21:21:33,653][14662] Decorrelating experience for 64 frames... [2023-02-24 21:21:33,691][14663] Decorrelating experience for 96 frames... [2023-02-24 21:21:33,742][14667] Decorrelating experience for 96 frames... [2023-02-24 21:21:34,211][14661] Decorrelating experience for 32 frames... [2023-02-24 21:21:34,568][14665] Decorrelating experience for 0 frames... [2023-02-24 21:21:34,595][14662] Decorrelating experience for 96 frames... [2023-02-24 21:21:35,093][14669] Decorrelating experience for 96 frames... [2023-02-24 21:21:35,939][14666] Decorrelating experience for 96 frames... [2023-02-24 21:21:36,259][14661] Decorrelating experience for 64 frames... [2023-02-24 21:21:36,436][14665] Decorrelating experience for 32 frames... [2023-02-24 21:21:36,877][14665] Decorrelating experience for 64 frames... [2023-02-24 21:21:37,283][14665] Decorrelating experience for 96 frames... [2023-02-24 21:21:37,545][14668] Decorrelating experience for 64 frames... [2023-02-24 21:21:38,096][14661] Decorrelating experience for 96 frames... [2023-02-24 21:21:38,183][01818] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 21:21:39,294][14668] Decorrelating experience for 96 frames... [2023-02-24 21:21:43,183][01818] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 95.3. Samples: 1430. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-24 21:21:43,194][01818] Avg episode reward: [(0, '1.724')] [2023-02-24 21:21:45,101][14647] Signal inference workers to stop experience collection... [2023-02-24 21:21:45,112][14660] InferenceWorker_p0-w0: stopping experience collection [2023-02-24 21:21:47,760][14647] Signal inference workers to resume experience collection... [2023-02-24 21:21:47,761][14660] InferenceWorker_p0-w0: resuming experience collection [2023-02-24 21:21:48,183][01818] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 131.8. Samples: 2636. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-24 21:21:48,188][01818] Avg episode reward: [(0, '2.282')] [2023-02-24 21:21:53,183][01818] Fps is (10 sec: 2457.6, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 191.0. Samples: 4774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-02-24 21:21:53,185][01818] Avg episode reward: [(0, '3.791')] [2023-02-24 21:21:56,136][14660] Updated weights for policy 0, policy_version 10 (0.0017) [2023-02-24 21:21:58,191][01818] Fps is (10 sec: 4502.0, 60 sec: 1638.0, 300 sec: 1638.0). Total num frames: 49152. Throughput: 0: 377.0. Samples: 11314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:21:58,194][01818] Avg episode reward: [(0, '4.440')] [2023-02-24 21:22:03,191][01818] Fps is (10 sec: 3683.5, 60 sec: 1755.0, 300 sec: 1755.0). Total num frames: 61440. Throughput: 0: 462.5. Samples: 16190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:22:03,198][01818] Avg episode reward: [(0, '4.422')] [2023-02-24 21:22:08,183][01818] Fps is (10 sec: 2869.4, 60 sec: 1945.6, 300 sec: 1945.6). Total num frames: 77824. Throughput: 0: 456.7. Samples: 18270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:22:08,190][01818] Avg episode reward: [(0, '4.476')] [2023-02-24 21:22:09,303][14660] Updated weights for policy 0, policy_version 20 (0.0031) [2023-02-24 21:22:13,185][01818] Fps is (10 sec: 3688.6, 60 sec: 2184.4, 300 sec: 2184.4). Total num frames: 98304. Throughput: 0: 523.4. Samples: 23552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:22:13,189][01818] Avg episode reward: [(0, '4.417')] [2023-02-24 21:22:18,183][01818] Fps is (10 sec: 4096.1, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 118784. Throughput: 0: 673.7. Samples: 30316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:22:18,186][01818] Avg episode reward: [(0, '4.487')] [2023-02-24 21:22:18,191][14647] Saving new best policy, reward=4.487! [2023-02-24 21:22:18,671][14660] Updated weights for policy 0, policy_version 30 (0.0015) [2023-02-24 21:22:23,183][01818] Fps is (10 sec: 3687.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 135168. Throughput: 0: 730.1. Samples: 32856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:22:23,185][01818] Avg episode reward: [(0, '4.654')] [2023-02-24 21:22:23,201][14647] Saving new best policy, reward=4.654! [2023-02-24 21:22:28,183][01818] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 147456. Throughput: 0: 788.8. Samples: 36928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:22:28,190][01818] Avg episode reward: [(0, '4.622')] [2023-02-24 21:22:31,525][14660] Updated weights for policy 0, policy_version 40 (0.0027) [2023-02-24 21:22:33,183][01818] Fps is (10 sec: 3276.8, 60 sec: 2798.9, 300 sec: 2583.6). Total num frames: 167936. Throughput: 0: 892.6. Samples: 42804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:22:33,185][01818] Avg episode reward: [(0, '4.507')] [2023-02-24 21:22:38,183][01818] Fps is (10 sec: 4505.4, 60 sec: 3208.5, 300 sec: 2750.2). Total num frames: 192512. Throughput: 0: 918.3. Samples: 46096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:22:38,188][01818] Avg episode reward: [(0, '4.475')] [2023-02-24 21:22:41,539][14660] Updated weights for policy 0, policy_version 50 (0.0018) [2023-02-24 21:22:43,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2785.3). Total num frames: 208896. Throughput: 0: 898.4. Samples: 51736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:22:43,185][01818] Avg episode reward: [(0, '4.475')] [2023-02-24 21:22:48,183][01818] Fps is (10 sec: 2867.3, 60 sec: 3618.1, 300 sec: 2764.8). Total num frames: 221184. Throughput: 0: 887.4. Samples: 56116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:22:48,188][01818] Avg episode reward: [(0, '4.526')] [2023-02-24 21:22:53,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 2843.1). Total num frames: 241664. Throughput: 0: 898.6. Samples: 58708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:22:53,186][01818] Avg episode reward: [(0, '4.751')] [2023-02-24 21:22:53,197][14647] Saving new best policy, reward=4.751! [2023-02-24 21:22:53,718][14660] Updated weights for policy 0, policy_version 60 (0.0027) [2023-02-24 21:22:58,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3550.3, 300 sec: 2912.7). Total num frames: 262144. Throughput: 0: 926.3. Samples: 65234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:22:58,186][01818] Avg episode reward: [(0, '4.612')] [2023-02-24 21:23:03,184][01818] Fps is (10 sec: 4095.5, 60 sec: 3686.8, 300 sec: 2975.0). Total num frames: 282624. Throughput: 0: 902.6. Samples: 70932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:23:03,187][01818] Avg episode reward: [(0, '4.382')] [2023-02-24 21:23:03,202][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth... [2023-02-24 21:23:04,495][14660] Updated weights for policy 0, policy_version 70 (0.0015) [2023-02-24 21:23:08,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 2949.1). Total num frames: 294912. Throughput: 0: 892.5. Samples: 73018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:23:08,190][01818] Avg episode reward: [(0, '4.559')] [2023-02-24 21:23:13,183][01818] Fps is (10 sec: 3277.2, 60 sec: 3618.3, 300 sec: 3003.7). Total num frames: 315392. Throughput: 0: 908.1. Samples: 77792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:23:13,190][01818] Avg episode reward: [(0, '4.414')] [2023-02-24 21:23:15,736][14660] Updated weights for policy 0, policy_version 80 (0.0031) [2023-02-24 21:23:18,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3053.4). Total num frames: 335872. Throughput: 0: 932.0. Samples: 84742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:23:18,191][01818] Avg episode reward: [(0, '4.484')] [2023-02-24 21:23:23,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3098.7). Total num frames: 356352. Throughput: 0: 929.3. Samples: 87914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:23:23,190][01818] Avg episode reward: [(0, '4.698')] [2023-02-24 21:23:27,564][14660] Updated weights for policy 0, policy_version 90 (0.0020) [2023-02-24 21:23:28,184][01818] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3072.0). Total num frames: 368640. Throughput: 0: 896.4. Samples: 92076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:23:28,187][01818] Avg episode reward: [(0, '4.757')] [2023-02-24 21:23:28,190][14647] Saving new best policy, reward=4.757! [2023-02-24 21:23:33,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3080.2). Total num frames: 385024. Throughput: 0: 915.0. Samples: 97290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:23:33,185][01818] Avg episode reward: [(0, '4.684')] [2023-02-24 21:23:37,978][14660] Updated weights for policy 0, policy_version 100 (0.0015) [2023-02-24 21:23:38,183][01818] Fps is (10 sec: 4096.6, 60 sec: 3618.2, 300 sec: 3150.8). Total num frames: 409600. Throughput: 0: 931.4. Samples: 100622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:23:38,185][01818] Avg episode reward: [(0, '4.703')] [2023-02-24 21:23:43,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3155.4). Total num frames: 425984. Throughput: 0: 919.3. Samples: 106604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:23:43,190][01818] Avg episode reward: [(0, '4.884')] [2023-02-24 21:23:43,203][14647] Saving new best policy, reward=4.884! [2023-02-24 21:23:48,185][01818] Fps is (10 sec: 2866.6, 60 sec: 3618.0, 300 sec: 3130.5). Total num frames: 438272. Throughput: 0: 884.5. Samples: 110736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:23:48,187][01818] Avg episode reward: [(0, '4.863')] [2023-02-24 21:23:51,105][14660] Updated weights for policy 0, policy_version 110 (0.0065) [2023-02-24 21:23:53,185][01818] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3163.8). Total num frames: 458752. Throughput: 0: 886.5. Samples: 112912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:23:53,188][01818] Avg episode reward: [(0, '4.624')] [2023-02-24 21:23:58,183][01818] Fps is (10 sec: 4096.8, 60 sec: 3618.1, 300 sec: 3194.9). Total num frames: 479232. Throughput: 0: 919.7. Samples: 119180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-24 21:23:58,185][01818] Avg episode reward: [(0, '4.508')] [2023-02-24 21:24:00,673][14660] Updated weights for policy 0, policy_version 120 (0.0012) [2023-02-24 21:24:03,183][01818] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3197.5). Total num frames: 495616. Throughput: 0: 897.8. Samples: 125144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:24:03,190][01818] Avg episode reward: [(0, '4.466')] [2023-02-24 21:24:08,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3200.0). Total num frames: 512000. Throughput: 0: 874.5. Samples: 127266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:24:08,189][01818] Avg episode reward: [(0, '4.509')] [2023-02-24 21:24:13,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3202.3). Total num frames: 528384. Throughput: 0: 877.8. Samples: 131574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-24 21:24:13,191][01818] Avg episode reward: [(0, '4.601')] [2023-02-24 21:24:13,741][14660] Updated weights for policy 0, policy_version 130 (0.0023) [2023-02-24 21:24:18,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3228.6). Total num frames: 548864. Throughput: 0: 903.9. Samples: 137964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:24:18,185][01818] Avg episode reward: [(0, '4.543')] [2023-02-24 21:24:23,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3253.4). Total num frames: 569344. Throughput: 0: 898.7. Samples: 141064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:24:23,189][01818] Avg episode reward: [(0, '4.383')] [2023-02-24 21:24:24,494][14660] Updated weights for policy 0, policy_version 140 (0.0013) [2023-02-24 21:24:28,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3231.3). Total num frames: 581632. Throughput: 0: 858.2. Samples: 145222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:24:28,187][01818] Avg episode reward: [(0, '4.372')] [2023-02-24 21:24:33,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3232.5). Total num frames: 598016. Throughput: 0: 863.5. Samples: 149590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:24:33,185][01818] Avg episode reward: [(0, '4.173')] [2023-02-24 21:24:37,240][14660] Updated weights for policy 0, policy_version 150 (0.0015) [2023-02-24 21:24:38,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3233.7). Total num frames: 614400. Throughput: 0: 882.4. Samples: 152620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:24:38,189][01818] Avg episode reward: [(0, '4.353')] [2023-02-24 21:24:43,183][01818] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3255.8). Total num frames: 634880. Throughput: 0: 881.0. Samples: 158824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:24:43,188][01818] Avg episode reward: [(0, '4.596')] [2023-02-24 21:24:48,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3235.8). Total num frames: 647168. Throughput: 0: 841.0. Samples: 162988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:24:48,190][01818] Avg episode reward: [(0, '4.563')] [2023-02-24 21:24:49,770][14660] Updated weights for policy 0, policy_version 160 (0.0025) [2023-02-24 21:24:53,183][01818] Fps is (10 sec: 2867.3, 60 sec: 3413.5, 300 sec: 3236.8). Total num frames: 663552. Throughput: 0: 839.2. Samples: 165030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:24:53,189][01818] Avg episode reward: [(0, '4.481')] [2023-02-24 21:24:58,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 688128. Throughput: 0: 874.0. Samples: 170904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:24:58,190][01818] Avg episode reward: [(0, '4.441')] [2023-02-24 21:25:00,145][14660] Updated weights for policy 0, policy_version 170 (0.0023) [2023-02-24 21:25:03,184][01818] Fps is (10 sec: 4095.5, 60 sec: 3481.5, 300 sec: 3276.8). Total num frames: 704512. Throughput: 0: 869.1. Samples: 177074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:03,191][01818] Avg episode reward: [(0, '4.553')] [2023-02-24 21:25:03,332][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000173_708608.pth... [2023-02-24 21:25:08,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3258.2). Total num frames: 716800. Throughput: 0: 841.5. Samples: 178932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:25:08,185][01818] Avg episode reward: [(0, '4.651')] [2023-02-24 21:25:13,183][01818] Fps is (10 sec: 2457.9, 60 sec: 3345.1, 300 sec: 3240.4). Total num frames: 729088. Throughput: 0: 834.8. Samples: 182790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:25:13,186][01818] Avg episode reward: [(0, '4.734')] [2023-02-24 21:25:14,247][14660] Updated weights for policy 0, policy_version 180 (0.0017) [2023-02-24 21:25:18,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3259.0). Total num frames: 749568. Throughput: 0: 857.1. Samples: 188158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:25:18,191][01818] Avg episode reward: [(0, '4.675')] [2023-02-24 21:25:23,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3276.8). Total num frames: 770048. Throughput: 0: 856.2. Samples: 191148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:23,191][01818] Avg episode reward: [(0, '4.610')] [2023-02-24 21:25:25,012][14660] Updated weights for policy 0, policy_version 190 (0.0021) [2023-02-24 21:25:28,185][01818] Fps is (10 sec: 3276.1, 60 sec: 3344.9, 300 sec: 3259.7). Total num frames: 782336. Throughput: 0: 829.3. Samples: 196146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:28,188][01818] Avg episode reward: [(0, '4.724')] [2023-02-24 21:25:33,183][01818] Fps is (10 sec: 2457.6, 60 sec: 3276.8, 300 sec: 3243.4). Total num frames: 794624. Throughput: 0: 803.6. Samples: 199150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:33,187][01818] Avg episode reward: [(0, '4.669')] [2023-02-24 21:25:38,183][01818] Fps is (10 sec: 2048.4, 60 sec: 3140.3, 300 sec: 3211.3). Total num frames: 802816. Throughput: 0: 789.8. Samples: 200572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:38,189][01818] Avg episode reward: [(0, '4.588')] [2023-02-24 21:25:42,126][14660] Updated weights for policy 0, policy_version 200 (0.0040) [2023-02-24 21:25:43,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3228.6). Total num frames: 823296. Throughput: 0: 750.7. Samples: 204686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:25:43,189][01818] Avg episode reward: [(0, '4.462')] [2023-02-24 21:25:48,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3229.5). Total num frames: 839680. Throughput: 0: 745.2. Samples: 210606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:25:48,187][01818] Avg episode reward: [(0, '4.535')] [2023-02-24 21:25:53,186][01818] Fps is (10 sec: 2866.3, 60 sec: 3140.1, 300 sec: 3214.9). Total num frames: 851968. Throughput: 0: 748.6. Samples: 212620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:25:53,190][01818] Avg episode reward: [(0, '4.474')] [2023-02-24 21:25:55,232][14660] Updated weights for policy 0, policy_version 210 (0.0017) [2023-02-24 21:25:58,183][01818] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 3200.9). Total num frames: 864256. Throughput: 0: 746.0. Samples: 216362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:25:58,185][01818] Avg episode reward: [(0, '4.361')] [2023-02-24 21:26:03,183][01818] Fps is (10 sec: 3277.8, 60 sec: 3003.8, 300 sec: 3217.2). Total num frames: 884736. Throughput: 0: 752.8. Samples: 222032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:26:03,189][01818] Avg episode reward: [(0, '4.438')] [2023-02-24 21:26:06,556][14660] Updated weights for policy 0, policy_version 220 (0.0018) [2023-02-24 21:26:08,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3232.9). Total num frames: 905216. Throughput: 0: 750.5. Samples: 224922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:08,188][01818] Avg episode reward: [(0, '4.481')] [2023-02-24 21:26:13,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3219.3). Total num frames: 917504. Throughput: 0: 739.7. Samples: 229430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:26:13,190][01818] Avg episode reward: [(0, '4.562')] [2023-02-24 21:26:18,184][01818] Fps is (10 sec: 2457.2, 60 sec: 3003.7, 300 sec: 3206.2). Total num frames: 929792. Throughput: 0: 756.9. Samples: 233210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:26:18,189][01818] Avg episode reward: [(0, '4.559')] [2023-02-24 21:26:20,721][14660] Updated weights for policy 0, policy_version 230 (0.0038) [2023-02-24 21:26:23,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3221.3). Total num frames: 950272. Throughput: 0: 786.9. Samples: 235982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:26:23,190][01818] Avg episode reward: [(0, '4.862')] [2023-02-24 21:26:28,183][01818] Fps is (10 sec: 4096.6, 60 sec: 3140.4, 300 sec: 3290.7). Total num frames: 970752. Throughput: 0: 832.5. Samples: 242148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:28,186][01818] Avg episode reward: [(0, '4.937')] [2023-02-24 21:26:28,188][14647] Saving new best policy, reward=4.937! [2023-02-24 21:26:31,881][14660] Updated weights for policy 0, policy_version 240 (0.0028) [2023-02-24 21:26:33,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3332.3). Total num frames: 983040. Throughput: 0: 805.6. Samples: 246860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:33,195][01818] Avg episode reward: [(0, '4.719')] [2023-02-24 21:26:38,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3387.9). Total num frames: 999424. Throughput: 0: 805.0. Samples: 248842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:38,191][01818] Avg episode reward: [(0, '4.629')] [2023-02-24 21:26:43,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3443.4). Total num frames: 1019904. Throughput: 0: 834.8. Samples: 253926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:26:43,184][01818] Avg episode reward: [(0, '4.475')] [2023-02-24 21:26:43,986][14660] Updated weights for policy 0, policy_version 250 (0.0023) [2023-02-24 21:26:48,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 1040384. Throughput: 0: 854.4. Samples: 260480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:48,186][01818] Avg episode reward: [(0, '4.611')] [2023-02-24 21:26:53,186][01818] Fps is (10 sec: 3275.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 1052672. Throughput: 0: 846.5. Samples: 263018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:53,189][01818] Avg episode reward: [(0, '4.650')] [2023-02-24 21:26:56,076][14660] Updated weights for policy 0, policy_version 260 (0.0020) [2023-02-24 21:26:58,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 1069056. Throughput: 0: 841.0. Samples: 267274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:26:58,186][01818] Avg episode reward: [(0, '4.689')] [2023-02-24 21:27:03,183][01818] Fps is (10 sec: 3687.6, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1089536. Throughput: 0: 878.3. Samples: 272734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:03,186][01818] Avg episode reward: [(0, '4.628')] [2023-02-24 21:27:03,199][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000266_1089536.pth... [2023-02-24 21:27:03,325][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth [2023-02-24 21:27:06,704][14660] Updated weights for policy 0, policy_version 270 (0.0018) [2023-02-24 21:27:08,184][01818] Fps is (10 sec: 4095.5, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 1110016. Throughput: 0: 887.3. Samples: 275912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:27:08,188][01818] Avg episode reward: [(0, '4.548')] [2023-02-24 21:27:13,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 1126400. Throughput: 0: 882.8. Samples: 281876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:13,185][01818] Avg episode reward: [(0, '4.524')] [2023-02-24 21:27:18,183][01818] Fps is (10 sec: 3277.2, 60 sec: 3550.0, 300 sec: 3415.6). Total num frames: 1142784. Throughput: 0: 870.5. Samples: 286032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:18,186][01818] Avg episode reward: [(0, '4.553')] [2023-02-24 21:27:19,548][14660] Updated weights for policy 0, policy_version 280 (0.0018) [2023-02-24 21:27:23,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1159168. Throughput: 0: 874.3. Samples: 288186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:23,190][01818] Avg episode reward: [(0, '4.671')] [2023-02-24 21:27:28,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1179648. Throughput: 0: 908.8. Samples: 294822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:28,186][01818] Avg episode reward: [(0, '4.754')] [2023-02-24 21:27:29,104][14660] Updated weights for policy 0, policy_version 290 (0.0013) [2023-02-24 21:27:33,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3415.7). Total num frames: 1200128. Throughput: 0: 894.1. Samples: 300714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:27:33,189][01818] Avg episode reward: [(0, '4.743')] [2023-02-24 21:27:38,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1212416. Throughput: 0: 883.5. Samples: 302774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:27:38,186][01818] Avg episode reward: [(0, '4.615')] [2023-02-24 21:27:42,043][14660] Updated weights for policy 0, policy_version 300 (0.0027) [2023-02-24 21:27:43,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1232896. Throughput: 0: 889.0. Samples: 307278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:27:43,189][01818] Avg episode reward: [(0, '4.700')] [2023-02-24 21:27:48,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 1253376. Throughput: 0: 914.9. Samples: 313906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:27:48,190][01818] Avg episode reward: [(0, '4.725')] [2023-02-24 21:27:51,587][14660] Updated weights for policy 0, policy_version 310 (0.0013) [2023-02-24 21:27:53,186][01818] Fps is (10 sec: 4094.7, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 1273856. Throughput: 0: 916.4. Samples: 317154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:27:53,193][01818] Avg episode reward: [(0, '4.695')] [2023-02-24 21:27:58,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 1286144. Throughput: 0: 881.2. Samples: 321528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:27:58,185][01818] Avg episode reward: [(0, '4.647')] [2023-02-24 21:28:03,183][01818] Fps is (10 sec: 2868.1, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 1302528. Throughput: 0: 891.3. Samples: 326142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:28:03,191][01818] Avg episode reward: [(0, '4.753')] [2023-02-24 21:28:04,669][14660] Updated weights for policy 0, policy_version 320 (0.0012) [2023-02-24 21:28:08,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 1323008. Throughput: 0: 916.3. Samples: 329418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:28:08,186][01818] Avg episode reward: [(0, '4.731')] [2023-02-24 21:28:13,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3415.6). Total num frames: 1343488. Throughput: 0: 912.0. Samples: 335864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:28:13,190][01818] Avg episode reward: [(0, '4.572')] [2023-02-24 21:28:15,333][14660] Updated weights for policy 0, policy_version 330 (0.0037) [2023-02-24 21:28:18,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 1355776. Throughput: 0: 872.3. Samples: 339966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:28:18,186][01818] Avg episode reward: [(0, '4.778')] [2023-02-24 21:28:23,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1372160. Throughput: 0: 873.9. Samples: 342100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:28:23,185][01818] Avg episode reward: [(0, '4.779')] [2023-02-24 21:28:27,270][14660] Updated weights for policy 0, policy_version 340 (0.0022) [2023-02-24 21:28:28,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3429.5). Total num frames: 1396736. Throughput: 0: 907.5. Samples: 348116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:28:28,191][01818] Avg episode reward: [(0, '4.606')] [2023-02-24 21:28:33,184][01818] Fps is (10 sec: 4505.1, 60 sec: 3618.1, 300 sec: 3415.6). Total num frames: 1417216. Throughput: 0: 907.5. Samples: 354744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:28:33,187][01818] Avg episode reward: [(0, '4.784')] [2023-02-24 21:28:38,187][01818] Fps is (10 sec: 3275.2, 60 sec: 3617.9, 300 sec: 3401.7). Total num frames: 1429504. Throughput: 0: 881.1. Samples: 356806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:28:38,190][01818] Avg episode reward: [(0, '4.990')] [2023-02-24 21:28:38,196][14647] Saving new best policy, reward=4.990! [2023-02-24 21:28:38,723][14660] Updated weights for policy 0, policy_version 350 (0.0020) [2023-02-24 21:28:43,183][01818] Fps is (10 sec: 2867.5, 60 sec: 3549.9, 300 sec: 3415.7). Total num frames: 1445888. Throughput: 0: 875.2. Samples: 360914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:28:43,189][01818] Avg episode reward: [(0, '4.808')] [2023-02-24 21:28:48,183][01818] Fps is (10 sec: 3688.1, 60 sec: 3549.9, 300 sec: 3415.7). Total num frames: 1466368. Throughput: 0: 909.2. Samples: 367054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:28:48,192][01818] Avg episode reward: [(0, '4.607')] [2023-02-24 21:28:49,697][14660] Updated weights for policy 0, policy_version 360 (0.0018) [2023-02-24 21:28:53,183][01818] Fps is (10 sec: 4096.1, 60 sec: 3550.1, 300 sec: 3415.6). Total num frames: 1486848. Throughput: 0: 908.2. Samples: 370288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:28:53,192][01818] Avg episode reward: [(0, '4.537')] [2023-02-24 21:28:58,189][01818] Fps is (10 sec: 3274.8, 60 sec: 3549.5, 300 sec: 3401.7). Total num frames: 1499136. Throughput: 0: 874.7. Samples: 375232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:28:58,191][01818] Avg episode reward: [(0, '4.556')] [2023-02-24 21:29:02,600][14660] Updated weights for policy 0, policy_version 370 (0.0034) [2023-02-24 21:29:03,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1515520. Throughput: 0: 876.3. Samples: 379398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:29:03,185][01818] Avg episode reward: [(0, '4.542')] [2023-02-24 21:29:03,199][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000370_1515520.pth... [2023-02-24 21:29:03,387][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000173_708608.pth [2023-02-24 21:29:08,183][01818] Fps is (10 sec: 3688.7, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 1536000. Throughput: 0: 894.7. Samples: 382362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:29:08,190][01818] Avg episode reward: [(0, '4.604')] [2023-02-24 21:29:12,381][14660] Updated weights for policy 0, policy_version 380 (0.0030) [2023-02-24 21:29:13,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3415.6). Total num frames: 1556480. Throughput: 0: 905.2. Samples: 388848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:29:13,187][01818] Avg episode reward: [(0, '4.662')] [2023-02-24 21:29:18,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3401.8). Total num frames: 1572864. Throughput: 0: 865.1. Samples: 393674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:29:18,188][01818] Avg episode reward: [(0, '4.643')] [2023-02-24 21:29:23,184][01818] Fps is (10 sec: 2866.8, 60 sec: 3549.8, 300 sec: 3401.7). Total num frames: 1585152. Throughput: 0: 863.6. Samples: 395666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:29:23,192][01818] Avg episode reward: [(0, '4.775')] [2023-02-24 21:29:25,567][14660] Updated weights for policy 0, policy_version 390 (0.0042) [2023-02-24 21:29:28,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 1605632. Throughput: 0: 892.7. Samples: 401084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:29:28,190][01818] Avg episode reward: [(0, '4.608')] [2023-02-24 21:29:33,183][01818] Fps is (10 sec: 4506.2, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1630208. Throughput: 0: 905.9. Samples: 407820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:29:33,185][01818] Avg episode reward: [(0, '4.513')] [2023-02-24 21:29:35,507][14660] Updated weights for policy 0, policy_version 400 (0.0014) [2023-02-24 21:29:38,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3415.7). Total num frames: 1642496. Throughput: 0: 887.6. Samples: 410232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:29:38,188][01818] Avg episode reward: [(0, '4.470')] [2023-02-24 21:29:43,186][01818] Fps is (10 sec: 2456.8, 60 sec: 3481.4, 300 sec: 3415.6). Total num frames: 1654784. Throughput: 0: 855.1. Samples: 413708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:29:43,194][01818] Avg episode reward: [(0, '4.549')] [2023-02-24 21:29:48,186][01818] Fps is (10 sec: 2456.8, 60 sec: 3344.9, 300 sec: 3401.7). Total num frames: 1667072. Throughput: 0: 836.1. Samples: 417024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:29:48,190][01818] Avg episode reward: [(0, '4.684')] [2023-02-24 21:29:51,825][14660] Updated weights for policy 0, policy_version 410 (0.0038) [2023-02-24 21:29:53,183][01818] Fps is (10 sec: 2868.1, 60 sec: 3276.8, 300 sec: 3374.0). Total num frames: 1683456. Throughput: 0: 818.5. Samples: 419196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:29:53,186][01818] Avg episode reward: [(0, '4.911')] [2023-02-24 21:29:58,183][01818] Fps is (10 sec: 3687.5, 60 sec: 3413.7, 300 sec: 3387.9). Total num frames: 1703936. Throughput: 0: 819.6. Samples: 425730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:29:58,185][01818] Avg episode reward: [(0, '4.899')] [2023-02-24 21:30:03,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3387.9). Total num frames: 1716224. Throughput: 0: 814.0. Samples: 430304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:30:03,187][01818] Avg episode reward: [(0, '5.091')] [2023-02-24 21:30:03,211][14647] Saving new best policy, reward=5.091! [2023-02-24 21:30:03,223][14660] Updated weights for policy 0, policy_version 420 (0.0020) [2023-02-24 21:30:08,184][01818] Fps is (10 sec: 2866.9, 60 sec: 3276.7, 300 sec: 3401.7). Total num frames: 1732608. Throughput: 0: 814.0. Samples: 432294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:30:08,190][01818] Avg episode reward: [(0, '5.248')] [2023-02-24 21:30:08,199][14647] Saving new best policy, reward=5.248! [2023-02-24 21:30:13,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 1753088. Throughput: 0: 814.0. Samples: 437716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:30:13,190][01818] Avg episode reward: [(0, '5.146')] [2023-02-24 21:30:14,594][14660] Updated weights for policy 0, policy_version 430 (0.0024) [2023-02-24 21:30:18,183][01818] Fps is (10 sec: 4096.5, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 1773568. Throughput: 0: 807.2. Samples: 444142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:30:18,190][01818] Avg episode reward: [(0, '5.200')] [2023-02-24 21:30:23,183][01818] Fps is (10 sec: 3686.3, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 1789952. Throughput: 0: 804.1. Samples: 446416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:30:23,186][01818] Avg episode reward: [(0, '5.132')] [2023-02-24 21:30:27,559][14660] Updated weights for policy 0, policy_version 440 (0.0025) [2023-02-24 21:30:28,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3415.6). Total num frames: 1802240. Throughput: 0: 817.8. Samples: 450506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:30:28,186][01818] Avg episode reward: [(0, '5.217')] [2023-02-24 21:30:33,183][01818] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3457.3). Total num frames: 1822720. Throughput: 0: 868.8. Samples: 456118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:30:33,185][01818] Avg episode reward: [(0, '5.740')] [2023-02-24 21:30:33,200][14647] Saving new best policy, reward=5.740! [2023-02-24 21:30:37,651][14660] Updated weights for policy 0, policy_version 450 (0.0015) [2023-02-24 21:30:38,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 1843200. Throughput: 0: 891.2. Samples: 459302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:30:38,186][01818] Avg episode reward: [(0, '6.250')] [2023-02-24 21:30:38,192][14647] Saving new best policy, reward=6.250! [2023-02-24 21:30:43,189][01818] Fps is (10 sec: 3684.1, 60 sec: 3413.2, 300 sec: 3457.2). Total num frames: 1859584. Throughput: 0: 860.0. Samples: 464434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:30:43,194][01818] Avg episode reward: [(0, '5.823')] [2023-02-24 21:30:48,185][01818] Fps is (10 sec: 2867.0, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 1871872. Throughput: 0: 848.0. Samples: 468466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:30:48,189][01818] Avg episode reward: [(0, '5.806')] [2023-02-24 21:30:51,225][14660] Updated weights for policy 0, policy_version 460 (0.0024) [2023-02-24 21:30:53,183][01818] Fps is (10 sec: 3278.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1892352. Throughput: 0: 861.1. Samples: 471042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:30:53,186][01818] Avg episode reward: [(0, '5.637')] [2023-02-24 21:30:58,183][01818] Fps is (10 sec: 4096.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1912832. Throughput: 0: 885.4. Samples: 477558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:30:58,189][01818] Avg episode reward: [(0, '5.862')] [2023-02-24 21:31:01,913][14660] Updated weights for policy 0, policy_version 470 (0.0021) [2023-02-24 21:31:03,187][01818] Fps is (10 sec: 3275.4, 60 sec: 3481.4, 300 sec: 3457.3). Total num frames: 1925120. Throughput: 0: 852.2. Samples: 482494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:31:03,195][01818] Avg episode reward: [(0, '6.111')] [2023-02-24 21:31:03,274][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000471_1929216.pth... [2023-02-24 21:31:03,457][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000266_1089536.pth [2023-02-24 21:31:08,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 1941504. Throughput: 0: 846.6. Samples: 484512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:31:08,188][01818] Avg episode reward: [(0, '5.983')] [2023-02-24 21:31:13,183][01818] Fps is (10 sec: 3687.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1961984. Throughput: 0: 867.2. Samples: 489530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:31:13,185][01818] Avg episode reward: [(0, '5.684')] [2023-02-24 21:31:13,835][14660] Updated weights for policy 0, policy_version 480 (0.0019) [2023-02-24 21:31:18,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1982464. Throughput: 0: 891.6. Samples: 496242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:18,188][01818] Avg episode reward: [(0, '5.690')] [2023-02-24 21:31:23,185][01818] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3485.0). Total num frames: 1998848. Throughput: 0: 885.7. Samples: 499158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:23,187][01818] Avg episode reward: [(0, '5.895')] [2023-02-24 21:31:25,268][14660] Updated weights for policy 0, policy_version 490 (0.0025) [2023-02-24 21:31:28,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2011136. Throughput: 0: 863.7. Samples: 503296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:28,189][01818] Avg episode reward: [(0, '6.285')] [2023-02-24 21:31:28,281][14647] Saving new best policy, reward=6.285! [2023-02-24 21:31:33,183][01818] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2031616. Throughput: 0: 884.6. Samples: 508272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:31:33,185][01818] Avg episode reward: [(0, '6.384')] [2023-02-24 21:31:33,201][14647] Saving new best policy, reward=6.384! [2023-02-24 21:31:36,879][14660] Updated weights for policy 0, policy_version 500 (0.0017) [2023-02-24 21:31:38,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2052096. Throughput: 0: 894.7. Samples: 511304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:38,186][01818] Avg episode reward: [(0, '6.580')] [2023-02-24 21:31:38,194][14647] Saving new best policy, reward=6.580! [2023-02-24 21:31:43,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3482.0, 300 sec: 3485.1). Total num frames: 2068480. Throughput: 0: 877.5. Samples: 517044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:31:43,189][01818] Avg episode reward: [(0, '6.757')] [2023-02-24 21:31:43,200][14647] Saving new best policy, reward=6.757! [2023-02-24 21:31:48,183][01818] Fps is (10 sec: 2867.3, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 2080768. Throughput: 0: 858.3. Samples: 521114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:31:48,191][01818] Avg episode reward: [(0, '7.111')] [2023-02-24 21:31:48,196][14647] Saving new best policy, reward=7.111! [2023-02-24 21:31:50,200][14660] Updated weights for policy 0, policy_version 510 (0.0028) [2023-02-24 21:31:53,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2101248. Throughput: 0: 861.4. Samples: 523276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:53,185][01818] Avg episode reward: [(0, '7.335')] [2023-02-24 21:31:53,192][14647] Saving new best policy, reward=7.335! [2023-02-24 21:31:58,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2121728. Throughput: 0: 901.2. Samples: 530084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:31:58,186][01818] Avg episode reward: [(0, '7.572')] [2023-02-24 21:31:58,190][14647] Saving new best policy, reward=7.572! [2023-02-24 21:31:59,385][14660] Updated weights for policy 0, policy_version 520 (0.0021) [2023-02-24 21:32:03,185][01818] Fps is (10 sec: 4095.1, 60 sec: 3618.3, 300 sec: 3498.9). Total num frames: 2142208. Throughput: 0: 879.2. Samples: 535808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:32:03,190][01818] Avg episode reward: [(0, '7.422')] [2023-02-24 21:32:08,183][01818] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2154496. Throughput: 0: 861.6. Samples: 537928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:32:08,185][01818] Avg episode reward: [(0, '7.191')] [2023-02-24 21:32:12,303][14660] Updated weights for policy 0, policy_version 530 (0.0015) [2023-02-24 21:32:13,183][01818] Fps is (10 sec: 3277.5, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2174976. Throughput: 0: 872.8. Samples: 542570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:32:13,185][01818] Avg episode reward: [(0, '7.183')] [2023-02-24 21:32:18,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2195456. Throughput: 0: 914.2. Samples: 549410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:32:18,186][01818] Avg episode reward: [(0, '7.829')] [2023-02-24 21:32:18,190][14647] Saving new best policy, reward=7.829! [2023-02-24 21:32:21,472][14660] Updated weights for policy 0, policy_version 540 (0.0016) [2023-02-24 21:32:23,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 2215936. Throughput: 0: 919.1. Samples: 552664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:32:23,188][01818] Avg episode reward: [(0, '8.051')] [2023-02-24 21:32:23,204][14647] Saving new best policy, reward=8.051! [2023-02-24 21:32:28,183][01818] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2228224. Throughput: 0: 891.3. Samples: 557152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:32:28,187][01818] Avg episode reward: [(0, '8.661')] [2023-02-24 21:32:28,191][14647] Saving new best policy, reward=8.661! [2023-02-24 21:32:33,183][01818] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2244608. Throughput: 0: 904.9. Samples: 561834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:32:33,186][01818] Avg episode reward: [(0, '8.160')] [2023-02-24 21:32:34,415][14660] Updated weights for policy 0, policy_version 550 (0.0016) [2023-02-24 21:32:38,183][01818] Fps is (10 sec: 4096.1, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 2269184. Throughput: 0: 931.0. Samples: 565172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:32:38,192][01818] Avg episode reward: [(0, '8.025')] [2023-02-24 21:32:43,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3512.8). Total num frames: 2289664. Throughput: 0: 930.3. Samples: 571948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:32:43,192][01818] Avg episode reward: [(0, '7.568')] [2023-02-24 21:32:44,540][14660] Updated weights for policy 0, policy_version 560 (0.0023) [2023-02-24 21:32:48,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3485.1). Total num frames: 2301952. Throughput: 0: 900.1. Samples: 576310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:32:48,184][01818] Avg episode reward: [(0, '7.727')] [2023-02-24 21:32:53,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2318336. Throughput: 0: 901.3. Samples: 578488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:32:53,188][01818] Avg episode reward: [(0, '8.292')] [2023-02-24 21:32:56,082][14660] Updated weights for policy 0, policy_version 570 (0.0027) [2023-02-24 21:32:58,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 2342912. Throughput: 0: 940.0. Samples: 584868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:32:58,185][01818] Avg episode reward: [(0, '8.487')] [2023-02-24 21:33:03,183][01818] Fps is (10 sec: 4505.5, 60 sec: 3686.5, 300 sec: 3526.7). Total num frames: 2363392. Throughput: 0: 934.5. Samples: 591462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:33:03,185][01818] Avg episode reward: [(0, '8.637')] [2023-02-24 21:33:03,199][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000577_2363392.pth... [2023-02-24 21:33:03,355][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000370_1515520.pth [2023-02-24 21:33:06,857][14660] Updated weights for policy 0, policy_version 580 (0.0029) [2023-02-24 21:33:08,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2375680. Throughput: 0: 908.8. Samples: 593558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:33:08,185][01818] Avg episode reward: [(0, '8.879')] [2023-02-24 21:33:08,261][14647] Saving new best policy, reward=8.879! [2023-02-24 21:33:13,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2392064. Throughput: 0: 905.1. Samples: 597880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:33:13,186][01818] Avg episode reward: [(0, '9.375')] [2023-02-24 21:33:13,200][14647] Saving new best policy, reward=9.375! [2023-02-24 21:33:17,990][14660] Updated weights for policy 0, policy_version 590 (0.0028) [2023-02-24 21:33:18,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 2416640. Throughput: 0: 946.2. Samples: 604412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:33:18,185][01818] Avg episode reward: [(0, '10.796')] [2023-02-24 21:33:18,190][14647] Saving new best policy, reward=10.796! [2023-02-24 21:33:23,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 2437120. Throughput: 0: 945.2. Samples: 607708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:33:23,189][01818] Avg episode reward: [(0, '10.537')] [2023-02-24 21:33:28,183][01818] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3499.0). Total num frames: 2449408. Throughput: 0: 905.8. Samples: 612710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:33:28,193][01818] Avg episode reward: [(0, '10.363')] [2023-02-24 21:33:29,851][14660] Updated weights for policy 0, policy_version 600 (0.0012) [2023-02-24 21:33:33,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3512.9). Total num frames: 2465792. Throughput: 0: 903.9. Samples: 616986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:33:33,188][01818] Avg episode reward: [(0, '9.421')] [2023-02-24 21:33:38,183][01818] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 2490368. Throughput: 0: 928.8. Samples: 620282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:33:38,192][01818] Avg episode reward: [(0, '9.331')] [2023-02-24 21:33:39,877][14660] Updated weights for policy 0, policy_version 610 (0.0020) [2023-02-24 21:33:43,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 2510848. Throughput: 0: 941.2. Samples: 627220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:33:43,189][01818] Avg episode reward: [(0, '10.182')] [2023-02-24 21:33:48,191][01818] Fps is (10 sec: 3274.1, 60 sec: 3685.9, 300 sec: 3512.7). Total num frames: 2523136. Throughput: 0: 899.4. Samples: 631942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:33:48,194][01818] Avg episode reward: [(0, '10.384')] [2023-02-24 21:33:52,656][14660] Updated weights for policy 0, policy_version 620 (0.0024) [2023-02-24 21:33:53,183][01818] Fps is (10 sec: 2867.1, 60 sec: 3686.4, 300 sec: 3526.8). Total num frames: 2539520. Throughput: 0: 899.3. Samples: 634028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:33:53,190][01818] Avg episode reward: [(0, '10.650')] [2023-02-24 21:33:58,183][01818] Fps is (10 sec: 4099.3, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 2564096. Throughput: 0: 930.3. Samples: 639742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:33:58,185][01818] Avg episode reward: [(0, '11.445')] [2023-02-24 21:33:58,189][14647] Saving new best policy, reward=11.445! [2023-02-24 21:34:03,036][14660] Updated weights for policy 0, policy_version 630 (0.0019) [2023-02-24 21:34:03,183][01818] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2580480. Throughput: 0: 908.8. Samples: 645310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:34:03,185][01818] Avg episode reward: [(0, '10.902')] [2023-02-24 21:34:08,183][01818] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2588672. Throughput: 0: 873.5. Samples: 647014. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:34:08,190][01818] Avg episode reward: [(0, '11.294')] [2023-02-24 21:34:13,183][01818] Fps is (10 sec: 2048.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2600960. Throughput: 0: 832.8. Samples: 650184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 21:34:13,186][01818] Avg episode reward: [(0, '11.709')] [2023-02-24 21:34:13,201][14647] Saving new best policy, reward=11.709! [2023-02-24 21:34:18,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 2617344. Throughput: 0: 830.7. Samples: 654368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:34:18,192][01818] Avg episode reward: [(0, '12.177')] [2023-02-24 21:34:18,198][14647] Saving new best policy, reward=12.177! [2023-02-24 21:34:19,257][14660] Updated weights for policy 0, policy_version 640 (0.0033) [2023-02-24 21:34:23,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 2637824. Throughput: 0: 827.8. Samples: 657534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:34:23,190][01818] Avg episode reward: [(0, '12.261')] [2023-02-24 21:34:23,205][14647] Saving new best policy, reward=12.261! [2023-02-24 21:34:28,183][01818] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2654208. Throughput: 0: 816.7. Samples: 663972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:34:28,187][01818] Avg episode reward: [(0, '12.294')] [2023-02-24 21:34:28,269][14647] Saving new best policy, reward=12.294! [2023-02-24 21:34:29,534][14660] Updated weights for policy 0, policy_version 650 (0.0014) [2023-02-24 21:34:33,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 2670592. Throughput: 0: 802.5. Samples: 668046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-24 21:34:33,185][01818] Avg episode reward: [(0, '12.853')] [2023-02-24 21:34:33,204][14647] Saving new best policy, reward=12.853! [2023-02-24 21:34:38,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 2682880. Throughput: 0: 800.6. Samples: 670054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:34:38,190][01818] Avg episode reward: [(0, '13.139')] [2023-02-24 21:34:38,228][14647] Saving new best policy, reward=13.139! [2023-02-24 21:34:42,134][14660] Updated weights for policy 0, policy_version 660 (0.0012) [2023-02-24 21:34:43,183][01818] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3526.8). Total num frames: 2707456. Throughput: 0: 803.6. Samples: 675902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:34:43,189][01818] Avg episode reward: [(0, '14.523')] [2023-02-24 21:34:43,201][14647] Saving new best policy, reward=14.523! [2023-02-24 21:34:48,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3413.8, 300 sec: 3540.6). Total num frames: 2727936. Throughput: 0: 817.3. Samples: 682090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:34:48,186][01818] Avg episode reward: [(0, '15.878')] [2023-02-24 21:34:48,192][14647] Saving new best policy, reward=15.878! [2023-02-24 21:34:53,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2740224. Throughput: 0: 823.1. Samples: 684052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:34:53,185][01818] Avg episode reward: [(0, '16.446')] [2023-02-24 21:34:53,200][14647] Saving new best policy, reward=16.446! [2023-02-24 21:34:54,286][14660] Updated weights for policy 0, policy_version 670 (0.0034) [2023-02-24 21:34:58,184][01818] Fps is (10 sec: 2866.8, 60 sec: 3208.5, 300 sec: 3526.7). Total num frames: 2756608. Throughput: 0: 846.8. Samples: 688290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:34:58,187][01818] Avg episode reward: [(0, '15.481')] [2023-02-24 21:35:03,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3540.6). Total num frames: 2777088. Throughput: 0: 900.2. Samples: 694878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:03,185][01818] Avg episode reward: [(0, '13.858')] [2023-02-24 21:35:03,202][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000678_2777088.pth... [2023-02-24 21:35:03,326][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000471_1929216.pth [2023-02-24 21:35:04,476][14660] Updated weights for policy 0, policy_version 680 (0.0027) [2023-02-24 21:35:08,183][01818] Fps is (10 sec: 4096.6, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 2797568. Throughput: 0: 906.0. Samples: 698304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:35:08,188][01818] Avg episode reward: [(0, '13.014')] [2023-02-24 21:35:13,187][01818] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 2813952. Throughput: 0: 877.0. Samples: 703440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:13,191][01818] Avg episode reward: [(0, '13.145')] [2023-02-24 21:35:16,763][14660] Updated weights for policy 0, policy_version 690 (0.0013) [2023-02-24 21:35:18,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2830336. Throughput: 0: 884.2. Samples: 707834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:18,191][01818] Avg episode reward: [(0, '13.445')] [2023-02-24 21:35:23,183][01818] Fps is (10 sec: 3687.9, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2850816. Throughput: 0: 914.2. Samples: 711194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:35:23,189][01818] Avg episode reward: [(0, '14.560')] [2023-02-24 21:35:26,209][14660] Updated weights for policy 0, policy_version 700 (0.0018) [2023-02-24 21:35:28,190][01818] Fps is (10 sec: 4502.4, 60 sec: 3686.0, 300 sec: 3568.3). Total num frames: 2875392. Throughput: 0: 933.8. Samples: 717928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:28,193][01818] Avg episode reward: [(0, '14.614')] [2023-02-24 21:35:33,183][01818] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2887680. Throughput: 0: 903.6. Samples: 722752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:35:33,188][01818] Avg episode reward: [(0, '14.878')] [2023-02-24 21:35:38,183][01818] Fps is (10 sec: 2869.2, 60 sec: 3686.4, 300 sec: 3540.7). Total num frames: 2904064. Throughput: 0: 909.0. Samples: 724956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:38,190][01818] Avg episode reward: [(0, '14.490')] [2023-02-24 21:35:38,884][14660] Updated weights for policy 0, policy_version 710 (0.0012) [2023-02-24 21:35:43,183][01818] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2924544. Throughput: 0: 941.7. Samples: 730666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:35:43,185][01818] Avg episode reward: [(0, '14.499')] [2023-02-24 21:35:47,875][14660] Updated weights for policy 0, policy_version 720 (0.0012) [2023-02-24 21:35:48,183][01818] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2949120. Throughput: 0: 950.9. Samples: 737668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:48,190][01818] Avg episode reward: [(0, '15.255')] [2023-02-24 21:35:53,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 2965504. Throughput: 0: 928.9. Samples: 740106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:35:53,188][01818] Avg episode reward: [(0, '15.189')] [2023-02-24 21:35:58,183][01818] Fps is (10 sec: 2867.1, 60 sec: 3686.5, 300 sec: 3568.4). Total num frames: 2977792. Throughput: 0: 909.1. Samples: 744346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:35:58,189][01818] Avg episode reward: [(0, '14.591')] [2023-02-24 21:36:00,837][14660] Updated weights for policy 0, policy_version 730 (0.0035) [2023-02-24 21:36:03,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2998272. Throughput: 0: 941.6. Samples: 750206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:03,190][01818] Avg episode reward: [(0, '15.074')] [2023-02-24 21:36:08,183][01818] Fps is (10 sec: 4505.7, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 3022848. Throughput: 0: 943.2. Samples: 753638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:08,186][01818] Avg episode reward: [(0, '15.224')] [2023-02-24 21:36:10,125][14660] Updated weights for policy 0, policy_version 740 (0.0016) [2023-02-24 21:36:13,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3568.4). Total num frames: 3035136. Throughput: 0: 917.7. Samples: 759218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-24 21:36:13,186][01818] Avg episode reward: [(0, '15.145')] [2023-02-24 21:36:18,183][01818] Fps is (10 sec: 2867.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3051520. Throughput: 0: 901.1. Samples: 763300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:36:18,191][01818] Avg episode reward: [(0, '14.847')] [2023-02-24 21:36:22,976][14660] Updated weights for policy 0, policy_version 750 (0.0022) [2023-02-24 21:36:23,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3072000. Throughput: 0: 910.7. Samples: 765936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:36:23,185][01818] Avg episode reward: [(0, '16.168')] [2023-02-24 21:36:28,183][01818] Fps is (10 sec: 4096.3, 60 sec: 3618.6, 300 sec: 3596.2). Total num frames: 3092480. Throughput: 0: 932.2. Samples: 772616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:28,185][01818] Avg episode reward: [(0, '15.144')] [2023-02-24 21:36:33,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3108864. Throughput: 0: 897.6. Samples: 778058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:36:33,186][01818] Avg episode reward: [(0, '14.696')] [2023-02-24 21:36:33,645][14660] Updated weights for policy 0, policy_version 760 (0.0026) [2023-02-24 21:36:38,185][01818] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3582.2). Total num frames: 3125248. Throughput: 0: 893.2. Samples: 780302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:38,188][01818] Avg episode reward: [(0, '16.089')] [2023-02-24 21:36:43,183][01818] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3145728. Throughput: 0: 913.8. Samples: 785468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:43,190][01818] Avg episode reward: [(0, '16.020')] [2023-02-24 21:36:44,734][14660] Updated weights for policy 0, policy_version 770 (0.0023) [2023-02-24 21:36:48,183][01818] Fps is (10 sec: 4506.6, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3170304. Throughput: 0: 942.0. Samples: 792594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:36:48,185][01818] Avg episode reward: [(0, '17.512')] [2023-02-24 21:36:48,193][14647] Saving new best policy, reward=17.512! [2023-02-24 21:36:53,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3186688. Throughput: 0: 931.4. Samples: 795550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-24 21:36:53,187][01818] Avg episode reward: [(0, '17.732')] [2023-02-24 21:36:53,202][14647] Saving new best policy, reward=17.732! [2023-02-24 21:36:55,767][14660] Updated weights for policy 0, policy_version 780 (0.0025) [2023-02-24 21:36:58,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3198976. Throughput: 0: 902.5. Samples: 799832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:36:58,185][01818] Avg episode reward: [(0, '18.518')] [2023-02-24 21:36:58,195][14647] Saving new best policy, reward=18.518! [2023-02-24 21:37:03,183][01818] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3219456. Throughput: 0: 932.0. Samples: 805238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:37:03,190][01818] Avg episode reward: [(0, '17.965')] [2023-02-24 21:37:03,204][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000786_3219456.pth... [2023-02-24 21:37:03,327][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000577_2363392.pth [2023-02-24 21:37:06,469][14660] Updated weights for policy 0, policy_version 790 (0.0015) [2023-02-24 21:37:08,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3239936. Throughput: 0: 947.6. Samples: 808576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:37:08,185][01818] Avg episode reward: [(0, '17.763')] [2023-02-24 21:37:13,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3260416. Throughput: 0: 934.3. Samples: 814658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:37:13,190][01818] Avg episode reward: [(0, '18.219')] [2023-02-24 21:37:18,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3272704. Throughput: 0: 911.6. Samples: 819078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:37:18,188][01818] Avg episode reward: [(0, '17.732')] [2023-02-24 21:37:18,622][14660] Updated weights for policy 0, policy_version 800 (0.0034) [2023-02-24 21:37:23,183][01818] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3293184. Throughput: 0: 911.2. Samples: 821304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:37:23,186][01818] Avg episode reward: [(0, '18.475')] [2023-02-24 21:37:28,183][01818] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3313664. Throughput: 0: 944.4. Samples: 827966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:37:28,185][01818] Avg episode reward: [(0, '17.618')] [2023-02-24 21:37:28,417][14660] Updated weights for policy 0, policy_version 810 (0.0036) [2023-02-24 21:37:33,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 3334144. Throughput: 0: 920.0. Samples: 833992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:37:33,190][01818] Avg episode reward: [(0, '18.021')] [2023-02-24 21:37:38,183][01818] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3582.3). Total num frames: 3346432. Throughput: 0: 901.2. Samples: 836104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:37:38,185][01818] Avg episode reward: [(0, '17.971')] [2023-02-24 21:37:41,425][14660] Updated weights for policy 0, policy_version 820 (0.0022) [2023-02-24 21:37:43,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3362816. Throughput: 0: 905.1. Samples: 840560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:37:43,190][01818] Avg episode reward: [(0, '17.917')] [2023-02-24 21:37:48,183][01818] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3387392. Throughput: 0: 936.6. Samples: 847384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-24 21:37:48,186][01818] Avg episode reward: [(0, '16.586')] [2023-02-24 21:37:50,353][14660] Updated weights for policy 0, policy_version 830 (0.0019) [2023-02-24 21:37:53,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3407872. Throughput: 0: 939.6. Samples: 850860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-24 21:37:53,188][01818] Avg episode reward: [(0, '17.387')] [2023-02-24 21:37:58,185][01818] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3596.1). Total num frames: 3424256. Throughput: 0: 906.3. Samples: 855444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:37:58,192][01818] Avg episode reward: [(0, '17.470')] [2023-02-24 21:38:03,183][01818] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3436544. Throughput: 0: 913.4. Samples: 860182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-24 21:38:03,191][01818] Avg episode reward: [(0, '18.360')] [2023-02-24 21:38:03,245][14660] Updated weights for policy 0, policy_version 840 (0.0027) [2023-02-24 21:38:08,183][01818] Fps is (10 sec: 3687.3, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3461120. Throughput: 0: 938.4. Samples: 863530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:38:08,188][01818] Avg episode reward: [(0, '19.523')] [2023-02-24 21:38:08,192][14647] Saving new best policy, reward=19.523! [2023-02-24 21:38:12,426][14660] Updated weights for policy 0, policy_version 850 (0.0029) [2023-02-24 21:38:13,183][01818] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 3481600. Throughput: 0: 945.2. Samples: 870502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-24 21:38:13,190][01818] Avg episode reward: [(0, '21.283')] [2023-02-24 21:38:13,204][14647] Saving new best policy, reward=21.283! [2023-02-24 21:38:18,183][01818] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3596.1). Total num frames: 3497984. Throughput: 0: 905.9. Samples: 874756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-24 21:38:18,185][01818] Avg episode reward: [(0, '22.163')] [2023-02-24 21:38:18,193][14647] Saving new best policy, reward=22.163! [2023-02-24 21:38:20,832][14647] Stopping Batcher_0... [2023-02-24 21:38:20,833][14647] Loop batcher_evt_loop terminating... [2023-02-24 21:38:20,834][01818] Component Batcher_0 stopped! [2023-02-24 21:38:20,843][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth... [2023-02-24 21:38:20,915][14660] Weights refcount: 2 0 [2023-02-24 21:38:20,924][01818] Component InferenceWorker_p0-w0 stopped! [2023-02-24 21:38:20,933][14667] Stopping RolloutWorker_w5... [2023-02-24 21:38:20,923][14660] Stopping InferenceWorker_p0-w0... [2023-02-24 21:38:20,935][14660] Loop inference_proc0-0_evt_loop terminating... [2023-02-24 21:38:20,933][01818] Component RolloutWorker_w5 stopped! [2023-02-24 21:38:20,933][14667] Loop rollout_proc5_evt_loop terminating... [2023-02-24 21:38:20,952][14669] Stopping RolloutWorker_w7... [2023-02-24 21:38:20,951][01818] Component RolloutWorker_w6 stopped! [2023-02-24 21:38:20,957][01818] Component RolloutWorker_w7 stopped! [2023-02-24 21:38:20,959][14668] Stopping RolloutWorker_w6... [2023-02-24 21:38:20,974][01818] Component RolloutWorker_w0 stopped! [2023-02-24 21:38:20,975][01818] Component RolloutWorker_w2 stopped! [2023-02-24 21:38:20,983][14662] Stopping RolloutWorker_w1... [2023-02-24 21:38:20,974][14663] Stopping RolloutWorker_w2... [2023-02-24 21:38:20,984][14663] Loop rollout_proc2_evt_loop terminating... [2023-02-24 21:38:20,986][14669] Loop rollout_proc7_evt_loop terminating... [2023-02-24 21:38:20,984][01818] Component RolloutWorker_w1 stopped! [2023-02-24 21:38:20,989][14661] Stopping RolloutWorker_w0... [2023-02-24 21:38:20,990][14661] Loop rollout_proc0_evt_loop terminating... [2023-02-24 21:38:20,972][14668] Loop rollout_proc6_evt_loop terminating... [2023-02-24 21:38:21,023][01818] Component RolloutWorker_w4 stopped! [2023-02-24 21:38:21,030][14666] Stopping RolloutWorker_w4... [2023-02-24 21:38:21,042][14666] Loop rollout_proc4_evt_loop terminating... [2023-02-24 21:38:21,053][14665] Stopping RolloutWorker_w3... [2023-02-24 21:38:21,054][14665] Loop rollout_proc3_evt_loop terminating... [2023-02-24 21:38:20,983][14662] Loop rollout_proc1_evt_loop terminating... [2023-02-24 21:38:21,054][01818] Component RolloutWorker_w3 stopped! [2023-02-24 21:38:21,166][14647] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000678_2777088.pth [2023-02-24 21:38:21,194][14647] Saving new best policy, reward=22.303! [2023-02-24 21:38:21,403][14647] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth... [2023-02-24 21:38:21,720][01818] Component LearnerWorker_p0 stopped! [2023-02-24 21:38:21,722][01818] Waiting for process learner_proc0 to stop... [2023-02-24 21:38:21,726][14647] Stopping LearnerWorker_p0... [2023-02-24 21:38:21,727][14647] Loop learner_proc0_evt_loop terminating... [2023-02-24 21:38:23,561][01818] Waiting for process inference_proc0-0 to join... [2023-02-24 21:38:23,907][01818] Waiting for process rollout_proc0 to join... [2023-02-24 21:38:24,150][01818] Waiting for process rollout_proc1 to join... [2023-02-24 21:38:24,152][01818] Waiting for process rollout_proc2 to join... [2023-02-24 21:38:24,174][01818] Waiting for process rollout_proc3 to join... [2023-02-24 21:38:24,175][01818] Waiting for process rollout_proc4 to join... [2023-02-24 21:38:24,178][01818] Waiting for process rollout_proc5 to join... [2023-02-24 21:38:24,181][01818] Waiting for process rollout_proc6 to join... [2023-02-24 21:38:24,184][01818] Waiting for process rollout_proc7 to join... [2023-02-24 21:38:24,186][01818] Batcher 0 profile tree view: batching: 23.4631, releasing_batches: 0.0200 [2023-02-24 21:38:24,188][01818] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 486.0066 update_model: 6.9912 weight_update: 0.0025 one_step: 0.0033 handle_policy_step: 479.0235 deserialize: 13.7442, stack: 2.7340, obs_to_device_normalize: 103.1918, forward: 234.9875, send_messages: 22.6309 prepare_outputs: 78.0531 to_cpu: 48.9591 [2023-02-24 21:38:24,190][01818] Learner 0 profile tree view: misc: 0.0052, prepare_batch: 16.3556 train: 67.3034 epoch_init: 0.0098, minibatch_init: 0.0082, losses_postprocess: 0.4303, kl_divergence: 0.5409, after_optimizer: 28.6990 calculate_losses: 24.1881 losses_init: 0.0167, forward_head: 1.5432, bptt_initial: 15.8958, tail: 1.0545, advantages_returns: 0.2436, losses: 3.1979 bptt: 1.9428 bptt_forward_core: 1.8471 update: 12.9027 clip: 1.2816 [2023-02-24 21:38:24,193][01818] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2528, enqueue_policy_requests: 135.8919, env_step: 756.1142, overhead: 19.9025, complete_rollouts: 6.3166 save_policy_outputs: 19.1322 split_output_tensors: 8.9058 [2023-02-24 21:38:24,195][01818] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3679, enqueue_policy_requests: 135.9258, env_step: 754.6938, overhead: 19.9694, complete_rollouts: 6.1935 save_policy_outputs: 18.4899 split_output_tensors: 9.1868 [2023-02-24 21:38:24,198][01818] Loop Runner_EvtLoop terminating... [2023-02-24 21:38:24,204][01818] Runner profile tree view: main_loop: 1036.4389 [2023-02-24 21:38:24,210][01818] Collected {0: 3506176}, FPS: 3382.9 [2023-02-24 21:38:31,931][01818] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 21:38:31,933][01818] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 21:38:31,936][01818] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 21:38:31,938][01818] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 21:38:31,940][01818] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 21:38:31,942][01818] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 21:38:31,947][01818] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 21:38:31,949][01818] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 21:38:31,951][01818] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-24 21:38:31,952][01818] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-24 21:38:31,953][01818] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 21:38:31,955][01818] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 21:38:31,956][01818] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 21:38:31,958][01818] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 21:38:31,960][01818] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 21:38:31,982][01818] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-24 21:38:31,986][01818] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 21:38:31,989][01818] RunningMeanStd input shape: (1,) [2023-02-24 21:38:32,005][01818] ConvEncoder: input_channels=3 [2023-02-24 21:38:32,706][01818] Conv encoder output size: 512 [2023-02-24 21:38:32,712][01818] Policy head output size: 512 [2023-02-24 21:38:35,785][01818] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth... [2023-02-24 21:38:37,023][01818] Num frames 100... [2023-02-24 21:38:37,133][01818] Num frames 200... [2023-02-24 21:38:37,242][01818] Num frames 300... [2023-02-24 21:38:37,354][01818] Num frames 400... [2023-02-24 21:38:37,470][01818] Num frames 500... [2023-02-24 21:38:37,580][01818] Num frames 600... [2023-02-24 21:38:37,697][01818] Num frames 700... [2023-02-24 21:38:37,787][01818] Avg episode rewards: #0: 16.320, true rewards: #0: 7.320 [2023-02-24 21:38:37,789][01818] Avg episode reward: 16.320, avg true_objective: 7.320 [2023-02-24 21:38:37,868][01818] Num frames 800... [2023-02-24 21:38:37,978][01818] Num frames 900... [2023-02-24 21:38:38,087][01818] Num frames 1000... [2023-02-24 21:38:38,196][01818] Num frames 1100... [2023-02-24 21:38:38,303][01818] Num frames 1200... [2023-02-24 21:38:38,419][01818] Num frames 1300... [2023-02-24 21:38:38,547][01818] Num frames 1400... [2023-02-24 21:38:38,668][01818] Num frames 1500... [2023-02-24 21:38:38,778][01818] Num frames 1600... [2023-02-24 21:38:38,940][01818] Avg episode rewards: #0: 18.990, true rewards: #0: 8.490 [2023-02-24 21:38:38,941][01818] Avg episode reward: 18.990, avg true_objective: 8.490 [2023-02-24 21:38:38,948][01818] Num frames 1700... [2023-02-24 21:38:39,057][01818] Num frames 1800... [2023-02-24 21:38:39,170][01818] Num frames 1900... [2023-02-24 21:38:39,279][01818] Num frames 2000... [2023-02-24 21:38:39,401][01818] Num frames 2100... [2023-02-24 21:38:39,515][01818] Num frames 2200... [2023-02-24 21:38:39,643][01818] Avg episode rewards: #0: 17.223, true rewards: #0: 7.557 [2023-02-24 21:38:39,645][01818] Avg episode reward: 17.223, avg true_objective: 7.557 [2023-02-24 21:38:39,692][01818] Num frames 2300... [2023-02-24 21:38:39,802][01818] Num frames 2400... [2023-02-24 21:38:39,918][01818] Num frames 2500... [2023-02-24 21:38:40,032][01818] Num frames 2600... [2023-02-24 21:38:40,142][01818] Num frames 2700... [2023-02-24 21:38:40,251][01818] Num frames 2800... [2023-02-24 21:38:40,373][01818] Num frames 2900... [2023-02-24 21:38:40,482][01818] Num frames 3000... [2023-02-24 21:38:40,592][01818] Num frames 3100... [2023-02-24 21:38:40,709][01818] Num frames 3200... [2023-02-24 21:38:40,816][01818] Num frames 3300... [2023-02-24 21:38:40,930][01818] Num frames 3400... [2023-02-24 21:38:41,044][01818] Num frames 3500... [2023-02-24 21:38:41,155][01818] Num frames 3600... [2023-02-24 21:38:41,292][01818] Avg episode rewards: #0: 22.188, true rewards: #0: 9.187 [2023-02-24 21:38:41,294][01818] Avg episode reward: 22.188, avg true_objective: 9.187 [2023-02-24 21:38:41,329][01818] Num frames 3700... [2023-02-24 21:38:41,440][01818] Num frames 3800... [2023-02-24 21:38:41,548][01818] Num frames 3900... [2023-02-24 21:38:41,657][01818] Num frames 4000... [2023-02-24 21:38:41,773][01818] Num frames 4100... [2023-02-24 21:38:41,883][01818] Num frames 4200... [2023-02-24 21:38:41,996][01818] Num frames 4300... [2023-02-24 21:38:42,120][01818] Num frames 4400... [2023-02-24 21:38:42,230][01818] Num frames 4500... [2023-02-24 21:38:42,344][01818] Num frames 4600... [2023-02-24 21:38:42,454][01818] Num frames 4700... [2023-02-24 21:38:42,581][01818] Num frames 4800... [2023-02-24 21:38:42,691][01818] Num frames 4900... [2023-02-24 21:38:42,814][01818] Num frames 5000... [2023-02-24 21:38:42,925][01818] Num frames 5100... [2023-02-24 21:38:43,044][01818] Num frames 5200... [2023-02-24 21:38:43,155][01818] Num frames 5300... [2023-02-24 21:38:43,317][01818] Avg episode rewards: #0: 26.190, true rewards: #0: 10.790 [2023-02-24 21:38:43,321][01818] Avg episode reward: 26.190, avg true_objective: 10.790 [2023-02-24 21:38:43,329][01818] Num frames 5400... [2023-02-24 21:38:43,445][01818] Num frames 5500... [2023-02-24 21:38:43,556][01818] Num frames 5600... [2023-02-24 21:38:43,666][01818] Num frames 5700... [2023-02-24 21:38:43,783][01818] Num frames 5800... [2023-02-24 21:38:43,894][01818] Num frames 5900... [2023-02-24 21:38:44,012][01818] Num frames 6000... [2023-02-24 21:38:44,123][01818] Num frames 6100... [2023-02-24 21:38:44,234][01818] Num frames 6200... [2023-02-24 21:38:44,346][01818] Num frames 6300... [2023-02-24 21:38:44,499][01818] Avg episode rewards: #0: 24.972, true rewards: #0: 10.638 [2023-02-24 21:38:44,503][01818] Avg episode reward: 24.972, avg true_objective: 10.638 [2023-02-24 21:38:44,527][01818] Num frames 6400... [2023-02-24 21:38:44,643][01818] Num frames 6500... [2023-02-24 21:38:44,751][01818] Num frames 6600... [2023-02-24 21:38:44,870][01818] Num frames 6700... [2023-02-24 21:38:44,979][01818] Num frames 6800... [2023-02-24 21:38:45,087][01818] Num frames 6900... [2023-02-24 21:38:45,199][01818] Num frames 7000... [2023-02-24 21:38:45,308][01818] Num frames 7100... [2023-02-24 21:38:45,423][01818] Num frames 7200... [2023-02-24 21:38:45,531][01818] Num frames 7300... [2023-02-24 21:38:45,641][01818] Num frames 7400... [2023-02-24 21:38:45,788][01818] Num frames 7500... [2023-02-24 21:38:45,953][01818] Num frames 7600... [2023-02-24 21:38:46,116][01818] Num frames 7700... [2023-02-24 21:38:46,274][01818] Num frames 7800... [2023-02-24 21:38:46,438][01818] Num frames 7900... [2023-02-24 21:38:46,595][01818] Num frames 8000... [2023-02-24 21:38:46,754][01818] Num frames 8100... [2023-02-24 21:38:46,914][01818] Num frames 8200... [2023-02-24 21:38:47,072][01818] Num frames 8300... [2023-02-24 21:38:47,228][01818] Num frames 8400... [2023-02-24 21:38:47,419][01818] Avg episode rewards: #0: 28.976, true rewards: #0: 12.119 [2023-02-24 21:38:47,425][01818] Avg episode reward: 28.976, avg true_objective: 12.119 [2023-02-24 21:38:47,453][01818] Num frames 8500... [2023-02-24 21:38:47,606][01818] Num frames 8600... [2023-02-24 21:38:47,766][01818] Num frames 8700... [2023-02-24 21:38:47,927][01818] Num frames 8800... [2023-02-24 21:38:48,086][01818] Num frames 8900... [2023-02-24 21:38:48,249][01818] Num frames 9000... [2023-02-24 21:38:48,408][01818] Num frames 9100... [2023-02-24 21:38:48,572][01818] Num frames 9200... [2023-02-24 21:38:48,730][01818] Num frames 9300... [2023-02-24 21:38:48,896][01818] Num frames 9400... [2023-02-24 21:38:49,064][01818] Num frames 9500... [2023-02-24 21:38:49,223][01818] Num frames 9600... [2023-02-24 21:38:49,340][01818] Num frames 9700... [2023-02-24 21:38:49,505][01818] Avg episode rewards: #0: 29.119, true rewards: #0: 12.244 [2023-02-24 21:38:49,507][01818] Avg episode reward: 29.119, avg true_objective: 12.244 [2023-02-24 21:38:49,517][01818] Num frames 9800... [2023-02-24 21:38:49,634][01818] Num frames 9900... [2023-02-24 21:38:49,744][01818] Num frames 10000... [2023-02-24 21:38:49,854][01818] Num frames 10100... [2023-02-24 21:38:49,973][01818] Num frames 10200... [2023-02-24 21:38:50,093][01818] Num frames 10300... [2023-02-24 21:38:50,226][01818] Avg episode rewards: #0: 26.968, true rewards: #0: 11.523 [2023-02-24 21:38:50,228][01818] Avg episode reward: 26.968, avg true_objective: 11.523 [2023-02-24 21:38:50,264][01818] Num frames 10400... [2023-02-24 21:38:50,382][01818] Num frames 10500... [2023-02-24 21:38:50,498][01818] Num frames 10600... [2023-02-24 21:38:50,607][01818] Num frames 10700... [2023-02-24 21:38:50,721][01818] Num frames 10800... [2023-02-24 21:38:50,788][01818] Avg episode rewards: #0: 24.910, true rewards: #0: 10.810 [2023-02-24 21:38:50,791][01818] Avg episode reward: 24.910, avg true_objective: 10.810 [2023-02-24 21:39:56,757][01818] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-24 21:51:35,425][01818] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-24 21:51:35,432][01818] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-24 21:51:35,434][01818] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-24 21:51:35,436][01818] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-24 21:51:35,442][01818] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-24 21:51:35,443][01818] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-24 21:51:35,444][01818] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-24 21:51:35,446][01818] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-24 21:51:35,447][01818] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-24 21:51:35,448][01818] Adding new argument 'hf_repository'='kinkpunk/rl-doom-health-gathering-supreme' that is not in the saved config file! [2023-02-24 21:51:35,451][01818] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-24 21:51:35,452][01818] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-24 21:51:35,453][01818] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-24 21:51:35,458][01818] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-24 21:51:35,459][01818] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-24 21:51:35,502][01818] RunningMeanStd input shape: (3, 72, 128) [2023-02-24 21:51:35,505][01818] RunningMeanStd input shape: (1,) [2023-02-24 21:51:35,525][01818] ConvEncoder: input_channels=3 [2023-02-24 21:51:35,589][01818] Conv encoder output size: 512 [2023-02-24 21:51:35,591][01818] Policy head output size: 512 [2023-02-24 21:51:35,622][01818] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000856_3506176.pth... [2023-02-24 21:51:36,261][01818] Num frames 100... [2023-02-24 21:51:36,419][01818] Num frames 200... [2023-02-24 21:51:36,578][01818] Num frames 300... [2023-02-24 21:51:36,734][01818] Num frames 400... [2023-02-24 21:51:36,890][01818] Num frames 500... [2023-02-24 21:51:37,047][01818] Num frames 600... [2023-02-24 21:51:37,201][01818] Num frames 700... [2023-02-24 21:51:37,318][01818] Num frames 800... [2023-02-24 21:51:37,448][01818] Num frames 900... [2023-02-24 21:51:37,547][01818] Avg episode rewards: #0: 19.330, true rewards: #0: 9.330 [2023-02-24 21:51:37,552][01818] Avg episode reward: 19.330, avg true_objective: 9.330 [2023-02-24 21:51:37,636][01818] Num frames 1000... [2023-02-24 21:51:37,751][01818] Num frames 1100... [2023-02-24 21:51:37,867][01818] Num frames 1200... [2023-02-24 21:51:37,991][01818] Num frames 1300... [2023-02-24 21:51:38,099][01818] Num frames 1400... [2023-02-24 21:51:38,218][01818] Num frames 1500... [2023-02-24 21:51:38,329][01818] Num frames 1600... [2023-02-24 21:51:38,439][01818] Num frames 1700... [2023-02-24 21:51:38,552][01818] Num frames 1800... [2023-02-24 21:51:38,665][01818] Num frames 1900... [2023-02-24 21:51:38,749][01818] Avg episode rewards: #0: 20.125, true rewards: #0: 9.625 [2023-02-24 21:51:38,751][01818] Avg episode reward: 20.125, avg true_objective: 9.625 [2023-02-24 21:51:38,836][01818] Num frames 2000... [2023-02-24 21:51:38,958][01818] Num frames 2100... [2023-02-24 21:51:39,086][01818] Num frames 2200... [2023-02-24 21:51:39,217][01818] Num frames 2300... [2023-02-24 21:51:39,334][01818] Num frames 2400... [2023-02-24 21:51:39,452][01818] Num frames 2500... [2023-02-24 21:51:39,566][01818] Num frames 2600... [2023-02-24 21:51:39,675][01818] Num frames 2700... [2023-02-24 21:51:39,787][01818] Num frames 2800... [2023-02-24 21:51:39,934][01818] Avg episode rewards: #0: 20.617, true rewards: #0: 9.617 [2023-02-24 21:51:39,937][01818] Avg episode reward: 20.617, avg true_objective: 9.617 [2023-02-24 21:51:39,956][01818] Num frames 2900... [2023-02-24 21:51:40,072][01818] Num frames 3000... [2023-02-24 21:51:40,187][01818] Num frames 3100... [2023-02-24 21:51:40,334][01818] Avg episode rewards: #0: 16.683, true rewards: #0: 7.932 [2023-02-24 21:51:40,336][01818] Avg episode reward: 16.683, avg true_objective: 7.932 [2023-02-24 21:51:40,371][01818] Num frames 3200... [2023-02-24 21:51:40,492][01818] Num frames 3300... [2023-02-24 21:51:40,615][01818] Num frames 3400... [2023-02-24 21:51:40,733][01818] Num frames 3500... [2023-02-24 21:51:40,852][01818] Num frames 3600... [2023-02-24 21:51:40,969][01818] Num frames 3700... [2023-02-24 21:51:41,091][01818] Num frames 3800... [2023-02-24 21:51:41,215][01818] Num frames 3900... [2023-02-24 21:51:41,340][01818] Num frames 4000... [2023-02-24 21:51:41,453][01818] Num frames 4100... [2023-02-24 21:51:41,580][01818] Avg episode rewards: #0: 17.930, true rewards: #0: 8.330 [2023-02-24 21:51:41,583][01818] Avg episode reward: 17.930, avg true_objective: 8.330 [2023-02-24 21:51:41,627][01818] Num frames 4200... [2023-02-24 21:51:41,745][01818] Num frames 4300... [2023-02-24 21:51:41,863][01818] Num frames 4400... [2023-02-24 21:51:41,985][01818] Num frames 4500... [2023-02-24 21:51:42,100][01818] Num frames 4600... [2023-02-24 21:51:42,214][01818] Num frames 4700... [2023-02-24 21:51:42,282][01818] Avg episode rewards: #0: 16.682, true rewards: #0: 7.848 [2023-02-24 21:51:42,283][01818] Avg episode reward: 16.682, avg true_objective: 7.848 [2023-02-24 21:51:42,391][01818] Num frames 4800... [2023-02-24 21:51:42,505][01818] Num frames 4900... [2023-02-24 21:51:42,617][01818] Num frames 5000... [2023-02-24 21:51:42,725][01818] Num frames 5100... [2023-02-24 21:51:42,834][01818] Num frames 5200... [2023-02-24 21:51:42,944][01818] Num frames 5300... [2023-02-24 21:51:43,053][01818] Num frames 5400... [2023-02-24 21:51:43,173][01818] Num frames 5500... [2023-02-24 21:51:43,286][01818] Num frames 5600... [2023-02-24 21:51:43,399][01818] Num frames 5700... [2023-02-24 21:51:43,510][01818] Num frames 5800... [2023-02-24 21:51:43,636][01818] Num frames 5900... [2023-02-24 21:51:43,831][01818] Num frames 6000... [2023-02-24 21:51:43,968][01818] Avg episode rewards: #0: 19.247, true rewards: #0: 8.676 [2023-02-24 21:51:43,970][01818] Avg episode reward: 19.247, avg true_objective: 8.676 [2023-02-24 21:51:44,006][01818] Num frames 6100... [2023-02-24 21:51:44,118][01818] Num frames 6200... [2023-02-24 21:51:44,226][01818] Num frames 6300... [2023-02-24 21:51:44,348][01818] Num frames 6400... [2023-02-24 21:51:44,457][01818] Num frames 6500... [2023-02-24 21:51:44,567][01818] Num frames 6600... [2023-02-24 21:51:44,675][01818] Num frames 6700... [2023-02-24 21:51:44,746][01818] Avg episode rewards: #0: 18.266, true rewards: #0: 8.391 [2023-02-24 21:51:44,748][01818] Avg episode reward: 18.266, avg true_objective: 8.391 [2023-02-24 21:51:44,844][01818] Num frames 6800... [2023-02-24 21:51:44,955][01818] Num frames 6900... [2023-02-24 21:51:45,066][01818] Num frames 7000... [2023-02-24 21:51:45,174][01818] Num frames 7100... [2023-02-24 21:51:45,292][01818] Num frames 7200... [2023-02-24 21:51:45,380][01818] Avg episode rewards: #0: 17.250, true rewards: #0: 8.028 [2023-02-24 21:51:45,382][01818] Avg episode reward: 17.250, avg true_objective: 8.028 [2023-02-24 21:51:45,466][01818] Num frames 7300... [2023-02-24 21:51:45,578][01818] Num frames 7400... [2023-02-24 21:51:45,687][01818] Num frames 7500... [2023-02-24 21:51:45,802][01818] Num frames 7600... [2023-02-24 21:51:45,910][01818] Num frames 7700... [2023-02-24 21:51:46,020][01818] Num frames 7800... [2023-02-24 21:51:46,133][01818] Num frames 7900... [2023-02-24 21:51:46,244][01818] Num frames 8000... [2023-02-24 21:51:46,361][01818] Num frames 8100... [2023-02-24 21:51:46,472][01818] Num frames 8200... [2023-02-24 21:51:46,584][01818] Num frames 8300... [2023-02-24 21:51:46,694][01818] Num frames 8400... [2023-02-24 21:51:46,805][01818] Avg episode rewards: #0: 18.652, true rewards: #0: 8.452 [2023-02-24 21:51:46,807][01818] Avg episode reward: 18.652, avg true_objective: 8.452 [2023-02-24 21:52:39,632][01818] Replay video saved to /content/train_dir/default_experiment/replay.mp4!