[2024-08-05 08:25:46,188][00332] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-05 08:25:46,191][00332] Rollout worker 0 uses device cpu [2024-08-05 08:25:46,193][00332] Rollout worker 1 uses device cpu [2024-08-05 08:25:46,194][00332] Rollout worker 2 uses device cpu [2024-08-05 08:25:46,195][00332] Rollout worker 3 uses device cpu [2024-08-05 08:25:46,197][00332] Rollout worker 4 uses device cpu [2024-08-05 08:25:46,198][00332] Rollout worker 5 uses device cpu [2024-08-05 08:25:46,200][00332] Rollout worker 6 uses device cpu [2024-08-05 08:25:46,201][00332] Rollout worker 7 uses device cpu [2024-08-05 08:25:46,346][00332] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-05 08:25:46,347][00332] InferenceWorker_p0-w0: min num requests: 2 [2024-08-05 08:25:46,380][00332] Starting all processes... [2024-08-05 08:25:46,381][00332] Starting process learner_proc0 [2024-08-05 08:25:46,430][00332] Starting all processes... [2024-08-05 08:25:46,439][00332] Starting process inference_proc0-0 [2024-08-05 08:25:46,439][00332] Starting process rollout_proc0 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc1 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc2 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc3 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc4 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc5 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc6 [2024-08-05 08:25:46,441][00332] Starting process rollout_proc7 [2024-08-05 08:25:57,320][04095] Worker 6 uses CPU cores [0] [2024-08-05 08:25:57,514][04089] Worker 0 uses CPU cores [0] [2024-08-05 08:25:57,593][04088] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-05 08:25:57,594][04088] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-05 08:25:57,595][04075] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-05 08:25:57,599][04075] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-05 08:25:57,644][04075] Num visible devices: 1 [2024-08-05 08:25:57,660][04088] Num visible devices: 1 [2024-08-05 08:25:57,682][04093] Worker 4 uses CPU cores [0] [2024-08-05 08:25:57,690][04075] Starting seed is not provided [2024-08-05 08:25:57,691][04075] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-05 08:25:57,692][04075] Initializing actor-critic model on device cuda:0 [2024-08-05 08:25:57,693][04075] RunningMeanStd input shape: (3, 72, 128) [2024-08-05 08:25:57,694][04075] RunningMeanStd input shape: (1,) [2024-08-05 08:25:57,717][04075] ConvEncoder: input_channels=3 [2024-08-05 08:25:57,735][04090] Worker 1 uses CPU cores [1] [2024-08-05 08:25:57,759][04091] Worker 2 uses CPU cores [0] [2024-08-05 08:25:57,772][04094] Worker 5 uses CPU cores [1] [2024-08-05 08:25:57,839][04096] Worker 7 uses CPU cores [1] [2024-08-05 08:25:57,843][04092] Worker 3 uses CPU cores [1] [2024-08-05 08:25:57,942][04075] Conv encoder output size: 512 [2024-08-05 08:25:57,942][04075] Policy head output size: 512 [2024-08-05 08:25:57,958][04075] Created Actor Critic model with architecture: [2024-08-05 08:25:57,958][04075] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-05 08:26:01,724][04075] Using optimizer [2024-08-05 08:26:01,726][04075] No checkpoints found [2024-08-05 08:26:01,727][04075] Did not load from checkpoint, starting from scratch! [2024-08-05 08:26:01,727][04075] Initialized policy 0 weights for model version 0 [2024-08-05 08:26:01,732][04075] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-05 08:26:01,742][04075] LearnerWorker_p0 finished initialization! [2024-08-05 08:26:01,840][04088] RunningMeanStd input shape: (3, 72, 128) [2024-08-05 08:26:01,842][04088] RunningMeanStd input shape: (1,) [2024-08-05 08:26:01,858][04088] ConvEncoder: input_channels=3 [2024-08-05 08:26:01,966][04088] Conv encoder output size: 512 [2024-08-05 08:26:01,966][04088] Policy head output size: 512 [2024-08-05 08:26:02,697][00332] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-05 08:26:03,484][00332] Inference worker 0-0 is ready! [2024-08-05 08:26:03,486][00332] All inference workers are ready! Signal rollout workers to start! [2024-08-05 08:26:03,602][04093] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,603][04091] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,635][04089] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,642][04095] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,692][04094] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,703][04096] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,702][04092] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:03,725][04090] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:26:04,832][04092] Decorrelating experience for 0 frames... [2024-08-05 08:26:04,830][04096] Decorrelating experience for 0 frames... [2024-08-05 08:26:05,088][04095] Decorrelating experience for 0 frames... [2024-08-05 08:26:05,092][04091] Decorrelating experience for 0 frames... [2024-08-05 08:26:05,099][04093] Decorrelating experience for 0 frames... [2024-08-05 08:26:05,101][04089] Decorrelating experience for 0 frames... [2024-08-05 08:26:05,398][04092] Decorrelating experience for 32 frames... [2024-08-05 08:26:06,074][04090] Decorrelating experience for 0 frames... [2024-08-05 08:26:06,099][04094] Decorrelating experience for 0 frames... [2024-08-05 08:26:06,224][04095] Decorrelating experience for 32 frames... [2024-08-05 08:26:06,250][04091] Decorrelating experience for 32 frames... [2024-08-05 08:26:06,339][00332] Heartbeat connected on Batcher_0 [2024-08-05 08:26:06,342][00332] Heartbeat connected on LearnerWorker_p0 [2024-08-05 08:26:06,390][00332] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-05 08:26:07,011][04089] Decorrelating experience for 32 frames... [2024-08-05 08:26:07,697][00332] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-05 08:26:08,017][04094] Decorrelating experience for 32 frames... [2024-08-05 08:26:08,030][04096] Decorrelating experience for 32 frames... [2024-08-05 08:26:08,127][04093] Decorrelating experience for 32 frames... [2024-08-05 08:26:08,300][04090] Decorrelating experience for 32 frames... [2024-08-05 08:26:08,356][04091] Decorrelating experience for 64 frames... [2024-08-05 08:26:08,360][04095] Decorrelating experience for 64 frames... [2024-08-05 08:26:09,496][04092] Decorrelating experience for 64 frames... [2024-08-05 08:26:09,686][04089] Decorrelating experience for 64 frames... [2024-08-05 08:26:09,756][04096] Decorrelating experience for 64 frames... [2024-08-05 08:26:09,995][04091] Decorrelating experience for 96 frames... [2024-08-05 08:26:09,998][04095] Decorrelating experience for 96 frames... [2024-08-05 08:26:10,228][00332] Heartbeat connected on RolloutWorker_w6 [2024-08-05 08:26:10,236][00332] Heartbeat connected on RolloutWorker_w2 [2024-08-05 08:26:11,355][04094] Decorrelating experience for 64 frames... [2024-08-05 08:26:11,530][04090] Decorrelating experience for 64 frames... [2024-08-05 08:26:11,708][04096] Decorrelating experience for 96 frames... [2024-08-05 08:26:12,120][00332] Heartbeat connected on RolloutWorker_w7 [2024-08-05 08:26:12,137][04093] Decorrelating experience for 64 frames... [2024-08-05 08:26:12,697][00332] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-05 08:26:12,753][04089] Decorrelating experience for 96 frames... [2024-08-05 08:26:12,874][00332] Heartbeat connected on RolloutWorker_w0 [2024-08-05 08:26:13,165][04094] Decorrelating experience for 96 frames... [2024-08-05 08:26:13,171][04092] Decorrelating experience for 96 frames... [2024-08-05 08:26:13,352][04090] Decorrelating experience for 96 frames... [2024-08-05 08:26:13,486][00332] Heartbeat connected on RolloutWorker_w5 [2024-08-05 08:26:13,497][00332] Heartbeat connected on RolloutWorker_w3 [2024-08-05 08:26:13,776][00332] Heartbeat connected on RolloutWorker_w1 [2024-08-05 08:26:16,171][04093] Decorrelating experience for 96 frames... [2024-08-05 08:26:16,760][00332] Heartbeat connected on RolloutWorker_w4 [2024-08-05 08:26:17,074][04075] Signal inference workers to stop experience collection... [2024-08-05 08:26:17,091][04088] InferenceWorker_p0-w0: stopping experience collection [2024-08-05 08:26:17,697][00332] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 106.3. Samples: 1594. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-05 08:26:17,702][00332] Avg episode reward: [(0, '2.463')] [2024-08-05 08:26:19,383][04075] Signal inference workers to resume experience collection... [2024-08-05 08:26:19,383][04088] InferenceWorker_p0-w0: resuming experience collection [2024-08-05 08:26:22,698][00332] Fps is (10 sec: 1638.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 194.9. Samples: 3898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2024-08-05 08:26:22,709][00332] Avg episode reward: [(0, '3.351')] [2024-08-05 08:26:27,697][00332] Fps is (10 sec: 3276.8, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 339.4. Samples: 8484. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:26:27,706][00332] Avg episode reward: [(0, '3.870')] [2024-08-05 08:26:29,904][04088] Updated weights for policy 0, policy_version 10 (0.0019) [2024-08-05 08:26:32,697][00332] Fps is (10 sec: 3686.7, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 53248. Throughput: 0: 377.7. Samples: 11332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:26:32,700][00332] Avg episode reward: [(0, '4.395')] [2024-08-05 08:26:37,698][00332] Fps is (10 sec: 4505.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 526.9. Samples: 18440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:26:37,704][00332] Avg episode reward: [(0, '4.328')] [2024-08-05 08:26:38,644][04088] Updated weights for policy 0, policy_version 20 (0.0021) [2024-08-05 08:26:42,697][00332] Fps is (10 sec: 4096.1, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 599.1. Samples: 23964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:26:42,700][00332] Avg episode reward: [(0, '4.332')] [2024-08-05 08:26:47,697][00332] Fps is (10 sec: 3276.9, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 110592. Throughput: 0: 583.5. Samples: 26258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:26:47,705][00332] Avg episode reward: [(0, '4.295')] [2024-08-05 08:26:47,708][04075] Saving new best policy, reward=4.295! [2024-08-05 08:26:50,247][04088] Updated weights for policy 0, policy_version 30 (0.0032) [2024-08-05 08:26:52,697][00332] Fps is (10 sec: 3686.4, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 131072. Throughput: 0: 729.6. Samples: 32834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:26:52,706][00332] Avg episode reward: [(0, '4.409')] [2024-08-05 08:26:52,780][04075] Saving new best policy, reward=4.409! [2024-08-05 08:26:57,698][00332] Fps is (10 sec: 4095.7, 60 sec: 2755.4, 300 sec: 2755.4). Total num frames: 151552. Throughput: 0: 876.2. Samples: 39428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:26:57,700][00332] Avg episode reward: [(0, '4.511')] [2024-08-05 08:26:57,802][04075] Saving new best policy, reward=4.511! [2024-08-05 08:27:00,349][04088] Updated weights for policy 0, policy_version 40 (0.0032) [2024-08-05 08:27:02,701][00332] Fps is (10 sec: 3684.9, 60 sec: 2798.8, 300 sec: 2798.8). Total num frames: 167936. Throughput: 0: 886.5. Samples: 41492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:27:02,706][00332] Avg episode reward: [(0, '4.580')] [2024-08-05 08:27:02,718][04075] Saving new best policy, reward=4.580! [2024-08-05 08:27:07,697][00332] Fps is (10 sec: 4096.4, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 965.0. Samples: 47322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:27:07,703][00332] Avg episode reward: [(0, '4.430')] [2024-08-05 08:27:10,198][04088] Updated weights for policy 0, policy_version 50 (0.0023) [2024-08-05 08:27:12,697][00332] Fps is (10 sec: 4507.4, 60 sec: 3549.9, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 1021.0. Samples: 54428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:27:12,704][00332] Avg episode reward: [(0, '4.261')] [2024-08-05 08:27:17,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3058.4). Total num frames: 229376. Throughput: 0: 1015.2. Samples: 57018. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:27:17,700][00332] Avg episode reward: [(0, '4.316')] [2024-08-05 08:27:21,640][04088] Updated weights for policy 0, policy_version 60 (0.0014) [2024-08-05 08:27:22,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 964.5. Samples: 61844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:27:22,699][00332] Avg episode reward: [(0, '4.376')] [2024-08-05 08:27:27,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 997.9. Samples: 68868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:27:27,700][00332] Avg episode reward: [(0, '4.557')] [2024-08-05 08:27:30,373][04088] Updated weights for policy 0, policy_version 70 (0.0014) [2024-08-05 08:27:32,701][00332] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3231.1). Total num frames: 290816. Throughput: 0: 1026.0. Samples: 72430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:27:32,709][00332] Avg episode reward: [(0, '4.452')] [2024-08-05 08:27:37,702][00332] Fps is (10 sec: 3684.7, 60 sec: 3822.7, 300 sec: 3233.5). Total num frames: 307200. Throughput: 0: 982.5. Samples: 77050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:27:37,704][00332] Avg episode reward: [(0, '4.423')] [2024-08-05 08:27:41,763][04088] Updated weights for policy 0, policy_version 80 (0.0037) [2024-08-05 08:27:42,697][00332] Fps is (10 sec: 4097.7, 60 sec: 3959.5, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 980.1. Samples: 83532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:27:42,702][00332] Avg episode reward: [(0, '4.532')] [2024-08-05 08:27:42,714][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000081_331776.pth... [2024-08-05 08:27:47,697][00332] Fps is (10 sec: 4507.7, 60 sec: 4027.7, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 1010.2. Samples: 86948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:27:47,700][00332] Avg episode reward: [(0, '4.367')] [2024-08-05 08:27:52,054][04088] Updated weights for policy 0, policy_version 90 (0.0018) [2024-08-05 08:27:52,704][00332] Fps is (10 sec: 3683.8, 60 sec: 3959.0, 300 sec: 3351.1). Total num frames: 368640. Throughput: 0: 1000.0. Samples: 92328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:27:52,707][00332] Avg episode reward: [(0, '4.484')] [2024-08-05 08:27:57,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 966.8. Samples: 97932. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:27:57,702][00332] Avg episode reward: [(0, '4.797')] [2024-08-05 08:27:57,707][04075] Saving new best policy, reward=4.797! [2024-08-05 08:28:01,914][04088] Updated weights for policy 0, policy_version 100 (0.0032) [2024-08-05 08:28:02,697][00332] Fps is (10 sec: 4098.9, 60 sec: 4028.0, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 986.8. Samples: 101424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:28:02,702][00332] Avg episode reward: [(0, '4.616')] [2024-08-05 08:28:07,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 1023.6. Samples: 107904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:28:07,699][00332] Avg episode reward: [(0, '4.658')] [2024-08-05 08:28:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 971.0. Samples: 112564. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:28:12,704][00332] Avg episode reward: [(0, '4.747')] [2024-08-05 08:28:12,970][04088] Updated weights for policy 0, policy_version 110 (0.0031) [2024-08-05 08:28:17,697][00332] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 971.3. Samples: 116134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:28:17,700][00332] Avg episode reward: [(0, '4.821')] [2024-08-05 08:28:17,705][04075] Saving new best policy, reward=4.821! [2024-08-05 08:28:21,696][04088] Updated weights for policy 0, policy_version 120 (0.0019) [2024-08-05 08:28:22,697][00332] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 1022.5. Samples: 123060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:28:22,702][00332] Avg episode reward: [(0, '4.706')] [2024-08-05 08:28:27,699][00332] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 3502.7). Total num frames: 507904. Throughput: 0: 982.4. Samples: 127740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:28:27,703][00332] Avg episode reward: [(0, '4.715')] [2024-08-05 08:28:32,697][00332] Fps is (10 sec: 3686.5, 60 sec: 3959.7, 300 sec: 3522.6). Total num frames: 528384. Throughput: 0: 969.4. Samples: 130570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:28:32,699][00332] Avg episode reward: [(0, '4.749')] [2024-08-05 08:28:33,186][04088] Updated weights for policy 0, policy_version 130 (0.0025) [2024-08-05 08:28:37,697][00332] Fps is (10 sec: 4506.5, 60 sec: 4096.3, 300 sec: 3567.5). Total num frames: 552960. Throughput: 0: 1010.6. Samples: 137798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:28:37,701][00332] Avg episode reward: [(0, '4.403')] [2024-08-05 08:28:42,698][00332] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 1008.6. Samples: 143320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:28:42,703][00332] Avg episode reward: [(0, '4.554')] [2024-08-05 08:28:43,049][04088] Updated weights for policy 0, policy_version 140 (0.0019) [2024-08-05 08:28:47,698][00332] Fps is (10 sec: 3276.5, 60 sec: 3891.1, 300 sec: 3549.8). Total num frames: 585728. Throughput: 0: 980.0. Samples: 145524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:28:47,706][00332] Avg episode reward: [(0, '4.700')] [2024-08-05 08:28:52,697][00332] Fps is (10 sec: 4096.1, 60 sec: 4028.2, 300 sec: 3590.0). Total num frames: 610304. Throughput: 0: 984.3. Samples: 152196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:28:52,701][00332] Avg episode reward: [(0, '4.654')] [2024-08-05 08:28:53,065][04088] Updated weights for policy 0, policy_version 150 (0.0026) [2024-08-05 08:28:57,697][00332] Fps is (10 sec: 4506.1, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 1024.3. Samples: 158656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:28:57,701][00332] Avg episode reward: [(0, '4.906')] [2024-08-05 08:28:57,712][04075] Saving new best policy, reward=4.906! [2024-08-05 08:29:02,698][00332] Fps is (10 sec: 3276.6, 60 sec: 3891.2, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 992.9. Samples: 160814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:29:02,706][00332] Avg episode reward: [(0, '4.975')] [2024-08-05 08:29:02,721][04075] Saving new best policy, reward=4.975! [2024-08-05 08:29:04,661][04088] Updated weights for policy 0, policy_version 160 (0.0015) [2024-08-05 08:29:07,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3608.9). Total num frames: 667648. Throughput: 0: 965.2. Samples: 166496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:29:07,699][00332] Avg episode reward: [(0, '4.988')] [2024-08-05 08:29:07,707][04075] Saving new best policy, reward=4.988! [2024-08-05 08:29:12,697][00332] Fps is (10 sec: 4915.6, 60 sec: 4096.0, 300 sec: 3643.3). Total num frames: 692224. Throughput: 0: 1018.9. Samples: 173590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:29:12,699][00332] Avg episode reward: [(0, '4.917')] [2024-08-05 08:29:13,770][04088] Updated weights for policy 0, policy_version 170 (0.0025) [2024-08-05 08:29:17,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3612.9). Total num frames: 704512. Throughput: 0: 1012.1. Samples: 176116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:29:17,701][00332] Avg episode reward: [(0, '4.893')] [2024-08-05 08:29:22,697][00332] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3625.0). Total num frames: 724992. Throughput: 0: 959.1. Samples: 180956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:29:22,703][00332] Avg episode reward: [(0, '4.733')] [2024-08-05 08:29:24,953][04088] Updated weights for policy 0, policy_version 180 (0.0021) [2024-08-05 08:29:27,697][00332] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 993.5. Samples: 188026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:29:27,703][00332] Avg episode reward: [(0, '4.723')] [2024-08-05 08:29:32,697][00332] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3666.9). Total num frames: 770048. Throughput: 0: 1024.8. Samples: 191638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:29:32,705][00332] Avg episode reward: [(0, '4.898')] [2024-08-05 08:29:35,182][04088] Updated weights for policy 0, policy_version 190 (0.0017) [2024-08-05 08:29:37,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3638.8). Total num frames: 782336. Throughput: 0: 973.7. Samples: 196012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:29:37,699][00332] Avg episode reward: [(0, '5.105')] [2024-08-05 08:29:37,772][04075] Saving new best policy, reward=5.105! [2024-08-05 08:29:42,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3667.8). Total num frames: 806912. Throughput: 0: 975.0. Samples: 202532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:29:42,700][00332] Avg episode reward: [(0, '5.128')] [2024-08-05 08:29:42,711][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000197_806912.pth... [2024-08-05 08:29:42,833][04075] Saving new best policy, reward=5.128! [2024-08-05 08:29:45,082][04088] Updated weights for policy 0, policy_version 200 (0.0017) [2024-08-05 08:29:47,697][00332] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3677.3). Total num frames: 827392. Throughput: 0: 1001.7. Samples: 205888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:29:47,704][00332] Avg episode reward: [(0, '5.223')] [2024-08-05 08:29:47,799][04075] Saving new best policy, reward=5.223! [2024-08-05 08:29:52,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 992.6. Samples: 211164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:29:52,704][00332] Avg episode reward: [(0, '5.295')] [2024-08-05 08:29:52,718][04075] Saving new best policy, reward=5.295! [2024-08-05 08:29:56,653][04088] Updated weights for policy 0, policy_version 210 (0.0022) [2024-08-05 08:29:57,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3660.3). Total num frames: 860160. Throughput: 0: 951.5. Samples: 216406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:29:57,701][00332] Avg episode reward: [(0, '5.272')] [2024-08-05 08:30:02,697][00332] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3686.4). Total num frames: 884736. Throughput: 0: 971.8. Samples: 219846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:30:02,701][00332] Avg episode reward: [(0, '5.480')] [2024-08-05 08:30:02,713][04075] Saving new best policy, reward=5.480! [2024-08-05 08:30:05,967][04088] Updated weights for policy 0, policy_version 220 (0.0012) [2024-08-05 08:30:07,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3694.8). Total num frames: 905216. Throughput: 0: 1004.7. Samples: 226166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:30:07,702][00332] Avg episode reward: [(0, '5.373')] [2024-08-05 08:30:12,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 935.8. Samples: 230138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:30:12,700][00332] Avg episode reward: [(0, '5.109')] [2024-08-05 08:30:17,697][00332] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3678.4). Total num frames: 937984. Throughput: 0: 926.2. Samples: 233318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:30:17,700][00332] Avg episode reward: [(0, '5.340')] [2024-08-05 08:30:18,151][04088] Updated weights for policy 0, policy_version 230 (0.0029) [2024-08-05 08:30:22,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 958464. Throughput: 0: 971.6. Samples: 239734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:30:22,699][00332] Avg episode reward: [(0, '5.529')] [2024-08-05 08:30:22,713][04075] Saving new best policy, reward=5.529! [2024-08-05 08:30:27,699][00332] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3678.6). Total num frames: 974848. Throughput: 0: 928.7. Samples: 244324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:30:27,705][00332] Avg episode reward: [(0, '5.616')] [2024-08-05 08:30:27,710][04075] Saving new best policy, reward=5.616! [2024-08-05 08:30:30,748][04088] Updated weights for policy 0, policy_version 240 (0.0028) [2024-08-05 08:30:32,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3671.2). Total num frames: 991232. Throughput: 0: 896.3. Samples: 246222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:30:32,701][00332] Avg episode reward: [(0, '5.807')] [2024-08-05 08:30:32,710][04075] Saving new best policy, reward=5.807! [2024-08-05 08:30:37,698][00332] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3678.9). Total num frames: 1011712. Throughput: 0: 923.5. Samples: 252722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:30:37,701][00332] Avg episode reward: [(0, '5.888')] [2024-08-05 08:30:37,706][04075] Saving new best policy, reward=5.888! [2024-08-05 08:30:40,138][04088] Updated weights for policy 0, policy_version 250 (0.0021) [2024-08-05 08:30:42,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3671.8). Total num frames: 1028096. Throughput: 0: 936.2. Samples: 258536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:30:42,705][00332] Avg episode reward: [(0, '6.046')] [2024-08-05 08:30:42,806][04075] Saving new best policy, reward=6.046! [2024-08-05 08:30:47,698][00332] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3664.8). Total num frames: 1044480. Throughput: 0: 901.0. Samples: 260394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:30:47,701][00332] Avg episode reward: [(0, '6.510')] [2024-08-05 08:30:47,703][04075] Saving new best policy, reward=6.510! [2024-08-05 08:30:52,510][04088] Updated weights for policy 0, policy_version 260 (0.0018) [2024-08-05 08:30:52,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3672.3). Total num frames: 1064960. Throughput: 0: 876.9. Samples: 265626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:30:52,705][00332] Avg episode reward: [(0, '7.304')] [2024-08-05 08:30:52,719][04075] Saving new best policy, reward=7.304! [2024-08-05 08:30:57,697][00332] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 933.4. Samples: 272140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:30:57,699][00332] Avg episode reward: [(0, '7.578')] [2024-08-05 08:30:57,708][04075] Saving new best policy, reward=7.578! [2024-08-05 08:31:02,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 913.1. Samples: 274406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:31:02,703][00332] Avg episode reward: [(0, '8.368')] [2024-08-05 08:31:02,712][04075] Saving new best policy, reward=8.368! [2024-08-05 08:31:04,383][04088] Updated weights for policy 0, policy_version 270 (0.0024) [2024-08-05 08:31:07,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 870.8. Samples: 278920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:31:07,699][00332] Avg episode reward: [(0, '8.107')] [2024-08-05 08:31:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1138688. Throughput: 0: 916.8. Samples: 285580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:31:12,700][00332] Avg episode reward: [(0, '8.367')] [2024-08-05 08:31:13,837][04088] Updated weights for policy 0, policy_version 280 (0.0019) [2024-08-05 08:31:17,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3873.9). Total num frames: 1159168. Throughput: 0: 949.8. Samples: 288964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:31:17,702][00332] Avg episode reward: [(0, '7.747')] [2024-08-05 08:31:22,700][00332] Fps is (10 sec: 3275.8, 60 sec: 3549.7, 300 sec: 3859.9). Total num frames: 1171456. Throughput: 0: 899.5. Samples: 293200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:31:22,710][00332] Avg episode reward: [(0, '7.525')] [2024-08-05 08:31:25,665][04088] Updated weights for policy 0, policy_version 290 (0.0036) [2024-08-05 08:31:27,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3873.8). Total num frames: 1196032. Throughput: 0: 912.1. Samples: 299580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:31:27,705][00332] Avg episode reward: [(0, '7.032')] [2024-08-05 08:31:32,697][00332] Fps is (10 sec: 4916.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1220608. Throughput: 0: 949.7. Samples: 303130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:31:32,702][00332] Avg episode reward: [(0, '7.234')] [2024-08-05 08:31:35,331][04088] Updated weights for policy 0, policy_version 300 (0.0025) [2024-08-05 08:31:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1232896. Throughput: 0: 954.8. Samples: 308590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:31:37,704][00332] Avg episode reward: [(0, '7.785')] [2024-08-05 08:31:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1253376. Throughput: 0: 932.3. Samples: 314094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:31:42,703][00332] Avg episode reward: [(0, '8.571')] [2024-08-05 08:31:42,714][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth... [2024-08-05 08:31:42,852][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000081_331776.pth [2024-08-05 08:31:42,861][04075] Saving new best policy, reward=8.571! [2024-08-05 08:31:45,881][04088] Updated weights for policy 0, policy_version 310 (0.0016) [2024-08-05 08:31:47,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 1273856. Throughput: 0: 955.0. Samples: 317380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:31:47,699][00332] Avg episode reward: [(0, '8.879')] [2024-08-05 08:31:47,774][04075] Saving new best policy, reward=8.879! [2024-08-05 08:31:52,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 1294336. Throughput: 0: 990.4. Samples: 323486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:31:52,703][00332] Avg episode reward: [(0, '8.843')] [2024-08-05 08:31:57,527][04088] Updated weights for policy 0, policy_version 320 (0.0027) [2024-08-05 08:31:57,697][00332] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 1310720. Throughput: 0: 942.7. Samples: 328000. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:31:57,700][00332] Avg episode reward: [(0, '8.718')] [2024-08-05 08:32:02,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1331200. Throughput: 0: 946.0. Samples: 331536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:32:02,699][00332] Avg episode reward: [(0, '8.990')] [2024-08-05 08:32:02,714][04075] Saving new best policy, reward=8.990! [2024-08-05 08:32:06,332][04088] Updated weights for policy 0, policy_version 330 (0.0022) [2024-08-05 08:32:07,697][00332] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1355776. Throughput: 0: 1007.8. Samples: 338546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:32:07,701][00332] Avg episode reward: [(0, '10.185')] [2024-08-05 08:32:07,706][04075] Saving new best policy, reward=10.185! [2024-08-05 08:32:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1368064. Throughput: 0: 967.1. Samples: 343098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:32:12,701][00332] Avg episode reward: [(0, '10.937')] [2024-08-05 08:32:12,715][04075] Saving new best policy, reward=10.937! [2024-08-05 08:32:17,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1388544. Throughput: 0: 945.2. Samples: 345662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:32:17,704][00332] Avg episode reward: [(0, '11.296')] [2024-08-05 08:32:17,707][04075] Saving new best policy, reward=11.296! [2024-08-05 08:32:17,993][04088] Updated weights for policy 0, policy_version 340 (0.0012) [2024-08-05 08:32:22,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3860.0). Total num frames: 1409024. Throughput: 0: 975.2. Samples: 352474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:32:22,701][00332] Avg episode reward: [(0, '11.625')] [2024-08-05 08:32:22,709][04075] Saving new best policy, reward=11.625! [2024-08-05 08:32:27,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1429504. Throughput: 0: 978.4. Samples: 358122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:32:27,700][00332] Avg episode reward: [(0, '12.038')] [2024-08-05 08:32:27,705][04075] Saving new best policy, reward=12.038! [2024-08-05 08:32:28,825][04088] Updated weights for policy 0, policy_version 350 (0.0016) [2024-08-05 08:32:32,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1445888. Throughput: 0: 952.0. Samples: 360222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:32:32,700][00332] Avg episode reward: [(0, '12.269')] [2024-08-05 08:32:32,711][04075] Saving new best policy, reward=12.269! [2024-08-05 08:32:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1466368. Throughput: 0: 962.7. Samples: 366806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:32:37,701][00332] Avg episode reward: [(0, '11.333')] [2024-08-05 08:32:38,463][04088] Updated weights for policy 0, policy_version 360 (0.0016) [2024-08-05 08:32:42,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1490944. Throughput: 0: 1010.5. Samples: 373472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:32:42,706][00332] Avg episode reward: [(0, '11.694')] [2024-08-05 08:32:47,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.2). Total num frames: 1503232. Throughput: 0: 979.4. Samples: 375608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:32:47,700][00332] Avg episode reward: [(0, '11.559')] [2024-08-05 08:32:49,994][04088] Updated weights for policy 0, policy_version 370 (0.0033) [2024-08-05 08:32:52,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1527808. Throughput: 0: 945.7. Samples: 381104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:32:52,704][00332] Avg episode reward: [(0, '11.610')] [2024-08-05 08:32:57,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1548288. Throughput: 0: 1000.3. Samples: 388110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:32:57,700][00332] Avg episode reward: [(0, '12.697')] [2024-08-05 08:32:57,708][04075] Saving new best policy, reward=12.697! [2024-08-05 08:32:59,037][04088] Updated weights for policy 0, policy_version 380 (0.0023) [2024-08-05 08:33:02,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1564672. Throughput: 0: 1002.6. Samples: 390780. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:33:02,703][00332] Avg episode reward: [(0, '13.145')] [2024-08-05 08:33:02,724][04075] Saving new best policy, reward=13.145! [2024-08-05 08:33:07,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1581056. Throughput: 0: 952.8. Samples: 395348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:33:07,700][00332] Avg episode reward: [(0, '13.483')] [2024-08-05 08:33:07,704][04075] Saving new best policy, reward=13.483! [2024-08-05 08:33:10,470][04088] Updated weights for policy 0, policy_version 390 (0.0033) [2024-08-05 08:33:12,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1605632. Throughput: 0: 982.0. Samples: 402314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:33:12,700][00332] Avg episode reward: [(0, '12.357')] [2024-08-05 08:33:17,699][00332] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 1626112. Throughput: 0: 1014.1. Samples: 405860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:33:17,701][00332] Avg episode reward: [(0, '10.607')] [2024-08-05 08:33:21,127][04088] Updated weights for policy 0, policy_version 400 (0.0014) [2024-08-05 08:33:22,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1642496. Throughput: 0: 968.2. Samples: 410374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:33:22,702][00332] Avg episode reward: [(0, '10.714')] [2024-08-05 08:33:27,697][00332] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1662976. Throughput: 0: 960.5. Samples: 416696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:33:27,700][00332] Avg episode reward: [(0, '10.781')] [2024-08-05 08:33:30,726][04088] Updated weights for policy 0, policy_version 410 (0.0017) [2024-08-05 08:33:32,697][00332] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1687552. Throughput: 0: 990.8. Samples: 420196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:33:32,702][00332] Avg episode reward: [(0, '12.692')] [2024-08-05 08:33:37,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1703936. Throughput: 0: 992.3. Samples: 425756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:33:37,703][00332] Avg episode reward: [(0, '12.734')] [2024-08-05 08:33:42,004][04088] Updated weights for policy 0, policy_version 420 (0.0029) [2024-08-05 08:33:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1720320. Throughput: 0: 954.8. Samples: 431076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:33:42,707][00332] Avg episode reward: [(0, '14.219')] [2024-08-05 08:33:42,717][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_1720320.pth... [2024-08-05 08:33:42,876][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000197_806912.pth [2024-08-05 08:33:42,895][04075] Saving new best policy, reward=14.219! [2024-08-05 08:33:47,697][00332] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1744896. Throughput: 0: 970.7. Samples: 434462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:33:47,705][00332] Avg episode reward: [(0, '15.489')] [2024-08-05 08:33:47,708][04075] Saving new best policy, reward=15.489! [2024-08-05 08:33:51,234][04088] Updated weights for policy 0, policy_version 430 (0.0020) [2024-08-05 08:33:52,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1765376. Throughput: 0: 1012.5. Samples: 440910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:33:52,705][00332] Avg episode reward: [(0, '15.396')] [2024-08-05 08:33:57,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1777664. Throughput: 0: 953.6. Samples: 445226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:33:57,702][00332] Avg episode reward: [(0, '15.311')] [2024-08-05 08:34:02,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1798144. Throughput: 0: 947.7. Samples: 448504. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:34:02,705][00332] Avg episode reward: [(0, '15.755')] [2024-08-05 08:34:02,712][04075] Saving new best policy, reward=15.755! [2024-08-05 08:34:02,992][04088] Updated weights for policy 0, policy_version 440 (0.0014) [2024-08-05 08:34:07,697][00332] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1822720. Throughput: 0: 995.7. Samples: 455180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:34:07,705][00332] Avg episode reward: [(0, '17.525')] [2024-08-05 08:34:07,707][04075] Saving new best policy, reward=17.525! [2024-08-05 08:34:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1835008. Throughput: 0: 962.4. Samples: 460004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:34:12,710][00332] Avg episode reward: [(0, '16.484')] [2024-08-05 08:34:14,453][04088] Updated weights for policy 0, policy_version 450 (0.0017) [2024-08-05 08:34:17,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1855488. Throughput: 0: 936.5. Samples: 462338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:34:17,701][00332] Avg episode reward: [(0, '16.137')] [2024-08-05 08:34:22,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1875968. Throughput: 0: 964.9. Samples: 469176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:34:22,700][00332] Avg episode reward: [(0, '17.757')] [2024-08-05 08:34:22,712][04075] Saving new best policy, reward=17.757! [2024-08-05 08:34:24,004][04088] Updated weights for policy 0, policy_version 460 (0.0015) [2024-08-05 08:34:27,698][00332] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 1896448. Throughput: 0: 971.4. Samples: 474792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:34:27,704][00332] Avg episode reward: [(0, '17.737')] [2024-08-05 08:34:32,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1908736. Throughput: 0: 939.6. Samples: 476746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:34:32,700][00332] Avg episode reward: [(0, '16.743')] [2024-08-05 08:34:35,931][04088] Updated weights for policy 0, policy_version 470 (0.0019) [2024-08-05 08:34:37,697][00332] Fps is (10 sec: 3277.1, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1929216. Throughput: 0: 922.8. Samples: 482434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:34:37,699][00332] Avg episode reward: [(0, '17.132')] [2024-08-05 08:34:42,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1953792. Throughput: 0: 976.1. Samples: 489150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:34:42,702][00332] Avg episode reward: [(0, '17.448')] [2024-08-05 08:34:46,480][04088] Updated weights for policy 0, policy_version 480 (0.0021) [2024-08-05 08:34:47,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1966080. Throughput: 0: 949.7. Samples: 491242. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:34:47,703][00332] Avg episode reward: [(0, '16.227')] [2024-08-05 08:34:52,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1986560. Throughput: 0: 914.1. Samples: 496314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:34:52,700][00332] Avg episode reward: [(0, '15.973')] [2024-08-05 08:34:56,702][04088] Updated weights for policy 0, policy_version 490 (0.0041) [2024-08-05 08:34:57,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2011136. Throughput: 0: 958.8. Samples: 503150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:34:57,703][00332] Avg episode reward: [(0, '17.162')] [2024-08-05 08:35:02,698][00332] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2027520. Throughput: 0: 972.3. Samples: 506094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:35:02,700][00332] Avg episode reward: [(0, '16.266')] [2024-08-05 08:35:07,697][00332] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2039808. Throughput: 0: 915.5. Samples: 510372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:35:07,699][00332] Avg episode reward: [(0, '15.152')] [2024-08-05 08:35:08,511][04088] Updated weights for policy 0, policy_version 500 (0.0026) [2024-08-05 08:35:12,697][00332] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2064384. Throughput: 0: 938.8. Samples: 517036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:35:12,706][00332] Avg episode reward: [(0, '15.444')] [2024-08-05 08:35:17,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2084864. Throughput: 0: 972.6. Samples: 520514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:35:17,700][00332] Avg episode reward: [(0, '14.480')] [2024-08-05 08:35:17,721][04088] Updated weights for policy 0, policy_version 510 (0.0019) [2024-08-05 08:35:22,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2101248. Throughput: 0: 951.3. Samples: 525244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:35:22,700][00332] Avg episode reward: [(0, '15.630')] [2024-08-05 08:35:27,697][00332] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2121728. Throughput: 0: 927.3. Samples: 530878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:35:27,704][00332] Avg episode reward: [(0, '16.250')] [2024-08-05 08:35:29,226][04088] Updated weights for policy 0, policy_version 520 (0.0018) [2024-08-05 08:35:32,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2142208. Throughput: 0: 958.8. Samples: 534386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:35:32,705][00332] Avg episode reward: [(0, '16.736')] [2024-08-05 08:35:37,697][00332] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2158592. Throughput: 0: 977.7. Samples: 540312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:35:37,705][00332] Avg episode reward: [(0, '17.605')] [2024-08-05 08:35:40,832][04088] Updated weights for policy 0, policy_version 530 (0.0028) [2024-08-05 08:35:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2174976. Throughput: 0: 929.9. Samples: 544994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:35:42,706][00332] Avg episode reward: [(0, '17.160')] [2024-08-05 08:35:42,716][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth... [2024-08-05 08:35:42,836][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000306_1253376.pth [2024-08-05 08:35:47,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2199552. Throughput: 0: 936.8. Samples: 548250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:35:47,700][00332] Avg episode reward: [(0, '17.264')] [2024-08-05 08:35:50,022][04088] Updated weights for policy 0, policy_version 540 (0.0018) [2024-08-05 08:35:52,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 995.2. Samples: 555156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:35:52,700][00332] Avg episode reward: [(0, '17.987')] [2024-08-05 08:35:52,718][04075] Saving new best policy, reward=17.987! [2024-08-05 08:35:57,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2236416. Throughput: 0: 941.7. Samples: 559414. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:35:57,701][00332] Avg episode reward: [(0, '17.701')] [2024-08-05 08:36:01,857][04088] Updated weights for policy 0, policy_version 550 (0.0018) [2024-08-05 08:36:02,699][00332] Fps is (10 sec: 3276.2, 60 sec: 3754.6, 300 sec: 3846.0). Total num frames: 2252800. Throughput: 0: 924.0. Samples: 562096. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:36:02,706][00332] Avg episode reward: [(0, '18.408')] [2024-08-05 08:36:02,714][04075] Saving new best policy, reward=18.408! [2024-08-05 08:36:07,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2277376. Throughput: 0: 967.1. Samples: 568764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:36:07,706][00332] Avg episode reward: [(0, '17.999')] [2024-08-05 08:36:12,281][04088] Updated weights for policy 0, policy_version 560 (0.0023) [2024-08-05 08:36:12,697][00332] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2293760. Throughput: 0: 962.3. Samples: 574180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:36:12,705][00332] Avg episode reward: [(0, '19.214')] [2024-08-05 08:36:12,719][04075] Saving new best policy, reward=19.214! [2024-08-05 08:36:17,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2310144. Throughput: 0: 931.0. Samples: 576280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:36:17,703][00332] Avg episode reward: [(0, '18.475')] [2024-08-05 08:36:22,542][04088] Updated weights for policy 0, policy_version 570 (0.0019) [2024-08-05 08:36:22,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2334720. Throughput: 0: 945.0. Samples: 582836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:36:22,699][00332] Avg episode reward: [(0, '18.577')] [2024-08-05 08:36:27,699][00332] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 2351104. Throughput: 0: 978.6. Samples: 589032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:36:27,704][00332] Avg episode reward: [(0, '17.430')] [2024-08-05 08:36:32,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2367488. Throughput: 0: 949.8. Samples: 590992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:36:32,701][00332] Avg episode reward: [(0, '17.096')] [2024-08-05 08:36:34,950][04088] Updated weights for policy 0, policy_version 580 (0.0040) [2024-08-05 08:36:37,697][00332] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2383872. Throughput: 0: 914.4. Samples: 596306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:36:37,701][00332] Avg episode reward: [(0, '17.764')] [2024-08-05 08:36:42,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2408448. Throughput: 0: 965.0. Samples: 602840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:36:42,699][00332] Avg episode reward: [(0, '17.505')] [2024-08-05 08:36:44,739][04088] Updated weights for policy 0, policy_version 590 (0.0033) [2024-08-05 08:36:47,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2424832. Throughput: 0: 960.8. Samples: 605332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:36:47,702][00332] Avg episode reward: [(0, '17.443')] [2024-08-05 08:36:52,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 2441216. Throughput: 0: 915.4. Samples: 609956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:36:52,703][00332] Avg episode reward: [(0, '18.985')] [2024-08-05 08:36:56,086][04088] Updated weights for policy 0, policy_version 600 (0.0019) [2024-08-05 08:36:57,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2461696. Throughput: 0: 943.2. Samples: 616624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:36:57,704][00332] Avg episode reward: [(0, '18.908')] [2024-08-05 08:37:02,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2482176. Throughput: 0: 968.8. Samples: 619874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:37:02,699][00332] Avg episode reward: [(0, '18.665')] [2024-08-05 08:37:07,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 2494464. Throughput: 0: 908.4. Samples: 623712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:37:07,699][00332] Avg episode reward: [(0, '18.727')] [2024-08-05 08:37:08,134][04088] Updated weights for policy 0, policy_version 610 (0.0037) [2024-08-05 08:37:12,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2514944. Throughput: 0: 901.8. Samples: 629612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:37:12,706][00332] Avg episode reward: [(0, '20.209')] [2024-08-05 08:37:12,718][04075] Saving new best policy, reward=20.209! [2024-08-05 08:37:17,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2535424. Throughput: 0: 928.8. Samples: 632790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:37:17,702][00332] Avg episode reward: [(0, '18.899')] [2024-08-05 08:37:18,081][04088] Updated weights for policy 0, policy_version 620 (0.0012) [2024-08-05 08:37:22,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2551808. Throughput: 0: 916.7. Samples: 637558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:37:22,701][00332] Avg episode reward: [(0, '19.177')] [2024-08-05 08:37:27,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 2568192. Throughput: 0: 895.0. Samples: 643116. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:37:27,700][00332] Avg episode reward: [(0, '18.548')] [2024-08-05 08:37:29,667][04088] Updated weights for policy 0, policy_version 630 (0.0020) [2024-08-05 08:37:32,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2592768. Throughput: 0: 915.5. Samples: 646530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:37:32,705][00332] Avg episode reward: [(0, '19.849')] [2024-08-05 08:37:37,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2609152. Throughput: 0: 944.3. Samples: 652450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:37:37,700][00332] Avg episode reward: [(0, '18.894')] [2024-08-05 08:37:41,274][04088] Updated weights for policy 0, policy_version 640 (0.0027) [2024-08-05 08:37:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2625536. Throughput: 0: 898.2. Samples: 657044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:37:42,702][00332] Avg episode reward: [(0, '18.510')] [2024-08-05 08:37:42,715][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth... [2024-08-05 08:37:42,837][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_1720320.pth [2024-08-05 08:37:47,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2646016. Throughput: 0: 898.7. Samples: 660316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-08-05 08:37:47,704][00332] Avg episode reward: [(0, '18.904')] [2024-08-05 08:37:50,575][04088] Updated weights for policy 0, policy_version 650 (0.0013) [2024-08-05 08:37:52,700][00332] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 2666496. Throughput: 0: 964.9. Samples: 667136. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-05 08:37:52,706][00332] Avg episode reward: [(0, '19.178')] [2024-08-05 08:37:57,697][00332] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2682880. Throughput: 0: 927.8. Samples: 671362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-05 08:37:57,702][00332] Avg episode reward: [(0, '18.834')] [2024-08-05 08:38:02,272][04088] Updated weights for policy 0, policy_version 660 (0.0020) [2024-08-05 08:38:02,697][00332] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2703360. Throughput: 0: 922.3. Samples: 674292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:38:02,699][00332] Avg episode reward: [(0, '19.937')] [2024-08-05 08:38:07,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2723840. Throughput: 0: 966.7. Samples: 681060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:38:07,700][00332] Avg episode reward: [(0, '20.674')] [2024-08-05 08:38:07,703][04075] Saving new best policy, reward=20.674! [2024-08-05 08:38:12,699][00332] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 2740224. Throughput: 0: 950.8. Samples: 685906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:38:12,702][00332] Avg episode reward: [(0, '19.540')] [2024-08-05 08:38:13,703][04088] Updated weights for policy 0, policy_version 670 (0.0019) [2024-08-05 08:38:17,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2756608. Throughput: 0: 920.8. Samples: 687968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:38:17,705][00332] Avg episode reward: [(0, '18.784')] [2024-08-05 08:38:22,697][00332] Fps is (10 sec: 3687.2, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2777088. Throughput: 0: 936.4. Samples: 694586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:38:22,700][00332] Avg episode reward: [(0, '17.794')] [2024-08-05 08:38:23,642][04088] Updated weights for policy 0, policy_version 680 (0.0015) [2024-08-05 08:38:27,697][00332] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2797568. Throughput: 0: 966.3. Samples: 700528. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:38:27,704][00332] Avg episode reward: [(0, '17.633')] [2024-08-05 08:38:32,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2813952. Throughput: 0: 939.3. Samples: 702586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:38:32,702][00332] Avg episode reward: [(0, '17.466')] [2024-08-05 08:38:35,467][04088] Updated weights for policy 0, policy_version 690 (0.0025) [2024-08-05 08:38:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2834432. Throughput: 0: 914.7. Samples: 708294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:38:37,700][00332] Avg episode reward: [(0, '16.799')] [2024-08-05 08:38:42,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2859008. Throughput: 0: 974.4. Samples: 715210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:38:42,704][00332] Avg episode reward: [(0, '16.665')] [2024-08-05 08:38:45,351][04088] Updated weights for policy 0, policy_version 700 (0.0015) [2024-08-05 08:38:47,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2871296. Throughput: 0: 959.2. Samples: 717456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:38:47,701][00332] Avg episode reward: [(0, '16.212')] [2024-08-05 08:38:52,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2891776. Throughput: 0: 919.4. Samples: 722434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:38:52,705][00332] Avg episode reward: [(0, '15.241')] [2024-08-05 08:38:56,092][04088] Updated weights for policy 0, policy_version 710 (0.0019) [2024-08-05 08:38:57,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2912256. Throughput: 0: 965.2. Samples: 729338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:38:57,701][00332] Avg episode reward: [(0, '16.091')] [2024-08-05 08:39:02,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2932736. Throughput: 0: 994.0. Samples: 732700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-05 08:39:02,704][00332] Avg episode reward: [(0, '16.955')] [2024-08-05 08:39:07,387][04088] Updated weights for policy 0, policy_version 720 (0.0018) [2024-08-05 08:39:07,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2949120. Throughput: 0: 944.3. Samples: 737080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:39:07,704][00332] Avg episode reward: [(0, '17.687')] [2024-08-05 08:39:12,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 2973696. Throughput: 0: 964.5. Samples: 743932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:39:12,704][00332] Avg episode reward: [(0, '18.140')] [2024-08-05 08:39:16,214][04088] Updated weights for policy 0, policy_version 730 (0.0015) [2024-08-05 08:39:17,699][00332] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3790.5). Total num frames: 2994176. Throughput: 0: 996.1. Samples: 747412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:39:17,704][00332] Avg episode reward: [(0, '17.934')] [2024-08-05 08:39:22,699][00332] Fps is (10 sec: 3276.1, 60 sec: 3822.8, 300 sec: 3762.8). Total num frames: 3006464. Throughput: 0: 979.2. Samples: 752358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:39:22,702][00332] Avg episode reward: [(0, '19.402')] [2024-08-05 08:39:27,542][04088] Updated weights for policy 0, policy_version 740 (0.0019) [2024-08-05 08:39:27,697][00332] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3031040. Throughput: 0: 956.0. Samples: 758228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:39:27,704][00332] Avg episode reward: [(0, '18.774')] [2024-08-05 08:39:32,697][00332] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3051520. Throughput: 0: 982.3. Samples: 761660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:39:32,700][00332] Avg episode reward: [(0, '19.417')] [2024-08-05 08:39:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 3067904. Throughput: 0: 996.8. Samples: 767290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:39:37,701][00332] Avg episode reward: [(0, '20.280')] [2024-08-05 08:39:38,449][04088] Updated weights for policy 0, policy_version 750 (0.0019) [2024-08-05 08:39:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3084288. Throughput: 0: 950.6. Samples: 772116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:39:42,705][00332] Avg episode reward: [(0, '21.072')] [2024-08-05 08:39:42,717][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth... [2024-08-05 08:39:42,839][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000531_2174976.pth [2024-08-05 08:39:42,868][04075] Saving new best policy, reward=21.072! [2024-08-05 08:39:47,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3104768. Throughput: 0: 945.1. Samples: 775230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:39:47,700][00332] Avg episode reward: [(0, '20.114')] [2024-08-05 08:39:48,601][04088] Updated weights for policy 0, policy_version 760 (0.0012) [2024-08-05 08:39:52,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 3125248. Throughput: 0: 991.9. Samples: 781716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:39:52,704][00332] Avg episode reward: [(0, '21.051')] [2024-08-05 08:39:57,705][00332] Fps is (10 sec: 3274.2, 60 sec: 3754.2, 300 sec: 3762.7). Total num frames: 3137536. Throughput: 0: 927.2. Samples: 785662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:39:57,709][00332] Avg episode reward: [(0, '20.487')] [2024-08-05 08:40:00,855][04088] Updated weights for policy 0, policy_version 770 (0.0033) [2024-08-05 08:40:02,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3162112. Throughput: 0: 919.0. Samples: 788766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:02,704][00332] Avg episode reward: [(0, '21.059')] [2024-08-05 08:40:07,697][00332] Fps is (10 sec: 4509.2, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3182592. Throughput: 0: 961.7. Samples: 795632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:07,705][00332] Avg episode reward: [(0, '21.146')] [2024-08-05 08:40:07,708][04075] Saving new best policy, reward=21.146! [2024-08-05 08:40:10,742][04088] Updated weights for policy 0, policy_version 780 (0.0015) [2024-08-05 08:40:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3198976. Throughput: 0: 938.0. Samples: 800436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-05 08:40:12,705][00332] Avg episode reward: [(0, '22.265')] [2024-08-05 08:40:12,716][04075] Saving new best policy, reward=22.265! [2024-08-05 08:40:17,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 3215360. Throughput: 0: 911.0. Samples: 802656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:40:17,700][00332] Avg episode reward: [(0, '21.429')] [2024-08-05 08:40:21,641][04088] Updated weights for policy 0, policy_version 790 (0.0029) [2024-08-05 08:40:22,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 3239936. Throughput: 0: 937.9. Samples: 809496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:22,700][00332] Avg episode reward: [(0, '22.377')] [2024-08-05 08:40:22,710][04075] Saving new best policy, reward=22.377! [2024-08-05 08:40:27,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3256320. Throughput: 0: 956.9. Samples: 815178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:27,700][00332] Avg episode reward: [(0, '21.220')] [2024-08-05 08:40:32,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3272704. Throughput: 0: 933.2. Samples: 817222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:32,702][00332] Avg episode reward: [(0, '22.062')] [2024-08-05 08:40:33,632][04088] Updated weights for policy 0, policy_version 800 (0.0035) [2024-08-05 08:40:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3293184. Throughput: 0: 917.1. Samples: 822986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:40:37,703][00332] Avg episode reward: [(0, '20.345')] [2024-08-05 08:40:42,700][00332] Fps is (10 sec: 4094.8, 60 sec: 3822.7, 300 sec: 3776.6). Total num frames: 3313664. Throughput: 0: 979.4. Samples: 829730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:42,703][00332] Avg episode reward: [(0, '19.133')] [2024-08-05 08:40:42,930][04088] Updated weights for policy 0, policy_version 810 (0.0022) [2024-08-05 08:40:47,700][00332] Fps is (10 sec: 3685.5, 60 sec: 3754.5, 300 sec: 3762.7). Total num frames: 3330048. Throughput: 0: 955.2. Samples: 831754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:47,708][00332] Avg episode reward: [(0, '18.285')] [2024-08-05 08:40:52,697][00332] Fps is (10 sec: 3277.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3346432. Throughput: 0: 916.8. Samples: 836890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:40:52,705][00332] Avg episode reward: [(0, '17.912')] [2024-08-05 08:40:54,590][04088] Updated weights for policy 0, policy_version 820 (0.0014) [2024-08-05 08:40:57,697][00332] Fps is (10 sec: 4097.0, 60 sec: 3891.7, 300 sec: 3790.6). Total num frames: 3371008. Throughput: 0: 957.2. Samples: 843510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:40:57,703][00332] Avg episode reward: [(0, '18.223')] [2024-08-05 08:41:02,697][00332] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3387392. Throughput: 0: 971.8. Samples: 846388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:41:02,706][00332] Avg episode reward: [(0, '19.308')] [2024-08-05 08:41:06,415][04088] Updated weights for policy 0, policy_version 830 (0.0018) [2024-08-05 08:41:07,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3403776. Throughput: 0: 915.0. Samples: 850670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:41:07,702][00332] Avg episode reward: [(0, '18.579')] [2024-08-05 08:41:12,697][00332] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3424256. Throughput: 0: 938.1. Samples: 857392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:12,700][00332] Avg episode reward: [(0, '20.268')] [2024-08-05 08:41:15,462][04088] Updated weights for policy 0, policy_version 840 (0.0026) [2024-08-05 08:41:17,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3444736. Throughput: 0: 967.6. Samples: 860766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:41:17,700][00332] Avg episode reward: [(0, '20.505')] [2024-08-05 08:41:22,698][00332] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3762.8). Total num frames: 3461120. Throughput: 0: 939.7. Samples: 865272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:22,702][00332] Avg episode reward: [(0, '20.158')] [2024-08-05 08:41:27,310][04088] Updated weights for policy 0, policy_version 850 (0.0020) [2024-08-05 08:41:27,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3481600. Throughput: 0: 920.1. Samples: 871130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:27,704][00332] Avg episode reward: [(0, '19.955')] [2024-08-05 08:41:32,697][00332] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3502080. Throughput: 0: 951.1. Samples: 874550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:41:32,701][00332] Avg episode reward: [(0, '20.603')] [2024-08-05 08:41:37,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3518464. Throughput: 0: 956.7. Samples: 879940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:37,700][00332] Avg episode reward: [(0, '20.476')] [2024-08-05 08:41:38,177][04088] Updated weights for policy 0, policy_version 860 (0.0014) [2024-08-05 08:41:42,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3776.7). Total num frames: 3538944. Throughput: 0: 924.2. Samples: 885098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:42,702][00332] Avg episode reward: [(0, '19.621')] [2024-08-05 08:41:42,710][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth... [2024-08-05 08:41:42,826][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth [2024-08-05 08:41:47,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 3559424. Throughput: 0: 933.7. Samples: 888404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:41:47,700][00332] Avg episode reward: [(0, '20.735')] [2024-08-05 08:41:48,229][04088] Updated weights for policy 0, policy_version 870 (0.0015) [2024-08-05 08:41:52,697][00332] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3575808. Throughput: 0: 975.5. Samples: 894566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:41:52,705][00332] Avg episode reward: [(0, '19.833')] [2024-08-05 08:41:57,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3592192. Throughput: 0: 918.1. Samples: 898706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:41:57,702][00332] Avg episode reward: [(0, '19.512')] [2024-08-05 08:42:00,099][04088] Updated weights for policy 0, policy_version 880 (0.0020) [2024-08-05 08:42:02,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3612672. Throughput: 0: 918.1. Samples: 902082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:42:02,699][00332] Avg episode reward: [(0, '20.233')] [2024-08-05 08:42:07,697][00332] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3637248. Throughput: 0: 968.2. Samples: 908840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:42:07,702][00332] Avg episode reward: [(0, '19.944')] [2024-08-05 08:42:10,387][04088] Updated weights for policy 0, policy_version 890 (0.0022) [2024-08-05 08:42:12,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3649536. Throughput: 0: 938.3. Samples: 913352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:42:12,703][00332] Avg episode reward: [(0, '21.399')] [2024-08-05 08:42:17,697][00332] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 3670016. Throughput: 0: 921.3. Samples: 916008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-05 08:42:17,700][00332] Avg episode reward: [(0, '21.045')] [2024-08-05 08:42:21,038][04088] Updated weights for policy 0, policy_version 900 (0.0025) [2024-08-05 08:42:22,697][00332] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 951.1. Samples: 922738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:42:22,700][00332] Avg episode reward: [(0, '21.412')] [2024-08-05 08:42:27,697][00332] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3710976. Throughput: 0: 958.0. Samples: 928210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:42:27,701][00332] Avg episode reward: [(0, '21.493')] [2024-08-05 08:42:32,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3723264. Throughput: 0: 930.9. Samples: 930296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:42:32,702][00332] Avg episode reward: [(0, '20.887')] [2024-08-05 08:42:32,724][04088] Updated weights for policy 0, policy_version 910 (0.0013) [2024-08-05 08:42:37,697][00332] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3747840. Throughput: 0: 931.9. Samples: 936500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:42:37,701][00332] Avg episode reward: [(0, '19.520')] [2024-08-05 08:42:42,201][04088] Updated weights for policy 0, policy_version 920 (0.0015) [2024-08-05 08:42:42,699][00332] Fps is (10 sec: 4504.7, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 3768320. Throughput: 0: 981.4. Samples: 942872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:42:42,704][00332] Avg episode reward: [(0, '19.702')] [2024-08-05 08:42:47,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3780608. Throughput: 0: 952.0. Samples: 944922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-05 08:42:47,704][00332] Avg episode reward: [(0, '18.742')] [2024-08-05 08:42:52,697][00332] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3801088. Throughput: 0: 921.2. Samples: 950294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:42:52,703][00332] Avg episode reward: [(0, '17.977')] [2024-08-05 08:42:53,875][04088] Updated weights for policy 0, policy_version 930 (0.0020) [2024-08-05 08:42:57,698][00332] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3825664. Throughput: 0: 969.1. Samples: 956960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:42:57,700][00332] Avg episode reward: [(0, '19.354')] [2024-08-05 08:43:02,697][00332] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3837952. Throughput: 0: 962.6. Samples: 959324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:43:02,704][00332] Avg episode reward: [(0, '19.825')] [2024-08-05 08:43:05,825][04088] Updated weights for policy 0, policy_version 940 (0.0032) [2024-08-05 08:43:07,697][00332] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 3858432. Throughput: 0: 916.9. Samples: 964000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:43:07,699][00332] Avg episode reward: [(0, '19.947')] [2024-08-05 08:43:12,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3878912. Throughput: 0: 951.0. Samples: 971004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:43:12,704][00332] Avg episode reward: [(0, '20.063')] [2024-08-05 08:43:14,582][04088] Updated weights for policy 0, policy_version 950 (0.0015) [2024-08-05 08:43:17,697][00332] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 3899392. Throughput: 0: 981.0. Samples: 974440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:43:17,699][00332] Avg episode reward: [(0, '21.369')] [2024-08-05 08:43:22,697][00332] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3915776. Throughput: 0: 937.6. Samples: 978692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:43:22,700][00332] Avg episode reward: [(0, '21.865')] [2024-08-05 08:43:26,092][04088] Updated weights for policy 0, policy_version 960 (0.0032) [2024-08-05 08:43:27,699][00332] Fps is (10 sec: 3685.7, 60 sec: 3754.5, 300 sec: 3804.4). Total num frames: 3936256. Throughput: 0: 937.7. Samples: 985068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-05 08:43:27,705][00332] Avg episode reward: [(0, '21.427')] [2024-08-05 08:43:32,697][00332] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3960832. Throughput: 0: 965.0. Samples: 988346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:43:32,703][00332] Avg episode reward: [(0, '21.999')] [2024-08-05 08:43:37,156][04088] Updated weights for policy 0, policy_version 970 (0.0025) [2024-08-05 08:43:37,697][00332] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3973120. Throughput: 0: 957.6. Samples: 993384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-05 08:43:37,702][00332] Avg episode reward: [(0, '21.369')] [2024-08-05 08:43:42,697][00332] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 3993600. Throughput: 0: 929.4. Samples: 998784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-05 08:43:42,705][00332] Avg episode reward: [(0, '21.221')] [2024-08-05 08:43:42,718][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth... [2024-08-05 08:43:42,845][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth [2024-08-05 08:43:45,538][04075] Stopping Batcher_0... [2024-08-05 08:43:45,539][04075] Loop batcher_evt_loop terminating... [2024-08-05 08:43:45,548][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-05 08:43:45,547][00332] Component Batcher_0 stopped! [2024-08-05 08:43:45,615][04094] Stopping RolloutWorker_w5... [2024-08-05 08:43:45,615][00332] Component RolloutWorker_w5 stopped! [2024-08-05 08:43:45,613][04088] Weights refcount: 2 0 [2024-08-05 08:43:45,628][04094] Loop rollout_proc5_evt_loop terminating... [2024-08-05 08:43:45,629][00332] Component InferenceWorker_p0-w0 stopped! [2024-08-05 08:43:45,636][00332] Component RolloutWorker_w6 stopped! [2024-08-05 08:43:45,635][04095] Stopping RolloutWorker_w6... [2024-08-05 08:43:45,642][04088] Stopping InferenceWorker_p0-w0... [2024-08-05 08:43:45,643][04088] Loop inference_proc0-0_evt_loop terminating... [2024-08-05 08:43:45,656][04092] Stopping RolloutWorker_w3... [2024-08-05 08:43:45,650][00332] Component RolloutWorker_w2 stopped! [2024-08-05 08:43:45,662][04091] Stopping RolloutWorker_w2... [2024-08-05 08:43:45,662][04091] Loop rollout_proc2_evt_loop terminating... [2024-08-05 08:43:45,663][04090] Stopping RolloutWorker_w1... [2024-08-05 08:43:45,663][04090] Loop rollout_proc1_evt_loop terminating... [2024-08-05 08:43:45,664][04092] Loop rollout_proc3_evt_loop terminating... [2024-08-05 08:43:45,661][00332] Component RolloutWorker_w3 stopped! [2024-08-05 08:43:45,665][00332] Component RolloutWorker_w1 stopped! [2024-08-05 08:43:45,669][04096] Stopping RolloutWorker_w7... [2024-08-05 08:43:45,670][04096] Loop rollout_proc7_evt_loop terminating... [2024-08-05 08:43:45,670][00332] Component RolloutWorker_w7 stopped! [2024-08-05 08:43:45,680][04089] Stopping RolloutWorker_w0... [2024-08-05 08:43:45,680][04089] Loop rollout_proc0_evt_loop terminating... [2024-08-05 08:43:45,680][00332] Component RolloutWorker_w0 stopped! [2024-08-05 08:43:45,641][04095] Loop rollout_proc6_evt_loop terminating... [2024-08-05 08:43:45,696][00332] Component RolloutWorker_w4 stopped! [2024-08-05 08:43:45,701][04093] Stopping RolloutWorker_w4... [2024-08-05 08:43:45,708][04093] Loop rollout_proc4_evt_loop terminating... [2024-08-05 08:43:45,760][04075] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth [2024-08-05 08:43:45,783][04075] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-05 08:43:45,998][00332] Component LearnerWorker_p0 stopped! [2024-08-05 08:43:46,005][00332] Waiting for process learner_proc0 to stop... [2024-08-05 08:43:46,011][04075] Stopping LearnerWorker_p0... [2024-08-05 08:43:46,012][04075] Loop learner_proc0_evt_loop terminating... [2024-08-05 08:43:47,408][00332] Waiting for process inference_proc0-0 to join... [2024-08-05 08:43:47,707][00332] Waiting for process rollout_proc0 to join... [2024-08-05 08:43:48,957][00332] Waiting for process rollout_proc1 to join... [2024-08-05 08:43:48,969][00332] Waiting for process rollout_proc2 to join... [2024-08-05 08:43:48,972][00332] Waiting for process rollout_proc3 to join... [2024-08-05 08:43:48,977][00332] Waiting for process rollout_proc4 to join... [2024-08-05 08:43:48,986][00332] Waiting for process rollout_proc5 to join... [2024-08-05 08:43:48,989][00332] Waiting for process rollout_proc6 to join... [2024-08-05 08:43:48,994][00332] Waiting for process rollout_proc7 to join... [2024-08-05 08:43:48,997][00332] Batcher 0 profile tree view: batching: 27.0407, releasing_batches: 0.0257 [2024-08-05 08:43:49,000][00332] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 458.6430 update_model: 7.7215 weight_update: 0.0034 one_step: 0.0177 handle_policy_step: 551.4574 deserialize: 14.7483, stack: 3.0024, obs_to_device_normalize: 113.8637, forward: 277.7050, send_messages: 27.4755 prepare_outputs: 86.9610 to_cpu: 54.3376 [2024-08-05 08:43:49,001][00332] Learner 0 profile tree view: misc: 0.0070, prepare_batch: 15.6585 train: 73.3646 epoch_init: 0.0057, minibatch_init: 0.0114, losses_postprocess: 0.6115, kl_divergence: 0.5526, after_optimizer: 32.6467 calculate_losses: 24.6168 losses_init: 0.0048, forward_head: 1.8740, bptt_initial: 15.6387, tail: 1.0922, advantages_returns: 0.2898, losses: 3.1023 bptt: 2.2729 bptt_forward_core: 2.1612 update: 14.2556 clip: 1.4335 [2024-08-05 08:43:49,003][00332] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3227, enqueue_policy_requests: 111.4475, env_step: 824.1719, overhead: 13.6516, complete_rollouts: 6.7499 save_policy_outputs: 24.9013 split_output_tensors: 8.8601 [2024-08-05 08:43:49,004][00332] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3118, enqueue_policy_requests: 116.7175, env_step: 819.3675, overhead: 13.7976, complete_rollouts: 6.6041 save_policy_outputs: 25.7951 split_output_tensors: 8.8932 [2024-08-05 08:43:49,006][00332] Loop Runner_EvtLoop terminating... [2024-08-05 08:43:49,007][00332] Runner profile tree view: main_loop: 1082.6273 [2024-08-05 08:43:49,009][00332] Collected {0: 4005888}, FPS: 3700.2 [2024-08-05 08:43:58,520][00332] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-05 08:43:58,522][00332] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-05 08:43:58,525][00332] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-05 08:43:58,527][00332] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-05 08:43:58,529][00332] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-05 08:43:58,531][00332] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-05 08:43:58,532][00332] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-05 08:43:58,535][00332] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-05 08:43:58,536][00332] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-05 08:43:58,537][00332] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-05 08:43:58,538][00332] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-05 08:43:58,540][00332] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-05 08:43:58,541][00332] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-05 08:43:58,542][00332] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-05 08:43:58,543][00332] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-05 08:43:58,560][00332] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-05 08:43:58,562][00332] RunningMeanStd input shape: (3, 72, 128) [2024-08-05 08:43:58,564][00332] RunningMeanStd input shape: (1,) [2024-08-05 08:43:58,581][00332] ConvEncoder: input_channels=3 [2024-08-05 08:43:58,700][00332] Conv encoder output size: 512 [2024-08-05 08:43:58,702][00332] Policy head output size: 512 [2024-08-05 08:44:00,279][00332] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-05 08:44:01,193][00332] Num frames 100... [2024-08-05 08:44:01,355][00332] Num frames 200... [2024-08-05 08:44:01,522][00332] Num frames 300... [2024-08-05 08:44:01,680][00332] Num frames 400... [2024-08-05 08:44:01,842][00332] Num frames 500... [2024-08-05 08:44:02,000][00332] Num frames 600... [2024-08-05 08:44:02,071][00332] Avg episode rewards: #0: 9.080, true rewards: #0: 6.080 [2024-08-05 08:44:02,074][00332] Avg episode reward: 9.080, avg true_objective: 6.080 [2024-08-05 08:44:02,226][00332] Num frames 700... [2024-08-05 08:44:02,391][00332] Num frames 800... [2024-08-05 08:44:02,571][00332] Num frames 900... [2024-08-05 08:44:02,744][00332] Num frames 1000... [2024-08-05 08:44:02,919][00332] Num frames 1100... [2024-08-05 08:44:03,092][00332] Num frames 1200... [2024-08-05 08:44:03,248][00332] Num frames 1300... [2024-08-05 08:44:03,365][00332] Num frames 1400... [2024-08-05 08:44:03,431][00332] Avg episode rewards: #0: 13.040, true rewards: #0: 7.040 [2024-08-05 08:44:03,433][00332] Avg episode reward: 13.040, avg true_objective: 7.040 [2024-08-05 08:44:03,555][00332] Num frames 1500... [2024-08-05 08:44:03,672][00332] Num frames 1600... [2024-08-05 08:44:03,790][00332] Num frames 1700... [2024-08-05 08:44:03,918][00332] Num frames 1800... [2024-08-05 08:44:04,037][00332] Num frames 1900... [2024-08-05 08:44:04,156][00332] Num frames 2000... [2024-08-05 08:44:04,275][00332] Num frames 2100... [2024-08-05 08:44:04,395][00332] Num frames 2200... [2024-08-05 08:44:04,522][00332] Num frames 2300... [2024-08-05 08:44:04,642][00332] Num frames 2400... [2024-08-05 08:44:04,763][00332] Num frames 2500... [2024-08-05 08:44:04,885][00332] Num frames 2600... [2024-08-05 08:44:04,944][00332] Avg episode rewards: #0: 17.670, true rewards: #0: 8.670 [2024-08-05 08:44:04,945][00332] Avg episode reward: 17.670, avg true_objective: 8.670 [2024-08-05 08:44:05,062][00332] Num frames 2700... [2024-08-05 08:44:05,177][00332] Num frames 2800... [2024-08-05 08:44:05,293][00332] Num frames 2900... [2024-08-05 08:44:05,414][00332] Num frames 3000... [2024-08-05 08:44:05,536][00332] Num frames 3100... [2024-08-05 08:44:05,653][00332] Num frames 3200... [2024-08-05 08:44:05,772][00332] Num frames 3300... [2024-08-05 08:44:05,894][00332] Num frames 3400... [2024-08-05 08:44:06,058][00332] Avg episode rewards: #0: 18.743, true rewards: #0: 8.742 [2024-08-05 08:44:06,060][00332] Avg episode reward: 18.743, avg true_objective: 8.742 [2024-08-05 08:44:06,066][00332] Num frames 3500... [2024-08-05 08:44:06,185][00332] Num frames 3600... [2024-08-05 08:44:06,302][00332] Num frames 3700... [2024-08-05 08:44:06,421][00332] Num frames 3800... [2024-08-05 08:44:06,546][00332] Num frames 3900... [2024-08-05 08:44:06,671][00332] Num frames 4000... [2024-08-05 08:44:06,790][00332] Num frames 4100... [2024-08-05 08:44:06,916][00332] Num frames 4200... [2024-08-05 08:44:07,032][00332] Num frames 4300... [2024-08-05 08:44:07,155][00332] Num frames 4400... [2024-08-05 08:44:07,279][00332] Avg episode rewards: #0: 19.114, true rewards: #0: 8.914 [2024-08-05 08:44:07,280][00332] Avg episode reward: 19.114, avg true_objective: 8.914 [2024-08-05 08:44:07,332][00332] Num frames 4500... [2024-08-05 08:44:07,448][00332] Num frames 4600... [2024-08-05 08:44:07,569][00332] Num frames 4700... [2024-08-05 08:44:07,690][00332] Num frames 4800... [2024-08-05 08:44:07,814][00332] Num frames 4900... [2024-08-05 08:44:07,939][00332] Num frames 5000... [2024-08-05 08:44:08,055][00332] Num frames 5100... [2024-08-05 08:44:08,177][00332] Num frames 5200... [2024-08-05 08:44:08,292][00332] Num frames 5300... [2024-08-05 08:44:08,413][00332] Num frames 5400... [2024-08-05 08:44:08,489][00332] Avg episode rewards: #0: 19.195, true rewards: #0: 9.028 [2024-08-05 08:44:08,491][00332] Avg episode reward: 19.195, avg true_objective: 9.028 [2024-08-05 08:44:08,594][00332] Num frames 5500... [2024-08-05 08:44:08,722][00332] Num frames 5600... [2024-08-05 08:44:08,838][00332] Num frames 5700... [2024-08-05 08:44:08,962][00332] Num frames 5800... [2024-08-05 08:44:09,077][00332] Num frames 5900... [2024-08-05 08:44:09,197][00332] Num frames 6000... [2024-08-05 08:44:09,346][00332] Avg episode rewards: #0: 18.402, true rewards: #0: 8.687 [2024-08-05 08:44:09,348][00332] Avg episode reward: 18.402, avg true_objective: 8.687 [2024-08-05 08:44:09,375][00332] Num frames 6100... [2024-08-05 08:44:09,490][00332] Num frames 6200... [2024-08-05 08:44:09,606][00332] Num frames 6300... [2024-08-05 08:44:09,729][00332] Num frames 6400... [2024-08-05 08:44:09,845][00332] Num frames 6500... [2024-08-05 08:44:09,968][00332] Num frames 6600... [2024-08-05 08:44:10,100][00332] Num frames 6700... [2024-08-05 08:44:10,219][00332] Num frames 6800... [2024-08-05 08:44:10,336][00332] Num frames 6900... [2024-08-05 08:44:10,455][00332] Num frames 7000... [2024-08-05 08:44:10,524][00332] Avg episode rewards: #0: 18.886, true rewards: #0: 8.761 [2024-08-05 08:44:10,525][00332] Avg episode reward: 18.886, avg true_objective: 8.761 [2024-08-05 08:44:10,633][00332] Num frames 7100... [2024-08-05 08:44:10,757][00332] Num frames 7200... [2024-08-05 08:44:10,881][00332] Num frames 7300... [2024-08-05 08:44:10,998][00332] Num frames 7400... [2024-08-05 08:44:11,114][00332] Num frames 7500... [2024-08-05 08:44:11,228][00332] Num frames 7600... [2024-08-05 08:44:11,288][00332] Avg episode rewards: #0: 17.892, true rewards: #0: 8.448 [2024-08-05 08:44:11,290][00332] Avg episode reward: 17.892, avg true_objective: 8.448 [2024-08-05 08:44:11,403][00332] Num frames 7700... [2024-08-05 08:44:11,525][00332] Num frames 7800... [2024-08-05 08:44:11,639][00332] Num frames 7900... [2024-08-05 08:44:11,766][00332] Num frames 8000... [2024-08-05 08:44:11,897][00332] Num frames 8100... [2024-08-05 08:44:12,014][00332] Num frames 8200... [2024-08-05 08:44:12,132][00332] Num frames 8300... [2024-08-05 08:44:12,249][00332] Num frames 8400... [2024-08-05 08:44:12,368][00332] Num frames 8500... [2024-08-05 08:44:12,491][00332] Num frames 8600... [2024-08-05 08:44:12,607][00332] Num frames 8700... [2024-08-05 08:44:12,733][00332] Num frames 8800... [2024-08-05 08:44:12,853][00332] Num frames 8900... [2024-08-05 08:44:12,978][00332] Num frames 9000... [2024-08-05 08:44:13,096][00332] Num frames 9100... [2024-08-05 08:44:13,221][00332] Avg episode rewards: #0: 20.158, true rewards: #0: 9.158 [2024-08-05 08:44:13,225][00332] Avg episode reward: 20.158, avg true_objective: 9.158 [2024-08-05 08:45:08,896][00332] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-05 08:46:53,398][00332] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-05 08:46:53,400][00332] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-05 08:46:53,402][00332] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-05 08:46:53,404][00332] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-05 08:46:53,405][00332] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-05 08:46:53,407][00332] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-05 08:46:53,409][00332] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-05 08:46:53,410][00332] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-05 08:46:53,411][00332] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-05 08:46:53,412][00332] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-05 08:46:53,413][00332] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-05 08:46:53,414][00332] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-05 08:46:53,415][00332] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-05 08:46:53,416][00332] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-05 08:46:53,417][00332] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-05 08:46:53,428][00332] RunningMeanStd input shape: (3, 72, 128) [2024-08-05 08:46:53,430][00332] RunningMeanStd input shape: (1,) [2024-08-05 08:46:53,449][00332] ConvEncoder: input_channels=3 [2024-08-05 08:46:53,485][00332] Conv encoder output size: 512 [2024-08-05 08:46:53,487][00332] Policy head output size: 512 [2024-08-05 08:46:53,505][00332] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-05 08:46:53,988][00332] Num frames 100... [2024-08-05 08:46:54,109][00332] Num frames 200... [2024-08-05 08:46:54,223][00332] Num frames 300... [2024-08-05 08:46:54,343][00332] Num frames 400... [2024-08-05 08:46:54,462][00332] Num frames 500... [2024-08-05 08:46:54,605][00332] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2024-08-05 08:46:54,607][00332] Avg episode reward: 9.760, avg true_objective: 5.760 [2024-08-05 08:46:54,638][00332] Num frames 600... [2024-08-05 08:46:54,756][00332] Num frames 700... [2024-08-05 08:46:54,883][00332] Num frames 800... [2024-08-05 08:46:54,999][00332] Num frames 900... [2024-08-05 08:46:55,116][00332] Num frames 1000... [2024-08-05 08:46:55,233][00332] Num frames 1100... [2024-08-05 08:46:55,353][00332] Num frames 1200... [2024-08-05 08:46:55,480][00332] Num frames 1300... [2024-08-05 08:46:55,599][00332] Num frames 1400... [2024-08-05 08:46:55,719][00332] Num frames 1500... [2024-08-05 08:46:55,843][00332] Num frames 1600... [2024-08-05 08:46:55,895][00332] Avg episode rewards: #0: 18.500, true rewards: #0: 8.000 [2024-08-05 08:46:55,897][00332] Avg episode reward: 18.500, avg true_objective: 8.000 [2024-08-05 08:46:56,010][00332] Num frames 1700... [2024-08-05 08:46:56,128][00332] Num frames 1800... [2024-08-05 08:46:56,250][00332] Num frames 1900... [2024-08-05 08:46:56,367][00332] Num frames 2000... [2024-08-05 08:46:56,491][00332] Num frames 2100... [2024-08-05 08:46:56,610][00332] Num frames 2200... [2024-08-05 08:46:56,729][00332] Num frames 2300... [2024-08-05 08:46:56,843][00332] Num frames 2400... [2024-08-05 08:46:56,967][00332] Num frames 2500... [2024-08-05 08:46:57,127][00332] Avg episode rewards: #0: 18.640, true rewards: #0: 8.640 [2024-08-05 08:46:57,129][00332] Avg episode reward: 18.640, avg true_objective: 8.640 [2024-08-05 08:46:57,141][00332] Num frames 2600... [2024-08-05 08:46:57,259][00332] Num frames 2700... [2024-08-05 08:46:57,377][00332] Num frames 2800... [2024-08-05 08:46:57,501][00332] Num frames 2900... [2024-08-05 08:46:57,618][00332] Num frames 3000... [2024-08-05 08:46:57,737][00332] Num frames 3100... [2024-08-05 08:46:57,856][00332] Num frames 3200... [2024-08-05 08:46:57,978][00332] Num frames 3300... [2024-08-05 08:46:58,094][00332] Num frames 3400... [2024-08-05 08:46:58,212][00332] Num frames 3500... [2024-08-05 08:46:58,327][00332] Num frames 3600... [2024-08-05 08:46:58,431][00332] Avg episode rewards: #0: 19.603, true rewards: #0: 9.102 [2024-08-05 08:46:58,433][00332] Avg episode reward: 19.603, avg true_objective: 9.102 [2024-08-05 08:46:58,507][00332] Num frames 3700... [2024-08-05 08:46:58,662][00332] Num frames 3800... [2024-08-05 08:46:58,781][00332] Num frames 3900... [2024-08-05 08:46:58,916][00332] Num frames 4000... [2024-08-05 08:46:59,031][00332] Num frames 4100... [2024-08-05 08:46:59,147][00332] Num frames 4200... [2024-08-05 08:46:59,290][00332] Num frames 4300... [2024-08-05 08:46:59,455][00332] Num frames 4400... [2024-08-05 08:46:59,618][00332] Num frames 4500... [2024-08-05 08:46:59,774][00332] Num frames 4600... [2024-08-05 08:46:59,884][00332] Avg episode rewards: #0: 19.466, true rewards: #0: 9.266 [2024-08-05 08:46:59,885][00332] Avg episode reward: 19.466, avg true_objective: 9.266 [2024-08-05 08:46:59,988][00332] Num frames 4700... [2024-08-05 08:47:00,151][00332] Num frames 4800... [2024-08-05 08:47:00,315][00332] Num frames 4900... [2024-08-05 08:47:00,473][00332] Num frames 5000... [2024-08-05 08:47:00,641][00332] Num frames 5100... [2024-08-05 08:47:00,807][00332] Num frames 5200... [2024-08-05 08:47:00,975][00332] Num frames 5300... [2024-08-05 08:47:01,143][00332] Num frames 5400... [2024-08-05 08:47:01,312][00332] Num frames 5500... [2024-08-05 08:47:01,477][00332] Num frames 5600... [2024-08-05 08:47:01,601][00332] Num frames 5700... [2024-08-05 08:47:01,726][00332] Num frames 5800... [2024-08-05 08:47:01,841][00332] Num frames 5900... [2024-08-05 08:47:01,990][00332] Avg episode rewards: #0: 21.124, true rewards: #0: 9.957 [2024-08-05 08:47:01,991][00332] Avg episode reward: 21.124, avg true_objective: 9.957 [2024-08-05 08:47:02,027][00332] Num frames 6000... [2024-08-05 08:47:02,143][00332] Num frames 6100... [2024-08-05 08:47:02,263][00332] Num frames 6200... [2024-08-05 08:47:02,380][00332] Num frames 6300... [2024-08-05 08:47:02,495][00332] Num frames 6400... [2024-08-05 08:47:02,620][00332] Num frames 6500... [2024-08-05 08:47:02,735][00332] Num frames 6600... [2024-08-05 08:47:02,857][00332] Num frames 6700... [2024-08-05 08:47:02,979][00332] Num frames 6800... [2024-08-05 08:47:03,097][00332] Num frames 6900... [2024-08-05 08:47:03,220][00332] Num frames 7000... [2024-08-05 08:47:03,339][00332] Num frames 7100... [2024-08-05 08:47:03,458][00332] Num frames 7200... [2024-08-05 08:47:03,577][00332] Num frames 7300... [2024-08-05 08:47:03,700][00332] Num frames 7400... [2024-08-05 08:47:03,823][00332] Num frames 7500... [2024-08-05 08:47:03,948][00332] Num frames 7600... [2024-08-05 08:47:04,065][00332] Num frames 7700... [2024-08-05 08:47:04,184][00332] Num frames 7800... [2024-08-05 08:47:04,303][00332] Num frames 7900... [2024-08-05 08:47:04,422][00332] Num frames 8000... [2024-08-05 08:47:04,565][00332] Avg episode rewards: #0: 26.249, true rewards: #0: 11.534 [2024-08-05 08:47:04,566][00332] Avg episode reward: 26.249, avg true_objective: 11.534 [2024-08-05 08:47:04,598][00332] Num frames 8100... [2024-08-05 08:47:04,722][00332] Num frames 8200... [2024-08-05 08:47:04,838][00332] Num frames 8300... [2024-08-05 08:47:04,962][00332] Num frames 8400... [2024-08-05 08:47:05,083][00332] Avg episode rewards: #0: 23.447, true rewards: #0: 10.572 [2024-08-05 08:47:05,085][00332] Avg episode reward: 23.447, avg true_objective: 10.572 [2024-08-05 08:47:05,136][00332] Num frames 8500... [2024-08-05 08:47:05,252][00332] Num frames 8600... [2024-08-05 08:47:05,369][00332] Num frames 8700... [2024-08-05 08:47:05,490][00332] Num frames 8800... [2024-08-05 08:47:05,608][00332] Num frames 8900... [2024-08-05 08:47:05,730][00332] Num frames 9000... [2024-08-05 08:47:05,882][00332] Avg episode rewards: #0: 22.094, true rewards: #0: 10.094 [2024-08-05 08:47:05,883][00332] Avg episode reward: 22.094, avg true_objective: 10.094 [2024-08-05 08:47:05,903][00332] Num frames 9100... [2024-08-05 08:47:06,023][00332] Num frames 9200... [2024-08-05 08:47:06,139][00332] Num frames 9300... [2024-08-05 08:47:06,251][00332] Num frames 9400... [2024-08-05 08:47:06,369][00332] Num frames 9500... [2024-08-05 08:47:06,468][00332] Avg episode rewards: #0: 20.433, true rewards: #0: 9.533 [2024-08-05 08:47:06,469][00332] Avg episode reward: 20.433, avg true_objective: 9.533 [2024-08-05 08:47:33,337][00332] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-05 08:47:38,371][00332] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-05 08:47:38,372][00332] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-05 08:47:38,375][00332] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-05 08:47:38,377][00332] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-05 08:47:38,379][00332] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-05 08:47:38,381][00332] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-05 08:47:38,382][00332] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-05 08:47:38,384][00332] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-05 08:47:38,385][00332] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-05 08:47:38,386][00332] Adding new argument 'hf_repository'='Lyuhong/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-05 08:47:38,387][00332] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-05 08:47:38,388][00332] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-05 08:47:38,389][00332] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-05 08:47:38,390][00332] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-05 08:47:38,391][00332] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-05 08:47:38,413][00332] RunningMeanStd input shape: (3, 72, 128) [2024-08-05 08:47:38,414][00332] RunningMeanStd input shape: (1,) [2024-08-05 08:47:38,428][00332] ConvEncoder: input_channels=3 [2024-08-05 08:47:38,464][00332] Conv encoder output size: 512 [2024-08-05 08:47:38,465][00332] Policy head output size: 512 [2024-08-05 08:47:38,483][00332] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-05 08:47:38,977][00332] Num frames 100... [2024-08-05 08:47:39,093][00332] Num frames 200... [2024-08-05 08:47:39,209][00332] Num frames 300... [2024-08-05 08:47:39,329][00332] Num frames 400... [2024-08-05 08:47:39,453][00332] Num frames 500... [2024-08-05 08:47:39,571][00332] Num frames 600... [2024-08-05 08:47:39,689][00332] Num frames 700... [2024-08-05 08:47:39,818][00332] Num frames 800... [2024-08-05 08:47:39,949][00332] Num frames 900... [2024-08-05 08:47:40,079][00332] Num frames 1000... [2024-08-05 08:47:40,201][00332] Avg episode rewards: #0: 23.560, true rewards: #0: 10.560 [2024-08-05 08:47:40,203][00332] Avg episode reward: 23.560, avg true_objective: 10.560 [2024-08-05 08:47:40,258][00332] Num frames 1100... [2024-08-05 08:47:40,384][00332] Num frames 1200... [2024-08-05 08:47:40,511][00332] Num frames 1300... [2024-08-05 08:47:40,632][00332] Num frames 1400... [2024-08-05 08:47:40,752][00332] Num frames 1500... [2024-08-05 08:47:40,880][00332] Num frames 1600... [2024-08-05 08:47:40,999][00332] Num frames 1700... [2024-08-05 08:47:41,117][00332] Num frames 1800... [2024-08-05 08:47:41,247][00332] Num frames 1900... [2024-08-05 08:47:41,363][00332] Num frames 2000... [2024-08-05 08:47:41,476][00332] Avg episode rewards: #0: 22.740, true rewards: #0: 10.240 [2024-08-05 08:47:41,478][00332] Avg episode reward: 22.740, avg true_objective: 10.240 [2024-08-05 08:47:41,543][00332] Num frames 2100... [2024-08-05 08:47:41,663][00332] Num frames 2200... [2024-08-05 08:47:41,781][00332] Num frames 2300... [2024-08-05 08:47:41,914][00332] Num frames 2400... [2024-08-05 08:47:42,033][00332] Num frames 2500... [2024-08-05 08:47:42,152][00332] Num frames 2600... [2024-08-05 08:47:42,272][00332] Avg episode rewards: #0: 18.187, true rewards: #0: 8.853 [2024-08-05 08:47:42,274][00332] Avg episode reward: 18.187, avg true_objective: 8.853 [2024-08-05 08:47:42,328][00332] Num frames 2700... [2024-08-05 08:47:42,447][00332] Num frames 2800... [2024-08-05 08:47:42,562][00332] Num frames 2900... [2024-08-05 08:47:42,680][00332] Num frames 3000... [2024-08-05 08:47:42,798][00332] Num frames 3100... [2024-08-05 08:47:42,934][00332] Num frames 3200... [2024-08-05 08:47:43,053][00332] Num frames 3300... [2024-08-05 08:47:43,173][00332] Num frames 3400... [2024-08-05 08:47:43,293][00332] Num frames 3500... [2024-08-05 08:47:43,418][00332] Num frames 3600... [2024-08-05 08:47:43,538][00332] Num frames 3700... [2024-08-05 08:47:43,655][00332] Num frames 3800... [2024-08-05 08:47:43,777][00332] Num frames 3900... [2024-08-05 08:47:43,916][00332] Num frames 4000... [2024-08-05 08:47:44,039][00332] Num frames 4100... [2024-08-05 08:47:44,162][00332] Num frames 4200... [2024-08-05 08:47:44,282][00332] Num frames 4300... [2024-08-05 08:47:44,400][00332] Num frames 4400... [2024-08-05 08:47:44,548][00332] Avg episode rewards: #0: 26.440, true rewards: #0: 11.190 [2024-08-05 08:47:44,550][00332] Avg episode reward: 26.440, avg true_objective: 11.190 [2024-08-05 08:47:44,582][00332] Num frames 4500... [2024-08-05 08:47:44,699][00332] Num frames 4600... [2024-08-05 08:47:44,817][00332] Num frames 4700... [2024-08-05 08:47:44,946][00332] Num frames 4800... [2024-08-05 08:47:45,067][00332] Num frames 4900... [2024-08-05 08:47:45,185][00332] Num frames 5000... [2024-08-05 08:47:45,303][00332] Num frames 5100... [2024-08-05 08:47:45,422][00332] Num frames 5200... [2024-08-05 08:47:45,540][00332] Num frames 5300... [2024-08-05 08:47:45,656][00332] Num frames 5400... [2024-08-05 08:47:45,775][00332] Num frames 5500... [2024-08-05 08:47:45,902][00332] Num frames 5600... [2024-08-05 08:47:46,030][00332] Num frames 5700... [2024-08-05 08:47:46,150][00332] Num frames 5800... [2024-08-05 08:47:46,270][00332] Num frames 5900... [2024-08-05 08:47:46,388][00332] Num frames 6000... [2024-08-05 08:47:46,508][00332] Num frames 6100... [2024-08-05 08:47:46,568][00332] Avg episode rewards: #0: 29.604, true rewards: #0: 12.204 [2024-08-05 08:47:46,569][00332] Avg episode reward: 29.604, avg true_objective: 12.204 [2024-08-05 08:47:46,686][00332] Num frames 6200... [2024-08-05 08:47:46,804][00332] Num frames 6300... [2024-08-05 08:47:46,926][00332] Num frames 6400... [2024-08-05 08:47:47,044][00332] Num frames 6500... [2024-08-05 08:47:47,160][00332] Num frames 6600... [2024-08-05 08:47:47,277][00332] Num frames 6700... [2024-08-05 08:47:47,392][00332] Num frames 6800... [2024-08-05 08:47:47,510][00332] Num frames 6900... [2024-08-05 08:47:47,642][00332] Avg episode rewards: #0: 27.110, true rewards: #0: 11.610 [2024-08-05 08:47:47,644][00332] Avg episode reward: 27.110, avg true_objective: 11.610 [2024-08-05 08:47:47,688][00332] Num frames 7000... [2024-08-05 08:47:47,806][00332] Num frames 7100... [2024-08-05 08:47:47,930][00332] Num frames 7200... [2024-08-05 08:47:48,056][00332] Num frames 7300... [2024-08-05 08:47:48,216][00332] Num frames 7400... [2024-08-05 08:47:48,384][00332] Num frames 7500... [2024-08-05 08:47:48,544][00332] Num frames 7600... [2024-08-05 08:47:48,708][00332] Num frames 7700... [2024-08-05 08:47:48,924][00332] Avg episode rewards: #0: 25.854, true rewards: #0: 11.140 [2024-08-05 08:47:48,926][00332] Avg episode reward: 25.854, avg true_objective: 11.140 [2024-08-05 08:47:48,931][00332] Num frames 7800... [2024-08-05 08:47:49,101][00332] Num frames 7900... [2024-08-05 08:47:49,256][00332] Num frames 8000... [2024-08-05 08:47:49,421][00332] Num frames 8100... [2024-08-05 08:47:49,583][00332] Num frames 8200... [2024-08-05 08:47:49,754][00332] Num frames 8300... [2024-08-05 08:47:49,927][00332] Num frames 8400... [2024-08-05 08:47:50,102][00332] Num frames 8500... [2024-08-05 08:47:50,270][00332] Num frames 8600... [2024-08-05 08:47:50,428][00332] Num frames 8700... [2024-08-05 08:47:50,559][00332] Avg episode rewards: #0: 25.206, true rewards: #0: 10.956 [2024-08-05 08:47:50,560][00332] Avg episode reward: 25.206, avg true_objective: 10.956 [2024-08-05 08:47:50,603][00332] Num frames 8800... [2024-08-05 08:47:50,722][00332] Num frames 8900... [2024-08-05 08:47:50,839][00332] Num frames 9000... [2024-08-05 08:47:50,964][00332] Num frames 9100... [2024-08-05 08:47:51,080][00332] Num frames 9200... [2024-08-05 08:47:51,206][00332] Num frames 9300... [2024-08-05 08:47:51,325][00332] Num frames 9400... [2024-08-05 08:47:51,441][00332] Num frames 9500... [2024-08-05 08:47:51,605][00332] Avg episode rewards: #0: 23.997, true rewards: #0: 10.663 [2024-08-05 08:47:51,607][00332] Avg episode reward: 23.997, avg true_objective: 10.663 [2024-08-05 08:47:51,613][00332] Num frames 9600... [2024-08-05 08:47:51,739][00332] Num frames 9700... [2024-08-05 08:47:51,856][00332] Num frames 9800... [2024-08-05 08:47:51,983][00332] Num frames 9900... [2024-08-05 08:47:52,102][00332] Num frames 10000... [2024-08-05 08:47:52,225][00332] Num frames 10100... [2024-08-05 08:47:52,341][00332] Num frames 10200... [2024-08-05 08:47:52,456][00332] Num frames 10300... [2024-08-05 08:47:52,576][00332] Num frames 10400... [2024-08-05 08:47:52,694][00332] Avg episode rewards: #0: 23.354, true rewards: #0: 10.454 [2024-08-05 08:47:52,696][00332] Avg episode reward: 23.354, avg true_objective: 10.454 [2024-08-05 08:48:54,918][00332] Replay video saved to /content/train_dir/default_experiment/replay.mp4!