[2024-07-27 17:05:45,138][00473] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-27 17:05:45,142][00473] Rollout worker 0 uses device cpu [2024-07-27 17:05:45,143][00473] Rollout worker 1 uses device cpu [2024-07-27 17:05:45,145][00473] Rollout worker 2 uses device cpu [2024-07-27 17:05:45,146][00473] Rollout worker 3 uses device cpu [2024-07-27 17:05:45,148][00473] Rollout worker 4 uses device cpu [2024-07-27 17:05:45,149][00473] Rollout worker 5 uses device cpu [2024-07-27 17:05:45,150][00473] Rollout worker 6 uses device cpu [2024-07-27 17:05:45,151][00473] Rollout worker 7 uses device cpu [2024-07-27 17:05:45,305][00473] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:05:45,307][00473] InferenceWorker_p0-w0: min num requests: 2 [2024-07-27 17:05:45,339][00473] Starting all processes... [2024-07-27 17:05:45,340][00473] Starting process learner_proc0 [2024-07-27 17:05:46,572][00473] Starting all processes... [2024-07-27 17:05:46,582][00473] Starting process inference_proc0-0 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc0 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc1 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc2 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc3 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc4 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc5 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc6 [2024-07-27 17:05:46,583][00473] Starting process rollout_proc7 [2024-07-27 17:06:01,927][02681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:06:01,928][02681] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-27 17:06:02,003][02681] Num visible devices: 1 [2024-07-27 17:06:02,030][02696] Worker 1 uses CPU cores [1] [2024-07-27 17:06:02,052][02681] Starting seed is not provided [2024-07-27 17:06:02,053][02681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:06:02,053][02681] Initializing actor-critic model on device cuda:0 [2024-07-27 17:06:02,054][02681] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:06:02,057][02681] RunningMeanStd input shape: (1,) [2024-07-27 17:06:02,071][02705] Worker 6 uses CPU cores [0] [2024-07-27 17:06:02,133][02681] ConvEncoder: input_channels=3 [2024-07-27 17:06:02,170][02702] Worker 4 uses CPU cores [0] [2024-07-27 17:06:02,315][02703] Worker 3 uses CPU cores [1] [2024-07-27 17:06:02,329][02694] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:06:02,330][02694] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-27 17:06:02,396][02701] Worker 2 uses CPU cores [0] [2024-07-27 17:06:02,406][02704] Worker 5 uses CPU cores [1] [2024-07-27 17:06:02,413][02694] Num visible devices: 1 [2024-07-27 17:06:02,419][02695] Worker 0 uses CPU cores [0] [2024-07-27 17:06:02,441][02706] Worker 7 uses CPU cores [1] [2024-07-27 17:06:02,515][02681] Conv encoder output size: 512 [2024-07-27 17:06:02,515][02681] Policy head output size: 512 [2024-07-27 17:06:02,570][02681] Created Actor Critic model with architecture: [2024-07-27 17:06:02,570][02681] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-27 17:06:02,834][02681] Using optimizer [2024-07-27 17:06:03,620][02681] No checkpoints found [2024-07-27 17:06:03,621][02681] Did not load from checkpoint, starting from scratch! [2024-07-27 17:06:03,621][02681] Initialized policy 0 weights for model version 0 [2024-07-27 17:06:03,624][02681] LearnerWorker_p0 finished initialization! [2024-07-27 17:06:03,625][02681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:06:03,764][02694] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:06:03,765][02694] RunningMeanStd input shape: (1,) [2024-07-27 17:06:03,778][02694] ConvEncoder: input_channels=3 [2024-07-27 17:06:03,882][02694] Conv encoder output size: 512 [2024-07-27 17:06:03,882][02694] Policy head output size: 512 [2024-07-27 17:06:03,936][00473] Inference worker 0-0 is ready! [2024-07-27 17:06:03,938][00473] All inference workers are ready! Signal rollout workers to start! [2024-07-27 17:06:04,166][02705] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,180][02695] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,206][02704] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,214][02703] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,230][02702] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,243][02696] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,272][02706] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:04,351][02701] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:06:05,298][00473] Heartbeat connected on Batcher_0 [2024-07-27 17:06:05,301][00473] Heartbeat connected on LearnerWorker_p0 [2024-07-27 17:06:05,350][00473] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-27 17:06:05,423][00473] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:06:05,877][02701] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,877][02695] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,879][02705] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,924][02704] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,944][02703] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,964][02696] Decorrelating experience for 0 frames... [2024-07-27 17:06:05,969][02706] Decorrelating experience for 0 frames... [2024-07-27 17:06:06,734][02704] Decorrelating experience for 32 frames... [2024-07-27 17:06:06,766][02696] Decorrelating experience for 32 frames... [2024-07-27 17:06:07,452][02695] Decorrelating experience for 32 frames... [2024-07-27 17:06:07,456][02701] Decorrelating experience for 32 frames... [2024-07-27 17:06:07,716][02702] Decorrelating experience for 0 frames... [2024-07-27 17:06:07,766][02705] Decorrelating experience for 32 frames... [2024-07-27 17:06:08,500][02706] Decorrelating experience for 32 frames... [2024-07-27 17:06:09,028][02696] Decorrelating experience for 64 frames... [2024-07-27 17:06:09,371][02703] Decorrelating experience for 32 frames... [2024-07-27 17:06:09,623][02701] Decorrelating experience for 64 frames... [2024-07-27 17:06:09,640][02695] Decorrelating experience for 64 frames... [2024-07-27 17:06:10,423][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:06:10,528][02704] Decorrelating experience for 64 frames... [2024-07-27 17:06:10,535][02702] Decorrelating experience for 32 frames... [2024-07-27 17:06:11,076][02705] Decorrelating experience for 64 frames... [2024-07-27 17:06:11,171][02696] Decorrelating experience for 96 frames... [2024-07-27 17:06:11,509][00473] Heartbeat connected on RolloutWorker_w1 [2024-07-27 17:06:11,561][02706] Decorrelating experience for 64 frames... [2024-07-27 17:06:11,628][02695] Decorrelating experience for 96 frames... [2024-07-27 17:06:12,118][00473] Heartbeat connected on RolloutWorker_w0 [2024-07-27 17:06:12,916][02704] Decorrelating experience for 96 frames... [2024-07-27 17:06:13,292][02705] Decorrelating experience for 96 frames... [2024-07-27 17:06:13,300][00473] Heartbeat connected on RolloutWorker_w5 [2024-07-27 17:06:13,623][00473] Heartbeat connected on RolloutWorker_w6 [2024-07-27 17:06:13,758][02703] Decorrelating experience for 64 frames... [2024-07-27 17:06:15,174][02703] Decorrelating experience for 96 frames... [2024-07-27 17:06:15,428][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1.2. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:06:15,438][00473] Avg episode reward: [(0, '2.384')] [2024-07-27 17:06:15,722][00473] Heartbeat connected on RolloutWorker_w3 [2024-07-27 17:06:16,559][02702] Decorrelating experience for 64 frames... [2024-07-27 17:06:17,158][02681] Signal inference workers to stop experience collection... [2024-07-27 17:06:17,195][02701] Decorrelating experience for 96 frames... [2024-07-27 17:06:17,214][02694] InferenceWorker_p0-w0: stopping experience collection [2024-07-27 17:06:17,373][00473] Heartbeat connected on RolloutWorker_w2 [2024-07-27 17:06:17,758][02706] Decorrelating experience for 96 frames... [2024-07-27 17:06:17,992][02702] Decorrelating experience for 96 frames... [2024-07-27 17:06:18,076][00473] Heartbeat connected on RolloutWorker_w7 [2024-07-27 17:06:18,120][00473] Heartbeat connected on RolloutWorker_w4 [2024-07-27 17:06:19,595][02681] Signal inference workers to resume experience collection... [2024-07-27 17:06:19,597][02694] InferenceWorker_p0-w0: resuming experience collection [2024-07-27 17:06:20,423][00473] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 156.3. Samples: 2344. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-07-27 17:06:20,425][00473] Avg episode reward: [(0, '3.050')] [2024-07-27 17:06:25,424][00473] Fps is (10 sec: 2458.5, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 24576. Throughput: 0: 340.8. Samples: 6816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:06:25,428][00473] Avg episode reward: [(0, '3.530')] [2024-07-27 17:06:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 346.3. Samples: 8658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-27 17:06:30,428][00473] Avg episode reward: [(0, '3.838')] [2024-07-27 17:06:31,049][02694] Updated weights for policy 0, policy_version 10 (0.0199) [2024-07-27 17:06:35,423][00473] Fps is (10 sec: 3277.4, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 458.7. Samples: 13760. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-07-27 17:06:35,425][00473] Avg episode reward: [(0, '4.404')] [2024-07-27 17:06:40,423][00473] Fps is (10 sec: 4096.0, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 581.0. Samples: 20336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:06:40,425][00473] Avg episode reward: [(0, '4.570')] [2024-07-27 17:06:40,734][02694] Updated weights for policy 0, policy_version 20 (0.0020) [2024-07-27 17:06:45,425][00473] Fps is (10 sec: 3685.7, 60 sec: 2355.1, 300 sec: 2355.1). Total num frames: 94208. Throughput: 0: 575.5. Samples: 23022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:06:45,429][00473] Avg episode reward: [(0, '4.529')] [2024-07-27 17:06:50,424][00473] Fps is (10 sec: 2866.8, 60 sec: 2366.5, 300 sec: 2366.5). Total num frames: 106496. Throughput: 0: 586.6. Samples: 26400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:06:50,432][00473] Avg episode reward: [(0, '4.500')] [2024-07-27 17:06:50,444][02681] Saving new best policy, reward=4.500! [2024-07-27 17:06:53,432][02694] Updated weights for policy 0, policy_version 30 (0.0029) [2024-07-27 17:06:55,423][00473] Fps is (10 sec: 3277.4, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 126976. Throughput: 0: 730.1. Samples: 32854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:06:55,425][00473] Avg episode reward: [(0, '4.430')] [2024-07-27 17:07:00,423][00473] Fps is (10 sec: 4096.5, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 147456. Throughput: 0: 798.7. Samples: 35948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:07:00,425][00473] Avg episode reward: [(0, '4.413')] [2024-07-27 17:07:05,423][00473] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 159744. Throughput: 0: 842.0. Samples: 40234. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:07:05,430][00473] Avg episode reward: [(0, '4.371')] [2024-07-27 17:07:05,609][02694] Updated weights for policy 0, policy_version 40 (0.0026) [2024-07-27 17:07:10,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 2772.7). Total num frames: 180224. Throughput: 0: 871.0. Samples: 46010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:07:10,425][00473] Avg episode reward: [(0, '4.442')] [2024-07-27 17:07:15,106][02694] Updated weights for policy 0, policy_version 50 (0.0022) [2024-07-27 17:07:15,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3413.7, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 903.6. Samples: 49320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:07:15,428][00473] Avg episode reward: [(0, '4.606')] [2024-07-27 17:07:15,433][02681] Saving new best policy, reward=4.606! [2024-07-27 17:07:20,423][00473] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 2894.5). Total num frames: 217088. Throughput: 0: 906.3. Samples: 54542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:07:20,429][00473] Avg episode reward: [(0, '4.598')] [2024-07-27 17:07:25,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 2969.6). Total num frames: 237568. Throughput: 0: 868.4. Samples: 59416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:07:25,425][00473] Avg episode reward: [(0, '4.452')] [2024-07-27 17:07:27,342][02694] Updated weights for policy 0, policy_version 60 (0.0021) [2024-07-27 17:07:30,423][00473] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3035.9). Total num frames: 258048. Throughput: 0: 878.3. Samples: 62544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:07:30,429][00473] Avg episode reward: [(0, '4.553')] [2024-07-27 17:07:35,423][00473] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3049.2). Total num frames: 274432. Throughput: 0: 941.0. Samples: 68744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:07:35,431][00473] Avg episode reward: [(0, '4.513')] [2024-07-27 17:07:39,288][02694] Updated weights for policy 0, policy_version 70 (0.0035) [2024-07-27 17:07:40,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3018.1). Total num frames: 286720. Throughput: 0: 885.9. Samples: 72720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:07:40,425][00473] Avg episode reward: [(0, '4.307')] [2024-07-27 17:07:40,478][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... [2024-07-27 17:07:45,423][00473] Fps is (10 sec: 3686.6, 60 sec: 3618.3, 300 sec: 3113.0). Total num frames: 311296. Throughput: 0: 885.6. Samples: 75802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:07:45,425][00473] Avg episode reward: [(0, '4.235')] [2024-07-27 17:07:49,129][02694] Updated weights for policy 0, policy_version 80 (0.0021) [2024-07-27 17:07:50,425][00473] Fps is (10 sec: 4504.3, 60 sec: 3754.6, 300 sec: 3159.7). Total num frames: 331776. Throughput: 0: 936.7. Samples: 82386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:07:50,428][00473] Avg episode reward: [(0, '4.510')] [2024-07-27 17:07:55,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3127.9). Total num frames: 344064. Throughput: 0: 909.8. Samples: 86950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:07:55,425][00473] Avg episode reward: [(0, '4.469')] [2024-07-27 17:08:00,423][00473] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3169.9). Total num frames: 364544. Throughput: 0: 884.7. Samples: 89130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:08:00,425][00473] Avg episode reward: [(0, '4.396')] [2024-07-27 17:08:01,229][02694] Updated weights for policy 0, policy_version 90 (0.0036) [2024-07-27 17:08:05,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3208.5). Total num frames: 385024. Throughput: 0: 917.7. Samples: 95838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:08:05,425][00473] Avg episode reward: [(0, '4.338')] [2024-07-27 17:08:10,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3211.3). Total num frames: 401408. Throughput: 0: 933.5. Samples: 101424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:08:10,426][00473] Avg episode reward: [(0, '4.413')] [2024-07-27 17:08:12,147][02694] Updated weights for policy 0, policy_version 100 (0.0017) [2024-07-27 17:08:15,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 908.2. Samples: 103414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:08:15,426][00473] Avg episode reward: [(0, '4.540')] [2024-07-27 17:08:20,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3246.5). Total num frames: 438272. Throughput: 0: 901.6. Samples: 109314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:08:20,429][00473] Avg episode reward: [(0, '4.546')] [2024-07-27 17:08:22,483][02694] Updated weights for policy 0, policy_version 110 (0.0029) [2024-07-27 17:08:25,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 458752. Throughput: 0: 960.8. Samples: 115954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:08:25,430][00473] Avg episode reward: [(0, '4.348')] [2024-07-27 17:08:30,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 475136. Throughput: 0: 935.5. Samples: 117900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:08:30,429][00473] Avg episode reward: [(0, '4.318')] [2024-07-27 17:08:34,971][02694] Updated weights for policy 0, policy_version 120 (0.0044) [2024-07-27 17:08:35,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3276.8). Total num frames: 491520. Throughput: 0: 891.9. Samples: 122518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:08:35,428][00473] Avg episode reward: [(0, '4.599')] [2024-07-27 17:08:40,423][00473] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3329.6). Total num frames: 516096. Throughput: 0: 934.4. Samples: 129000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:08:40,425][00473] Avg episode reward: [(0, '4.738')] [2024-07-27 17:08:40,436][02681] Saving new best policy, reward=4.738! [2024-07-27 17:08:45,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 950.8. Samples: 131916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:08:45,425][00473] Avg episode reward: [(0, '4.622')] [2024-07-27 17:08:45,777][02694] Updated weights for policy 0, policy_version 130 (0.0022) [2024-07-27 17:08:50,423][00473] Fps is (10 sec: 2867.3, 60 sec: 3550.0, 300 sec: 3301.6). Total num frames: 544768. Throughput: 0: 890.3. Samples: 135900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:08:50,425][00473] Avg episode reward: [(0, '4.610')] [2024-07-27 17:08:55,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3325.0). Total num frames: 565248. Throughput: 0: 896.8. Samples: 141780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:08:55,425][00473] Avg episode reward: [(0, '4.508')] [2024-07-27 17:08:56,941][02694] Updated weights for policy 0, policy_version 140 (0.0036) [2024-07-27 17:09:00,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3347.0). Total num frames: 585728. Throughput: 0: 924.4. Samples: 145010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:09:00,427][00473] Avg episode reward: [(0, '4.482')] [2024-07-27 17:09:05,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3322.3). Total num frames: 598016. Throughput: 0: 895.4. Samples: 149608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:09:05,429][00473] Avg episode reward: [(0, '4.515')] [2024-07-27 17:09:09,641][02694] Updated weights for policy 0, policy_version 150 (0.0036) [2024-07-27 17:09:10,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3321.1). Total num frames: 614400. Throughput: 0: 860.0. Samples: 154652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:09:10,430][00473] Avg episode reward: [(0, '4.494')] [2024-07-27 17:09:15,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3363.0). Total num frames: 638976. Throughput: 0: 886.8. Samples: 157806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:09:15,427][00473] Avg episode reward: [(0, '4.495')] [2024-07-27 17:09:20,064][02694] Updated weights for policy 0, policy_version 160 (0.0026) [2024-07-27 17:09:20,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3360.8). Total num frames: 655360. Throughput: 0: 912.4. Samples: 163576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:09:20,428][00473] Avg episode reward: [(0, '4.510')] [2024-07-27 17:09:25,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3338.2). Total num frames: 667648. Throughput: 0: 860.1. Samples: 167706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:09:25,425][00473] Avg episode reward: [(0, '4.481')] [2024-07-27 17:09:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3356.7). Total num frames: 688128. Throughput: 0: 867.7. Samples: 170964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:09:30,428][00473] Avg episode reward: [(0, '4.536')] [2024-07-27 17:09:31,567][02694] Updated weights for policy 0, policy_version 170 (0.0019) [2024-07-27 17:09:35,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3374.3). Total num frames: 708608. Throughput: 0: 920.9. Samples: 177340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:09:35,429][00473] Avg episode reward: [(0, '4.587')] [2024-07-27 17:09:40,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3372.1). Total num frames: 724992. Throughput: 0: 887.6. Samples: 181722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:09:40,425][00473] Avg episode reward: [(0, '4.596')] [2024-07-27 17:09:40,437][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth... [2024-07-27 17:09:43,894][02694] Updated weights for policy 0, policy_version 180 (0.0035) [2024-07-27 17:09:45,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3369.9). Total num frames: 741376. Throughput: 0: 867.1. Samples: 184030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:09:45,425][00473] Avg episode reward: [(0, '4.607')] [2024-07-27 17:09:50,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3386.0). Total num frames: 761856. Throughput: 0: 904.2. Samples: 190298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:09:50,425][00473] Avg episode reward: [(0, '4.604')] [2024-07-27 17:09:54,605][02694] Updated weights for policy 0, policy_version 190 (0.0020) [2024-07-27 17:09:55,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3383.7). Total num frames: 778240. Throughput: 0: 907.6. Samples: 195496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:09:55,425][00473] Avg episode reward: [(0, '4.504')] [2024-07-27 17:10:00,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3363.9). Total num frames: 790528. Throughput: 0: 880.2. Samples: 197414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:10:00,432][00473] Avg episode reward: [(0, '4.468')] [2024-07-27 17:10:05,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3396.3). Total num frames: 815104. Throughput: 0: 874.7. Samples: 202938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:10:05,427][00473] Avg episode reward: [(0, '4.530')] [2024-07-27 17:10:06,508][02694] Updated weights for policy 0, policy_version 200 (0.0019) [2024-07-27 17:10:10,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3393.8). Total num frames: 831488. Throughput: 0: 920.0. Samples: 209108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:10:10,427][00473] Avg episode reward: [(0, '4.561')] [2024-07-27 17:10:15,424][00473] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3391.5). Total num frames: 847872. Throughput: 0: 890.0. Samples: 211016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:10:15,427][00473] Avg episode reward: [(0, '4.468')] [2024-07-27 17:10:18,636][02694] Updated weights for policy 0, policy_version 210 (0.0031) [2024-07-27 17:10:20,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3389.2). Total num frames: 864256. Throughput: 0: 856.8. Samples: 215896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:10:20,430][00473] Avg episode reward: [(0, '4.470')] [2024-07-27 17:10:25,423][00473] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3418.6). Total num frames: 888832. Throughput: 0: 904.7. Samples: 222432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:10:25,424][00473] Avg episode reward: [(0, '4.594')] [2024-07-27 17:10:28,758][02694] Updated weights for policy 0, policy_version 220 (0.0024) [2024-07-27 17:10:30,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3415.9). Total num frames: 905216. Throughput: 0: 913.9. Samples: 225156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:10:30,425][00473] Avg episode reward: [(0, '4.484')] [2024-07-27 17:10:35,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3398.2). Total num frames: 917504. Throughput: 0: 862.5. Samples: 229112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:10:35,429][00473] Avg episode reward: [(0, '4.488')] [2024-07-27 17:10:40,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3410.9). Total num frames: 937984. Throughput: 0: 890.0. Samples: 235548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:10:40,425][00473] Avg episode reward: [(0, '4.522')] [2024-07-27 17:10:40,492][02694] Updated weights for policy 0, policy_version 230 (0.0024) [2024-07-27 17:10:45,430][00473] Fps is (10 sec: 4093.2, 60 sec: 3617.7, 300 sec: 3423.0). Total num frames: 958464. Throughput: 0: 919.5. Samples: 238800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:10:45,438][00473] Avg episode reward: [(0, '4.583')] [2024-07-27 17:10:50,426][00473] Fps is (10 sec: 3275.5, 60 sec: 3481.4, 300 sec: 3406.1). Total num frames: 970752. Throughput: 0: 892.7. Samples: 243112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:10:50,429][00473] Avg episode reward: [(0, '4.486')] [2024-07-27 17:10:52,858][02694] Updated weights for policy 0, policy_version 240 (0.0040) [2024-07-27 17:10:55,427][00473] Fps is (10 sec: 3277.7, 60 sec: 3549.6, 300 sec: 3418.0). Total num frames: 991232. Throughput: 0: 878.8. Samples: 248656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:10:55,431][00473] Avg episode reward: [(0, '4.616')] [2024-07-27 17:11:00,423][00473] Fps is (10 sec: 4097.4, 60 sec: 3686.4, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 905.2. Samples: 251748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:11:00,430][00473] Avg episode reward: [(0, '4.683')] [2024-07-27 17:11:03,561][02694] Updated weights for policy 0, policy_version 250 (0.0050) [2024-07-27 17:11:05,425][00473] Fps is (10 sec: 3687.2, 60 sec: 3549.8, 300 sec: 3485.0). Total num frames: 1028096. Throughput: 0: 908.5. Samples: 256782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:11:05,431][00473] Avg episode reward: [(0, '4.872')] [2024-07-27 17:11:05,434][02681] Saving new best policy, reward=4.872! [2024-07-27 17:11:10,423][00473] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3526.8). Total num frames: 1040384. Throughput: 0: 861.6. Samples: 261202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:11:10,428][00473] Avg episode reward: [(0, '4.892')] [2024-07-27 17:11:10,444][02681] Saving new best policy, reward=4.892! [2024-07-27 17:11:15,380][02694] Updated weights for policy 0, policy_version 260 (0.0021) [2024-07-27 17:11:15,423][00473] Fps is (10 sec: 3687.1, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 1064960. Throughput: 0: 868.4. Samples: 264236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:11:15,427][00473] Avg episode reward: [(0, '4.919')] [2024-07-27 17:11:15,429][02681] Saving new best policy, reward=4.919! [2024-07-27 17:11:20,424][00473] Fps is (10 sec: 4095.3, 60 sec: 3618.0, 300 sec: 3582.3). Total num frames: 1081344. Throughput: 0: 914.3. Samples: 270258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:11:20,428][00473] Avg episode reward: [(0, '4.719')] [2024-07-27 17:11:25,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3582.3). Total num frames: 1093632. Throughput: 0: 858.8. Samples: 274194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:11:25,426][00473] Avg episode reward: [(0, '4.506')] [2024-07-27 17:11:27,913][02694] Updated weights for policy 0, policy_version 270 (0.0042) [2024-07-27 17:11:30,423][00473] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 1114112. Throughput: 0: 851.4. Samples: 277106. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:11:30,428][00473] Avg episode reward: [(0, '4.616')] [2024-07-27 17:11:35,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1134592. Throughput: 0: 898.8. Samples: 283554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:11:35,430][00473] Avg episode reward: [(0, '4.641')] [2024-07-27 17:11:38,517][02694] Updated weights for policy 0, policy_version 280 (0.0030) [2024-07-27 17:11:40,425][00473] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3582.3). Total num frames: 1150976. Throughput: 0: 880.8. Samples: 288292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:11:40,429][00473] Avg episode reward: [(0, '4.609')] [2024-07-27 17:11:40,441][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000281_1150976.pth... [2024-07-27 17:11:40,607][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth [2024-07-27 17:11:45,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3482.0, 300 sec: 3596.2). Total num frames: 1167360. Throughput: 0: 855.9. Samples: 290264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:11:45,431][00473] Avg episode reward: [(0, '4.729')] [2024-07-27 17:11:49,691][02694] Updated weights for policy 0, policy_version 290 (0.0048) [2024-07-27 17:11:50,423][00473] Fps is (10 sec: 3687.1, 60 sec: 3618.4, 300 sec: 3596.1). Total num frames: 1187840. Throughput: 0: 888.0. Samples: 296742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:11:50,427][00473] Avg episode reward: [(0, '4.641')] [2024-07-27 17:11:55,423][00473] Fps is (10 sec: 4095.7, 60 sec: 3618.3, 300 sec: 3596.1). Total num frames: 1208320. Throughput: 0: 917.9. Samples: 302510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:11:55,426][00473] Avg episode reward: [(0, '4.605')] [2024-07-27 17:12:00,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 1220608. Throughput: 0: 895.1. Samples: 304516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:12:00,425][00473] Avg episode reward: [(0, '4.658')] [2024-07-27 17:12:01,731][02694] Updated weights for policy 0, policy_version 300 (0.0029) [2024-07-27 17:12:05,423][00473] Fps is (10 sec: 3686.7, 60 sec: 3618.3, 300 sec: 3610.0). Total num frames: 1245184. Throughput: 0: 885.1. Samples: 310088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:12:05,424][00473] Avg episode reward: [(0, '4.782')] [2024-07-27 17:12:10,426][00473] Fps is (10 sec: 4503.9, 60 sec: 3754.4, 300 sec: 3596.1). Total num frames: 1265664. Throughput: 0: 944.5. Samples: 316698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:12:10,429][00473] Avg episode reward: [(0, '4.919')] [2024-07-27 17:12:11,439][02694] Updated weights for policy 0, policy_version 310 (0.0023) [2024-07-27 17:12:15,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1277952. Throughput: 0: 926.0. Samples: 318774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:12:15,427][00473] Avg episode reward: [(0, '4.999')] [2024-07-27 17:12:15,431][02681] Saving new best policy, reward=4.999! [2024-07-27 17:12:20,423][00473] Fps is (10 sec: 2868.3, 60 sec: 3550.0, 300 sec: 3582.3). Total num frames: 1294336. Throughput: 0: 888.2. Samples: 323522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:12:20,425][00473] Avg episode reward: [(0, '4.789')] [2024-07-27 17:12:23,414][02694] Updated weights for policy 0, policy_version 320 (0.0021) [2024-07-27 17:12:25,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1318912. Throughput: 0: 927.9. Samples: 330046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:12:25,427][00473] Avg episode reward: [(0, '4.828')] [2024-07-27 17:12:30,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 1335296. Throughput: 0: 949.7. Samples: 333002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:12:30,427][00473] Avg episode reward: [(0, '4.670')] [2024-07-27 17:12:35,296][02694] Updated weights for policy 0, policy_version 330 (0.0029) [2024-07-27 17:12:35,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1351680. Throughput: 0: 896.3. Samples: 337074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:12:35,426][00473] Avg episode reward: [(0, '4.727')] [2024-07-27 17:12:40,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3596.1). Total num frames: 1372160. Throughput: 0: 907.1. Samples: 343328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:12:40,428][00473] Avg episode reward: [(0, '4.756')] [2024-07-27 17:12:44,814][02694] Updated weights for policy 0, policy_version 340 (0.0017) [2024-07-27 17:12:45,425][00473] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3596.1). Total num frames: 1392640. Throughput: 0: 935.1. Samples: 346596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:12:45,431][00473] Avg episode reward: [(0, '4.642')] [2024-07-27 17:12:50,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1404928. Throughput: 0: 917.2. Samples: 351364. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-27 17:12:50,425][00473] Avg episode reward: [(0, '4.483')] [2024-07-27 17:12:55,424][00473] Fps is (10 sec: 3277.1, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1425408. Throughput: 0: 889.6. Samples: 356726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:12:55,427][00473] Avg episode reward: [(0, '4.444')] [2024-07-27 17:12:56,965][02694] Updated weights for policy 0, policy_version 350 (0.0041) [2024-07-27 17:13:00,423][00473] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3596.1). Total num frames: 1445888. Throughput: 0: 916.6. Samples: 360020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:13:00,429][00473] Avg episode reward: [(0, '4.606')] [2024-07-27 17:13:05,423][00473] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1462272. Throughput: 0: 937.0. Samples: 365688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:13:05,425][00473] Avg episode reward: [(0, '4.589')] [2024-07-27 17:13:09,396][02694] Updated weights for policy 0, policy_version 360 (0.0042) [2024-07-27 17:13:10,423][00473] Fps is (10 sec: 2867.3, 60 sec: 3481.8, 300 sec: 3582.3). Total num frames: 1474560. Throughput: 0: 885.6. Samples: 369900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:13:10,425][00473] Avg episode reward: [(0, '4.586')] [2024-07-27 17:13:15,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1499136. Throughput: 0: 890.8. Samples: 373086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:13:15,425][00473] Avg episode reward: [(0, '4.620')] [2024-07-27 17:13:18,710][02694] Updated weights for policy 0, policy_version 370 (0.0020) [2024-07-27 17:13:20,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 1519616. Throughput: 0: 945.1. Samples: 379602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-07-27 17:13:20,435][00473] Avg episode reward: [(0, '4.439')] [2024-07-27 17:13:25,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1531904. Throughput: 0: 899.0. Samples: 383784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:13:25,430][00473] Avg episode reward: [(0, '4.600')] [2024-07-27 17:13:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1552384. Throughput: 0: 881.7. Samples: 386270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:13:30,427][00473] Avg episode reward: [(0, '4.544')] [2024-07-27 17:13:31,210][02694] Updated weights for policy 0, policy_version 380 (0.0038) [2024-07-27 17:13:35,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1572864. Throughput: 0: 915.1. Samples: 392542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:13:35,425][00473] Avg episode reward: [(0, '4.481')] [2024-07-27 17:13:40,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1589248. Throughput: 0: 910.3. Samples: 397690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:13:40,429][00473] Avg episode reward: [(0, '4.327')] [2024-07-27 17:13:40,445][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth... [2024-07-27 17:13:40,600][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth [2024-07-27 17:13:43,401][02694] Updated weights for policy 0, policy_version 390 (0.0039) [2024-07-27 17:13:45,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3582.3). Total num frames: 1601536. Throughput: 0: 880.7. Samples: 399652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:13:45,431][00473] Avg episode reward: [(0, '4.247')] [2024-07-27 17:13:50,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1622016. Throughput: 0: 884.2. Samples: 405478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:13:50,425][00473] Avg episode reward: [(0, '4.432')] [2024-07-27 17:13:53,328][02694] Updated weights for policy 0, policy_version 400 (0.0029) [2024-07-27 17:13:55,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 1642496. Throughput: 0: 927.4. Samples: 411632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:13:55,428][00473] Avg episode reward: [(0, '4.306')] [2024-07-27 17:14:00,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 1654784. Throughput: 0: 899.2. Samples: 413548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:14:00,432][00473] Avg episode reward: [(0, '4.465')] [2024-07-27 17:14:05,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 1675264. Throughput: 0: 865.8. Samples: 418562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:14:05,430][00473] Avg episode reward: [(0, '4.765')] [2024-07-27 17:14:05,736][02694] Updated weights for policy 0, policy_version 410 (0.0036) [2024-07-27 17:14:10,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1695744. Throughput: 0: 915.7. Samples: 424990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:14:10,428][00473] Avg episode reward: [(0, '5.112')] [2024-07-27 17:14:10,445][02681] Saving new best policy, reward=5.112! [2024-07-27 17:14:15,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1712128. Throughput: 0: 917.6. Samples: 427564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:14:15,432][00473] Avg episode reward: [(0, '5.122')] [2024-07-27 17:14:15,433][02681] Saving new best policy, reward=5.122! [2024-07-27 17:14:17,683][02694] Updated weights for policy 0, policy_version 420 (0.0022) [2024-07-27 17:14:20,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 1728512. Throughput: 0: 867.7. Samples: 431590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:14:20,427][00473] Avg episode reward: [(0, '4.918')] [2024-07-27 17:14:25,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1748992. Throughput: 0: 896.1. Samples: 438016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:14:25,428][00473] Avg episode reward: [(0, '4.987')] [2024-07-27 17:14:27,733][02694] Updated weights for policy 0, policy_version 430 (0.0023) [2024-07-27 17:14:30,423][00473] Fps is (10 sec: 4095.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1769472. Throughput: 0: 923.8. Samples: 441222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:14:30,429][00473] Avg episode reward: [(0, '4.943')] [2024-07-27 17:14:35,423][00473] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 1781760. Throughput: 0: 889.6. Samples: 445510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:14:35,428][00473] Avg episode reward: [(0, '4.956')] [2024-07-27 17:14:39,899][02694] Updated weights for policy 0, policy_version 440 (0.0030) [2024-07-27 17:14:40,423][00473] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 1802240. Throughput: 0: 878.3. Samples: 451154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:14:40,425][00473] Avg episode reward: [(0, '4.795')] [2024-07-27 17:14:45,423][00473] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1822720. Throughput: 0: 908.2. Samples: 454416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:14:45,425][00473] Avg episode reward: [(0, '4.596')] [2024-07-27 17:14:50,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1839104. Throughput: 0: 913.2. Samples: 459658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:14:50,425][00473] Avg episode reward: [(0, '4.751')] [2024-07-27 17:14:51,484][02694] Updated weights for policy 0, policy_version 450 (0.0026) [2024-07-27 17:14:55,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1855488. Throughput: 0: 875.9. Samples: 464406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:14:55,425][00473] Avg episode reward: [(0, '4.583')] [2024-07-27 17:15:00,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1875968. Throughput: 0: 889.6. Samples: 467594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:15:00,425][00473] Avg episode reward: [(0, '4.600')] [2024-07-27 17:15:01,527][02694] Updated weights for policy 0, policy_version 460 (0.0034) [2024-07-27 17:15:05,423][00473] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3610.0). Total num frames: 1896448. Throughput: 0: 942.8. Samples: 474014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:15:05,429][00473] Avg episode reward: [(0, '4.765')] [2024-07-27 17:15:10,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 1908736. Throughput: 0: 889.0. Samples: 478020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:15:10,428][00473] Avg episode reward: [(0, '4.700')] [2024-07-27 17:15:13,786][02694] Updated weights for policy 0, policy_version 470 (0.0024) [2024-07-27 17:15:15,423][00473] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1929216. Throughput: 0: 884.5. Samples: 481026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:15:15,425][00473] Avg episode reward: [(0, '4.700')] [2024-07-27 17:15:20,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 1953792. Throughput: 0: 939.9. Samples: 487806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:15:20,425][00473] Avg episode reward: [(0, '4.924')] [2024-07-27 17:15:24,155][02694] Updated weights for policy 0, policy_version 480 (0.0031) [2024-07-27 17:15:25,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1966080. Throughput: 0: 918.6. Samples: 492490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:15:25,427][00473] Avg episode reward: [(0, '4.842')] [2024-07-27 17:15:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 1986560. Throughput: 0: 893.4. Samples: 494618. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:15:30,429][00473] Avg episode reward: [(0, '4.768')] [2024-07-27 17:15:34,734][02694] Updated weights for policy 0, policy_version 490 (0.0023) [2024-07-27 17:15:35,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2007040. Throughput: 0: 924.5. Samples: 501262. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:15:35,432][00473] Avg episode reward: [(0, '4.837')] [2024-07-27 17:15:40,424][00473] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3610.1). Total num frames: 2023424. Throughput: 0: 943.2. Samples: 506852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:15:40,426][00473] Avg episode reward: [(0, '4.941')] [2024-07-27 17:15:40,442][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth... [2024-07-27 17:15:40,613][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000281_1150976.pth [2024-07-27 17:15:45,423][00473] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3624.0). Total num frames: 2039808. Throughput: 0: 912.6. Samples: 508660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:15:45,425][00473] Avg episode reward: [(0, '4.865')] [2024-07-27 17:15:47,317][02694] Updated weights for policy 0, policy_version 500 (0.0024) [2024-07-27 17:15:50,423][00473] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3624.0). Total num frames: 2060288. Throughput: 0: 895.6. Samples: 514316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:15:50,427][00473] Avg episode reward: [(0, '4.636')] [2024-07-27 17:15:55,423][00473] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2080768. Throughput: 0: 952.0. Samples: 520858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:15:55,427][00473] Avg episode reward: [(0, '4.920')] [2024-07-27 17:15:57,724][02694] Updated weights for policy 0, policy_version 510 (0.0025) [2024-07-27 17:16:00,425][00473] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 2093056. Throughput: 0: 931.0. Samples: 522924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:16:00,427][00473] Avg episode reward: [(0, '5.156')] [2024-07-27 17:16:00,439][02681] Saving new best policy, reward=5.156! [2024-07-27 17:16:05,423][00473] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2113536. Throughput: 0: 887.1. Samples: 527726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:16:05,429][00473] Avg episode reward: [(0, '4.938')] [2024-07-27 17:16:08,703][02694] Updated weights for policy 0, policy_version 520 (0.0017) [2024-07-27 17:16:10,423][00473] Fps is (10 sec: 4096.9, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 2134016. Throughput: 0: 927.9. Samples: 534244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:16:10,426][00473] Avg episode reward: [(0, '4.756')] [2024-07-27 17:16:15,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2150400. Throughput: 0: 948.1. Samples: 537282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:16:15,424][00473] Avg episode reward: [(0, '4.861')] [2024-07-27 17:16:20,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 2166784. Throughput: 0: 891.6. Samples: 541382. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:16:20,425][00473] Avg episode reward: [(0, '4.858')] [2024-07-27 17:16:20,843][02694] Updated weights for policy 0, policy_version 530 (0.0016) [2024-07-27 17:16:25,423][00473] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2191360. Throughput: 0: 911.2. Samples: 547854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:16:25,425][00473] Avg episode reward: [(0, '4.827')] [2024-07-27 17:16:30,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2211840. Throughput: 0: 944.7. Samples: 551170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:16:30,425][00473] Avg episode reward: [(0, '4.831')] [2024-07-27 17:16:30,446][02694] Updated weights for policy 0, policy_version 540 (0.0041) [2024-07-27 17:16:35,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2224128. Throughput: 0: 921.9. Samples: 555800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:16:35,428][00473] Avg episode reward: [(0, '4.757')] [2024-07-27 17:16:40,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 2244608. Throughput: 0: 900.0. Samples: 561358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:16:40,425][00473] Avg episode reward: [(0, '4.667')] [2024-07-27 17:16:42,097][02694] Updated weights for policy 0, policy_version 550 (0.0024) [2024-07-27 17:16:45,423][00473] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 2265088. Throughput: 0: 929.2. Samples: 564734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:16:45,425][00473] Avg episode reward: [(0, '4.530')] [2024-07-27 17:16:50,423][00473] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2281472. Throughput: 0: 956.3. Samples: 570758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:16:50,428][00473] Avg episode reward: [(0, '4.711')] [2024-07-27 17:16:53,537][02694] Updated weights for policy 0, policy_version 560 (0.0044) [2024-07-27 17:16:55,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2297856. Throughput: 0: 910.4. Samples: 575210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:16:55,428][00473] Avg episode reward: [(0, '4.730')] [2024-07-27 17:17:00,423][00473] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3651.7). Total num frames: 2322432. Throughput: 0: 918.5. Samples: 578614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:00,425][00473] Avg episode reward: [(0, '4.876')] [2024-07-27 17:17:02,827][02694] Updated weights for policy 0, policy_version 570 (0.0030) [2024-07-27 17:17:05,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 2342912. Throughput: 0: 980.0. Samples: 585484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:05,431][00473] Avg episode reward: [(0, '5.094')] [2024-07-27 17:17:10,426][00473] Fps is (10 sec: 3275.9, 60 sec: 3686.2, 300 sec: 3651.7). Total num frames: 2355200. Throughput: 0: 932.5. Samples: 589820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:17:10,428][00473] Avg episode reward: [(0, '4.944')] [2024-07-27 17:17:14,768][02694] Updated weights for policy 0, policy_version 580 (0.0030) [2024-07-27 17:17:15,423][00473] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2375680. Throughput: 0: 916.8. Samples: 592424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:15,425][00473] Avg episode reward: [(0, '4.769')] [2024-07-27 17:17:20,423][00473] Fps is (10 sec: 4506.9, 60 sec: 3891.2, 300 sec: 3665.6). Total num frames: 2400256. Throughput: 0: 967.2. Samples: 599322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:20,427][00473] Avg episode reward: [(0, '4.788')] [2024-07-27 17:17:24,884][02694] Updated weights for policy 0, policy_version 590 (0.0029) [2024-07-27 17:17:25,426][00473] Fps is (10 sec: 4094.5, 60 sec: 3754.4, 300 sec: 3665.5). Total num frames: 2416640. Throughput: 0: 962.1. Samples: 604658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:17:25,428][00473] Avg episode reward: [(0, '4.732')] [2024-07-27 17:17:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2433024. Throughput: 0: 934.7. Samples: 606796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:17:30,424][00473] Avg episode reward: [(0, '4.660')] [2024-07-27 17:17:35,306][02694] Updated weights for policy 0, policy_version 600 (0.0019) [2024-07-27 17:17:35,426][00473] Fps is (10 sec: 4096.3, 60 sec: 3891.0, 300 sec: 3679.4). Total num frames: 2457600. Throughput: 0: 942.4. Samples: 613168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:17:35,428][00473] Avg episode reward: [(0, '4.789')] [2024-07-27 17:17:40,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3665.6). Total num frames: 2473984. Throughput: 0: 983.4. Samples: 619464. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:17:40,429][00473] Avg episode reward: [(0, '5.038')] [2024-07-27 17:17:40,447][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth... [2024-07-27 17:17:40,612][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth [2024-07-27 17:17:45,423][00473] Fps is (10 sec: 2868.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2486272. Throughput: 0: 949.2. Samples: 621326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:45,428][00473] Avg episode reward: [(0, '5.179')] [2024-07-27 17:17:45,438][02681] Saving new best policy, reward=5.179! [2024-07-27 17:17:47,763][02694] Updated weights for policy 0, policy_version 610 (0.0017) [2024-07-27 17:17:50,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2510848. Throughput: 0: 916.7. Samples: 626736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:17:50,429][00473] Avg episode reward: [(0, '5.870')] [2024-07-27 17:17:50,440][02681] Saving new best policy, reward=5.870! [2024-07-27 17:17:55,425][00473] Fps is (10 sec: 4504.7, 60 sec: 3891.1, 300 sec: 3679.4). Total num frames: 2531328. Throughput: 0: 965.8. Samples: 633280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:17:55,429][00473] Avg episode reward: [(0, '6.357')] [2024-07-27 17:17:55,439][02681] Saving new best policy, reward=6.357! [2024-07-27 17:17:57,366][02694] Updated weights for policy 0, policy_version 620 (0.0025) [2024-07-27 17:18:00,423][00473] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2547712. Throughput: 0: 962.8. Samples: 635752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:18:00,430][00473] Avg episode reward: [(0, '6.328')] [2024-07-27 17:18:05,423][00473] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2564096. Throughput: 0: 908.8. Samples: 640220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:18:05,425][00473] Avg episode reward: [(0, '6.486')] [2024-07-27 17:18:05,427][02681] Saving new best policy, reward=6.486! [2024-07-27 17:18:09,136][02694] Updated weights for policy 0, policy_version 630 (0.0032) [2024-07-27 17:18:10,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3679.5). Total num frames: 2584576. Throughput: 0: 934.4. Samples: 646704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:18:10,425][00473] Avg episode reward: [(0, '6.601')] [2024-07-27 17:18:10,436][02681] Saving new best policy, reward=6.601! [2024-07-27 17:18:15,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2600960. Throughput: 0: 959.4. Samples: 649968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:18:15,427][00473] Avg episode reward: [(0, '6.772')] [2024-07-27 17:18:15,472][02681] Saving new best policy, reward=6.772! [2024-07-27 17:18:20,432][00473] Fps is (10 sec: 3273.9, 60 sec: 3617.6, 300 sec: 3679.3). Total num frames: 2617344. Throughput: 0: 910.4. Samples: 654140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:18:20,438][00473] Avg episode reward: [(0, '6.788')] [2024-07-27 17:18:20,452][02681] Saving new best policy, reward=6.788! [2024-07-27 17:18:21,260][02694] Updated weights for policy 0, policy_version 640 (0.0024) [2024-07-27 17:18:25,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 2637824. Throughput: 0: 902.5. Samples: 660076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:18:25,430][00473] Avg episode reward: [(0, '6.249')] [2024-07-27 17:18:30,423][00473] Fps is (10 sec: 4099.6, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2658304. Throughput: 0: 934.4. Samples: 663374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:18:30,425][00473] Avg episode reward: [(0, '6.396')] [2024-07-27 17:18:30,494][02694] Updated weights for policy 0, policy_version 650 (0.0022) [2024-07-27 17:18:35,423][00473] Fps is (10 sec: 3686.2, 60 sec: 3618.3, 300 sec: 3679.5). Total num frames: 2674688. Throughput: 0: 930.4. Samples: 668606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:18:35,426][00473] Avg episode reward: [(0, '6.827')] [2024-07-27 17:18:35,433][02681] Saving new best policy, reward=6.827! [2024-07-27 17:18:40,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2691072. Throughput: 0: 892.7. Samples: 673452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:18:40,425][00473] Avg episode reward: [(0, '6.968')] [2024-07-27 17:18:40,441][02681] Saving new best policy, reward=6.968! [2024-07-27 17:18:42,641][02694] Updated weights for policy 0, policy_version 660 (0.0026) [2024-07-27 17:18:45,423][00473] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2711552. Throughput: 0: 910.1. Samples: 676708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:18:45,425][00473] Avg episode reward: [(0, '6.988')] [2024-07-27 17:18:45,465][02681] Saving new best policy, reward=6.988! [2024-07-27 17:18:50,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2732032. Throughput: 0: 945.2. Samples: 682756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:18:50,425][00473] Avg episode reward: [(0, '6.813')] [2024-07-27 17:18:54,731][02694] Updated weights for policy 0, policy_version 670 (0.0025) [2024-07-27 17:18:55,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3693.3). Total num frames: 2744320. Throughput: 0: 890.7. Samples: 686786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:18:55,424][00473] Avg episode reward: [(0, '6.627')] [2024-07-27 17:19:00,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2768896. Throughput: 0: 889.6. Samples: 690000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:00,429][00473] Avg episode reward: [(0, '6.634')] [2024-07-27 17:19:04,023][02694] Updated weights for policy 0, policy_version 680 (0.0039) [2024-07-27 17:19:05,423][00473] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2789376. Throughput: 0: 949.5. Samples: 696858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:05,426][00473] Avg episode reward: [(0, '6.569')] [2024-07-27 17:19:10,427][00473] Fps is (10 sec: 3275.5, 60 sec: 3617.9, 300 sec: 3693.3). Total num frames: 2801664. Throughput: 0: 916.9. Samples: 701338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:10,429][00473] Avg episode reward: [(0, '6.382')] [2024-07-27 17:19:15,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2822144. Throughput: 0: 892.9. Samples: 703556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:19:15,427][00473] Avg episode reward: [(0, '6.476')] [2024-07-27 17:19:16,145][02694] Updated weights for policy 0, policy_version 690 (0.0023) [2024-07-27 17:19:20,423][00473] Fps is (10 sec: 4097.6, 60 sec: 3755.2, 300 sec: 3707.2). Total num frames: 2842624. Throughput: 0: 927.5. Samples: 710342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:20,428][00473] Avg episode reward: [(0, '6.381')] [2024-07-27 17:19:25,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2859008. Throughput: 0: 948.0. Samples: 716114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:25,427][00473] Avg episode reward: [(0, '6.615')] [2024-07-27 17:19:26,928][02694] Updated weights for policy 0, policy_version 700 (0.0026) [2024-07-27 17:19:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2875392. Throughput: 0: 920.2. Samples: 718118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:19:30,424][00473] Avg episode reward: [(0, '7.176')] [2024-07-27 17:19:30,441][02681] Saving new best policy, reward=7.176! [2024-07-27 17:19:35,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2895872. Throughput: 0: 912.7. Samples: 723828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:19:35,425][00473] Avg episode reward: [(0, '7.229')] [2024-07-27 17:19:35,472][02681] Saving new best policy, reward=7.229! [2024-07-27 17:19:37,354][02694] Updated weights for policy 0, policy_version 710 (0.0042) [2024-07-27 17:19:40,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2916352. Throughput: 0: 965.9. Samples: 730250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:19:40,425][00473] Avg episode reward: [(0, '7.483')] [2024-07-27 17:19:40,527][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth... [2024-07-27 17:19:40,711][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000494_2023424.pth [2024-07-27 17:19:40,730][02681] Saving new best policy, reward=7.483! [2024-07-27 17:19:45,423][00473] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2932736. Throughput: 0: 933.4. Samples: 732002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:19:45,425][00473] Avg episode reward: [(0, '7.620')] [2024-07-27 17:19:45,427][02681] Saving new best policy, reward=7.620! [2024-07-27 17:19:49,857][02694] Updated weights for policy 0, policy_version 720 (0.0041) [2024-07-27 17:19:50,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 2949120. Throughput: 0: 891.2. Samples: 736960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:19:50,430][00473] Avg episode reward: [(0, '7.794')] [2024-07-27 17:19:50,438][02681] Saving new best policy, reward=7.794! [2024-07-27 17:19:55,423][00473] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 2969600. Throughput: 0: 938.7. Samples: 743574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:19:55,425][00473] Avg episode reward: [(0, '7.726')] [2024-07-27 17:20:00,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2985984. Throughput: 0: 951.7. Samples: 746384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:00,425][00473] Avg episode reward: [(0, '7.987')] [2024-07-27 17:20:00,437][02681] Saving new best policy, reward=7.987! [2024-07-27 17:20:00,442][02694] Updated weights for policy 0, policy_version 730 (0.0031) [2024-07-27 17:20:05,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3707.2). Total num frames: 3002368. Throughput: 0: 892.4. Samples: 750498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:20:05,426][00473] Avg episode reward: [(0, '7.766')] [2024-07-27 17:20:10,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3721.1). Total num frames: 3026944. Throughput: 0: 907.3. Samples: 756942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:20:10,427][00473] Avg episode reward: [(0, '8.046')] [2024-07-27 17:20:10,436][02681] Saving new best policy, reward=8.046! [2024-07-27 17:20:11,335][02694] Updated weights for policy 0, policy_version 740 (0.0019) [2024-07-27 17:20:15,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3047424. Throughput: 0: 936.0. Samples: 760236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:15,438][00473] Avg episode reward: [(0, '8.178')] [2024-07-27 17:20:15,443][02681] Saving new best policy, reward=8.178! [2024-07-27 17:20:20,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 3055616. Throughput: 0: 902.8. Samples: 764452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:20,428][00473] Avg episode reward: [(0, '8.423')] [2024-07-27 17:20:20,519][02681] Saving new best policy, reward=8.423! [2024-07-27 17:20:23,821][02694] Updated weights for policy 0, policy_version 750 (0.0036) [2024-07-27 17:20:25,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3076096. Throughput: 0: 877.0. Samples: 769716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:25,432][00473] Avg episode reward: [(0, '8.212')] [2024-07-27 17:20:30,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3096576. Throughput: 0: 905.8. Samples: 772762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:20:30,429][00473] Avg episode reward: [(0, '8.359')] [2024-07-27 17:20:35,247][02694] Updated weights for policy 0, policy_version 760 (0.0022) [2024-07-27 17:20:35,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 3112960. Throughput: 0: 910.9. Samples: 777952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:20:35,430][00473] Avg episode reward: [(0, '8.337')] [2024-07-27 17:20:40,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 3125248. Throughput: 0: 862.9. Samples: 782404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:40,427][00473] Avg episode reward: [(0, '8.460')] [2024-07-27 17:20:40,438][02681] Saving new best policy, reward=8.460! [2024-07-27 17:20:45,423][00473] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3679.5). Total num frames: 3145728. Throughput: 0: 864.1. Samples: 785268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:20:45,426][00473] Avg episode reward: [(0, '9.060')] [2024-07-27 17:20:45,430][02681] Saving new best policy, reward=9.060! [2024-07-27 17:20:46,506][02694] Updated weights for policy 0, policy_version 770 (0.0031) [2024-07-27 17:20:50,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3166208. Throughput: 0: 907.0. Samples: 791314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:20:50,430][00473] Avg episode reward: [(0, '9.216')] [2024-07-27 17:20:50,444][02681] Saving new best policy, reward=9.216! [2024-07-27 17:20:55,423][00473] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3679.5). Total num frames: 3178496. Throughput: 0: 848.4. Samples: 795122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:20:55,426][00473] Avg episode reward: [(0, '9.390')] [2024-07-27 17:20:55,430][02681] Saving new best policy, reward=9.390! [2024-07-27 17:20:59,449][02694] Updated weights for policy 0, policy_version 780 (0.0027) [2024-07-27 17:21:00,426][00473] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3679.4). Total num frames: 3198976. Throughput: 0: 833.1. Samples: 797728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:21:00,430][00473] Avg episode reward: [(0, '8.793')] [2024-07-27 17:21:05,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3219456. Throughput: 0: 879.6. Samples: 804032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:21:05,425][00473] Avg episode reward: [(0, '8.777')] [2024-07-27 17:21:10,428][00473] Fps is (10 sec: 3276.1, 60 sec: 3413.1, 300 sec: 3665.5). Total num frames: 3231744. Throughput: 0: 865.9. Samples: 808684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:21:10,437][00473] Avg episode reward: [(0, '9.141')] [2024-07-27 17:21:11,368][02694] Updated weights for policy 0, policy_version 790 (0.0014) [2024-07-27 17:21:15,423][00473] Fps is (10 sec: 2867.1, 60 sec: 3345.0, 300 sec: 3665.6). Total num frames: 3248128. Throughput: 0: 840.7. Samples: 810596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:21:15,430][00473] Avg episode reward: [(0, '9.852')] [2024-07-27 17:21:15,432][02681] Saving new best policy, reward=9.852! [2024-07-27 17:21:20,423][00473] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 3268608. Throughput: 0: 859.7. Samples: 816638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:21:20,429][00473] Avg episode reward: [(0, '10.074')] [2024-07-27 17:21:20,439][02681] Saving new best policy, reward=10.074! [2024-07-27 17:21:21,888][02694] Updated weights for policy 0, policy_version 800 (0.0035) [2024-07-27 17:21:25,426][00473] Fps is (10 sec: 3685.1, 60 sec: 3481.4, 300 sec: 3637.8). Total num frames: 3284992. Throughput: 0: 886.2. Samples: 822286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:21:25,429][00473] Avg episode reward: [(0, '10.047')] [2024-07-27 17:21:30,424][00473] Fps is (10 sec: 3276.5, 60 sec: 3413.3, 300 sec: 3651.7). Total num frames: 3301376. Throughput: 0: 864.8. Samples: 824184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:21:30,426][00473] Avg episode reward: [(0, '10.714')] [2024-07-27 17:21:30,443][02681] Saving new best policy, reward=10.714! [2024-07-27 17:21:34,506][02694] Updated weights for policy 0, policy_version 810 (0.0022) [2024-07-27 17:21:35,423][00473] Fps is (10 sec: 3278.0, 60 sec: 3413.3, 300 sec: 3637.8). Total num frames: 3317760. Throughput: 0: 845.5. Samples: 829360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:21:35,428][00473] Avg episode reward: [(0, '10.907')] [2024-07-27 17:21:35,431][02681] Saving new best policy, reward=10.907! [2024-07-27 17:21:40,423][00473] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3342336. Throughput: 0: 900.6. Samples: 835650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:21:40,425][00473] Avg episode reward: [(0, '11.375')] [2024-07-27 17:21:40,434][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth... [2024-07-27 17:21:40,563][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth [2024-07-27 17:21:40,576][02681] Saving new best policy, reward=11.375! [2024-07-27 17:21:45,423][00473] Fps is (10 sec: 3276.5, 60 sec: 3413.3, 300 sec: 3623.9). Total num frames: 3350528. Throughput: 0: 885.8. Samples: 837586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:21:45,427][00473] Avg episode reward: [(0, '12.212')] [2024-07-27 17:21:45,433][02681] Saving new best policy, reward=12.212! [2024-07-27 17:21:47,288][02694] Updated weights for policy 0, policy_version 820 (0.0025) [2024-07-27 17:21:50,423][00473] Fps is (10 sec: 2457.7, 60 sec: 3345.1, 300 sec: 3623.9). Total num frames: 3366912. Throughput: 0: 839.1. Samples: 841790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:21:50,426][00473] Avg episode reward: [(0, '12.037')] [2024-07-27 17:21:55,423][00473] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3391488. Throughput: 0: 877.0. Samples: 848146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:21:55,425][00473] Avg episode reward: [(0, '12.404')] [2024-07-27 17:21:55,428][02681] Saving new best policy, reward=12.404! [2024-07-27 17:21:57,336][02694] Updated weights for policy 0, policy_version 830 (0.0049) [2024-07-27 17:22:00,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3610.0). Total num frames: 3407872. Throughput: 0: 902.9. Samples: 851224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-07-27 17:22:00,425][00473] Avg episode reward: [(0, '11.551')] [2024-07-27 17:22:05,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3610.1). Total num frames: 3420160. Throughput: 0: 855.3. Samples: 855126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:22:05,425][00473] Avg episode reward: [(0, '12.267')] [2024-07-27 17:22:09,787][02694] Updated weights for policy 0, policy_version 840 (0.0034) [2024-07-27 17:22:10,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3610.0). Total num frames: 3440640. Throughput: 0: 858.9. Samples: 860934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:22:10,427][00473] Avg episode reward: [(0, '11.338')] [2024-07-27 17:22:15,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3461120. Throughput: 0: 884.6. Samples: 863992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:22:15,427][00473] Avg episode reward: [(0, '11.636')] [2024-07-27 17:22:20,431][00473] Fps is (10 sec: 3274.2, 60 sec: 3412.9, 300 sec: 3582.2). Total num frames: 3473408. Throughput: 0: 875.9. Samples: 868784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:22:20,436][00473] Avg episode reward: [(0, '11.520')] [2024-07-27 17:22:22,099][02694] Updated weights for policy 0, policy_version 850 (0.0030) [2024-07-27 17:22:25,423][00473] Fps is (10 sec: 3276.7, 60 sec: 3481.8, 300 sec: 3596.1). Total num frames: 3493888. Throughput: 0: 846.7. Samples: 873750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:22:25,425][00473] Avg episode reward: [(0, '11.333')] [2024-07-27 17:22:30,423][00473] Fps is (10 sec: 4099.2, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3514368. Throughput: 0: 873.7. Samples: 876902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:22:30,425][00473] Avg episode reward: [(0, '12.107')] [2024-07-27 17:22:32,029][02694] Updated weights for policy 0, policy_version 860 (0.0027) [2024-07-27 17:22:35,423][00473] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3530752. Throughput: 0: 907.1. Samples: 882610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:22:35,431][00473] Avg episode reward: [(0, '12.257')] [2024-07-27 17:22:40,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3582.3). Total num frames: 3543040. Throughput: 0: 854.7. Samples: 886608. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:22:40,425][00473] Avg episode reward: [(0, '12.354')] [2024-07-27 17:22:44,545][02694] Updated weights for policy 0, policy_version 870 (0.0019) [2024-07-27 17:22:45,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3563520. Throughput: 0: 852.6. Samples: 889590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:22:45,425][00473] Avg episode reward: [(0, '12.496')] [2024-07-27 17:22:45,428][02681] Saving new best policy, reward=12.496! [2024-07-27 17:22:50,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3584000. Throughput: 0: 906.2. Samples: 895906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:22:50,427][00473] Avg episode reward: [(0, '11.868')] [2024-07-27 17:22:55,430][00473] Fps is (10 sec: 3274.2, 60 sec: 3412.9, 300 sec: 3554.4). Total num frames: 3596288. Throughput: 0: 866.9. Samples: 899950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:22:55,433][00473] Avg episode reward: [(0, '11.785')] [2024-07-27 17:22:57,091][02694] Updated weights for policy 0, policy_version 880 (0.0024) [2024-07-27 17:23:00,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 3616768. Throughput: 0: 855.1. Samples: 902472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:23:00,427][00473] Avg episode reward: [(0, '12.265')] [2024-07-27 17:23:05,423][00473] Fps is (10 sec: 4099.1, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3637248. Throughput: 0: 896.5. Samples: 909118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:23:05,425][00473] Avg episode reward: [(0, '12.613')] [2024-07-27 17:23:05,431][02681] Saving new best policy, reward=12.613! [2024-07-27 17:23:06,484][02694] Updated weights for policy 0, policy_version 890 (0.0023) [2024-07-27 17:23:10,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3653632. Throughput: 0: 901.8. Samples: 914330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:23:10,427][00473] Avg episode reward: [(0, '12.829')] [2024-07-27 17:23:10,439][02681] Saving new best policy, reward=12.829! [2024-07-27 17:23:15,423][00473] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3568.5). Total num frames: 3670016. Throughput: 0: 874.4. Samples: 916248. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:23:15,426][00473] Avg episode reward: [(0, '12.698')] [2024-07-27 17:23:18,983][02694] Updated weights for policy 0, policy_version 900 (0.0032) [2024-07-27 17:23:20,423][00473] Fps is (10 sec: 3686.5, 60 sec: 3618.6, 300 sec: 3568.4). Total num frames: 3690496. Throughput: 0: 880.0. Samples: 922212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:23:20,428][00473] Avg episode reward: [(0, '14.200')] [2024-07-27 17:23:20,438][02681] Saving new best policy, reward=14.200! [2024-07-27 17:23:25,423][00473] Fps is (10 sec: 4096.1, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 3710976. Throughput: 0: 929.1. Samples: 928416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:23:25,424][00473] Avg episode reward: [(0, '14.889')] [2024-07-27 17:23:25,432][02681] Saving new best policy, reward=14.889! [2024-07-27 17:23:30,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 3723264. Throughput: 0: 905.2. Samples: 930324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:23:30,428][00473] Avg episode reward: [(0, '14.785')] [2024-07-27 17:23:31,149][02694] Updated weights for policy 0, policy_version 910 (0.0035) [2024-07-27 17:23:35,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3743744. Throughput: 0: 882.0. Samples: 935596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:23:35,428][00473] Avg episode reward: [(0, '15.575')] [2024-07-27 17:23:35,431][02681] Saving new best policy, reward=15.575! [2024-07-27 17:23:40,383][02694] Updated weights for policy 0, policy_version 920 (0.0029) [2024-07-27 17:23:40,423][00473] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 3768320. Throughput: 0: 941.3. Samples: 942300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:23:40,429][00473] Avg episode reward: [(0, '15.318')] [2024-07-27 17:23:40,440][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000920_3768320.pth... [2024-07-27 17:23:40,574][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000713_2920448.pth [2024-07-27 17:23:45,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3780608. Throughput: 0: 939.3. Samples: 944742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:23:45,428][00473] Avg episode reward: [(0, '15.089')] [2024-07-27 17:23:50,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3801088. Throughput: 0: 888.6. Samples: 949106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:23:50,430][00473] Avg episode reward: [(0, '15.998')] [2024-07-27 17:23:50,437][02681] Saving new best policy, reward=15.998! [2024-07-27 17:23:52,346][02694] Updated weights for policy 0, policy_version 930 (0.0025) [2024-07-27 17:23:55,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3755.2, 300 sec: 3568.4). Total num frames: 3821568. Throughput: 0: 920.0. Samples: 955732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:23:55,432][00473] Avg episode reward: [(0, '16.761')] [2024-07-27 17:23:55,434][02681] Saving new best policy, reward=16.761! [2024-07-27 17:24:00,423][00473] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 3837952. Throughput: 0: 949.5. Samples: 958978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:24:00,429][00473] Avg episode reward: [(0, '16.454')] [2024-07-27 17:24:03,578][02694] Updated weights for policy 0, policy_version 940 (0.0017) [2024-07-27 17:24:05,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 3854336. Throughput: 0: 913.2. Samples: 963308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:24:05,424][00473] Avg episode reward: [(0, '17.265')] [2024-07-27 17:24:05,427][02681] Saving new best policy, reward=17.265! [2024-07-27 17:24:10,423][00473] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3874816. Throughput: 0: 909.4. Samples: 969340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:24:10,427][00473] Avg episode reward: [(0, '15.939')] [2024-07-27 17:24:13,522][02694] Updated weights for policy 0, policy_version 950 (0.0026) [2024-07-27 17:24:15,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3568.4). Total num frames: 3895296. Throughput: 0: 943.4. Samples: 972778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:24:15,425][00473] Avg episode reward: [(0, '15.842')] [2024-07-27 17:24:20,423][00473] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3911680. Throughput: 0: 936.5. Samples: 977738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:24:20,425][00473] Avg episode reward: [(0, '15.613')] [2024-07-27 17:24:25,423][00473] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3928064. Throughput: 0: 901.6. Samples: 982870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:24:25,429][00473] Avg episode reward: [(0, '17.186')] [2024-07-27 17:24:25,655][02694] Updated weights for policy 0, policy_version 960 (0.0017) [2024-07-27 17:24:30,423][00473] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3582.3). Total num frames: 3952640. Throughput: 0: 921.1. Samples: 986190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:24:30,430][00473] Avg episode reward: [(0, '17.618')] [2024-07-27 17:24:30,440][02681] Saving new best policy, reward=17.618! [2024-07-27 17:24:35,426][00473] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 3568.3). Total num frames: 3969024. Throughput: 0: 963.0. Samples: 992444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:24:35,428][00473] Avg episode reward: [(0, '17.326')] [2024-07-27 17:24:35,975][02694] Updated weights for policy 0, policy_version 970 (0.0026) [2024-07-27 17:24:40,423][00473] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3981312. Throughput: 0: 905.1. Samples: 996460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:24:40,425][00473] Avg episode reward: [(0, '17.323')] [2024-07-27 17:24:45,325][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-27 17:24:45,326][00473] Component Batcher_0 stopped! [2024-07-27 17:24:45,331][02681] Stopping Batcher_0... [2024-07-27 17:24:45,332][02681] Loop batcher_evt_loop terminating... [2024-07-27 17:24:45,390][00473] Component RolloutWorker_w4 stopped! [2024-07-27 17:24:45,396][02702] Stopping RolloutWorker_w4... [2024-07-27 17:24:45,400][02703] Stopping RolloutWorker_w3... [2024-07-27 17:24:45,400][00473] Component RolloutWorker_w3 stopped! [2024-07-27 17:24:45,407][02703] Loop rollout_proc3_evt_loop terminating... [2024-07-27 17:24:45,412][02696] Stopping RolloutWorker_w1... [2024-07-27 17:24:45,412][00473] Component RolloutWorker_w1 stopped! [2024-07-27 17:24:45,397][02702] Loop rollout_proc4_evt_loop terminating... [2024-07-27 17:24:45,420][02706] Stopping RolloutWorker_w7... [2024-07-27 17:24:45,423][02696] Loop rollout_proc1_evt_loop terminating... [2024-07-27 17:24:45,421][00473] Component RolloutWorker_w7 stopped! [2024-07-27 17:24:45,428][02704] Stopping RolloutWorker_w5... [2024-07-27 17:24:45,428][00473] Component RolloutWorker_w5 stopped! [2024-07-27 17:24:45,426][02706] Loop rollout_proc7_evt_loop terminating... [2024-07-27 17:24:45,436][02694] Weights refcount: 2 0 [2024-07-27 17:24:45,429][02704] Loop rollout_proc5_evt_loop terminating... [2024-07-27 17:24:45,449][00473] Component RolloutWorker_w0 stopped! [2024-07-27 17:24:45,449][02695] Stopping RolloutWorker_w0... [2024-07-27 17:24:45,458][00473] Component InferenceWorker_p0-w0 stopped! [2024-07-27 17:24:45,458][02694] Stopping InferenceWorker_p0-w0... [2024-07-27 17:24:45,466][00473] Component RolloutWorker_w2 stopped! [2024-07-27 17:24:45,466][02701] Stopping RolloutWorker_w2... [2024-07-27 17:24:45,464][02694] Loop inference_proc0-0_evt_loop terminating... [2024-07-27 17:24:45,456][02695] Loop rollout_proc0_evt_loop terminating... [2024-07-27 17:24:45,474][02701] Loop rollout_proc2_evt_loop terminating... [2024-07-27 17:24:45,499][02681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth [2024-07-27 17:24:45,500][02705] Stopping RolloutWorker_w6... [2024-07-27 17:24:45,501][00473] Component RolloutWorker_w6 stopped! [2024-07-27 17:24:45,506][02705] Loop rollout_proc6_evt_loop terminating... [2024-07-27 17:24:45,517][02681] Saving new best policy, reward=17.859! [2024-07-27 17:24:45,714][02681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-27 17:24:45,905][00473] Component LearnerWorker_p0 stopped! [2024-07-27 17:24:45,905][02681] Stopping LearnerWorker_p0... [2024-07-27 17:24:45,908][02681] Loop learner_proc0_evt_loop terminating... [2024-07-27 17:24:45,908][00473] Waiting for process learner_proc0 to stop... [2024-07-27 17:24:47,320][00473] Waiting for process inference_proc0-0 to join... [2024-07-27 17:24:47,326][00473] Waiting for process rollout_proc0 to join... [2024-07-27 17:24:49,129][00473] Waiting for process rollout_proc1 to join... [2024-07-27 17:24:49,138][00473] Waiting for process rollout_proc2 to join... [2024-07-27 17:24:49,143][00473] Waiting for process rollout_proc3 to join... [2024-07-27 17:24:49,147][00473] Waiting for process rollout_proc4 to join... [2024-07-27 17:24:49,151][00473] Waiting for process rollout_proc5 to join... [2024-07-27 17:24:49,156][00473] Waiting for process rollout_proc6 to join... [2024-07-27 17:24:49,159][00473] Waiting for process rollout_proc7 to join... [2024-07-27 17:24:49,162][00473] Batcher 0 profile tree view: batching: 27.1624, releasing_batches: 0.0312 [2024-07-27 17:24:49,165][00473] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 447.1244 update_model: 9.3599 weight_update: 0.0025 one_step: 0.0111 handle_policy_step: 617.3348 deserialize: 16.1793, stack: 3.3750, obs_to_device_normalize: 126.9088, forward: 327.2743, send_messages: 29.2454 prepare_outputs: 84.0883 to_cpu: 48.5907 [2024-07-27 17:24:49,166][00473] Learner 0 profile tree view: misc: 0.0071, prepare_batch: 13.6679 train: 74.3020 epoch_init: 0.0105, minibatch_init: 0.0078, losses_postprocess: 0.6546, kl_divergence: 0.7281, after_optimizer: 34.1525 calculate_losses: 27.0153 losses_init: 0.0107, forward_head: 1.3915, bptt_initial: 17.8882, tail: 1.0502, advantages_returns: 0.3568, losses: 3.9737 bptt: 2.0836 bptt_forward_core: 1.9663 update: 11.0577 clip: 0.9391 [2024-07-27 17:24:49,169][00473] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3453, enqueue_policy_requests: 117.0330, env_step: 869.8972, overhead: 15.2483, complete_rollouts: 7.5061 save_policy_outputs: 21.3860 split_output_tensors: 8.5389 [2024-07-27 17:24:49,170][00473] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3175, enqueue_policy_requests: 119.3852, env_step: 863.6148, overhead: 15.2210, complete_rollouts: 7.2825 save_policy_outputs: 20.0485 split_output_tensors: 8.2469 [2024-07-27 17:24:49,172][00473] Loop Runner_EvtLoop terminating... [2024-07-27 17:24:49,173][00473] Runner profile tree view: main_loop: 1143.8343 [2024-07-27 17:24:49,174][00473] Collected {0: 4005888}, FPS: 3502.2 [2024-07-27 17:24:58,502][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:24:58,505][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:24:58,507][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:24:58,509][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:24:58,511][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:24:58,514][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:24:58,515][00473] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:24:58,517][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:24:58,519][00473] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-27 17:24:58,520][00473] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-27 17:24:58,525][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:24:58,526][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:24:58,530][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:24:58,531][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:24:58,532][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:24:58,563][00473] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:24:58,567][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:24:58,568][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:24:58,586][00473] ConvEncoder: input_channels=3 [2024-07-27 17:24:58,699][00473] Conv encoder output size: 512 [2024-07-27 17:24:58,700][00473] Policy head output size: 512 [2024-07-27 17:24:58,883][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-27 17:24:59,636][00473] Num frames 100... [2024-07-27 17:24:59,782][00473] Num frames 200... [2024-07-27 17:24:59,908][00473] Num frames 300... [2024-07-27 17:25:00,038][00473] Num frames 400... [2024-07-27 17:25:00,165][00473] Num frames 500... [2024-07-27 17:25:00,299][00473] Num frames 600... [2024-07-27 17:25:00,428][00473] Num frames 700... [2024-07-27 17:25:00,574][00473] Avg episode rewards: #0: 14.680, true rewards: #0: 7.680 [2024-07-27 17:25:00,576][00473] Avg episode reward: 14.680, avg true_objective: 7.680 [2024-07-27 17:25:00,625][00473] Num frames 800... [2024-07-27 17:25:00,768][00473] Num frames 900... [2024-07-27 17:25:00,896][00473] Num frames 1000... [2024-07-27 17:25:01,025][00473] Num frames 1100... [2024-07-27 17:25:01,151][00473] Num frames 1200... [2024-07-27 17:25:01,228][00473] Avg episode rewards: #0: 10.080, true rewards: #0: 6.080 [2024-07-27 17:25:01,229][00473] Avg episode reward: 10.080, avg true_objective: 6.080 [2024-07-27 17:25:01,336][00473] Num frames 1300... [2024-07-27 17:25:01,472][00473] Num frames 1400... [2024-07-27 17:25:01,601][00473] Num frames 1500... [2024-07-27 17:25:01,753][00473] Num frames 1600... [2024-07-27 17:25:01,937][00473] Num frames 1700... [2024-07-27 17:25:02,125][00473] Num frames 1800... [2024-07-27 17:25:02,227][00473] Avg episode rewards: #0: 10.747, true rewards: #0: 6.080 [2024-07-27 17:25:02,229][00473] Avg episode reward: 10.747, avg true_objective: 6.080 [2024-07-27 17:25:02,366][00473] Num frames 1900... [2024-07-27 17:25:02,546][00473] Num frames 2000... [2024-07-27 17:25:02,733][00473] Num frames 2100... [2024-07-27 17:25:02,933][00473] Num frames 2200... [2024-07-27 17:25:03,115][00473] Num frames 2300... [2024-07-27 17:25:03,301][00473] Num frames 2400... [2024-07-27 17:25:03,496][00473] Num frames 2500... [2024-07-27 17:25:03,713][00473] Num frames 2600... [2024-07-27 17:25:03,912][00473] Num frames 2700... [2024-07-27 17:25:04,075][00473] Avg episode rewards: #0: 13.380, true rewards: #0: 6.880 [2024-07-27 17:25:04,078][00473] Avg episode reward: 13.380, avg true_objective: 6.880 [2024-07-27 17:25:04,147][00473] Num frames 2800... [2024-07-27 17:25:04,274][00473] Num frames 2900... [2024-07-27 17:25:04,403][00473] Num frames 3000... [2024-07-27 17:25:04,532][00473] Num frames 3100... [2024-07-27 17:25:04,659][00473] Num frames 3200... [2024-07-27 17:25:04,802][00473] Num frames 3300... [2024-07-27 17:25:04,943][00473] Num frames 3400... [2024-07-27 17:25:05,073][00473] Num frames 3500... [2024-07-27 17:25:05,207][00473] Num frames 3600... [2024-07-27 17:25:05,335][00473] Num frames 3700... [2024-07-27 17:25:05,466][00473] Num frames 3800... [2024-07-27 17:25:05,536][00473] Avg episode rewards: #0: 14.818, true rewards: #0: 7.618 [2024-07-27 17:25:05,537][00473] Avg episode reward: 14.818, avg true_objective: 7.618 [2024-07-27 17:25:05,659][00473] Num frames 3900... [2024-07-27 17:25:05,794][00473] Num frames 4000... [2024-07-27 17:25:05,927][00473] Num frames 4100... [2024-07-27 17:25:06,063][00473] Num frames 4200... [2024-07-27 17:25:06,190][00473] Num frames 4300... [2024-07-27 17:25:06,314][00473] Avg episode rewards: #0: 13.588, true rewards: #0: 7.255 [2024-07-27 17:25:06,316][00473] Avg episode reward: 13.588, avg true_objective: 7.255 [2024-07-27 17:25:06,377][00473] Num frames 4400... [2024-07-27 17:25:06,504][00473] Num frames 4500... [2024-07-27 17:25:06,631][00473] Num frames 4600... [2024-07-27 17:25:06,767][00473] Num frames 4700... [2024-07-27 17:25:06,893][00473] Num frames 4800... [2024-07-27 17:25:07,031][00473] Num frames 4900... [2024-07-27 17:25:07,163][00473] Num frames 5000... [2024-07-27 17:25:07,287][00473] Num frames 5100... [2024-07-27 17:25:07,413][00473] Num frames 5200... [2024-07-27 17:25:07,547][00473] Avg episode rewards: #0: 14.230, true rewards: #0: 7.516 [2024-07-27 17:25:07,549][00473] Avg episode reward: 14.230, avg true_objective: 7.516 [2024-07-27 17:25:07,603][00473] Num frames 5300... [2024-07-27 17:25:07,739][00473] Num frames 5400... [2024-07-27 17:25:07,862][00473] Num frames 5500... [2024-07-27 17:25:08,002][00473] Num frames 5600... [2024-07-27 17:25:08,129][00473] Num frames 5700... [2024-07-27 17:25:08,255][00473] Num frames 5800... [2024-07-27 17:25:08,357][00473] Avg episode rewards: #0: 13.421, true rewards: #0: 7.296 [2024-07-27 17:25:08,359][00473] Avg episode reward: 13.421, avg true_objective: 7.296 [2024-07-27 17:25:08,445][00473] Num frames 5900... [2024-07-27 17:25:08,574][00473] Num frames 6000... [2024-07-27 17:25:08,707][00473] Num frames 6100... [2024-07-27 17:25:08,842][00473] Num frames 6200... [2024-07-27 17:25:08,955][00473] Avg episode rewards: #0: 13.160, true rewards: #0: 6.938 [2024-07-27 17:25:08,956][00473] Avg episode reward: 13.160, avg true_objective: 6.938 [2024-07-27 17:25:09,041][00473] Num frames 6300... [2024-07-27 17:25:09,173][00473] Num frames 6400... [2024-07-27 17:25:09,300][00473] Num frames 6500... [2024-07-27 17:25:09,426][00473] Num frames 6600... [2024-07-27 17:25:09,559][00473] Num frames 6700... [2024-07-27 17:25:09,685][00473] Num frames 6800... [2024-07-27 17:25:09,819][00473] Num frames 6900... [2024-07-27 17:25:09,974][00473] Avg episode rewards: #0: 13.380, true rewards: #0: 6.980 [2024-07-27 17:25:09,976][00473] Avg episode reward: 13.380, avg true_objective: 6.980 [2024-07-27 17:25:51,842][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:27:58,334][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:27:58,336][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:27:58,338][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:27:58,341][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:27:58,343][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:27:58,344][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:27:58,346][00473] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-27 17:27:58,350][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:27:58,352][00473] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-27 17:27:58,354][00473] Adding new argument 'hf_repository'='rishisim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-27 17:27:58,355][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:27:58,356][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:27:58,357][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:27:58,358][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:27:58,359][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:27:58,396][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:27:58,400][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:27:58,424][00473] ConvEncoder: input_channels=3 [2024-07-27 17:27:58,537][00473] Conv encoder output size: 512 [2024-07-27 17:27:58,541][00473] Policy head output size: 512 [2024-07-27 17:27:58,597][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-27 17:27:59,826][00473] Num frames 100... [2024-07-27 17:28:00,015][00473] Num frames 200... [2024-07-27 17:28:00,202][00473] Num frames 300... [2024-07-27 17:28:00,398][00473] Num frames 400... [2024-07-27 17:28:00,587][00473] Num frames 500... [2024-07-27 17:28:00,776][00473] Num frames 600... [2024-07-27 17:28:00,850][00473] Avg episode rewards: #0: 10.080, true rewards: #0: 6.080 [2024-07-27 17:28:00,852][00473] Avg episode reward: 10.080, avg true_objective: 6.080 [2024-07-27 17:28:01,041][00473] Num frames 700... [2024-07-27 17:28:01,193][00473] Num frames 800... [2024-07-27 17:28:01,321][00473] Num frames 900... [2024-07-27 17:28:01,448][00473] Num frames 1000... [2024-07-27 17:28:01,578][00473] Avg episode rewards: #0: 7.780, true rewards: #0: 5.280 [2024-07-27 17:28:01,580][00473] Avg episode reward: 7.780, avg true_objective: 5.280 [2024-07-27 17:28:01,646][00473] Num frames 1100... [2024-07-27 17:28:01,795][00473] Num frames 1200... [2024-07-27 17:28:01,926][00473] Num frames 1300... [2024-07-27 17:28:02,069][00473] Num frames 1400... [2024-07-27 17:28:02,201][00473] Num frames 1500... [2024-07-27 17:28:02,335][00473] Num frames 1600... [2024-07-27 17:28:02,467][00473] Num frames 1700... [2024-07-27 17:28:02,591][00473] Num frames 1800... [2024-07-27 17:28:02,729][00473] Num frames 1900... [2024-07-27 17:28:02,864][00473] Num frames 2000... [2024-07-27 17:28:02,998][00473] Num frames 2100... [2024-07-27 17:28:03,110][00473] Avg episode rewards: #0: 12.480, true rewards: #0: 7.147 [2024-07-27 17:28:03,112][00473] Avg episode reward: 12.480, avg true_objective: 7.147 [2024-07-27 17:28:03,186][00473] Num frames 2200... [2024-07-27 17:28:03,318][00473] Num frames 2300... [2024-07-27 17:28:03,448][00473] Num frames 2400... [2024-07-27 17:28:03,575][00473] Num frames 2500... [2024-07-27 17:28:03,709][00473] Num frames 2600... [2024-07-27 17:28:03,842][00473] Num frames 2700... [2024-07-27 17:28:03,963][00473] Avg episode rewards: #0: 12.380, true rewards: #0: 6.880 [2024-07-27 17:28:03,965][00473] Avg episode reward: 12.380, avg true_objective: 6.880 [2024-07-27 17:28:04,039][00473] Num frames 2800... [2024-07-27 17:28:04,167][00473] Num frames 2900... [2024-07-27 17:28:04,296][00473] Num frames 3000... [2024-07-27 17:28:04,429][00473] Num frames 3100... [2024-07-27 17:28:04,556][00473] Num frames 3200... [2024-07-27 17:28:04,685][00473] Num frames 3300... [2024-07-27 17:28:04,824][00473] Num frames 3400... [2024-07-27 17:28:04,947][00473] Avg episode rewards: #0: 12.306, true rewards: #0: 6.906 [2024-07-27 17:28:04,948][00473] Avg episode reward: 12.306, avg true_objective: 6.906 [2024-07-27 17:28:05,009][00473] Num frames 3500... [2024-07-27 17:28:05,152][00473] Num frames 3600... [2024-07-27 17:28:05,280][00473] Num frames 3700... [2024-07-27 17:28:05,409][00473] Num frames 3800... [2024-07-27 17:28:05,544][00473] Num frames 3900... [2024-07-27 17:28:05,674][00473] Num frames 4000... [2024-07-27 17:28:05,773][00473] Avg episode rewards: #0: 11.715, true rewards: #0: 6.715 [2024-07-27 17:28:05,775][00473] Avg episode reward: 11.715, avg true_objective: 6.715 [2024-07-27 17:28:05,870][00473] Num frames 4100... [2024-07-27 17:28:06,000][00473] Num frames 4200... [2024-07-27 17:28:06,137][00473] Num frames 4300... [2024-07-27 17:28:06,270][00473] Num frames 4400... [2024-07-27 17:28:06,400][00473] Num frames 4500... [2024-07-27 17:28:06,527][00473] Num frames 4600... [2024-07-27 17:28:06,659][00473] Num frames 4700... [2024-07-27 17:28:06,798][00473] Num frames 4800... [2024-07-27 17:28:06,890][00473] Avg episode rewards: #0: 12.042, true rewards: #0: 6.899 [2024-07-27 17:28:06,892][00473] Avg episode reward: 12.042, avg true_objective: 6.899 [2024-07-27 17:28:06,982][00473] Num frames 4900... [2024-07-27 17:28:07,118][00473] Num frames 5000... [2024-07-27 17:28:07,246][00473] Num frames 5100... [2024-07-27 17:28:07,386][00473] Num frames 5200... [2024-07-27 17:28:07,517][00473] Num frames 5300... [2024-07-27 17:28:07,645][00473] Num frames 5400... [2024-07-27 17:28:07,778][00473] Num frames 5500... [2024-07-27 17:28:07,838][00473] Avg episode rewards: #0: 12.003, true rewards: #0: 6.877 [2024-07-27 17:28:07,840][00473] Avg episode reward: 12.003, avg true_objective: 6.877 [2024-07-27 17:28:07,964][00473] Num frames 5600... [2024-07-27 17:28:08,093][00473] Num frames 5700... [2024-07-27 17:28:08,227][00473] Num frames 5800... [2024-07-27 17:28:08,355][00473] Num frames 5900... [2024-07-27 17:28:08,484][00473] Num frames 6000... [2024-07-27 17:28:08,622][00473] Num frames 6100... [2024-07-27 17:28:08,760][00473] Num frames 6200... [2024-07-27 17:28:08,891][00473] Num frames 6300... [2024-07-27 17:28:09,076][00473] Avg episode rewards: #0: 12.442, true rewards: #0: 7.109 [2024-07-27 17:28:09,077][00473] Avg episode reward: 12.442, avg true_objective: 7.109 [2024-07-27 17:28:09,083][00473] Num frames 6400... [2024-07-27 17:28:09,232][00473] Num frames 6500... [2024-07-27 17:28:09,361][00473] Num frames 6600... [2024-07-27 17:28:09,490][00473] Num frames 6700... [2024-07-27 17:28:09,617][00473] Num frames 6800... [2024-07-27 17:28:09,732][00473] Avg episode rewards: #0: 11.746, true rewards: #0: 6.846 [2024-07-27 17:28:09,733][00473] Avg episode reward: 11.746, avg true_objective: 6.846 [2024-07-27 17:28:51,831][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:29:03,865][00473] The model has been pushed to https://huggingface.co/rishisim/rl_course_vizdoom_health_gathering_supreme [2024-07-27 17:30:06,616][00473] Environment doom_basic already registered, overwriting... [2024-07-27 17:30:06,619][00473] Environment doom_two_colors_easy already registered, overwriting... [2024-07-27 17:30:06,621][00473] Environment doom_two_colors_hard already registered, overwriting... [2024-07-27 17:30:06,624][00473] Environment doom_dm already registered, overwriting... [2024-07-27 17:30:06,626][00473] Environment doom_dwango5 already registered, overwriting... [2024-07-27 17:30:06,627][00473] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-07-27 17:30:06,629][00473] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-07-27 17:30:06,630][00473] Environment doom_my_way_home already registered, overwriting... [2024-07-27 17:30:06,633][00473] Environment doom_deadly_corridor already registered, overwriting... [2024-07-27 17:30:06,636][00473] Environment doom_defend_the_center already registered, overwriting... [2024-07-27 17:30:06,639][00473] Environment doom_defend_the_line already registered, overwriting... [2024-07-27 17:30:06,640][00473] Environment doom_health_gathering already registered, overwriting... [2024-07-27 17:30:06,641][00473] Environment doom_health_gathering_supreme already registered, overwriting... [2024-07-27 17:30:06,643][00473] Environment doom_battle already registered, overwriting... [2024-07-27 17:30:06,645][00473] Environment doom_battle2 already registered, overwriting... [2024-07-27 17:30:06,647][00473] Environment doom_duel_bots already registered, overwriting... [2024-07-27 17:30:06,648][00473] Environment doom_deathmatch_bots already registered, overwriting... [2024-07-27 17:30:06,650][00473] Environment doom_duel already registered, overwriting... [2024-07-27 17:30:06,652][00473] Environment doom_deathmatch_full already registered, overwriting... [2024-07-27 17:30:06,653][00473] Environment doom_benchmark already registered, overwriting... [2024-07-27 17:30:06,655][00473] register_encoder_factory: [2024-07-27 17:30:06,673][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:30:06,674][00473] Overriding arg 'train_for_env_steps' with value 4005000 passed from command line [2024-07-27 17:30:06,682][00473] Experiment dir /content/train_dir/default_experiment already exists! [2024-07-27 17:30:06,684][00473] Resuming existing experiment from /content/train_dir/default_experiment... [2024-07-27 17:30:06,685][00473] Weights and Biases integration disabled [2024-07-27 17:30:06,690][00473] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-07-27 17:30:08,714][00473] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4005000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-07-27 17:30:08,716][00473] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-27 17:30:08,720][00473] Rollout worker 0 uses device cpu [2024-07-27 17:30:08,723][00473] Rollout worker 1 uses device cpu [2024-07-27 17:30:08,724][00473] Rollout worker 2 uses device cpu [2024-07-27 17:30:08,725][00473] Rollout worker 3 uses device cpu [2024-07-27 17:30:08,726][00473] Rollout worker 4 uses device cpu [2024-07-27 17:30:08,727][00473] Rollout worker 5 uses device cpu [2024-07-27 17:30:08,729][00473] Rollout worker 6 uses device cpu [2024-07-27 17:30:08,730][00473] Rollout worker 7 uses device cpu [2024-07-27 17:30:08,803][00473] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:30:08,806][00473] InferenceWorker_p0-w0: min num requests: 2 [2024-07-27 17:30:08,840][00473] Starting all processes... [2024-07-27 17:30:08,842][00473] Starting process learner_proc0 [2024-07-27 17:30:08,890][00473] Starting all processes... [2024-07-27 17:30:08,895][00473] Starting process inference_proc0-0 [2024-07-27 17:30:08,896][00473] Starting process rollout_proc0 [2024-07-27 17:30:08,897][00473] Starting process rollout_proc1 [2024-07-27 17:30:08,919][00473] Starting process rollout_proc2 [2024-07-27 17:30:08,920][00473] Starting process rollout_proc3 [2024-07-27 17:30:08,924][00473] Starting process rollout_proc4 [2024-07-27 17:30:08,924][00473] Starting process rollout_proc5 [2024-07-27 17:30:08,924][00473] Starting process rollout_proc6 [2024-07-27 17:30:08,924][00473] Starting process rollout_proc7 [2024-07-27 17:30:23,411][12669] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:30:23,418][12669] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-27 17:30:23,483][12669] Num visible devices: 1 [2024-07-27 17:30:23,532][12669] Starting seed is not provided [2024-07-27 17:30:23,533][12669] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:30:23,534][12669] Initializing actor-critic model on device cuda:0 [2024-07-27 17:30:23,535][12669] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:30:23,537][12669] RunningMeanStd input shape: (1,) [2024-07-27 17:30:23,627][12669] ConvEncoder: input_channels=3 [2024-07-27 17:30:23,818][12683] Worker 0 uses CPU cores [0] [2024-07-27 17:30:23,823][12684] Worker 1 uses CPU cores [1] [2024-07-27 17:30:23,898][12685] Worker 2 uses CPU cores [0] [2024-07-27 17:30:23,911][12686] Worker 3 uses CPU cores [1] [2024-07-27 17:30:23,960][12689] Worker 7 uses CPU cores [1] [2024-07-27 17:30:24,003][12688] Worker 5 uses CPU cores [1] [2024-07-27 17:30:24,092][12682] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:30:24,093][12682] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-27 17:30:24,124][12687] Worker 4 uses CPU cores [0] [2024-07-27 17:30:24,136][12682] Num visible devices: 1 [2024-07-27 17:30:24,144][12690] Worker 6 uses CPU cores [0] [2024-07-27 17:30:24,164][12669] Conv encoder output size: 512 [2024-07-27 17:30:24,164][12669] Policy head output size: 512 [2024-07-27 17:30:24,179][12669] Created Actor Critic model with architecture: [2024-07-27 17:30:24,179][12669] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-27 17:30:24,309][12669] Using optimizer [2024-07-27 17:30:25,221][12669] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-07-27 17:30:25,265][12669] Loading model from checkpoint [2024-07-27 17:30:25,268][12669] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-07-27 17:30:25,269][12669] Initialized policy 0 weights for model version 978 [2024-07-27 17:30:25,272][12669] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:30:25,281][12669] LearnerWorker_p0 finished initialization! [2024-07-27 17:30:25,458][12682] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:30:25,460][12682] RunningMeanStd input shape: (1,) [2024-07-27 17:30:25,479][12682] ConvEncoder: input_channels=3 [2024-07-27 17:30:25,639][12682] Conv encoder output size: 512 [2024-07-27 17:30:25,640][12682] Policy head output size: 512 [2024-07-27 17:30:25,718][00473] Inference worker 0-0 is ready! [2024-07-27 17:30:25,720][00473] All inference workers are ready! Signal rollout workers to start! [2024-07-27 17:30:26,051][12689] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,072][12688] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,081][12684] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,127][12686] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,170][12690] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,225][12687] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,228][12683] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,260][12685] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:30:26,691][00473] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:30:28,217][12690] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,221][12689] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,234][12683] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,235][12688] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,272][12684] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,328][12686] Decorrelating experience for 0 frames... [2024-07-27 17:30:28,795][00473] Heartbeat connected on Batcher_0 [2024-07-27 17:30:28,801][00473] Heartbeat connected on LearnerWorker_p0 [2024-07-27 17:30:28,832][00473] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-27 17:30:29,500][12689] Decorrelating experience for 32 frames... [2024-07-27 17:30:29,502][12688] Decorrelating experience for 32 frames... [2024-07-27 17:30:30,113][12685] Decorrelating experience for 0 frames... [2024-07-27 17:30:30,135][12683] Decorrelating experience for 32 frames... [2024-07-27 17:30:30,230][12690] Decorrelating experience for 32 frames... [2024-07-27 17:30:30,251][12687] Decorrelating experience for 0 frames... [2024-07-27 17:30:30,891][12686] Decorrelating experience for 32 frames... [2024-07-27 17:30:31,287][12689] Decorrelating experience for 64 frames... [2024-07-27 17:30:31,365][12685] Decorrelating experience for 32 frames... [2024-07-27 17:30:31,433][12687] Decorrelating experience for 32 frames... [2024-07-27 17:30:31,589][12688] Decorrelating experience for 64 frames... [2024-07-27 17:30:31,690][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:30:31,888][12683] Decorrelating experience for 64 frames... [2024-07-27 17:30:32,695][12685] Decorrelating experience for 64 frames... [2024-07-27 17:30:32,778][12683] Decorrelating experience for 96 frames... [2024-07-27 17:30:32,805][12686] Decorrelating experience for 64 frames... [2024-07-27 17:30:32,920][00473] Heartbeat connected on RolloutWorker_w0 [2024-07-27 17:30:32,984][12689] Decorrelating experience for 96 frames... [2024-07-27 17:30:33,019][12684] Decorrelating experience for 32 frames... [2024-07-27 17:30:33,305][00473] Heartbeat connected on RolloutWorker_w7 [2024-07-27 17:30:33,372][12688] Decorrelating experience for 96 frames... [2024-07-27 17:30:33,539][00473] Heartbeat connected on RolloutWorker_w5 [2024-07-27 17:30:34,480][12690] Decorrelating experience for 64 frames... [2024-07-27 17:30:34,581][12684] Decorrelating experience for 64 frames... [2024-07-27 17:30:35,103][12687] Decorrelating experience for 64 frames... [2024-07-27 17:30:36,183][12690] Decorrelating experience for 96 frames... [2024-07-27 17:30:36,690][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 54.4. Samples: 544. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:30:36,694][00473] Avg episode reward: [(0, '4.500')] [2024-07-27 17:30:36,883][00473] Heartbeat connected on RolloutWorker_w6 [2024-07-27 17:30:37,686][12685] Decorrelating experience for 96 frames... [2024-07-27 17:30:38,124][00473] Heartbeat connected on RolloutWorker_w2 [2024-07-27 17:30:38,851][12669] Signal inference workers to stop experience collection... [2024-07-27 17:30:38,872][12682] InferenceWorker_p0-w0: stopping experience collection [2024-07-27 17:30:38,997][12687] Decorrelating experience for 96 frames... [2024-07-27 17:30:39,154][00473] Heartbeat connected on RolloutWorker_w4 [2024-07-27 17:30:39,620][12686] Decorrelating experience for 96 frames... [2024-07-27 17:30:39,773][00473] Heartbeat connected on RolloutWorker_w3 [2024-07-27 17:30:39,855][12684] Decorrelating experience for 96 frames... [2024-07-27 17:30:39,950][00473] Heartbeat connected on RolloutWorker_w1 [2024-07-27 17:30:40,565][12669] Signal inference workers to resume experience collection... [2024-07-27 17:30:40,566][12682] InferenceWorker_p0-w0: resuming experience collection [2024-07-27 17:30:40,567][12669] Stopping Batcher_0... [2024-07-27 17:30:40,568][12669] Loop batcher_evt_loop terminating... [2024-07-27 17:30:40,568][00473] Component Batcher_0 stopped! [2024-07-27 17:30:40,616][00473] Component RolloutWorker_w5 stopped! [2024-07-27 17:30:40,621][12688] Stopping RolloutWorker_w5... [2024-07-27 17:30:40,621][12688] Loop rollout_proc5_evt_loop terminating... [2024-07-27 17:30:40,627][00473] Component RolloutWorker_w1 stopped! [2024-07-27 17:30:40,631][12684] Stopping RolloutWorker_w1... [2024-07-27 17:30:40,631][12684] Loop rollout_proc1_evt_loop terminating... [2024-07-27 17:30:40,664][00473] Component RolloutWorker_w7 stopped! [2024-07-27 17:30:40,668][12689] Stopping RolloutWorker_w7... [2024-07-27 17:30:40,669][12687] Stopping RolloutWorker_w4... [2024-07-27 17:30:40,669][12689] Loop rollout_proc7_evt_loop terminating... [2024-07-27 17:30:40,660][12682] Weights refcount: 2 0 [2024-07-27 17:30:40,673][12687] Loop rollout_proc4_evt_loop terminating... [2024-07-27 17:30:40,672][00473] Component RolloutWorker_w4 stopped! [2024-07-27 17:30:40,683][12686] Stopping RolloutWorker_w3... [2024-07-27 17:30:40,682][12682] Stopping InferenceWorker_p0-w0... [2024-07-27 17:30:40,687][12682] Loop inference_proc0-0_evt_loop terminating... [2024-07-27 17:30:40,688][00473] Component InferenceWorker_p0-w0 stopped! [2024-07-27 17:30:40,690][00473] Component RolloutWorker_w3 stopped! [2024-07-27 17:30:40,691][12683] Stopping RolloutWorker_w0... [2024-07-27 17:30:40,700][12683] Loop rollout_proc0_evt_loop terminating... [2024-07-27 17:30:40,684][12686] Loop rollout_proc3_evt_loop terminating... [2024-07-27 17:30:40,699][00473] Component RolloutWorker_w0 stopped! [2024-07-27 17:30:40,728][12690] Stopping RolloutWorker_w6... [2024-07-27 17:30:40,728][12690] Loop rollout_proc6_evt_loop terminating... [2024-07-27 17:30:40,727][00473] Component RolloutWorker_w6 stopped! [2024-07-27 17:30:40,743][00473] Component RolloutWorker_w2 stopped! [2024-07-27 17:30:40,744][12685] Stopping RolloutWorker_w2... [2024-07-27 17:30:40,749][12685] Loop rollout_proc2_evt_loop terminating... [2024-07-27 17:30:42,375][12669] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:30:42,523][12669] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000920_3768320.pth [2024-07-27 17:30:42,541][12669] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:30:42,839][12669] Stopping LearnerWorker_p0... [2024-07-27 17:30:42,840][00473] Component LearnerWorker_p0 stopped! [2024-07-27 17:30:42,842][00473] Waiting for process learner_proc0 to stop... [2024-07-27 17:30:42,850][12669] Loop learner_proc0_evt_loop terminating... [2024-07-27 17:30:44,880][00473] Waiting for process inference_proc0-0 to join... [2024-07-27 17:30:44,882][00473] Waiting for process rollout_proc0 to join... [2024-07-27 17:30:46,377][00473] Waiting for process rollout_proc1 to join... [2024-07-27 17:30:46,382][00473] Waiting for process rollout_proc2 to join... [2024-07-27 17:30:46,386][00473] Waiting for process rollout_proc3 to join... [2024-07-27 17:30:46,388][00473] Waiting for process rollout_proc4 to join... [2024-07-27 17:30:46,393][00473] Waiting for process rollout_proc5 to join... [2024-07-27 17:30:46,396][00473] Waiting for process rollout_proc6 to join... [2024-07-27 17:30:46,399][00473] Waiting for process rollout_proc7 to join... [2024-07-27 17:30:46,403][00473] Batcher 0 profile tree view: batching: 0.8286, releasing_batches: 0.0004 [2024-07-27 17:30:46,404][00473] InferenceWorker_p0-w0 profile tree view: update_model: 0.0262 wait_policy: 0.0000 wait_policy_total: 9.8374 one_step: 0.0100 handle_policy_step: 3.0564 deserialize: 0.0610, stack: 0.0112, obs_to_device_normalize: 0.6159, forward: 1.9583, send_messages: 0.0789 prepare_outputs: 0.2438 to_cpu: 0.1458 [2024-07-27 17:30:46,407][00473] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 3.1277 train: 3.3533 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0253, after_optimizer: 0.0913 calculate_losses: 1.7805 losses_init: 0.0000, forward_head: 0.3276, bptt_initial: 1.2620, tail: 0.1100, advantages_returns: 0.0033, losses: 0.0663 bptt: 0.0046 bptt_forward_core: 0.0044 update: 1.4546 clip: 0.0720 [2024-07-27 17:30:46,410][00473] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0015, enqueue_policy_requests: 0.8496, env_step: 3.4516, overhead: 0.0908, complete_rollouts: 0.0138 save_policy_outputs: 0.0975 split_output_tensors: 0.0465 [2024-07-27 17:30:46,411][00473] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0019, enqueue_policy_requests: 1.4317, env_step: 3.1135, overhead: 0.1307, complete_rollouts: 0.0017 save_policy_outputs: 0.1117 split_output_tensors: 0.0469 [2024-07-27 17:30:46,414][00473] Loop Runner_EvtLoop terminating... [2024-07-27 17:30:46,415][00473] Runner profile tree view: main_loop: 37.5755 [2024-07-27 17:30:46,419][00473] Collected {0: 4014080}, FPS: 218.0 [2024-07-27 17:30:54,683][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:30:54,686][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:30:54,687][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:30:54,690][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:30:54,691][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:30:54,694][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:30:54,696][00473] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:30:54,698][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:30:54,700][00473] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-27 17:30:54,702][00473] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-27 17:30:54,703][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:30:54,704][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:30:54,706][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:30:54,707][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:30:54,709][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:30:54,757][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:30:54,760][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:30:54,779][00473] ConvEncoder: input_channels=3 [2024-07-27 17:30:54,842][00473] Conv encoder output size: 512 [2024-07-27 17:30:54,845][00473] Policy head output size: 512 [2024-07-27 17:30:54,873][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:30:55,520][00473] Num frames 100... [2024-07-27 17:30:55,702][00473] Num frames 200... [2024-07-27 17:30:55,888][00473] Num frames 300... [2024-07-27 17:30:56,082][00473] Num frames 400... [2024-07-27 17:30:56,279][00473] Num frames 500... [2024-07-27 17:30:56,363][00473] Avg episode rewards: #0: 7.120, true rewards: #0: 5.120 [2024-07-27 17:30:56,366][00473] Avg episode reward: 7.120, avg true_objective: 5.120 [2024-07-27 17:30:56,548][00473] Num frames 600... [2024-07-27 17:30:56,750][00473] Num frames 700... [2024-07-27 17:30:56,940][00473] Num frames 800... [2024-07-27 17:30:57,154][00473] Num frames 900... [2024-07-27 17:30:57,283][00473] Num frames 1000... [2024-07-27 17:30:57,414][00473] Num frames 1100... [2024-07-27 17:30:57,538][00473] Num frames 1200... [2024-07-27 17:30:57,700][00473] Avg episode rewards: #0: 11.900, true rewards: #0: 6.400 [2024-07-27 17:30:57,702][00473] Avg episode reward: 11.900, avg true_objective: 6.400 [2024-07-27 17:30:57,741][00473] Num frames 1300... [2024-07-27 17:30:57,873][00473] Num frames 1400... [2024-07-27 17:30:58,008][00473] Num frames 1500... [2024-07-27 17:30:58,149][00473] Num frames 1600... [2024-07-27 17:30:58,281][00473] Num frames 1700... [2024-07-27 17:30:58,409][00473] Num frames 1800... [2024-07-27 17:30:58,544][00473] Num frames 1900... [2024-07-27 17:30:58,675][00473] Num frames 2000... [2024-07-27 17:30:58,812][00473] Num frames 2100... [2024-07-27 17:30:58,973][00473] Avg episode rewards: #0: 13.920, true rewards: #0: 7.253 [2024-07-27 17:30:58,975][00473] Avg episode reward: 13.920, avg true_objective: 7.253 [2024-07-27 17:30:59,011][00473] Num frames 2200... [2024-07-27 17:30:59,150][00473] Num frames 2300... [2024-07-27 17:30:59,301][00473] Num frames 2400... [2024-07-27 17:30:59,442][00473] Num frames 2500... [2024-07-27 17:30:59,577][00473] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2024-07-27 17:30:59,578][00473] Avg episode reward: 11.400, avg true_objective: 6.400 [2024-07-27 17:30:59,637][00473] Num frames 2600... [2024-07-27 17:30:59,773][00473] Num frames 2700... [2024-07-27 17:30:59,902][00473] Num frames 2800... [2024-07-27 17:31:00,038][00473] Num frames 2900... [2024-07-27 17:31:00,176][00473] Num frames 3000... [2024-07-27 17:31:00,310][00473] Num frames 3100... [2024-07-27 17:31:00,443][00473] Num frames 3200... [2024-07-27 17:31:00,731][00473] Num frames 3300... [2024-07-27 17:31:00,872][00473] Num frames 3400... [2024-07-27 17:31:01,010][00473] Num frames 3500... [2024-07-27 17:31:01,145][00473] Num frames 3600... [2024-07-27 17:31:01,282][00473] Num frames 3700... [2024-07-27 17:31:01,515][00473] Num frames 3800... [2024-07-27 17:31:01,642][00473] Num frames 3900... [2024-07-27 17:31:01,781][00473] Num frames 4000... [2024-07-27 17:31:01,911][00473] Num frames 4100... [2024-07-27 17:31:02,050][00473] Num frames 4200... [2024-07-27 17:31:02,181][00473] Num frames 4300... [2024-07-27 17:31:02,324][00473] Num frames 4400... [2024-07-27 17:31:02,401][00473] Avg episode rewards: #0: 16.832, true rewards: #0: 8.832 [2024-07-27 17:31:02,403][00473] Avg episode reward: 16.832, avg true_objective: 8.832 [2024-07-27 17:31:02,509][00473] Num frames 4500... [2024-07-27 17:31:02,636][00473] Num frames 4600... [2024-07-27 17:31:02,769][00473] Num frames 4700... [2024-07-27 17:31:02,895][00473] Num frames 4800... [2024-07-27 17:31:03,028][00473] Num frames 4900... [2024-07-27 17:31:03,123][00473] Avg episode rewards: #0: 15.547, true rewards: #0: 8.213 [2024-07-27 17:31:03,124][00473] Avg episode reward: 15.547, avg true_objective: 8.213 [2024-07-27 17:31:03,220][00473] Num frames 5000... [2024-07-27 17:31:03,353][00473] Num frames 5100... [2024-07-27 17:31:03,483][00473] Num frames 5200... [2024-07-27 17:31:03,613][00473] Num frames 5300... [2024-07-27 17:31:03,750][00473] Num frames 5400... [2024-07-27 17:31:03,882][00473] Num frames 5500... [2024-07-27 17:31:04,012][00473] Num frames 5600... [2024-07-27 17:31:04,141][00473] Num frames 5700... [2024-07-27 17:31:04,281][00473] Num frames 5800... [2024-07-27 17:31:04,411][00473] Avg episode rewards: #0: 16.223, true rewards: #0: 8.366 [2024-07-27 17:31:04,412][00473] Avg episode reward: 16.223, avg true_objective: 8.366 [2024-07-27 17:31:04,473][00473] Num frames 5900... [2024-07-27 17:31:04,607][00473] Num frames 6000... [2024-07-27 17:31:04,748][00473] Num frames 6100... [2024-07-27 17:31:04,878][00473] Num frames 6200... [2024-07-27 17:31:05,015][00473] Num frames 6300... [2024-07-27 17:31:05,148][00473] Num frames 6400... [2024-07-27 17:31:05,285][00473] Num frames 6500... [2024-07-27 17:31:05,418][00473] Avg episode rewards: #0: 15.575, true rewards: #0: 8.200 [2024-07-27 17:31:05,419][00473] Avg episode reward: 15.575, avg true_objective: 8.200 [2024-07-27 17:31:05,476][00473] Num frames 6600... [2024-07-27 17:31:05,607][00473] Num frames 6700... [2024-07-27 17:31:05,748][00473] Num frames 6800... [2024-07-27 17:31:05,876][00473] Num frames 6900... [2024-07-27 17:31:06,030][00473] Avg episode rewards: #0: 14.418, true rewards: #0: 7.751 [2024-07-27 17:31:06,031][00473] Avg episode reward: 14.418, avg true_objective: 7.751 [2024-07-27 17:31:06,068][00473] Num frames 7000... [2024-07-27 17:31:06,196][00473] Num frames 7100... [2024-07-27 17:31:06,334][00473] Num frames 7200... [2024-07-27 17:31:06,468][00473] Num frames 7300... [2024-07-27 17:31:06,599][00473] Num frames 7400... [2024-07-27 17:31:06,733][00473] Num frames 7500... [2024-07-27 17:31:06,867][00473] Num frames 7600... [2024-07-27 17:31:06,999][00473] Num frames 7700... [2024-07-27 17:31:07,127][00473] Num frames 7800... [2024-07-27 17:31:07,306][00473] Num frames 7900... [2024-07-27 17:31:07,498][00473] Num frames 8000... [2024-07-27 17:31:07,678][00473] Num frames 8100... [2024-07-27 17:31:07,809][00473] Avg episode rewards: #0: 15.439, true rewards: #0: 8.139 [2024-07-27 17:31:07,811][00473] Avg episode reward: 15.439, avg true_objective: 8.139 [2024-07-27 17:31:56,547][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:32:08,223][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:32:08,225][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:32:08,227][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:32:08,229][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:32:08,230][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:32:08,232][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:32:08,233][00473] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-27 17:32:08,236][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:32:08,236][00473] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-27 17:32:08,239][00473] Adding new argument 'hf_repository'='rishisim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-27 17:32:08,240][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:32:08,241][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:32:08,244][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:32:08,246][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:32:08,248][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:32:08,282][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:32:08,285][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:32:08,297][00473] ConvEncoder: input_channels=3 [2024-07-27 17:32:08,346][00473] Conv encoder output size: 512 [2024-07-27 17:32:08,348][00473] Policy head output size: 512 [2024-07-27 17:32:08,367][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:32:08,805][00473] Num frames 100... [2024-07-27 17:32:08,939][00473] Num frames 200... [2024-07-27 17:32:09,068][00473] Num frames 300... [2024-07-27 17:32:09,195][00473] Num frames 400... [2024-07-27 17:32:09,324][00473] Num frames 500... [2024-07-27 17:32:09,463][00473] Num frames 600... [2024-07-27 17:32:09,588][00473] Num frames 700... [2024-07-27 17:32:09,720][00473] Num frames 800... [2024-07-27 17:32:09,849][00473] Num frames 900... [2024-07-27 17:32:09,975][00473] Num frames 1000... [2024-07-27 17:32:10,104][00473] Num frames 1100... [2024-07-27 17:32:10,263][00473] Avg episode rewards: #0: 22.840, true rewards: #0: 11.840 [2024-07-27 17:32:10,265][00473] Avg episode reward: 22.840, avg true_objective: 11.840 [2024-07-27 17:32:10,290][00473] Num frames 1200... [2024-07-27 17:32:10,435][00473] Num frames 1300... [2024-07-27 17:32:10,565][00473] Num frames 1400... [2024-07-27 17:32:10,697][00473] Num frames 1500... [2024-07-27 17:32:10,836][00473] Num frames 1600... [2024-07-27 17:32:10,945][00473] Avg episode rewards: #0: 15.160, true rewards: #0: 8.160 [2024-07-27 17:32:10,948][00473] Avg episode reward: 15.160, avg true_objective: 8.160 [2024-07-27 17:32:11,076][00473] Num frames 1700... [2024-07-27 17:32:11,256][00473] Num frames 1800... [2024-07-27 17:32:11,439][00473] Num frames 1900... [2024-07-27 17:32:11,645][00473] Num frames 2000... [2024-07-27 17:32:11,834][00473] Num frames 2100... [2024-07-27 17:32:12,013][00473] Num frames 2200... [2024-07-27 17:32:12,203][00473] Num frames 2300... [2024-07-27 17:32:12,394][00473] Num frames 2400... [2024-07-27 17:32:12,592][00473] Num frames 2500... [2024-07-27 17:32:12,832][00473] Avg episode rewards: #0: 16.640, true rewards: #0: 8.640 [2024-07-27 17:32:12,833][00473] Avg episode reward: 16.640, avg true_objective: 8.640 [2024-07-27 17:32:12,851][00473] Num frames 2600... [2024-07-27 17:32:13,039][00473] Num frames 2700... [2024-07-27 17:32:13,228][00473] Num frames 2800... [2024-07-27 17:32:13,383][00473] Num frames 2900... [2024-07-27 17:32:13,518][00473] Num frames 3000... [2024-07-27 17:32:13,649][00473] Num frames 3100... [2024-07-27 17:32:13,786][00473] Num frames 3200... [2024-07-27 17:32:13,915][00473] Num frames 3300... [2024-07-27 17:32:14,027][00473] Avg episode rewards: #0: 16.108, true rewards: #0: 8.357 [2024-07-27 17:32:14,028][00473] Avg episode reward: 16.108, avg true_objective: 8.357 [2024-07-27 17:32:14,110][00473] Num frames 3400... [2024-07-27 17:32:14,238][00473] Num frames 3500... [2024-07-27 17:32:14,369][00473] Num frames 3600... [2024-07-27 17:32:14,505][00473] Num frames 3700... [2024-07-27 17:32:14,638][00473] Num frames 3800... [2024-07-27 17:32:14,775][00473] Num frames 3900... [2024-07-27 17:32:14,904][00473] Num frames 4000... [2024-07-27 17:32:15,049][00473] Num frames 4100... [2024-07-27 17:32:15,180][00473] Num frames 4200... [2024-07-27 17:32:15,312][00473] Num frames 4300... [2024-07-27 17:32:15,443][00473] Num frames 4400... [2024-07-27 17:32:15,582][00473] Num frames 4500... [2024-07-27 17:32:15,722][00473] Num frames 4600... [2024-07-27 17:32:15,831][00473] Avg episode rewards: #0: 18.474, true rewards: #0: 9.274 [2024-07-27 17:32:15,833][00473] Avg episode reward: 18.474, avg true_objective: 9.274 [2024-07-27 17:32:15,915][00473] Num frames 4700... [2024-07-27 17:32:16,047][00473] Num frames 4800... [2024-07-27 17:32:16,175][00473] Num frames 4900... [2024-07-27 17:32:16,301][00473] Num frames 5000... [2024-07-27 17:32:16,434][00473] Num frames 5100... [2024-07-27 17:32:16,573][00473] Num frames 5200... [2024-07-27 17:32:16,706][00473] Num frames 5300... [2024-07-27 17:32:16,866][00473] Avg episode rewards: #0: 18.122, true rewards: #0: 8.955 [2024-07-27 17:32:16,868][00473] Avg episode reward: 18.122, avg true_objective: 8.955 [2024-07-27 17:32:16,904][00473] Num frames 5400... [2024-07-27 17:32:17,035][00473] Num frames 5500... [2024-07-27 17:32:17,172][00473] Num frames 5600... [2024-07-27 17:32:17,302][00473] Num frames 5700... [2024-07-27 17:32:17,431][00473] Num frames 5800... [2024-07-27 17:32:17,596][00473] Avg episode rewards: #0: 16.836, true rewards: #0: 8.407 [2024-07-27 17:32:17,597][00473] Avg episode reward: 16.836, avg true_objective: 8.407 [2024-07-27 17:32:17,620][00473] Num frames 5900... [2024-07-27 17:32:17,759][00473] Num frames 6000... [2024-07-27 17:32:17,883][00473] Num frames 6100... [2024-07-27 17:32:18,009][00473] Num frames 6200... [2024-07-27 17:32:18,142][00473] Num frames 6300... [2024-07-27 17:32:18,269][00473] Num frames 6400... [2024-07-27 17:32:18,364][00473] Avg episode rewards: #0: 15.661, true rewards: #0: 8.036 [2024-07-27 17:32:18,365][00473] Avg episode reward: 15.661, avg true_objective: 8.036 [2024-07-27 17:32:18,460][00473] Num frames 6500... [2024-07-27 17:32:18,591][00473] Num frames 6600... [2024-07-27 17:32:18,730][00473] Num frames 6700... [2024-07-27 17:32:18,864][00473] Num frames 6800... [2024-07-27 17:32:18,979][00473] Avg episode rewards: #0: 14.383, true rewards: #0: 7.606 [2024-07-27 17:32:18,981][00473] Avg episode reward: 14.383, avg true_objective: 7.606 [2024-07-27 17:32:19,053][00473] Num frames 6900... [2024-07-27 17:32:19,183][00473] Num frames 7000... [2024-07-27 17:32:19,311][00473] Num frames 7100... [2024-07-27 17:32:19,439][00473] Num frames 7200... [2024-07-27 17:32:19,570][00473] Num frames 7300... [2024-07-27 17:32:19,749][00473] Num frames 7400... [2024-07-27 17:32:19,881][00473] Num frames 7500... [2024-07-27 17:32:19,960][00473] Avg episode rewards: #0: 14.017, true rewards: #0: 7.517 [2024-07-27 17:32:19,961][00473] Avg episode reward: 14.017, avg true_objective: 7.517 [2024-07-27 17:33:06,680][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:33:17,918][00473] The model has been pushed to https://huggingface.co/rishisim/rl_course_vizdoom_health_gathering_supreme [2024-07-27 17:33:52,349][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:33:52,350][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:33:52,352][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:33:52,354][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:33:52,356][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:33:52,358][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:33:52,360][00473] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-07-27 17:33:52,361][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:33:52,362][00473] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-27 17:33:52,363][00473] Adding new argument 'hf_repository'='rishisim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-27 17:33:52,364][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:33:52,365][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:33:52,366][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:33:52,367][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:33:52,368][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:33:52,399][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:33:52,400][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:33:52,421][00473] ConvEncoder: input_channels=3 [2024-07-27 17:33:52,464][00473] Conv encoder output size: 512 [2024-07-27 17:33:52,466][00473] Policy head output size: 512 [2024-07-27 17:33:52,486][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:33:52,952][00473] Num frames 100... [2024-07-27 17:33:53,094][00473] Num frames 200... [2024-07-27 17:33:53,230][00473] Num frames 300... [2024-07-27 17:33:53,368][00473] Num frames 400... [2024-07-27 17:33:53,510][00473] Num frames 500... [2024-07-27 17:33:53,647][00473] Num frames 600... [2024-07-27 17:33:53,810][00473] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720 [2024-07-27 17:33:53,812][00473] Avg episode reward: 11.720, avg true_objective: 6.720 [2024-07-27 17:33:53,859][00473] Num frames 700... [2024-07-27 17:33:53,991][00473] Num frames 800... [2024-07-27 17:33:54,123][00473] Num frames 900... [2024-07-27 17:33:54,259][00473] Num frames 1000... [2024-07-27 17:33:54,392][00473] Num frames 1100... [2024-07-27 17:33:54,531][00473] Num frames 1200... [2024-07-27 17:33:54,670][00473] Num frames 1300... [2024-07-27 17:33:54,811][00473] Num frames 1400... [2024-07-27 17:33:54,945][00473] Num frames 1500... [2024-07-27 17:33:55,046][00473] Avg episode rewards: #0: 13.675, true rewards: #0: 7.675 [2024-07-27 17:33:55,048][00473] Avg episode reward: 13.675, avg true_objective: 7.675 [2024-07-27 17:33:55,133][00473] Num frames 1600... [2024-07-27 17:33:55,260][00473] Num frames 1700... [2024-07-27 17:33:55,398][00473] Num frames 1800... [2024-07-27 17:33:55,534][00473] Num frames 1900... [2024-07-27 17:33:55,664][00473] Num frames 2000... [2024-07-27 17:33:55,805][00473] Num frames 2100... [2024-07-27 17:33:55,945][00473] Num frames 2200... [2024-07-27 17:33:56,073][00473] Num frames 2300... [2024-07-27 17:33:56,205][00473] Num frames 2400... [2024-07-27 17:33:56,339][00473] Num frames 2500... [2024-07-27 17:33:56,477][00473] Num frames 2600... [2024-07-27 17:33:56,623][00473] Num frames 2700... [2024-07-27 17:33:56,770][00473] Num frames 2800... [2024-07-27 17:33:56,912][00473] Num frames 2900... [2024-07-27 17:33:57,047][00473] Num frames 3000... [2024-07-27 17:33:57,181][00473] Avg episode rewards: #0: 20.530, true rewards: #0: 10.197 [2024-07-27 17:33:57,183][00473] Avg episode reward: 20.530, avg true_objective: 10.197 [2024-07-27 17:33:57,245][00473] Num frames 3100... [2024-07-27 17:33:57,377][00473] Num frames 3200... [2024-07-27 17:33:57,511][00473] Num frames 3300... [2024-07-27 17:33:57,656][00473] Num frames 3400... [2024-07-27 17:33:57,801][00473] Num frames 3500... [2024-07-27 17:33:57,935][00473] Num frames 3600... [2024-07-27 17:33:58,070][00473] Num frames 3700... [2024-07-27 17:33:58,202][00473] Num frames 3800... [2024-07-27 17:33:58,337][00473] Num frames 3900... [2024-07-27 17:33:58,471][00473] Num frames 4000... [2024-07-27 17:33:58,552][00473] Avg episode rewards: #0: 19.798, true rewards: #0: 10.047 [2024-07-27 17:33:58,554][00473] Avg episode reward: 19.798, avg true_objective: 10.047 [2024-07-27 17:33:58,671][00473] Num frames 4100... [2024-07-27 17:33:58,815][00473] Num frames 4200... [2024-07-27 17:33:58,948][00473] Num frames 4300... [2024-07-27 17:33:59,083][00473] Num frames 4400... [2024-07-27 17:33:59,214][00473] Num frames 4500... [2024-07-27 17:33:59,347][00473] Num frames 4600... [2024-07-27 17:33:59,479][00473] Num frames 4700... [2024-07-27 17:33:59,618][00473] Num frames 4800... [2024-07-27 17:33:59,754][00473] Num frames 4900... [2024-07-27 17:33:59,830][00473] Avg episode rewards: #0: 19.230, true rewards: #0: 9.830 [2024-07-27 17:33:59,832][00473] Avg episode reward: 19.230, avg true_objective: 9.830 [2024-07-27 17:33:59,947][00473] Num frames 5000... [2024-07-27 17:34:00,077][00473] Num frames 5100... [2024-07-27 17:34:00,207][00473] Num frames 5200... [2024-07-27 17:34:00,342][00473] Num frames 5300... [2024-07-27 17:34:00,474][00473] Num frames 5400... [2024-07-27 17:34:00,604][00473] Num frames 5500... [2024-07-27 17:34:00,758][00473] Num frames 5600... [2024-07-27 17:34:00,892][00473] Num frames 5700... [2024-07-27 17:34:01,025][00473] Num frames 5800... [2024-07-27 17:34:01,160][00473] Num frames 5900... [2024-07-27 17:34:01,291][00473] Num frames 6000... [2024-07-27 17:34:01,426][00473] Num frames 6100... [2024-07-27 17:34:01,603][00473] Avg episode rewards: #0: 20.825, true rewards: #0: 10.325 [2024-07-27 17:34:01,605][00473] Avg episode reward: 20.825, avg true_objective: 10.325 [2024-07-27 17:34:01,616][00473] Num frames 6200... [2024-07-27 17:34:01,765][00473] Num frames 6300... [2024-07-27 17:34:01,899][00473] Num frames 6400... [2024-07-27 17:34:02,028][00473] Num frames 6500... [2024-07-27 17:34:02,182][00473] Num frames 6600... [2024-07-27 17:34:02,379][00473] Num frames 6700... [2024-07-27 17:34:02,597][00473] Num frames 6800... [2024-07-27 17:34:02,805][00473] Num frames 6900... [2024-07-27 17:34:03,010][00473] Num frames 7000... [2024-07-27 17:34:03,244][00473] Avg episode rewards: #0: 20.273, true rewards: #0: 10.130 [2024-07-27 17:34:03,249][00473] Avg episode reward: 20.273, avg true_objective: 10.130 [2024-07-27 17:34:03,269][00473] Num frames 7100... [2024-07-27 17:34:03,457][00473] Num frames 7200... [2024-07-27 17:34:03,642][00473] Num frames 7300... [2024-07-27 17:34:03,856][00473] Num frames 7400... [2024-07-27 17:34:04,050][00473] Num frames 7500... [2024-07-27 17:34:04,246][00473] Num frames 7600... [2024-07-27 17:34:04,444][00473] Num frames 7700... [2024-07-27 17:34:04,645][00473] Num frames 7800... [2024-07-27 17:34:04,820][00473] Num frames 7900... [2024-07-27 17:34:04,959][00473] Num frames 8000... [2024-07-27 17:34:05,092][00473] Num frames 8100... [2024-07-27 17:34:05,224][00473] Num frames 8200... [2024-07-27 17:34:05,301][00473] Avg episode rewards: #0: 20.770, true rewards: #0: 10.270 [2024-07-27 17:34:05,302][00473] Avg episode reward: 20.770, avg true_objective: 10.270 [2024-07-27 17:34:05,417][00473] Num frames 8300... [2024-07-27 17:34:05,551][00473] Num frames 8400... [2024-07-27 17:34:05,685][00473] Num frames 8500... [2024-07-27 17:34:05,830][00473] Num frames 8600... [2024-07-27 17:34:05,973][00473] Avg episode rewards: #0: 19.071, true rewards: #0: 9.627 [2024-07-27 17:34:05,975][00473] Avg episode reward: 19.071, avg true_objective: 9.627 [2024-07-27 17:34:06,028][00473] Num frames 8700... [2024-07-27 17:34:06,154][00473] Num frames 8800... [2024-07-27 17:34:06,283][00473] Num frames 8900... [2024-07-27 17:34:06,422][00473] Num frames 9000... [2024-07-27 17:34:06,549][00473] Num frames 9100... [2024-07-27 17:34:06,680][00473] Num frames 9200... [2024-07-27 17:34:06,821][00473] Num frames 9300... [2024-07-27 17:34:06,965][00473] Num frames 9400... [2024-07-27 17:34:07,094][00473] Num frames 9500... [2024-07-27 17:34:07,227][00473] Num frames 9600... [2024-07-27 17:34:07,288][00473] Avg episode rewards: #0: 19.003, true rewards: #0: 9.603 [2024-07-27 17:34:07,289][00473] Avg episode reward: 19.003, avg true_objective: 9.603 [2024-07-27 17:35:07,703][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:35:15,032][00473] The model has been pushed to https://huggingface.co/rishisim/rl_course_vizdoom_health_gathering_supreme [2024-07-27 17:38:03,587][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:38:03,589][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:38:03,591][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:38:03,593][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:38:03,594][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:38:03,595][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:38:03,597][00473] Adding new argument 'max_num_frames'=1000000 that is not in the saved config file! [2024-07-27 17:38:03,600][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:38:03,602][00473] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-27 17:38:03,603][00473] Adding new argument 'hf_repository'='rishisim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-27 17:38:03,604][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:38:03,605][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:38:03,609][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:38:03,610][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:38:03,611][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:38:03,647][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:38:03,649][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:38:03,665][00473] ConvEncoder: input_channels=3 [2024-07-27 17:38:03,707][00473] Conv encoder output size: 512 [2024-07-27 17:38:03,709][00473] Policy head output size: 512 [2024-07-27 17:38:03,732][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:38:04,174][00473] Num frames 100... [2024-07-27 17:38:04,309][00473] Num frames 200... [2024-07-27 17:38:04,439][00473] Num frames 300... [2024-07-27 17:38:04,584][00473] Num frames 400... [2024-07-27 17:38:04,714][00473] Num frames 500... [2024-07-27 17:38:04,843][00473] Num frames 600... [2024-07-27 17:38:04,983][00473] Num frames 700... [2024-07-27 17:38:05,117][00473] Num frames 800... [2024-07-27 17:38:05,255][00473] Num frames 900... [2024-07-27 17:38:05,402][00473] Num frames 1000... [2024-07-27 17:38:05,544][00473] Num frames 1100... [2024-07-27 17:38:05,679][00473] Num frames 1200... [2024-07-27 17:38:05,807][00473] Avg episode rewards: #0: 29.480, true rewards: #0: 12.480 [2024-07-27 17:38:05,809][00473] Avg episode reward: 29.480, avg true_objective: 12.480 [2024-07-27 17:38:05,886][00473] Num frames 1300... [2024-07-27 17:38:06,028][00473] Num frames 1400... [2024-07-27 17:38:06,160][00473] Num frames 1500... [2024-07-27 17:38:06,304][00473] Num frames 1600... [2024-07-27 17:38:06,437][00473] Num frames 1700... [2024-07-27 17:38:06,573][00473] Num frames 1800... [2024-07-27 17:38:06,661][00473] Avg episode rewards: #0: 19.620, true rewards: #0: 9.120 [2024-07-27 17:38:06,662][00473] Avg episode reward: 19.620, avg true_objective: 9.120 [2024-07-27 17:38:06,771][00473] Num frames 1900... [2024-07-27 17:38:06,911][00473] Num frames 2000... [2024-07-27 17:38:07,044][00473] Num frames 2100... [2024-07-27 17:38:07,171][00473] Num frames 2200... [2024-07-27 17:38:07,278][00473] Avg episode rewards: #0: 15.133, true rewards: #0: 7.467 [2024-07-27 17:38:07,280][00473] Avg episode reward: 15.133, avg true_objective: 7.467 [2024-07-27 17:38:07,369][00473] Num frames 2300... [2024-07-27 17:38:07,499][00473] Num frames 2400... [2024-07-27 17:38:07,631][00473] Num frames 2500... [2024-07-27 17:38:07,769][00473] Num frames 2600... [2024-07-27 17:38:07,900][00473] Num frames 2700... [2024-07-27 17:38:08,026][00473] Avg episode rewards: #0: 13.380, true rewards: #0: 6.880 [2024-07-27 17:38:08,028][00473] Avg episode reward: 13.380, avg true_objective: 6.880 [2024-07-27 17:38:08,095][00473] Num frames 2800... [2024-07-27 17:38:08,229][00473] Num frames 2900... [2024-07-27 17:38:08,367][00473] Num frames 3000... [2024-07-27 17:38:08,504][00473] Num frames 3100... [2024-07-27 17:38:08,634][00473] Num frames 3200... [2024-07-27 17:38:08,776][00473] Num frames 3300... [2024-07-27 17:38:08,909][00473] Num frames 3400... [2024-07-27 17:38:09,081][00473] Avg episode rewards: #0: 13.376, true rewards: #0: 6.976 [2024-07-27 17:38:09,082][00473] Avg episode reward: 13.376, avg true_objective: 6.976 [2024-07-27 17:38:09,104][00473] Num frames 3500... [2024-07-27 17:38:09,241][00473] Num frames 3600... [2024-07-27 17:38:09,385][00473] Num frames 3700... [2024-07-27 17:38:09,522][00473] Num frames 3800... [2024-07-27 17:38:09,666][00473] Num frames 3900... [2024-07-27 17:38:09,805][00473] Num frames 4000... [2024-07-27 17:38:09,944][00473] Avg episode rewards: #0: 12.940, true rewards: #0: 6.773 [2024-07-27 17:38:09,946][00473] Avg episode reward: 12.940, avg true_objective: 6.773 [2024-07-27 17:38:09,997][00473] Num frames 4100... [2024-07-27 17:38:10,122][00473] Num frames 4200... [2024-07-27 17:38:10,247][00473] Num frames 4300... [2024-07-27 17:38:10,402][00473] Num frames 4400... [2024-07-27 17:38:10,542][00473] Num frames 4500... [2024-07-27 17:38:10,677][00473] Num frames 4600... [2024-07-27 17:38:10,842][00473] Num frames 4700... [2024-07-27 17:38:10,999][00473] Num frames 4800... [2024-07-27 17:38:11,151][00473] Num frames 4900... [2024-07-27 17:38:11,286][00473] Num frames 5000... [2024-07-27 17:38:11,425][00473] Num frames 5100... [2024-07-27 17:38:11,600][00473] Avg episode rewards: #0: 14.406, true rewards: #0: 7.406 [2024-07-27 17:38:11,602][00473] Avg episode reward: 14.406, avg true_objective: 7.406 [2024-07-27 17:38:11,630][00473] Num frames 5200... [2024-07-27 17:38:11,770][00473] Num frames 5300... [2024-07-27 17:38:11,914][00473] Num frames 5400... [2024-07-27 17:38:12,053][00473] Num frames 5500... [2024-07-27 17:38:12,307][00473] Num frames 5600... [2024-07-27 17:38:12,458][00473] Num frames 5700... [2024-07-27 17:38:12,591][00473] Num frames 5800... [2024-07-27 17:38:12,782][00473] Num frames 5900... [2024-07-27 17:38:12,980][00473] Num frames 6000... [2024-07-27 17:38:13,170][00473] Num frames 6100... [2024-07-27 17:38:13,368][00473] Num frames 6200... [2024-07-27 17:38:13,443][00473] Avg episode rewards: #0: 15.260, true rewards: #0: 7.760 [2024-07-27 17:38:13,445][00473] Avg episode reward: 15.260, avg true_objective: 7.760 [2024-07-27 17:38:13,619][00473] Num frames 6300... [2024-07-27 17:38:13,830][00473] Num frames 6400... [2024-07-27 17:38:14,045][00473] Num frames 6500... [2024-07-27 17:38:14,254][00473] Num frames 6600... [2024-07-27 17:38:14,445][00473] Num frames 6700... [2024-07-27 17:38:14,645][00473] Num frames 6800... [2024-07-27 17:38:14,847][00473] Num frames 6900... [2024-07-27 17:38:15,043][00473] Num frames 7000... [2024-07-27 17:38:15,237][00473] Num frames 7100... [2024-07-27 17:38:15,305][00473] Avg episode rewards: #0: 15.671, true rewards: #0: 7.893 [2024-07-27 17:38:15,307][00473] Avg episode reward: 15.671, avg true_objective: 7.893 [2024-07-27 17:38:15,441][00473] Num frames 7200... [2024-07-27 17:38:15,577][00473] Num frames 7300... [2024-07-27 17:38:15,715][00473] Num frames 7400... [2024-07-27 17:38:15,846][00473] Num frames 7500... [2024-07-27 17:38:15,970][00473] Avg episode rewards: #0: 14.652, true rewards: #0: 7.552 [2024-07-27 17:38:15,974][00473] Avg episode reward: 14.652, avg true_objective: 7.552 [2024-07-27 17:39:03,628][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:39:07,781][00473] The model has been pushed to https://huggingface.co/rishisim/rl_course_vizdoom_health_gathering_supreme [2024-07-27 17:39:30,317][00473] Environment doom_basic already registered, overwriting... [2024-07-27 17:39:30,320][00473] Environment doom_two_colors_easy already registered, overwriting... [2024-07-27 17:39:30,322][00473] Environment doom_two_colors_hard already registered, overwriting... [2024-07-27 17:39:30,325][00473] Environment doom_dm already registered, overwriting... [2024-07-27 17:39:30,329][00473] Environment doom_dwango5 already registered, overwriting... [2024-07-27 17:39:30,330][00473] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-07-27 17:39:30,331][00473] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-07-27 17:39:30,332][00473] Environment doom_my_way_home already registered, overwriting... [2024-07-27 17:39:30,333][00473] Environment doom_deadly_corridor already registered, overwriting... [2024-07-27 17:39:30,334][00473] Environment doom_defend_the_center already registered, overwriting... [2024-07-27 17:39:30,342][00473] Environment doom_defend_the_line already registered, overwriting... [2024-07-27 17:39:30,344][00473] Environment doom_health_gathering already registered, overwriting... [2024-07-27 17:39:30,345][00473] Environment doom_health_gathering_supreme already registered, overwriting... [2024-07-27 17:39:30,347][00473] Environment doom_battle already registered, overwriting... [2024-07-27 17:39:30,349][00473] Environment doom_battle2 already registered, overwriting... [2024-07-27 17:39:30,351][00473] Environment doom_duel_bots already registered, overwriting... [2024-07-27 17:39:30,352][00473] Environment doom_deathmatch_bots already registered, overwriting... [2024-07-27 17:39:30,354][00473] Environment doom_duel already registered, overwriting... [2024-07-27 17:39:30,356][00473] Environment doom_deathmatch_full already registered, overwriting... [2024-07-27 17:39:30,357][00473] Environment doom_benchmark already registered, overwriting... [2024-07-27 17:39:30,359][00473] register_encoder_factory: [2024-07-27 17:39:30,381][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:39:30,383][00473] Overriding arg 'train_for_env_steps' with value 4505000 passed from command line [2024-07-27 17:39:30,392][00473] Experiment dir /content/train_dir/default_experiment already exists! [2024-07-27 17:39:30,395][00473] Resuming existing experiment from /content/train_dir/default_experiment... [2024-07-27 17:39:30,397][00473] Weights and Biases integration disabled [2024-07-27 17:39:30,403][00473] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-07-27 17:39:32,519][00473] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=4505000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-07-27 17:39:32,521][00473] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-07-27 17:39:32,525][00473] Rollout worker 0 uses device cpu [2024-07-27 17:39:32,526][00473] Rollout worker 1 uses device cpu [2024-07-27 17:39:32,528][00473] Rollout worker 2 uses device cpu [2024-07-27 17:39:32,530][00473] Rollout worker 3 uses device cpu [2024-07-27 17:39:32,531][00473] Rollout worker 4 uses device cpu [2024-07-27 17:39:32,532][00473] Rollout worker 5 uses device cpu [2024-07-27 17:39:32,534][00473] Rollout worker 6 uses device cpu [2024-07-27 17:39:32,535][00473] Rollout worker 7 uses device cpu [2024-07-27 17:39:32,635][00473] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:39:32,637][00473] InferenceWorker_p0-w0: min num requests: 2 [2024-07-27 17:39:32,671][00473] Starting all processes... [2024-07-27 17:39:32,674][00473] Starting process learner_proc0 [2024-07-27 17:39:32,721][00473] Starting all processes... [2024-07-27 17:39:32,729][00473] Starting process inference_proc0-0 [2024-07-27 17:39:32,730][00473] Starting process rollout_proc0 [2024-07-27 17:39:32,730][00473] Starting process rollout_proc1 [2024-07-27 17:39:32,730][00473] Starting process rollout_proc2 [2024-07-27 17:39:32,730][00473] Starting process rollout_proc3 [2024-07-27 17:39:32,732][00473] Starting process rollout_proc4 [2024-07-27 17:39:32,739][00473] Starting process rollout_proc5 [2024-07-27 17:39:32,739][00473] Starting process rollout_proc6 [2024-07-27 17:39:32,739][00473] Starting process rollout_proc7 [2024-07-27 17:39:48,066][18884] Worker 6 uses CPU cores [0] [2024-07-27 17:39:48,151][18863] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:39:48,154][18863] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-07-27 17:39:48,204][18881] Worker 4 uses CPU cores [0] [2024-07-27 17:39:48,214][18863] Num visible devices: 1 [2024-07-27 17:39:48,243][18863] Starting seed is not provided [2024-07-27 17:39:48,245][18863] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:39:48,245][18863] Initializing actor-critic model on device cuda:0 [2024-07-27 17:39:48,246][18863] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:39:48,248][18863] RunningMeanStd input shape: (1,) [2024-07-27 17:39:48,261][18883] Worker 7 uses CPU cores [1] [2024-07-27 17:39:48,285][18863] ConvEncoder: input_channels=3 [2024-07-27 17:39:48,441][18876] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:39:48,445][18876] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-07-27 17:39:48,482][18879] Worker 2 uses CPU cores [0] [2024-07-27 17:39:48,495][18877] Worker 1 uses CPU cores [1] [2024-07-27 17:39:48,510][18876] Num visible devices: 1 [2024-07-27 17:39:48,577][18882] Worker 5 uses CPU cores [1] [2024-07-27 17:39:48,630][18880] Worker 3 uses CPU cores [1] [2024-07-27 17:39:48,639][18878] Worker 0 uses CPU cores [0] [2024-07-27 17:39:48,678][18863] Conv encoder output size: 512 [2024-07-27 17:39:48,679][18863] Policy head output size: 512 [2024-07-27 17:39:48,694][18863] Created Actor Critic model with architecture: [2024-07-27 17:39:48,694][18863] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-07-27 17:39:48,850][18863] Using optimizer [2024-07-27 17:39:49,619][18863] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2024-07-27 17:39:49,654][18863] Loading model from checkpoint [2024-07-27 17:39:49,656][18863] Loaded experiment state at self.train_step=980, self.env_steps=4014080 [2024-07-27 17:39:49,657][18863] Initialized policy 0 weights for model version 980 [2024-07-27 17:39:49,659][18863] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-07-27 17:39:49,666][18863] LearnerWorker_p0 finished initialization! [2024-07-27 17:39:49,767][18876] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:39:49,768][18876] RunningMeanStd input shape: (1,) [2024-07-27 17:39:49,780][18876] ConvEncoder: input_channels=3 [2024-07-27 17:39:49,886][18876] Conv encoder output size: 512 [2024-07-27 17:39:49,887][18876] Policy head output size: 512 [2024-07-27 17:39:49,942][00473] Inference worker 0-0 is ready! [2024-07-27 17:39:49,944][00473] All inference workers are ready! Signal rollout workers to start! [2024-07-27 17:39:50,170][18880] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,173][18877] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,174][18883] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,169][18882] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,181][18881] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,194][18879] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,195][18878] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,192][18884] Doom resolution: 160x120, resize resolution: (128, 72) [2024-07-27 17:39:50,403][00473] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:39:50,826][18878] Decorrelating experience for 0 frames... [2024-07-27 17:39:51,217][18878] Decorrelating experience for 32 frames... [2024-07-27 17:39:51,585][18883] Decorrelating experience for 0 frames... [2024-07-27 17:39:51,588][18877] Decorrelating experience for 0 frames... [2024-07-27 17:39:51,590][18880] Decorrelating experience for 0 frames... [2024-07-27 17:39:52,097][18878] Decorrelating experience for 64 frames... [2024-07-27 17:39:52,623][18881] Decorrelating experience for 0 frames... [2024-07-27 17:39:52,626][18884] Decorrelating experience for 0 frames... [2024-07-27 17:39:52,628][00473] Heartbeat connected on Batcher_0 [2024-07-27 17:39:52,638][00473] Heartbeat connected on LearnerWorker_p0 [2024-07-27 17:39:52,674][00473] Heartbeat connected on InferenceWorker_p0-w0 [2024-07-27 17:39:53,133][18880] Decorrelating experience for 32 frames... [2024-07-27 17:39:53,135][18877] Decorrelating experience for 32 frames... [2024-07-27 17:39:53,137][18883] Decorrelating experience for 32 frames... [2024-07-27 17:39:53,198][18882] Decorrelating experience for 0 frames... [2024-07-27 17:39:54,076][18884] Decorrelating experience for 32 frames... [2024-07-27 17:39:54,079][18881] Decorrelating experience for 32 frames... [2024-07-27 17:39:54,081][18879] Decorrelating experience for 0 frames... [2024-07-27 17:39:54,288][18882] Decorrelating experience for 32 frames... [2024-07-27 17:39:54,934][18883] Decorrelating experience for 64 frames... [2024-07-27 17:39:54,971][18878] Decorrelating experience for 96 frames... [2024-07-27 17:39:55,345][00473] Heartbeat connected on RolloutWorker_w0 [2024-07-27 17:39:55,406][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:39:56,651][18880] Decorrelating experience for 64 frames... [2024-07-27 17:39:56,744][18879] Decorrelating experience for 32 frames... [2024-07-27 17:39:57,140][18877] Decorrelating experience for 64 frames... [2024-07-27 17:39:57,686][18882] Decorrelating experience for 64 frames... [2024-07-27 17:39:57,938][18883] Decorrelating experience for 96 frames... [2024-07-27 17:39:58,470][00473] Heartbeat connected on RolloutWorker_w7 [2024-07-27 17:39:59,430][18881] Decorrelating experience for 64 frames... [2024-07-27 17:39:59,624][18880] Decorrelating experience for 96 frames... [2024-07-27 17:39:59,864][18884] Decorrelating experience for 64 frames... [2024-07-27 17:40:00,058][00473] Heartbeat connected on RolloutWorker_w3 [2024-07-27 17:40:00,403][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 42.0. Samples: 420. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:40:00,407][00473] Avg episode reward: [(0, '2.560')] [2024-07-27 17:40:00,645][18877] Decorrelating experience for 96 frames... [2024-07-27 17:40:00,943][18882] Decorrelating experience for 96 frames... [2024-07-27 17:40:01,245][00473] Heartbeat connected on RolloutWorker_w1 [2024-07-27 17:40:01,957][00473] Heartbeat connected on RolloutWorker_w5 [2024-07-27 17:40:02,674][18879] Decorrelating experience for 64 frames... [2024-07-27 17:40:03,967][18884] Decorrelating experience for 96 frames... [2024-07-27 17:40:04,683][00473] Heartbeat connected on RolloutWorker_w6 [2024-07-27 17:40:05,052][18863] Signal inference workers to stop experience collection... [2024-07-27 17:40:05,066][18876] InferenceWorker_p0-w0: stopping experience collection [2024-07-27 17:40:05,286][18881] Decorrelating experience for 96 frames... [2024-07-27 17:40:05,389][18879] Decorrelating experience for 96 frames... [2024-07-27 17:40:05,403][00473] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 160.1. Samples: 2402. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-07-27 17:40:05,407][00473] Avg episode reward: [(0, '5.740')] [2024-07-27 17:40:05,455][00473] Heartbeat connected on RolloutWorker_w4 [2024-07-27 17:40:05,545][00473] Heartbeat connected on RolloutWorker_w2 [2024-07-27 17:40:06,486][18863] Signal inference workers to resume experience collection... [2024-07-27 17:40:06,488][18876] InferenceWorker_p0-w0: resuming experience collection [2024-07-27 17:40:10,403][00473] Fps is (10 sec: 2048.1, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 4034560. Throughput: 0: 171.6. Samples: 3432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:40:10,408][00473] Avg episode reward: [(0, '8.034')] [2024-07-27 17:40:15,403][00473] Fps is (10 sec: 3686.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 4050944. Throughput: 0: 367.2. Samples: 9180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:40:15,406][00473] Avg episode reward: [(0, '10.297')] [2024-07-27 17:40:16,092][18876] Updated weights for policy 0, policy_version 990 (0.0219) [2024-07-27 17:40:20,403][00473] Fps is (10 sec: 2867.2, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 4063232. Throughput: 0: 433.3. Samples: 13000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-27 17:40:20,410][00473] Avg episode reward: [(0, '12.401')] [2024-07-27 17:40:25,403][00473] Fps is (10 sec: 3276.8, 60 sec: 1989.5, 300 sec: 1989.5). Total num frames: 4083712. Throughput: 0: 452.9. Samples: 15850. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:40:25,406][00473] Avg episode reward: [(0, '14.255')] [2024-07-27 17:40:28,173][18876] Updated weights for policy 0, policy_version 1000 (0.0015) [2024-07-27 17:40:30,403][00473] Fps is (10 sec: 4096.0, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 4104192. Throughput: 0: 544.3. Samples: 21770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:40:30,407][00473] Avg episode reward: [(0, '15.281')] [2024-07-27 17:40:35,408][00473] Fps is (10 sec: 3275.3, 60 sec: 2275.3, 300 sec: 2275.3). Total num frames: 4116480. Throughput: 0: 583.3. Samples: 26252. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:40:35,410][00473] Avg episode reward: [(0, '16.210')] [2024-07-27 17:40:40,403][00473] Fps is (10 sec: 2867.1, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 4132864. Throughput: 0: 627.2. Samples: 28222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:40:40,409][00473] Avg episode reward: [(0, '17.626')] [2024-07-27 17:40:40,827][18876] Updated weights for policy 0, policy_version 1010 (0.0024) [2024-07-27 17:40:45,403][00473] Fps is (10 sec: 3688.1, 60 sec: 2532.1, 300 sec: 2532.1). Total num frames: 4153344. Throughput: 0: 751.2. Samples: 34222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-07-27 17:40:45,408][00473] Avg episode reward: [(0, '18.829')] [2024-07-27 17:40:45,412][18863] Saving new best policy, reward=18.829! [2024-07-27 17:40:50,404][00473] Fps is (10 sec: 3686.3, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 4169728. Throughput: 0: 824.3. Samples: 39496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-07-27 17:40:50,408][00473] Avg episode reward: [(0, '18.931')] [2024-07-27 17:40:50,426][18863] Saving new best policy, reward=18.931! [2024-07-27 17:40:53,148][18876] Updated weights for policy 0, policy_version 1020 (0.0033) [2024-07-27 17:40:55,403][00473] Fps is (10 sec: 2867.2, 60 sec: 2799.1, 300 sec: 2583.6). Total num frames: 4182016. Throughput: 0: 840.8. Samples: 41266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:40:55,405][00473] Avg episode reward: [(0, '18.576')] [2024-07-27 17:41:00,403][00473] Fps is (10 sec: 3277.0, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 4202496. Throughput: 0: 822.0. Samples: 46170. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:41:00,405][00473] Avg episode reward: [(0, '17.721')] [2024-07-27 17:41:04,256][18876] Updated weights for policy 0, policy_version 1030 (0.0017) [2024-07-27 17:41:05,403][00473] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2785.3). Total num frames: 4222976. Throughput: 0: 876.9. Samples: 52462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:41:05,406][00473] Avg episode reward: [(0, '17.163')] [2024-07-27 17:41:10,409][00473] Fps is (10 sec: 3274.9, 60 sec: 3344.8, 300 sec: 2764.6). Total num frames: 4235264. Throughput: 0: 861.1. Samples: 54606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:41:10,416][00473] Avg episode reward: [(0, '17.196')] [2024-07-27 17:41:15,403][00473] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2794.9). Total num frames: 4251648. Throughput: 0: 821.3. Samples: 58728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:41:15,405][00473] Avg episode reward: [(0, '17.044')] [2024-07-27 17:41:17,167][18876] Updated weights for policy 0, policy_version 1040 (0.0015) [2024-07-27 17:41:20,403][00473] Fps is (10 sec: 3688.4, 60 sec: 3481.6, 300 sec: 2867.2). Total num frames: 4272128. Throughput: 0: 857.2. Samples: 64820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:41:20,405][00473] Avg episode reward: [(0, '17.607')] [2024-07-27 17:41:25,403][00473] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2888.8). Total num frames: 4288512. Throughput: 0: 883.3. Samples: 67970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:41:25,407][00473] Avg episode reward: [(0, '18.281')] [2024-07-27 17:41:29,513][18876] Updated weights for policy 0, policy_version 1050 (0.0016) [2024-07-27 17:41:30,403][00473] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 2867.2). Total num frames: 4300800. Throughput: 0: 834.9. Samples: 71794. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-07-27 17:41:30,410][00473] Avg episode reward: [(0, '18.080')] [2024-07-27 17:41:30,425][18863] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth... [2024-07-27 17:41:30,599][18863] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-07-27 17:41:35,403][00473] Fps is (10 sec: 3276.8, 60 sec: 3413.6, 300 sec: 2925.7). Total num frames: 4321280. Throughput: 0: 840.5. Samples: 77316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:41:35,411][00473] Avg episode reward: [(0, '20.107')] [2024-07-27 17:41:35,413][18863] Saving new best policy, reward=20.107! [2024-07-27 17:41:40,160][18876] Updated weights for policy 0, policy_version 1060 (0.0026) [2024-07-27 17:41:40,403][00473] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 2978.9). Total num frames: 4341760. Throughput: 0: 867.3. Samples: 80296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:41:40,409][00473] Avg episode reward: [(0, '19.189')] [2024-07-27 17:41:45,404][00473] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 2956.2). Total num frames: 4354048. Throughput: 0: 865.1. Samples: 85102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:41:45,406][00473] Avg episode reward: [(0, '18.620')] [2024-07-27 17:41:50,403][00473] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2969.6). Total num frames: 4370432. Throughput: 0: 822.8. Samples: 89488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:41:50,411][00473] Avg episode reward: [(0, '18.749')] [2024-07-27 17:41:53,106][18876] Updated weights for policy 0, policy_version 1070 (0.0021) [2024-07-27 17:41:55,403][00473] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3014.6). Total num frames: 4390912. Throughput: 0: 844.9. Samples: 92620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-07-27 17:41:55,413][00473] Avg episode reward: [(0, '17.540')] [2024-07-27 17:42:00,404][00473] Fps is (10 sec: 3686.0, 60 sec: 3413.3, 300 sec: 3024.7). Total num frames: 4407296. Throughput: 0: 887.8. Samples: 98678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:42:00,408][00473] Avg episode reward: [(0, '16.795')] [2024-07-27 17:42:05,403][00473] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 4419584. Throughput: 0: 836.8. Samples: 102474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:42:05,405][00473] Avg episode reward: [(0, '17.038')] [2024-07-27 17:42:05,599][18876] Updated weights for policy 0, policy_version 1080 (0.0013) [2024-07-27 17:42:10,403][00473] Fps is (10 sec: 3277.1, 60 sec: 3413.6, 300 sec: 3042.7). Total num frames: 4440064. Throughput: 0: 826.2. Samples: 105150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-07-27 17:42:10,406][00473] Avg episode reward: [(0, '19.360')] [2024-07-27 17:42:15,403][00473] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3079.1). Total num frames: 4460544. Throughput: 0: 868.5. Samples: 110876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-07-27 17:42:15,410][00473] Avg episode reward: [(0, '19.294')] [2024-07-27 17:42:16,213][18876] Updated weights for policy 0, policy_version 1090 (0.0024) [2024-07-27 17:42:20,403][00473] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3058.3). Total num frames: 4472832. Throughput: 0: 848.5. Samples: 115500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-07-27 17:42:20,411][00473] Avg episode reward: [(0, '18.425')] [2024-07-27 17:42:25,404][00473] Fps is (10 sec: 2457.3, 60 sec: 3276.7, 300 sec: 3038.9). Total num frames: 4485120. Throughput: 0: 823.6. Samples: 117358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-07-27 17:42:25,406][00473] Avg episode reward: [(0, '18.760')] [2024-07-27 17:42:29,257][18876] Updated weights for policy 0, policy_version 1100 (0.0030) [2024-07-27 17:42:30,403][00473] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3072.0). Total num frames: 4505600. Throughput: 0: 845.9. Samples: 123168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-07-27 17:42:30,407][00473] Avg episode reward: [(0, '18.340')] [2024-07-27 17:42:30,534][18863] Stopping Batcher_0... [2024-07-27 17:42:30,535][18863] Loop batcher_evt_loop terminating... [2024-07-27 17:42:30,534][00473] Component Batcher_0 stopped! [2024-07-27 17:42:30,539][18863] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001101_4509696.pth... [2024-07-27 17:42:30,601][00473] Component RolloutWorker_w3 stopped! [2024-07-27 17:42:30,603][18880] Stopping RolloutWorker_w3... [2024-07-27 17:42:30,612][18881] Stopping RolloutWorker_w4... [2024-07-27 17:42:30,612][00473] Component RolloutWorker_w4 stopped! [2024-07-27 17:42:30,609][18880] Loop rollout_proc3_evt_loop terminating... [2024-07-27 17:42:30,620][18881] Loop rollout_proc4_evt_loop terminating... [2024-07-27 17:42:30,624][00473] Component RolloutWorker_w1 stopped! [2024-07-27 17:42:30,627][18877] Stopping RolloutWorker_w1... [2024-07-27 17:42:30,656][18877] Loop rollout_proc1_evt_loop terminating... [2024-07-27 17:42:30,659][18884] Stopping RolloutWorker_w6... [2024-07-27 17:42:30,660][18884] Loop rollout_proc6_evt_loop terminating... [2024-07-27 17:42:30,662][00473] Component RolloutWorker_w6 stopped! [2024-07-27 17:42:30,641][18876] Weights refcount: 2 0 [2024-07-27 17:42:30,681][00473] Component InferenceWorker_p0-w0 stopped! [2024-07-27 17:42:30,686][18876] Stopping InferenceWorker_p0-w0... [2024-07-27 17:42:30,689][18876] Loop inference_proc0-0_evt_loop terminating... [2024-07-27 17:42:30,692][00473] Component RolloutWorker_w7 stopped! [2024-07-27 17:42:30,695][18883] Stopping RolloutWorker_w7... [2024-07-27 17:42:30,696][18883] Loop rollout_proc7_evt_loop terminating... [2024-07-27 17:42:30,706][00473] Component RolloutWorker_w5 stopped! [2024-07-27 17:42:30,711][18882] Stopping RolloutWorker_w5... [2024-07-27 17:42:30,718][18878] Stopping RolloutWorker_w0... [2024-07-27 17:42:30,719][18878] Loop rollout_proc0_evt_loop terminating... [2024-07-27 17:42:30,722][18879] Stopping RolloutWorker_w2... [2024-07-27 17:42:30,718][00473] Component RolloutWorker_w0 stopped! [2024-07-27 17:42:30,725][18879] Loop rollout_proc2_evt_loop terminating... [2024-07-27 17:42:30,725][00473] Component RolloutWorker_w2 stopped! [2024-07-27 17:42:30,712][18882] Loop rollout_proc5_evt_loop terminating... [2024-07-27 17:42:30,754][18863] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth [2024-07-27 17:42:30,768][18863] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001101_4509696.pth... [2024-07-27 17:42:31,000][00473] Component LearnerWorker_p0 stopped! [2024-07-27 17:42:31,001][00473] Waiting for process learner_proc0 to stop... [2024-07-27 17:42:30,999][18863] Stopping LearnerWorker_p0... [2024-07-27 17:42:31,018][18863] Loop learner_proc0_evt_loop terminating... [2024-07-27 17:42:32,425][00473] Waiting for process inference_proc0-0 to join... [2024-07-27 17:42:32,428][00473] Waiting for process rollout_proc0 to join... [2024-07-27 17:42:34,518][00473] Waiting for process rollout_proc1 to join... [2024-07-27 17:42:34,558][00473] Waiting for process rollout_proc2 to join... [2024-07-27 17:42:34,561][00473] Waiting for process rollout_proc3 to join... [2024-07-27 17:42:34,564][00473] Waiting for process rollout_proc4 to join... [2024-07-27 17:42:34,572][00473] Waiting for process rollout_proc5 to join... [2024-07-27 17:42:34,575][00473] Waiting for process rollout_proc6 to join... [2024-07-27 17:42:34,583][00473] Waiting for process rollout_proc7 to join... [2024-07-27 17:42:34,589][00473] Batcher 0 profile tree view: batching: 3.2638, releasing_batches: 0.0064 [2024-07-27 17:42:34,593][00473] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 67.6765 update_model: 1.3268 weight_update: 0.0039 one_step: 0.0049 handle_policy_step: 84.2539 deserialize: 2.1267, stack: 0.4427, obs_to_device_normalize: 17.0462, forward: 45.6140, send_messages: 3.9128 prepare_outputs: 11.0594 to_cpu: 6.4650 [2024-07-27 17:42:34,595][00473] Learner 0 profile tree view: misc: 0.0007, prepare_batch: 4.2685 train: 12.2778 epoch_init: 0.0007, minibatch_init: 0.0009, losses_postprocess: 0.0813, kl_divergence: 0.1217, after_optimizer: 0.5455 calculate_losses: 5.0106 losses_init: 0.0005, forward_head: 0.5302, bptt_initial: 3.3199, tail: 0.3049, advantages_returns: 0.0295, losses: 0.5035 bptt: 0.2665 bptt_forward_core: 0.2500 update: 6.4516 clip: 0.1431 [2024-07-27 17:42:34,597][00473] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0341, enqueue_policy_requests: 17.7626, env_step: 117.1943, overhead: 2.2659, complete_rollouts: 1.0716 save_policy_outputs: 2.8331 split_output_tensors: 1.1090 [2024-07-27 17:42:34,599][00473] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0488, enqueue_policy_requests: 17.4188, env_step: 116.0178, overhead: 2.1658, complete_rollouts: 0.9147 save_policy_outputs: 3.1246 split_output_tensors: 1.1803 [2024-07-27 17:42:34,601][00473] Loop Runner_EvtLoop terminating... [2024-07-27 17:42:34,603][00473] Runner profile tree view: main_loop: 181.9323 [2024-07-27 17:42:34,609][00473] Collected {0: 4509696}, FPS: 2724.2 [2024-07-27 17:42:34,644][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:42:34,647][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:42:34,649][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:42:34,650][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:42:34,651][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:42:34,652][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:42:34,653][00473] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:42:34,654][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:42:34,655][00473] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-07-27 17:42:34,656][00473] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-07-27 17:42:34,658][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:42:34,659][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:42:34,660][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:42:34,661][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:42:34,662][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:42:34,712][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:42:34,715][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:42:34,738][00473] ConvEncoder: input_channels=3 [2024-07-27 17:42:34,806][00473] Conv encoder output size: 512 [2024-07-27 17:42:34,808][00473] Policy head output size: 512 [2024-07-27 17:42:34,838][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001101_4509696.pth... [2024-07-27 17:42:35,509][00473] Num frames 100... [2024-07-27 17:42:35,712][00473] Num frames 200... [2024-07-27 17:42:35,902][00473] Num frames 300... [2024-07-27 17:42:36,103][00473] Num frames 400... [2024-07-27 17:42:36,292][00473] Num frames 500... [2024-07-27 17:42:36,481][00473] Num frames 600... [2024-07-27 17:42:36,678][00473] Num frames 700... [2024-07-27 17:42:36,821][00473] Num frames 800... [2024-07-27 17:42:36,957][00473] Num frames 900... [2024-07-27 17:42:37,095][00473] Num frames 1000... [2024-07-27 17:42:37,222][00473] Num frames 1100... [2024-07-27 17:42:37,349][00473] Num frames 1200... [2024-07-27 17:42:37,474][00473] Num frames 1300... [2024-07-27 17:42:37,606][00473] Num frames 1400... [2024-07-27 17:42:37,740][00473] Num frames 1500... [2024-07-27 17:42:37,870][00473] Num frames 1600... [2024-07-27 17:42:37,999][00473] Num frames 1700... [2024-07-27 17:42:38,141][00473] Num frames 1800... [2024-07-27 17:42:38,274][00473] Avg episode rewards: #0: 51.599, true rewards: #0: 18.600 [2024-07-27 17:42:38,276][00473] Avg episode reward: 51.599, avg true_objective: 18.600 [2024-07-27 17:42:38,330][00473] Num frames 1900... [2024-07-27 17:42:38,463][00473] Num frames 2000... [2024-07-27 17:42:38,590][00473] Num frames 2100... [2024-07-27 17:42:38,732][00473] Num frames 2200... [2024-07-27 17:42:38,863][00473] Num frames 2300... [2024-07-27 17:42:38,992][00473] Num frames 2400... [2024-07-27 17:42:39,133][00473] Num frames 2500... [2024-07-27 17:42:39,260][00473] Num frames 2600... [2024-07-27 17:42:39,387][00473] Num frames 2700... [2024-07-27 17:42:39,520][00473] Num frames 2800... [2024-07-27 17:42:39,657][00473] Num frames 2900... [2024-07-27 17:42:39,798][00473] Num frames 3000... [2024-07-27 17:42:39,933][00473] Num frames 3100... [2024-07-27 17:42:40,001][00473] Avg episode rewards: #0: 38.540, true rewards: #0: 15.540 [2024-07-27 17:42:40,002][00473] Avg episode reward: 38.540, avg true_objective: 15.540 [2024-07-27 17:42:40,136][00473] Num frames 3200... [2024-07-27 17:42:40,269][00473] Num frames 3300... [2024-07-27 17:42:40,402][00473] Num frames 3400... [2024-07-27 17:42:40,538][00473] Num frames 3500... [2024-07-27 17:42:40,669][00473] Num frames 3600... [2024-07-27 17:42:40,811][00473] Num frames 3700... [2024-07-27 17:42:40,938][00473] Num frames 3800... [2024-07-27 17:42:41,071][00473] Num frames 3900... [2024-07-27 17:42:41,215][00473] Num frames 4000... [2024-07-27 17:42:41,344][00473] Num frames 4100... [2024-07-27 17:42:41,475][00473] Num frames 4200... [2024-07-27 17:42:41,651][00473] Avg episode rewards: #0: 33.306, true rewards: #0: 14.307 [2024-07-27 17:42:41,652][00473] Avg episode reward: 33.306, avg true_objective: 14.307 [2024-07-27 17:42:41,666][00473] Num frames 4300... [2024-07-27 17:42:41,799][00473] Num frames 4400... [2024-07-27 17:42:41,931][00473] Num frames 4500... [2024-07-27 17:42:42,059][00473] Num frames 4600... [2024-07-27 17:42:42,196][00473] Num frames 4700... [2024-07-27 17:42:42,322][00473] Num frames 4800... [2024-07-27 17:42:42,448][00473] Num frames 4900... [2024-07-27 17:42:42,578][00473] Num frames 5000... [2024-07-27 17:42:42,712][00473] Num frames 5100... [2024-07-27 17:42:42,885][00473] Avg episode rewards: #0: 30.717, true rewards: #0: 12.967 [2024-07-27 17:42:42,887][00473] Avg episode reward: 30.717, avg true_objective: 12.967 [2024-07-27 17:42:42,906][00473] Num frames 5200... [2024-07-27 17:42:43,035][00473] Num frames 5300... [2024-07-27 17:42:43,172][00473] Num frames 5400... [2024-07-27 17:42:43,299][00473] Num frames 5500... [2024-07-27 17:42:43,426][00473] Num frames 5600... [2024-07-27 17:42:43,525][00473] Avg episode rewards: #0: 25.670, true rewards: #0: 11.270 [2024-07-27 17:42:43,528][00473] Avg episode reward: 25.670, avg true_objective: 11.270 [2024-07-27 17:42:43,613][00473] Num frames 5700... [2024-07-27 17:42:43,749][00473] Num frames 5800... [2024-07-27 17:42:43,879][00473] Num frames 5900... [2024-07-27 17:42:44,007][00473] Num frames 6000... [2024-07-27 17:42:44,134][00473] Num frames 6100... [2024-07-27 17:42:44,277][00473] Num frames 6200... [2024-07-27 17:42:44,406][00473] Num frames 6300... [2024-07-27 17:42:44,536][00473] Num frames 6400... [2024-07-27 17:42:44,678][00473] Num frames 6500... [2024-07-27 17:42:44,772][00473] Avg episode rewards: #0: 24.043, true rewards: #0: 10.877 [2024-07-27 17:42:44,774][00473] Avg episode reward: 24.043, avg true_objective: 10.877 [2024-07-27 17:42:44,868][00473] Num frames 6600... [2024-07-27 17:42:44,998][00473] Num frames 6700... [2024-07-27 17:42:45,132][00473] Num frames 6800... [2024-07-27 17:42:45,271][00473] Num frames 6900... [2024-07-27 17:42:45,403][00473] Num frames 7000... [2024-07-27 17:42:45,534][00473] Num frames 7100... [2024-07-27 17:42:45,664][00473] Num frames 7200... [2024-07-27 17:42:45,802][00473] Num frames 7300... [2024-07-27 17:42:45,929][00473] Num frames 7400... [2024-07-27 17:42:46,060][00473] Num frames 7500... [2024-07-27 17:42:46,198][00473] Num frames 7600... [2024-07-27 17:42:46,339][00473] Num frames 7700... [2024-07-27 17:42:46,470][00473] Num frames 7800... [2024-07-27 17:42:46,606][00473] Num frames 7900... [2024-07-27 17:42:46,763][00473] Num frames 8000... [2024-07-27 17:42:46,956][00473] Num frames 8100... [2024-07-27 17:42:47,147][00473] Num frames 8200... [2024-07-27 17:42:47,338][00473] Num frames 8300... [2024-07-27 17:42:47,525][00473] Num frames 8400... [2024-07-27 17:42:47,706][00473] Num frames 8500... [2024-07-27 17:42:47,893][00473] Num frames 8600... [2024-07-27 17:42:48,004][00473] Avg episode rewards: #0: 28.180, true rewards: #0: 12.323 [2024-07-27 17:42:48,007][00473] Avg episode reward: 28.180, avg true_objective: 12.323 [2024-07-27 17:42:48,143][00473] Num frames 8700... [2024-07-27 17:42:48,340][00473] Num frames 8800... [2024-07-27 17:42:48,529][00473] Num frames 8900... [2024-07-27 17:42:48,729][00473] Num frames 9000... [2024-07-27 17:42:48,916][00473] Num frames 9100... [2024-07-27 17:42:49,108][00473] Num frames 9200... [2024-07-27 17:42:49,293][00473] Num frames 9300... [2024-07-27 17:42:49,429][00473] Num frames 9400... [2024-07-27 17:42:49,560][00473] Num frames 9500... [2024-07-27 17:42:49,645][00473] Avg episode rewards: #0: 27.027, true rewards: #0: 11.902 [2024-07-27 17:42:49,647][00473] Avg episode reward: 27.027, avg true_objective: 11.902 [2024-07-27 17:42:49,752][00473] Num frames 9600... [2024-07-27 17:42:49,883][00473] Num frames 9700... [2024-07-27 17:42:50,013][00473] Num frames 9800... [2024-07-27 17:42:50,141][00473] Num frames 9900... [2024-07-27 17:42:50,269][00473] Num frames 10000... [2024-07-27 17:42:50,404][00473] Num frames 10100... [2024-07-27 17:42:50,581][00473] Avg episode rewards: #0: 25.215, true rewards: #0: 11.327 [2024-07-27 17:42:50,583][00473] Avg episode reward: 25.215, avg true_objective: 11.327 [2024-07-27 17:42:50,594][00473] Num frames 10200... [2024-07-27 17:42:50,729][00473] Num frames 10300... [2024-07-27 17:42:50,863][00473] Num frames 10400... [2024-07-27 17:42:50,998][00473] Num frames 10500... [2024-07-27 17:42:51,128][00473] Num frames 10600... [2024-07-27 17:42:51,259][00473] Num frames 10700... [2024-07-27 17:42:51,399][00473] Num frames 10800... [2024-07-27 17:42:51,535][00473] Num frames 10900... [2024-07-27 17:42:51,672][00473] Num frames 11000... [2024-07-27 17:42:51,815][00473] Num frames 11100... [2024-07-27 17:42:51,949][00473] Num frames 11200... [2024-07-27 17:42:52,118][00473] Avg episode rewards: #0: 25.187, true rewards: #0: 11.287 [2024-07-27 17:42:52,120][00473] Avg episode reward: 25.187, avg true_objective: 11.287 [2024-07-27 17:44:02,468][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-07-27 17:45:25,025][00473] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-07-27 17:45:25,027][00473] Overriding arg 'num_workers' with value 1 passed from command line [2024-07-27 17:45:25,029][00473] Adding new argument 'no_render'=True that is not in the saved config file! [2024-07-27 17:45:25,031][00473] Adding new argument 'save_video'=True that is not in the saved config file! [2024-07-27 17:45:25,033][00473] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-07-27 17:45:25,034][00473] Adding new argument 'video_name'=None that is not in the saved config file! [2024-07-27 17:45:25,035][00473] Adding new argument 'max_num_frames'=1000000 that is not in the saved config file! [2024-07-27 17:45:25,037][00473] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-07-27 17:45:25,039][00473] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-07-27 17:45:25,041][00473] Adding new argument 'hf_repository'='rishisim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-07-27 17:45:25,042][00473] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-07-27 17:45:25,044][00473] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-07-27 17:45:25,045][00473] Adding new argument 'train_script'=None that is not in the saved config file! [2024-07-27 17:45:25,046][00473] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-07-27 17:45:25,049][00473] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-07-27 17:45:25,077][00473] RunningMeanStd input shape: (3, 72, 128) [2024-07-27 17:45:25,079][00473] RunningMeanStd input shape: (1,) [2024-07-27 17:45:25,093][00473] ConvEncoder: input_channels=3 [2024-07-27 17:45:25,131][00473] Conv encoder output size: 512 [2024-07-27 17:45:25,132][00473] Policy head output size: 512 [2024-07-27 17:45:25,151][00473] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001101_4509696.pth... [2024-07-27 17:45:25,589][00473] Num frames 100... [2024-07-27 17:45:25,727][00473] Num frames 200... [2024-07-27 17:45:25,876][00473] Num frames 300... [2024-07-27 17:45:26,006][00473] Num frames 400... [2024-07-27 17:45:26,135][00473] Num frames 500... [2024-07-27 17:45:26,272][00473] Num frames 600... [2024-07-27 17:45:26,429][00473] Avg episode rewards: #0: 13.730, true rewards: #0: 6.730 [2024-07-27 17:45:26,431][00473] Avg episode reward: 13.730, avg true_objective: 6.730 [2024-07-27 17:45:26,468][00473] Num frames 700... [2024-07-27 17:45:26,626][00473] Num frames 800... [2024-07-27 17:45:26,818][00473] Num frames 900... [2024-07-27 17:45:27,007][00473] Num frames 1000... [2024-07-27 17:45:27,193][00473] Num frames 1100... [2024-07-27 17:45:27,377][00473] Num frames 1200... [2024-07-27 17:45:27,567][00473] Num frames 1300... [2024-07-27 17:45:27,763][00473] Num frames 1400... [2024-07-27 17:45:27,976][00473] Num frames 1500... [2024-07-27 17:45:28,169][00473] Num frames 1600... [2024-07-27 17:45:28,231][00473] Avg episode rewards: #0: 15.505, true rewards: #0: 8.005 [2024-07-27 17:45:28,234][00473] Avg episode reward: 15.505, avg true_objective: 8.005 [2024-07-27 17:45:28,443][00473] Num frames 1700... [2024-07-27 17:45:28,632][00473] Num frames 1800... [2024-07-27 17:45:28,817][00473] Num frames 1900... [2024-07-27 17:45:29,018][00473] Num frames 2000... [2024-07-27 17:45:29,198][00473] Num frames 2100... [2024-07-27 17:45:29,330][00473] Num frames 2200... [2024-07-27 17:45:29,459][00473] Num frames 2300... [2024-07-27 17:45:29,603][00473] Num frames 2400... [2024-07-27 17:45:29,744][00473] Num frames 2500... [2024-07-27 17:45:29,876][00473] Num frames 2600... [2024-07-27 17:45:30,013][00473] Num frames 2700... [2024-07-27 17:45:30,073][00473] Avg episode rewards: #0: 18.003, true rewards: #0: 9.003 [2024-07-27 17:45:30,075][00473] Avg episode reward: 18.003, avg true_objective: 9.003 [2024-07-27 17:45:30,217][00473] Num frames 2800... [2024-07-27 17:45:30,350][00473] Num frames 2900... [2024-07-27 17:45:30,479][00473] Num frames 3000... [2024-07-27 17:45:30,618][00473] Num frames 3100... [2024-07-27 17:45:30,758][00473] Num frames 3200... [2024-07-27 17:45:30,900][00473] Num frames 3300... [2024-07-27 17:45:31,030][00473] Num frames 3400... [2024-07-27 17:45:31,157][00473] Num frames 3500... [2024-07-27 17:45:31,288][00473] Num frames 3600... [2024-07-27 17:45:31,420][00473] Num frames 3700... [2024-07-27 17:45:31,556][00473] Num frames 3800... [2024-07-27 17:45:31,689][00473] Num frames 3900... [2024-07-27 17:45:31,827][00473] Num frames 4000... [2024-07-27 17:45:31,958][00473] Num frames 4100... [2024-07-27 17:45:32,088][00473] Num frames 4200... [2024-07-27 17:45:32,257][00473] Num frames 4300... [2024-07-27 17:45:32,357][00473] Avg episode rewards: #0: 23.080, true rewards: #0: 10.830 [2024-07-27 17:45:32,358][00473] Avg episode reward: 23.080, avg true_objective: 10.830 [2024-07-27 17:45:32,448][00473] Num frames 4400... [2024-07-27 17:45:32,586][00473] Num frames 4500... [2024-07-27 17:45:32,719][00473] Num frames 4600... [2024-07-27 17:45:32,850][00473] Num frames 4700... [2024-07-27 17:45:32,981][00473] Num frames 4800... [2024-07-27 17:45:33,108][00473] Num frames 4900... [2024-07-27 17:45:33,237][00473] Num frames 5000... [2024-07-27 17:45:33,376][00473] Num frames 5100... [2024-07-27 17:45:33,507][00473] Num frames 5200... [2024-07-27 17:45:33,646][00473] Num frames 5300... [2024-07-27 17:45:33,785][00473] Num frames 5400... [2024-07-27 17:45:33,914][00473] Num frames 5500... [2024-07-27 17:45:34,044][00473] Num frames 5600... [2024-07-27 17:45:34,172][00473] Num frames 5700... [2024-07-27 17:45:34,307][00473] Num frames 5800... [2024-07-27 17:45:34,442][00473] Num frames 5900... [2024-07-27 17:45:34,577][00473] Num frames 6000... [2024-07-27 17:45:34,727][00473] Num frames 6100... [2024-07-27 17:45:34,858][00473] Num frames 6200... [2024-07-27 17:45:34,989][00473] Num frames 6300... [2024-07-27 17:45:35,118][00473] Num frames 6400... [2024-07-27 17:45:35,215][00473] Avg episode rewards: #0: 29.664, true rewards: #0: 12.864 [2024-07-27 17:45:35,217][00473] Avg episode reward: 29.664, avg true_objective: 12.864 [2024-07-27 17:45:35,312][00473] Num frames 6500... [2024-07-27 17:45:35,440][00473] Num frames 6600... [2024-07-27 17:45:35,570][00473] Num frames 6700... [2024-07-27 17:45:35,714][00473] Num frames 6800... [2024-07-27 17:45:35,886][00473] Avg episode rewards: #0: 25.633, true rewards: #0: 11.467 [2024-07-27 17:45:35,888][00473] Avg episode reward: 25.633, avg true_objective: 11.467 [2024-07-27 17:45:35,918][00473] Num frames 6900... [2024-07-27 17:45:36,046][00473] Num frames 7000... [2024-07-27 17:45:36,176][00473] Num frames 7100... [2024-07-27 17:45:36,307][00473] Num frames 7200... [2024-07-27 17:45:36,436][00473] Num frames 7300... [2024-07-27 17:45:36,566][00473] Num frames 7400... [2024-07-27 17:45:36,703][00473] Num frames 7500... [2024-07-27 17:45:36,790][00473] Avg episode rewards: #0: 24.028, true rewards: #0: 10.743 [2024-07-27 17:45:36,791][00473] Avg episode reward: 24.028, avg true_objective: 10.743 [2024-07-27 17:45:36,898][00473] Num frames 7600... [2024-07-27 17:45:37,026][00473] Num frames 7700... [2024-07-27 17:45:37,151][00473] Num frames 7800... [2024-07-27 17:45:37,286][00473] Num frames 7900... [2024-07-27 17:45:37,415][00473] Num frames 8000... [2024-07-27 17:45:37,543][00473] Num frames 8100... [2024-07-27 17:45:37,673][00473] Num frames 8200... [2024-07-27 17:45:37,820][00473] Num frames 8300... [2024-07-27 17:45:37,988][00473] Avg episode rewards: #0: 23.360, true rewards: #0: 10.485 [2024-07-27 17:45:37,990][00473] Avg episode reward: 23.360, avg true_objective: 10.485 [2024-07-27 17:45:38,012][00473] Num frames 8400... [2024-07-27 17:45:38,147][00473] Num frames 8500... [2024-07-27 17:45:38,278][00473] Num frames 8600... [2024-07-27 17:45:38,409][00473] Num frames 8700... [2024-07-27 17:45:38,541][00473] Num frames 8800... [2024-07-27 17:45:38,675][00473] Num frames 8900... [2024-07-27 17:45:38,819][00473] Num frames 9000... [2024-07-27 17:45:38,948][00473] Num frames 9100... [2024-07-27 17:45:39,079][00473] Num frames 9200... [2024-07-27 17:45:39,237][00473] Num frames 9300... [2024-07-27 17:45:39,429][00473] Num frames 9400... [2024-07-27 17:45:39,567][00473] Avg episode rewards: #0: 23.160, true rewards: #0: 10.493 [2024-07-27 17:45:39,569][00473] Avg episode reward: 23.160, avg true_objective: 10.493 [2024-07-27 17:45:39,673][00473] Num frames 9500... [2024-07-27 17:45:39,885][00473] Num frames 9600... [2024-07-27 17:45:40,068][00473] Num frames 9700... [2024-07-27 17:45:40,251][00473] Num frames 9800... [2024-07-27 17:45:40,438][00473] Num frames 9900... [2024-07-27 17:45:40,627][00473] Num frames 10000... [2024-07-27 17:45:40,837][00473] Num frames 10100... [2024-07-27 17:45:41,032][00473] Num frames 10200... [2024-07-27 17:45:41,228][00473] Num frames 10300... [2024-07-27 17:45:41,425][00473] Num frames 10400... [2024-07-27 17:45:41,622][00473] Num frames 10500... [2024-07-27 17:45:41,784][00473] Num frames 10600... [2024-07-27 17:45:41,937][00473] Num frames 10700... [2024-07-27 17:45:42,070][00473] Num frames 10800... [2024-07-27 17:45:42,198][00473] Num frames 10900... [2024-07-27 17:45:42,333][00473] Num frames 11000... [2024-07-27 17:45:42,463][00473] Num frames 11100... [2024-07-27 17:45:42,596][00473] Num frames 11200... [2024-07-27 17:45:42,732][00473] Num frames 11300... [2024-07-27 17:45:42,863][00473] Num frames 11400... [2024-07-27 17:45:43,003][00473] Num frames 11500... [2024-07-27 17:45:43,116][00473] Avg episode rewards: #0: 26.544, true rewards: #0: 11.544 [2024-07-27 17:45:43,118][00473] Avg episode reward: 26.544, avg true_objective: 11.544 [2024-07-27 17:46:54,499][00473] Replay video saved to /content/train_dir/default_experiment/replay.mp4!