[2023-02-22 17:07:47,866][00749] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 17:07:47,869][00749] Rollout worker 0 uses device cpu [2023-02-22 17:07:47,870][00749] Rollout worker 1 uses device cpu [2023-02-22 17:07:47,872][00749] Rollout worker 2 uses device cpu [2023-02-22 17:07:47,875][00749] Rollout worker 3 uses device cpu [2023-02-22 17:07:47,876][00749] Rollout worker 4 uses device cpu [2023-02-22 17:07:47,877][00749] Rollout worker 5 uses device cpu [2023-02-22 17:07:47,879][00749] Rollout worker 6 uses device cpu [2023-02-22 17:07:47,881][00749] Rollout worker 7 uses device cpu [2023-02-22 17:07:47,996][00749] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:07:47,998][00749] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 17:07:48,029][00749] Starting all processes... [2023-02-22 17:07:48,031][00749] Starting process learner_proc0 [2023-02-22 17:07:48,090][00749] Starting all processes... [2023-02-22 17:07:48,099][00749] Starting process inference_proc0-0 [2023-02-22 17:07:48,100][00749] Starting process rollout_proc0 [2023-02-22 17:07:48,101][00749] Starting process rollout_proc1 [2023-02-22 17:07:48,102][00749] Starting process rollout_proc2 [2023-02-22 17:07:48,103][00749] Starting process rollout_proc3 [2023-02-22 17:07:48,104][00749] Starting process rollout_proc4 [2023-02-22 17:07:48,105][00749] Starting process rollout_proc5 [2023-02-22 17:07:48,105][00749] Starting process rollout_proc6 [2023-02-22 17:07:48,105][00749] Starting process rollout_proc7 [2023-02-22 17:07:50,175][09796] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,214][09794] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:07:50,214][09794] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 17:07:50,278][09821] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,477][09780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:07:50,477][09780] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 17:07:50,509][09819] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,550][09798] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,557][09797] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,565][09816] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,595][09795] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,683][09818] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:07:50,992][09794] Num visible devices: 1 [2023-02-22 17:07:50,992][09780] Num visible devices: 1 [2023-02-22 17:07:51,010][09780] Starting seed is not provided [2023-02-22 17:07:51,011][09780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:07:51,011][09780] Initializing actor-critic model on device cuda:0 [2023-02-22 17:07:51,011][09780] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:07:51,013][09780] RunningMeanStd input shape: (1,) [2023-02-22 17:07:51,026][09780] ConvEncoder: input_channels=3 [2023-02-22 17:07:51,307][09780] Conv encoder output size: 512 [2023-02-22 17:07:51,307][09780] Policy head output size: 512 [2023-02-22 17:07:51,352][09780] Created Actor Critic model with architecture: [2023-02-22 17:07:51,352][09780] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 17:07:58,864][09780] Using optimizer [2023-02-22 17:07:58,865][09780] No checkpoints found [2023-02-22 17:07:58,865][09780] Did not load from checkpoint, starting from scratch! [2023-02-22 17:07:58,866][09780] Initialized policy 0 weights for model version 0 [2023-02-22 17:07:58,868][09780] LearnerWorker_p0 finished initialization! [2023-02-22 17:07:58,868][09780] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:07:58,979][09794] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:07:58,980][09794] RunningMeanStd input shape: (1,) [2023-02-22 17:07:58,995][09794] ConvEncoder: input_channels=3 [2023-02-22 17:07:59,103][09794] Conv encoder output size: 512 [2023-02-22 17:07:59,104][09794] Policy head output size: 512 [2023-02-22 17:08:01,903][00749] Inference worker 0-0 is ready! [2023-02-22 17:08:01,905][00749] All inference workers are ready! Signal rollout workers to start! [2023-02-22 17:08:01,927][09819] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,928][09797] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,928][09818] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,928][09796] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,932][09795] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,932][09821] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,933][09816] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:01,933][09798] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:08:02,001][09821] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2023-02-22 17:08:02,001][09797] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2023-02-22 17:08:02,002][09821] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-02-22 17:08:02,003][09821] Unhandled exception in evt loop rollout_proc7_evt_loop [2023-02-22 17:08:02,002][09797] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 379, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset obs, info = self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-02-22 17:08:02,004][09797] Unhandled exception in evt loop rollout_proc2_evt_loop [2023-02-22 17:08:02,269][09795] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,343][09818] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,378][09798] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,378][09819] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,378][09816] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,378][09796] Decorrelating experience for 0 frames... [2023-02-22 17:08:02,607][09818] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,668][09795] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,675][09796] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,677][09798] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,678][09819] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,691][09816] Decorrelating experience for 32 frames... [2023-02-22 17:08:02,920][09818] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,005][00749] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 17:08:03,018][09819] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,024][09798] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,028][09796] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,029][09795] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,034][09816] Decorrelating experience for 64 frames... [2023-02-22 17:08:03,355][09818] Decorrelating experience for 96 frames... [2023-02-22 17:08:03,359][09819] Decorrelating experience for 96 frames... [2023-02-22 17:08:03,363][09798] Decorrelating experience for 96 frames... [2023-02-22 17:08:03,365][09795] Decorrelating experience for 96 frames... [2023-02-22 17:08:03,367][09796] Decorrelating experience for 96 frames... [2023-02-22 17:08:03,372][09816] Decorrelating experience for 96 frames... [2023-02-22 17:08:07,999][00749] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 17:08:08,005][00749] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 416.0. Samples: 2080. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 17:08:08,008][00749] Avg episode reward: [(0, '2.501')] [2023-02-22 17:08:08,011][00749] Heartbeat connected on RolloutWorker_w1 [2023-02-22 17:08:08,016][09780] Signal inference workers to stop experience collection... [2023-02-22 17:08:08,014][00749] Heartbeat connected on RolloutWorker_w0 [2023-02-22 17:08:08,017][00749] Heartbeat connected on RolloutWorker_w3 [2023-02-22 17:08:08,019][09794] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 17:08:08,019][00749] Heartbeat connected on Batcher_0 [2023-02-22 17:08:08,023][00749] Heartbeat connected on RolloutWorker_w4 [2023-02-22 17:08:08,026][00749] Heartbeat connected on RolloutWorker_w5 [2023-02-22 17:08:08,028][00749] Heartbeat connected on RolloutWorker_w6 [2023-02-22 17:08:11,435][09780] Signal inference workers to resume experience collection... [2023-02-22 17:08:11,436][09794] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 17:08:12,372][00749] Heartbeat connected on LearnerWorker_p0 [2023-02-22 17:08:13,005][00749] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 16384. Throughput: 0: 246.6. Samples: 2466. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-02-22 17:08:13,008][00749] Avg episode reward: [(0, '3.595')] [2023-02-22 17:08:14,454][09794] Updated weights for policy 0, policy_version 10 (0.0366) [2023-02-22 17:08:17,174][09794] Updated weights for policy 0, policy_version 20 (0.0011) [2023-02-22 17:08:18,005][00749] Fps is (10 sec: 9420.9, 60 sec: 6280.5, 300 sec: 6280.5). Total num frames: 94208. Throughput: 0: 1329.6. Samples: 19944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:08:18,008][00749] Avg episode reward: [(0, '4.345')] [2023-02-22 17:08:19,764][09794] Updated weights for policy 0, policy_version 30 (0.0011) [2023-02-22 17:08:22,497][09794] Updated weights for policy 0, policy_version 40 (0.0011) [2023-02-22 17:08:23,005][00749] Fps is (10 sec: 15564.9, 60 sec: 8601.6, 300 sec: 8601.6). Total num frames: 172032. Throughput: 0: 2158.7. Samples: 43174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 17:08:23,008][00749] Avg episode reward: [(0, '4.411')] [2023-02-22 17:08:23,016][09780] Saving new best policy, reward=4.411! [2023-02-22 17:08:25,097][09794] Updated weights for policy 0, policy_version 50 (0.0011) [2023-02-22 17:08:27,690][09794] Updated weights for policy 0, policy_version 60 (0.0011) [2023-02-22 17:08:28,005][00749] Fps is (10 sec: 15155.2, 60 sec: 9830.4, 300 sec: 9830.4). Total num frames: 245760. Throughput: 0: 2197.2. Samples: 54930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:08:28,008][00749] Avg episode reward: [(0, '4.532')] [2023-02-22 17:08:28,011][09780] Saving new best policy, reward=4.532! [2023-02-22 17:08:30,522][09794] Updated weights for policy 0, policy_version 70 (0.0012) [2023-02-22 17:08:33,005][00749] Fps is (10 sec: 14745.5, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 319488. Throughput: 0: 2581.6. Samples: 77448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:08:33,008][00749] Avg episode reward: [(0, '4.488')] [2023-02-22 17:08:33,220][09794] Updated weights for policy 0, policy_version 80 (0.0011) [2023-02-22 17:08:35,906][09794] Updated weights for policy 0, policy_version 90 (0.0010) [2023-02-22 17:08:38,005][00749] Fps is (10 sec: 15155.2, 60 sec: 11351.8, 300 sec: 11351.8). Total num frames: 397312. Throughput: 0: 2866.2. Samples: 100316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:08:38,008][00749] Avg episode reward: [(0, '4.653')] [2023-02-22 17:08:38,037][09780] Saving new best policy, reward=4.653! [2023-02-22 17:08:38,587][09794] Updated weights for policy 0, policy_version 100 (0.0011) [2023-02-22 17:08:41,222][09794] Updated weights for policy 0, policy_version 110 (0.0011) [2023-02-22 17:08:43,005][00749] Fps is (10 sec: 15564.9, 60 sec: 11878.4, 300 sec: 11878.4). Total num frames: 475136. Throughput: 0: 2799.4. Samples: 111976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:08:43,008][00749] Avg episode reward: [(0, '4.463')] [2023-02-22 17:08:43,992][09794] Updated weights for policy 0, policy_version 120 (0.0011) [2023-02-22 17:08:46,813][09794] Updated weights for policy 0, policy_version 130 (0.0011) [2023-02-22 17:08:48,005][00749] Fps is (10 sec: 15155.3, 60 sec: 12197.0, 300 sec: 12197.0). Total num frames: 548864. Throughput: 0: 2978.6. Samples: 134036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:08:48,008][00749] Avg episode reward: [(0, '4.481')] [2023-02-22 17:08:49,586][09794] Updated weights for policy 0, policy_version 140 (0.0011) [2023-02-22 17:08:52,272][09794] Updated weights for policy 0, policy_version 150 (0.0010) [2023-02-22 17:08:53,005][00749] Fps is (10 sec: 14745.4, 60 sec: 12451.8, 300 sec: 12451.8). Total num frames: 622592. Throughput: 0: 3436.2. Samples: 156708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:08:53,007][00749] Avg episode reward: [(0, '4.402')] [2023-02-22 17:08:54,830][09794] Updated weights for policy 0, policy_version 160 (0.0010) [2023-02-22 17:08:57,485][09794] Updated weights for policy 0, policy_version 170 (0.0011) [2023-02-22 17:08:58,005][00749] Fps is (10 sec: 15564.8, 60 sec: 12809.3, 300 sec: 12809.3). Total num frames: 704512. Throughput: 0: 3690.6. Samples: 168542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:08:58,008][00749] Avg episode reward: [(0, '4.561')] [2023-02-22 17:09:00,147][09794] Updated weights for policy 0, policy_version 180 (0.0011) [2023-02-22 17:09:02,870][09794] Updated weights for policy 0, policy_version 190 (0.0011) [2023-02-22 17:09:03,005][00749] Fps is (10 sec: 15565.0, 60 sec: 12970.7, 300 sec: 12970.7). Total num frames: 778240. Throughput: 0: 3812.7. Samples: 191514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:03,008][00749] Avg episode reward: [(0, '4.500')] [2023-02-22 17:09:05,640][09794] Updated weights for policy 0, policy_version 200 (0.0011) [2023-02-22 17:09:08,005][00749] Fps is (10 sec: 14745.6, 60 sec: 14199.5, 300 sec: 13107.2). Total num frames: 851968. Throughput: 0: 3796.2. Samples: 214002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:09:08,007][00749] Avg episode reward: [(0, '4.885')] [2023-02-22 17:09:08,011][09780] Saving new best policy, reward=4.885! [2023-02-22 17:09:08,341][09794] Updated weights for policy 0, policy_version 210 (0.0011) [2023-02-22 17:09:10,912][09794] Updated weights for policy 0, policy_version 220 (0.0011) [2023-02-22 17:09:13,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15291.7, 300 sec: 13341.3). Total num frames: 933888. Throughput: 0: 3796.6. Samples: 225778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:13,008][00749] Avg episode reward: [(0, '5.027')] [2023-02-22 17:09:13,016][09780] Saving new best policy, reward=5.027! [2023-02-22 17:09:13,497][09794] Updated weights for policy 0, policy_version 230 (0.0011) [2023-02-22 17:09:16,161][09794] Updated weights for policy 0, policy_version 240 (0.0011) [2023-02-22 17:09:18,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15223.5, 300 sec: 13434.9). Total num frames: 1007616. Throughput: 0: 3816.6. Samples: 249196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:18,008][00749] Avg episode reward: [(0, '5.004')] [2023-02-22 17:09:18,862][09794] Updated weights for policy 0, policy_version 250 (0.0011) [2023-02-22 17:09:21,397][09794] Updated weights for policy 0, policy_version 260 (0.0011) [2023-02-22 17:09:23,005][00749] Fps is (10 sec: 15565.0, 60 sec: 15291.7, 300 sec: 13619.2). Total num frames: 1089536. Throughput: 0: 3833.6. Samples: 272828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:09:23,008][00749] Avg episode reward: [(0, '4.915')] [2023-02-22 17:09:23,944][09794] Updated weights for policy 0, policy_version 270 (0.0011) [2023-02-22 17:09:26,554][09794] Updated weights for policy 0, policy_version 280 (0.0011) [2023-02-22 17:09:28,005][00749] Fps is (10 sec: 15974.5, 60 sec: 15360.0, 300 sec: 13733.7). Total num frames: 1167360. Throughput: 0: 3838.7. Samples: 284716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:09:28,008][00749] Avg episode reward: [(0, '6.150')] [2023-02-22 17:09:28,010][09780] Saving new best policy, reward=6.150! [2023-02-22 17:09:29,087][09794] Updated weights for policy 0, policy_version 290 (0.0010) [2023-02-22 17:09:31,775][09794] Updated weights for policy 0, policy_version 300 (0.0011) [2023-02-22 17:09:33,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15428.3, 300 sec: 13835.4). Total num frames: 1245184. Throughput: 0: 3870.0. Samples: 308188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:33,007][00749] Avg episode reward: [(0, '6.143')] [2023-02-22 17:09:34,488][09794] Updated weights for policy 0, policy_version 310 (0.0011) [2023-02-22 17:09:37,062][09794] Updated weights for policy 0, policy_version 320 (0.0010) [2023-02-22 17:09:38,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15428.3, 300 sec: 13926.4). Total num frames: 1323008. Throughput: 0: 3883.6. Samples: 331470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:38,007][00749] Avg episode reward: [(0, '7.452')] [2023-02-22 17:09:38,011][09780] Saving new best policy, reward=7.452! [2023-02-22 17:09:39,662][09794] Updated weights for policy 0, policy_version 330 (0.0012) [2023-02-22 17:09:42,190][09794] Updated weights for policy 0, policy_version 340 (0.0010) [2023-02-22 17:09:43,005][00749] Fps is (10 sec: 15974.2, 60 sec: 15496.5, 300 sec: 14049.3). Total num frames: 1404928. Throughput: 0: 3888.8. Samples: 343538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:43,008][00749] Avg episode reward: [(0, '8.480')] [2023-02-22 17:09:43,017][09780] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth... [2023-02-22 17:09:43,080][09780] Saving new best policy, reward=8.480! [2023-02-22 17:09:44,782][09794] Updated weights for policy 0, policy_version 350 (0.0010) [2023-02-22 17:09:47,465][09794] Updated weights for policy 0, policy_version 360 (0.0012) [2023-02-22 17:09:48,005][00749] Fps is (10 sec: 15974.5, 60 sec: 15564.8, 300 sec: 14121.5). Total num frames: 1482752. Throughput: 0: 3899.5. Samples: 366990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:09:48,008][00749] Avg episode reward: [(0, '10.903')] [2023-02-22 17:09:48,010][09780] Saving new best policy, reward=10.903! [2023-02-22 17:09:50,131][09794] Updated weights for policy 0, policy_version 370 (0.0011) [2023-02-22 17:09:52,678][09794] Updated weights for policy 0, policy_version 380 (0.0011) [2023-02-22 17:09:53,005][00749] Fps is (10 sec: 15565.1, 60 sec: 15633.1, 300 sec: 14187.1). Total num frames: 1560576. Throughput: 0: 3923.2. Samples: 390546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:09:53,007][00749] Avg episode reward: [(0, '10.680')] [2023-02-22 17:09:55,227][09794] Updated weights for policy 0, policy_version 390 (0.0011) [2023-02-22 17:09:57,774][09794] Updated weights for policy 0, policy_version 400 (0.0011) [2023-02-22 17:09:58,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15564.8, 300 sec: 14247.0). Total num frames: 1638400. Throughput: 0: 3929.9. Samples: 402624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:09:58,008][00749] Avg episode reward: [(0, '11.824')] [2023-02-22 17:09:58,011][09780] Saving new best policy, reward=11.824! [2023-02-22 17:10:00,439][09794] Updated weights for policy 0, policy_version 410 (0.0011) [2023-02-22 17:10:03,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15633.1, 300 sec: 14301.9). Total num frames: 1716224. Throughput: 0: 3929.7. Samples: 426032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:10:03,007][00749] Avg episode reward: [(0, '12.799')] [2023-02-22 17:10:03,017][09780] Saving new best policy, reward=12.799! [2023-02-22 17:10:03,131][09794] Updated weights for policy 0, policy_version 420 (0.0011) [2023-02-22 17:10:05,848][09794] Updated weights for policy 0, policy_version 430 (0.0010) [2023-02-22 17:10:08,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15701.3, 300 sec: 14352.4). Total num frames: 1794048. Throughput: 0: 3909.2. Samples: 448744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:10:08,008][00749] Avg episode reward: [(0, '13.948')] [2023-02-22 17:10:08,011][09780] Saving new best policy, reward=13.948! [2023-02-22 17:10:08,538][09794] Updated weights for policy 0, policy_version 440 (0.0010) [2023-02-22 17:10:11,210][09794] Updated weights for policy 0, policy_version 450 (0.0010) [2023-02-22 17:10:13,005][00749] Fps is (10 sec: 15155.2, 60 sec: 15564.8, 300 sec: 14367.5). Total num frames: 1867776. Throughput: 0: 3897.8. Samples: 460116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:10:13,007][00749] Avg episode reward: [(0, '14.487')] [2023-02-22 17:10:13,014][09780] Saving new best policy, reward=14.487! [2023-02-22 17:10:13,853][09794] Updated weights for policy 0, policy_version 460 (0.0011) [2023-02-22 17:10:16,477][09794] Updated weights for policy 0, policy_version 470 (0.0011) [2023-02-22 17:10:18,005][00749] Fps is (10 sec: 15155.2, 60 sec: 15633.1, 300 sec: 14411.9). Total num frames: 1945600. Throughput: 0: 3894.8. Samples: 483456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:10:18,008][00749] Avg episode reward: [(0, '17.009')] [2023-02-22 17:10:18,011][09780] Saving new best policy, reward=17.009! [2023-02-22 17:10:19,125][09794] Updated weights for policy 0, policy_version 480 (0.0011) [2023-02-22 17:10:21,766][09794] Updated weights for policy 0, policy_version 490 (0.0010) [2023-02-22 17:10:23,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14453.0). Total num frames: 2023424. Throughput: 0: 3898.6. Samples: 506906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:10:23,008][00749] Avg episode reward: [(0, '15.362')] [2023-02-22 17:10:24,301][09794] Updated weights for policy 0, policy_version 500 (0.0010) [2023-02-22 17:10:26,900][09794] Updated weights for policy 0, policy_version 510 (0.0010) [2023-02-22 17:10:28,005][00749] Fps is (10 sec: 15974.4, 60 sec: 15633.1, 300 sec: 14519.6). Total num frames: 2105344. Throughput: 0: 3894.8. Samples: 518804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:10:28,008][00749] Avg episode reward: [(0, '16.369')] [2023-02-22 17:10:29,471][09794] Updated weights for policy 0, policy_version 520 (0.0011) [2023-02-22 17:10:32,150][09794] Updated weights for policy 0, policy_version 530 (0.0011) [2023-02-22 17:10:33,005][00749] Fps is (10 sec: 15974.1, 60 sec: 15633.0, 300 sec: 14554.4). Total num frames: 2183168. Throughput: 0: 3895.7. Samples: 542296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:10:33,008][00749] Avg episode reward: [(0, '16.850')] [2023-02-22 17:10:34,855][09794] Updated weights for policy 0, policy_version 540 (0.0011) [2023-02-22 17:10:37,425][09794] Updated weights for policy 0, policy_version 550 (0.0011) [2023-02-22 17:10:38,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15633.1, 300 sec: 14587.1). Total num frames: 2260992. Throughput: 0: 3888.2. Samples: 565514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:10:38,008][00749] Avg episode reward: [(0, '14.840')] [2023-02-22 17:10:40,015][09794] Updated weights for policy 0, policy_version 560 (0.0011) [2023-02-22 17:10:42,616][09794] Updated weights for policy 0, policy_version 570 (0.0010) [2023-02-22 17:10:43,005][00749] Fps is (10 sec: 15565.1, 60 sec: 15564.8, 300 sec: 14617.6). Total num frames: 2338816. Throughput: 0: 3884.2. Samples: 577414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 17:10:43,007][00749] Avg episode reward: [(0, '17.547')] [2023-02-22 17:10:43,018][09780] Saving new best policy, reward=17.547! [2023-02-22 17:10:45,141][09794] Updated weights for policy 0, policy_version 580 (0.0011) [2023-02-22 17:10:47,847][09794] Updated weights for policy 0, policy_version 590 (0.0011) [2023-02-22 17:10:48,005][00749] Fps is (10 sec: 15564.6, 60 sec: 15564.8, 300 sec: 14646.3). Total num frames: 2416640. Throughput: 0: 3887.7. Samples: 600978. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 17:10:48,008][00749] Avg episode reward: [(0, '17.049')] [2023-02-22 17:10:50,644][09794] Updated weights for policy 0, policy_version 600 (0.0011) [2023-02-22 17:10:53,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14673.3). Total num frames: 2494464. Throughput: 0: 3893.2. Samples: 623936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 17:10:53,007][00749] Avg episode reward: [(0, '18.617')] [2023-02-22 17:10:53,015][09780] Saving new best policy, reward=18.617! [2023-02-22 17:10:53,185][09794] Updated weights for policy 0, policy_version 610 (0.0011) [2023-02-22 17:10:55,762][09794] Updated weights for policy 0, policy_version 620 (0.0011) [2023-02-22 17:10:58,005][00749] Fps is (10 sec: 15565.0, 60 sec: 15564.8, 300 sec: 14698.8). Total num frames: 2572288. Throughput: 0: 3904.8. Samples: 635830. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:10:58,008][00749] Avg episode reward: [(0, '16.774')] [2023-02-22 17:10:58,404][09794] Updated weights for policy 0, policy_version 630 (0.0011) [2023-02-22 17:11:00,940][09794] Updated weights for policy 0, policy_version 640 (0.0011) [2023-02-22 17:11:03,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14722.9). Total num frames: 2650112. Throughput: 0: 3910.8. Samples: 659444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:03,009][00749] Avg episode reward: [(0, '23.356')] [2023-02-22 17:11:03,018][09780] Saving new best policy, reward=23.356! [2023-02-22 17:11:03,670][09794] Updated weights for policy 0, policy_version 650 (0.0011) [2023-02-22 17:11:06,310][09794] Updated weights for policy 0, policy_version 660 (0.0010) [2023-02-22 17:11:08,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14745.6). Total num frames: 2727936. Throughput: 0: 3906.8. Samples: 682712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:11:08,008][00749] Avg episode reward: [(0, '18.035')] [2023-02-22 17:11:08,842][09794] Updated weights for policy 0, policy_version 670 (0.0011) [2023-02-22 17:11:11,493][09794] Updated weights for policy 0, policy_version 680 (0.0010) [2023-02-22 17:11:13,005][00749] Fps is (10 sec: 15564.6, 60 sec: 15633.0, 300 sec: 14767.2). Total num frames: 2805760. Throughput: 0: 3909.6. Samples: 694738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 17:11:13,008][00749] Avg episode reward: [(0, '18.036')] [2023-02-22 17:11:14,032][09794] Updated weights for policy 0, policy_version 690 (0.0011) [2023-02-22 17:11:16,790][09794] Updated weights for policy 0, policy_version 700 (0.0011) [2023-02-22 17:11:18,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15633.1, 300 sec: 14787.6). Total num frames: 2883584. Throughput: 0: 3896.1. Samples: 717622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:11:18,007][00749] Avg episode reward: [(0, '21.296')] [2023-02-22 17:11:19,529][09794] Updated weights for policy 0, policy_version 710 (0.0011) [2023-02-22 17:11:22,253][09794] Updated weights for policy 0, policy_version 720 (0.0010) [2023-02-22 17:11:23,005][00749] Fps is (10 sec: 15155.1, 60 sec: 15564.8, 300 sec: 14786.6). Total num frames: 2957312. Throughput: 0: 3887.7. Samples: 740462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:23,009][00749] Avg episode reward: [(0, '19.292')] [2023-02-22 17:11:24,899][09794] Updated weights for policy 0, policy_version 730 (0.0011) [2023-02-22 17:11:27,506][09794] Updated weights for policy 0, policy_version 740 (0.0011) [2023-02-22 17:11:28,005][00749] Fps is (10 sec: 15155.3, 60 sec: 15496.6, 300 sec: 14805.6). Total num frames: 3035136. Throughput: 0: 3878.4. Samples: 751944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:28,008][00749] Avg episode reward: [(0, '20.338')] [2023-02-22 17:11:30,072][09794] Updated weights for policy 0, policy_version 750 (0.0011) [2023-02-22 17:11:32,730][09794] Updated weights for policy 0, policy_version 760 (0.0011) [2023-02-22 17:11:33,005][00749] Fps is (10 sec: 15974.6, 60 sec: 15564.8, 300 sec: 14843.1). Total num frames: 3117056. Throughput: 0: 3882.3. Samples: 775680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:33,008][00749] Avg episode reward: [(0, '23.196')] [2023-02-22 17:11:35,406][09794] Updated weights for policy 0, policy_version 770 (0.0011) [2023-02-22 17:11:38,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15496.5, 300 sec: 14840.9). Total num frames: 3190784. Throughput: 0: 3881.5. Samples: 798602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:38,007][00749] Avg episode reward: [(0, '27.011')] [2023-02-22 17:11:38,011][09780] Saving new best policy, reward=27.011! [2023-02-22 17:11:38,129][09794] Updated weights for policy 0, policy_version 780 (0.0011) [2023-02-22 17:11:40,650][09794] Updated weights for policy 0, policy_version 790 (0.0010) [2023-02-22 17:11:43,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14875.9). Total num frames: 3272704. Throughput: 0: 3879.1. Samples: 810392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:11:43,008][00749] Avg episode reward: [(0, '26.513')] [2023-02-22 17:11:43,020][09780] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000799_3272704.pth... [2023-02-22 17:11:43,278][09794] Updated weights for policy 0, policy_version 800 (0.0011) [2023-02-22 17:11:45,833][09794] Updated weights for policy 0, policy_version 810 (0.0010) [2023-02-22 17:11:48,005][00749] Fps is (10 sec: 15974.4, 60 sec: 15564.8, 300 sec: 14891.2). Total num frames: 3350528. Throughput: 0: 3881.9. Samples: 834130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:11:48,008][00749] Avg episode reward: [(0, '23.306')] [2023-02-22 17:11:48,506][09794] Updated weights for policy 0, policy_version 820 (0.0011) [2023-02-22 17:11:51,199][09794] Updated weights for policy 0, policy_version 830 (0.0011) [2023-02-22 17:11:53,005][00749] Fps is (10 sec: 15155.2, 60 sec: 15496.5, 300 sec: 14888.1). Total num frames: 3424256. Throughput: 0: 3874.8. Samples: 857080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:11:53,008][00749] Avg episode reward: [(0, '23.699')] [2023-02-22 17:11:53,817][09794] Updated weights for policy 0, policy_version 840 (0.0011) [2023-02-22 17:11:56,387][09794] Updated weights for policy 0, policy_version 850 (0.0011) [2023-02-22 17:11:58,005][00749] Fps is (10 sec: 15564.7, 60 sec: 15564.8, 300 sec: 14919.9). Total num frames: 3506176. Throughput: 0: 3873.4. Samples: 869040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:11:58,007][00749] Avg episode reward: [(0, '23.796')] [2023-02-22 17:11:58,993][09794] Updated weights for policy 0, policy_version 860 (0.0011) [2023-02-22 17:12:01,531][09794] Updated weights for policy 0, policy_version 870 (0.0010) [2023-02-22 17:12:03,005][00749] Fps is (10 sec: 15974.3, 60 sec: 15564.8, 300 sec: 14933.3). Total num frames: 3584000. Throughput: 0: 3893.9. Samples: 892848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:12:03,008][00749] Avg episode reward: [(0, '22.995')] [2023-02-22 17:12:04,223][09794] Updated weights for policy 0, policy_version 880 (0.0010) [2023-02-22 17:12:06,877][09794] Updated weights for policy 0, policy_version 890 (0.0011) [2023-02-22 17:12:08,005][00749] Fps is (10 sec: 15564.9, 60 sec: 15564.8, 300 sec: 14946.2). Total num frames: 3661824. Throughput: 0: 3902.0. Samples: 916050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:12:08,008][00749] Avg episode reward: [(0, '26.580')] [2023-02-22 17:12:09,401][09794] Updated weights for policy 0, policy_version 900 (0.0011) [2023-02-22 17:12:11,982][09794] Updated weights for policy 0, policy_version 910 (0.0011) [2023-02-22 17:12:13,005][00749] Fps is (10 sec: 15564.8, 60 sec: 15564.8, 300 sec: 14958.6). Total num frames: 3739648. Throughput: 0: 3915.8. Samples: 928156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-02-22 17:12:13,008][00749] Avg episode reward: [(0, '25.689')] [2023-02-22 17:12:14,585][09794] Updated weights for policy 0, policy_version 920 (0.0010) [2023-02-22 17:12:17,173][09794] Updated weights for policy 0, policy_version 930 (0.0011) [2023-02-22 17:12:18,005][00749] Fps is (10 sec: 15974.5, 60 sec: 15633.1, 300 sec: 14986.5). Total num frames: 3821568. Throughput: 0: 3911.6. Samples: 951700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:12:18,008][00749] Avg episode reward: [(0, '25.178')] [2023-02-22 17:12:19,958][09794] Updated weights for policy 0, policy_version 940 (0.0011) [2023-02-22 17:12:22,648][09794] Updated weights for policy 0, policy_version 950 (0.0011) [2023-02-22 17:12:23,005][00749] Fps is (10 sec: 15564.9, 60 sec: 15633.1, 300 sec: 14981.9). Total num frames: 3895296. Throughput: 0: 3904.1. Samples: 974288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2023-02-22 17:12:23,008][00749] Avg episode reward: [(0, '23.471')] [2023-02-22 17:12:25,216][09794] Updated weights for policy 0, policy_version 960 (0.0010) [2023-02-22 17:12:27,845][09794] Updated weights for policy 0, policy_version 970 (0.0011) [2023-02-22 17:12:28,005][00749] Fps is (10 sec: 15155.2, 60 sec: 15633.1, 300 sec: 14992.9). Total num frames: 3973120. Throughput: 0: 3907.3. Samples: 986218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2023-02-22 17:12:28,008][00749] Avg episode reward: [(0, '25.868')] [2023-02-22 17:12:29,876][09780] Stopping Batcher_0... [2023-02-22 17:12:29,876][00749] Component Batcher_0 stopped! [2023-02-22 17:12:29,876][09780] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 17:12:29,878][00749] Component RolloutWorker_w2 process died already! Don't wait for it. [2023-02-22 17:12:29,881][00749] Component RolloutWorker_w7 process died already! Don't wait for it. [2023-02-22 17:12:29,877][09780] Loop batcher_evt_loop terminating... [2023-02-22 17:12:29,887][09816] Stopping RolloutWorker_w6... [2023-02-22 17:12:29,887][09816] Loop rollout_proc6_evt_loop terminating... [2023-02-22 17:12:29,890][09798] Stopping RolloutWorker_w3... [2023-02-22 17:12:29,890][09795] Stopping RolloutWorker_w1... [2023-02-22 17:12:29,887][00749] Component RolloutWorker_w6 stopped! [2023-02-22 17:12:29,891][09798] Loop rollout_proc3_evt_loop terminating... [2023-02-22 17:12:29,891][09794] Weights refcount: 2 0 [2023-02-22 17:12:29,891][09795] Loop rollout_proc1_evt_loop terminating... [2023-02-22 17:12:29,891][09819] Stopping RolloutWorker_w5... [2023-02-22 17:12:29,891][09819] Loop rollout_proc5_evt_loop terminating... [2023-02-22 17:12:29,893][09794] Stopping InferenceWorker_p0-w0... [2023-02-22 17:12:29,893][09796] Stopping RolloutWorker_w0... [2023-02-22 17:12:29,893][09794] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 17:12:29,891][00749] Component RolloutWorker_w3 stopped! [2023-02-22 17:12:29,894][09796] Loop rollout_proc0_evt_loop terminating... [2023-02-22 17:12:29,894][00749] Component RolloutWorker_w1 stopped! [2023-02-22 17:12:29,896][09818] Stopping RolloutWorker_w4... [2023-02-22 17:12:29,897][09818] Loop rollout_proc4_evt_loop terminating... [2023-02-22 17:12:29,896][00749] Component RolloutWorker_w5 stopped! [2023-02-22 17:12:29,899][00749] Component InferenceWorker_p0-w0 stopped! [2023-02-22 17:12:29,902][00749] Component RolloutWorker_w0 stopped! [2023-02-22 17:12:29,904][00749] Component RolloutWorker_w4 stopped! [2023-02-22 17:12:29,935][09780] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000343_1404928.pth [2023-02-22 17:12:29,941][09780] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 17:12:30,034][09780] Stopping LearnerWorker_p0... [2023-02-22 17:12:30,035][09780] Loop learner_proc0_evt_loop terminating... [2023-02-22 17:12:30,035][00749] Component LearnerWorker_p0 stopped! [2023-02-22 17:12:30,037][00749] Waiting for process learner_proc0 to stop... [2023-02-22 17:12:31,740][00749] Waiting for process inference_proc0-0 to join... [2023-02-22 17:12:31,742][00749] Waiting for process rollout_proc0 to join... [2023-02-22 17:12:31,744][00749] Waiting for process rollout_proc1 to join... [2023-02-22 17:12:31,746][00749] Waiting for process rollout_proc2 to join... [2023-02-22 17:12:31,747][00749] Waiting for process rollout_proc3 to join... [2023-02-22 17:12:31,749][00749] Waiting for process rollout_proc4 to join... [2023-02-22 17:12:31,751][00749] Waiting for process rollout_proc5 to join... [2023-02-22 17:12:31,752][00749] Waiting for process rollout_proc6 to join... [2023-02-22 17:12:31,754][00749] Waiting for process rollout_proc7 to join... [2023-02-22 17:12:31,755][00749] Batcher 0 profile tree view: batching: 15.6698, releasing_batches: 0.0232 [2023-02-22 17:12:31,756][00749] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 4.6958 update_model: 4.0603 weight_update: 0.0010 one_step: 0.0026 handle_policy_step: 242.2526 deserialize: 9.1828, stack: 1.5350, obs_to_device_normalize: 56.1599, forward: 112.3785, send_messages: 16.7795 prepare_outputs: 34.6596 to_cpu: 21.1398 [2023-02-22 17:12:31,758][00749] Learner 0 profile tree view: misc: 0.0062, prepare_batch: 10.1016 train: 20.1066 epoch_init: 0.0056, minibatch_init: 0.0060, losses_postprocess: 0.3786, kl_divergence: 0.5390, after_optimizer: 1.1418 calculate_losses: 7.7604 losses_init: 0.0036, forward_head: 1.1111, bptt_initial: 3.2359, tail: 0.6397, advantages_returns: 0.1729, losses: 1.0538 bptt: 1.3628 bptt_forward_core: 1.3112 update: 9.9265 clip: 1.1066 [2023-02-22 17:12:31,760][00749] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1947, enqueue_policy_requests: 10.0101, env_step: 162.3796, overhead: 13.3595, complete_rollouts: 0.3324 save_policy_outputs: 11.1506 split_output_tensors: 5.4350 [2023-02-22 17:12:31,762][00749] Loop Runner_EvtLoop terminating... [2023-02-22 17:12:31,764][00749] Runner profile tree view: main_loop: 283.7353 [2023-02-22 17:12:31,766][00749] Collected {0: 4005888}, FPS: 14118.4 [2023-02-22 17:13:02,218][00749] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 17:13:02,220][00749] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 17:13:02,222][00749] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 17:13:02,224][00749] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 17:13:02,226][00749] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 17:13:02,227][00749] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 17:13:02,230][00749] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 17:13:02,231][00749] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 17:13:02,233][00749] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 17:13:02,234][00749] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 17:13:02,235][00749] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 17:13:02,238][00749] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 17:13:02,239][00749] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 17:13:02,241][00749] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 17:13:02,242][00749] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 17:13:02,260][00749] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:13:02,263][00749] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:13:02,267][00749] RunningMeanStd input shape: (1,) [2023-02-22 17:13:02,288][00749] ConvEncoder: input_channels=3 [2023-02-22 17:13:03,190][00749] Conv encoder output size: 512 [2023-02-22 17:13:03,193][00749] Policy head output size: 512 [2023-02-22 17:13:06,134][00749] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 17:13:08,006][00749] Num frames 100... [2023-02-22 17:13:08,126][00749] Num frames 200... [2023-02-22 17:13:08,247][00749] Num frames 300... [2023-02-22 17:13:08,367][00749] Num frames 400... [2023-02-22 17:13:08,489][00749] Num frames 500... [2023-02-22 17:13:08,606][00749] Num frames 600... [2023-02-22 17:13:08,730][00749] Num frames 700... [2023-02-22 17:13:08,866][00749] Avg episode rewards: #0: 12.680, true rewards: #0: 7.680 [2023-02-22 17:13:08,868][00749] Avg episode reward: 12.680, avg true_objective: 7.680 [2023-02-22 17:13:08,912][00749] Num frames 800... [2023-02-22 17:13:09,035][00749] Num frames 900... [2023-02-22 17:13:09,161][00749] Num frames 1000... [2023-02-22 17:13:09,278][00749] Num frames 1100... [2023-02-22 17:13:09,400][00749] Num frames 1200... [2023-02-22 17:13:09,520][00749] Num frames 1300... [2023-02-22 17:13:09,636][00749] Num frames 1400... [2023-02-22 17:13:09,787][00749] Num frames 1500... [2023-02-22 17:13:09,909][00749] Num frames 1600... [2023-02-22 17:13:10,031][00749] Num frames 1700... [2023-02-22 17:13:10,154][00749] Num frames 1800... [2023-02-22 17:13:10,274][00749] Num frames 1900... [2023-02-22 17:13:10,400][00749] Num frames 2000... [2023-02-22 17:13:10,525][00749] Num frames 2100... [2023-02-22 17:13:10,649][00749] Num frames 2200... [2023-02-22 17:13:10,778][00749] Num frames 2300... [2023-02-22 17:13:10,902][00749] Num frames 2400... [2023-02-22 17:13:11,029][00749] Num frames 2500... [2023-02-22 17:13:11,149][00749] Num frames 2600... [2023-02-22 17:13:11,272][00749] Num frames 2700... [2023-02-22 17:13:11,401][00749] Num frames 2800... [2023-02-22 17:13:11,542][00749] Avg episode rewards: #0: 37.339, true rewards: #0: 14.340 [2023-02-22 17:13:11,544][00749] Avg episode reward: 37.339, avg true_objective: 14.340 [2023-02-22 17:13:11,589][00749] Num frames 2900... [2023-02-22 17:13:11,719][00749] Num frames 3000... [2023-02-22 17:13:11,859][00749] Num frames 3100... [2023-02-22 17:13:11,984][00749] Num frames 3200... [2023-02-22 17:13:12,106][00749] Num frames 3300... [2023-02-22 17:13:12,226][00749] Num frames 3400... [2023-02-22 17:13:12,353][00749] Num frames 3500... [2023-02-22 17:13:12,471][00749] Num frames 3600... [2023-02-22 17:13:12,590][00749] Num frames 3700... [2023-02-22 17:13:12,712][00749] Num frames 3800... [2023-02-22 17:13:12,835][00749] Num frames 3900... [2023-02-22 17:13:12,956][00749] Num frames 4000... [2023-02-22 17:13:13,077][00749] Num frames 4100... [2023-02-22 17:13:13,191][00749] Num frames 4200... [2023-02-22 17:13:13,310][00749] Num frames 4300... [2023-02-22 17:13:13,428][00749] Num frames 4400... [2023-02-22 17:13:13,544][00749] Num frames 4500... [2023-02-22 17:13:13,662][00749] Num frames 4600... [2023-02-22 17:13:13,754][00749] Avg episode rewards: #0: 39.426, true rewards: #0: 15.427 [2023-02-22 17:13:13,757][00749] Avg episode reward: 39.426, avg true_objective: 15.427 [2023-02-22 17:13:13,848][00749] Num frames 4700... [2023-02-22 17:13:13,971][00749] Num frames 4800... [2023-02-22 17:13:14,086][00749] Num frames 4900... [2023-02-22 17:13:14,211][00749] Num frames 5000... [2023-02-22 17:13:14,328][00749] Num frames 5100... [2023-02-22 17:13:14,441][00749] Num frames 5200... [2023-02-22 17:13:14,563][00749] Num frames 5300... [2023-02-22 17:13:14,687][00749] Num frames 5400... [2023-02-22 17:13:14,807][00749] Num frames 5500... [2023-02-22 17:13:14,934][00749] Num frames 5600... [2023-02-22 17:13:15,050][00749] Num frames 5700... [2023-02-22 17:13:15,173][00749] Num frames 5800... [2023-02-22 17:13:15,293][00749] Num frames 5900... [2023-02-22 17:13:15,417][00749] Num frames 6000... [2023-02-22 17:13:15,543][00749] Num frames 6100... [2023-02-22 17:13:15,664][00749] Num frames 6200... [2023-02-22 17:13:15,785][00749] Num frames 6300... [2023-02-22 17:13:15,908][00749] Num frames 6400... [2023-02-22 17:13:16,028][00749] Num frames 6500... [2023-02-22 17:13:16,103][00749] Avg episode rewards: #0: 42.539, true rewards: #0: 16.290 [2023-02-22 17:13:16,105][00749] Avg episode reward: 42.539, avg true_objective: 16.290 [2023-02-22 17:13:16,205][00749] Num frames 6600... [2023-02-22 17:13:16,324][00749] Num frames 6700... [2023-02-22 17:13:16,440][00749] Num frames 6800... [2023-02-22 17:13:16,553][00749] Num frames 6900... [2023-02-22 17:13:16,666][00749] Num frames 7000... [2023-02-22 17:13:16,781][00749] Num frames 7100... [2023-02-22 17:13:16,895][00749] Num frames 7200... [2023-02-22 17:13:17,049][00749] Avg episode rewards: #0: 37.567, true rewards: #0: 14.568 [2023-02-22 17:13:17,051][00749] Avg episode reward: 37.567, avg true_objective: 14.568 [2023-02-22 17:13:17,074][00749] Num frames 7300... [2023-02-22 17:13:17,191][00749] Num frames 7400... [2023-02-22 17:13:17,312][00749] Num frames 7500... [2023-02-22 17:13:17,430][00749] Num frames 7600... [2023-02-22 17:13:17,545][00749] Num frames 7700... [2023-02-22 17:13:17,668][00749] Num frames 7800... [2023-02-22 17:13:17,791][00749] Num frames 7900... [2023-02-22 17:13:17,915][00749] Num frames 8000... [2023-02-22 17:13:18,035][00749] Num frames 8100... [2023-02-22 17:13:18,149][00749] Avg episode rewards: #0: 34.746, true rewards: #0: 13.580 [2023-02-22 17:13:18,151][00749] Avg episode reward: 34.746, avg true_objective: 13.580 [2023-02-22 17:13:18,218][00749] Num frames 8200... [2023-02-22 17:13:18,337][00749] Num frames 8300... [2023-02-22 17:13:18,459][00749] Num frames 8400... [2023-02-22 17:13:18,591][00749] Avg episode rewards: #0: 30.525, true rewards: #0: 12.097 [2023-02-22 17:13:18,593][00749] Avg episode reward: 30.525, avg true_objective: 12.097 [2023-02-22 17:13:18,634][00749] Num frames 8500... [2023-02-22 17:13:18,760][00749] Num frames 8600... [2023-02-22 17:13:18,881][00749] Num frames 8700... [2023-02-22 17:13:18,997][00749] Num frames 8800... [2023-02-22 17:13:19,116][00749] Num frames 8900... [2023-02-22 17:13:19,232][00749] Num frames 9000... [2023-02-22 17:13:19,352][00749] Num frames 9100... [2023-02-22 17:13:19,476][00749] Num frames 9200... [2023-02-22 17:13:19,595][00749] Num frames 9300... [2023-02-22 17:13:19,719][00749] Num frames 9400... [2023-02-22 17:13:19,841][00749] Num frames 9500... [2023-02-22 17:13:19,960][00749] Num frames 9600... [2023-02-22 17:13:20,088][00749] Num frames 9700... [2023-02-22 17:13:20,208][00749] Num frames 9800... [2023-02-22 17:13:20,331][00749] Num frames 9900... [2023-02-22 17:13:20,474][00749] Avg episode rewards: #0: 30.965, true rewards: #0: 12.465 [2023-02-22 17:13:20,476][00749] Avg episode reward: 30.965, avg true_objective: 12.465 [2023-02-22 17:13:20,511][00749] Num frames 10000... [2023-02-22 17:13:20,627][00749] Num frames 10100... [2023-02-22 17:13:20,747][00749] Num frames 10200... [2023-02-22 17:13:20,861][00749] Num frames 10300... [2023-02-22 17:13:20,974][00749] Num frames 10400... [2023-02-22 17:13:21,093][00749] Num frames 10500... [2023-02-22 17:13:21,214][00749] Num frames 10600... [2023-02-22 17:13:21,336][00749] Num frames 10700... [2023-02-22 17:13:21,456][00749] Num frames 10800... [2023-02-22 17:13:21,581][00749] Num frames 10900... [2023-02-22 17:13:21,704][00749] Num frames 11000... [2023-02-22 17:13:21,827][00749] Num frames 11100... [2023-02-22 17:13:21,911][00749] Avg episode rewards: #0: 30.582, true rewards: #0: 12.360 [2023-02-22 17:13:21,913][00749] Avg episode reward: 30.582, avg true_objective: 12.360 [2023-02-22 17:13:22,010][00749] Num frames 11200... [2023-02-22 17:13:22,133][00749] Num frames 11300... [2023-02-22 17:13:22,247][00749] Num frames 11400... [2023-02-22 17:13:22,364][00749] Num frames 11500... [2023-02-22 17:13:22,494][00749] Num frames 11600... [2023-02-22 17:13:22,621][00749] Num frames 11700... [2023-02-22 17:13:22,745][00749] Num frames 11800... [2023-02-22 17:13:22,868][00749] Num frames 11900... [2023-02-22 17:13:22,993][00749] Num frames 12000... [2023-02-22 17:13:23,118][00749] Num frames 12100... [2023-02-22 17:13:23,240][00749] Num frames 12200... [2023-02-22 17:13:23,371][00749] Num frames 12300... [2023-02-22 17:13:23,501][00749] Num frames 12400... [2023-02-22 17:13:23,629][00749] Num frames 12500... [2023-02-22 17:13:23,809][00749] Avg episode rewards: #0: 31.395, true rewards: #0: 12.595 [2023-02-22 17:13:23,811][00749] Avg episode reward: 31.395, avg true_objective: 12.595 [2023-02-22 17:13:53,927][00749] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 17:14:49,518][00749] Environment doom_basic already registered, overwriting... [2023-02-22 17:14:49,520][00749] Environment doom_two_colors_easy already registered, overwriting... [2023-02-22 17:14:49,522][00749] Environment doom_two_colors_hard already registered, overwriting... [2023-02-22 17:14:49,524][00749] Environment doom_dm already registered, overwriting... [2023-02-22 17:14:49,525][00749] Environment doom_dwango5 already registered, overwriting... [2023-02-22 17:14:49,527][00749] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-02-22 17:14:49,528][00749] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-02-22 17:14:49,531][00749] Environment doom_my_way_home already registered, overwriting... [2023-02-22 17:14:49,532][00749] Environment doom_deadly_corridor already registered, overwriting... [2023-02-22 17:14:49,533][00749] Environment doom_defend_the_center already registered, overwriting... [2023-02-22 17:14:49,534][00749] Environment doom_defend_the_line already registered, overwriting... [2023-02-22 17:14:49,537][00749] Environment doom_health_gathering already registered, overwriting... [2023-02-22 17:14:49,539][00749] Environment doom_health_gathering_supreme already registered, overwriting... [2023-02-22 17:14:49,541][00749] Environment doom_battle already registered, overwriting... [2023-02-22 17:14:49,542][00749] Environment doom_battle2 already registered, overwriting... [2023-02-22 17:14:49,544][00749] Environment doom_duel_bots already registered, overwriting... [2023-02-22 17:14:49,546][00749] Environment doom_deathmatch_bots already registered, overwriting... [2023-02-22 17:14:49,547][00749] Environment doom_duel already registered, overwriting... [2023-02-22 17:14:49,548][00749] Environment doom_deathmatch_full already registered, overwriting... [2023-02-22 17:14:49,550][00749] Environment doom_benchmark already registered, overwriting... [2023-02-22 17:14:49,552][00749] register_encoder_factory: [2023-02-22 17:14:49,563][00749] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 17:14:49,564][00749] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2023-02-22 17:14:49,570][00749] Experiment dir /content/train_dir/default_experiment already exists! [2023-02-22 17:14:49,571][00749] Resuming existing experiment from /content/train_dir/default_experiment... [2023-02-22 17:14:49,572][00749] Weights and Biases integration disabled [2023-02-22 17:14:49,578][00749] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-02-22 17:14:51,165][00749] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2023-02-22 17:14:51,168][00749] Saving configuration to /content/train_dir/default_experiment/config.json... [2023-02-22 17:14:51,171][00749] Rollout worker 0 uses device cpu [2023-02-22 17:14:51,173][00749] Rollout worker 1 uses device cpu [2023-02-22 17:14:51,174][00749] Rollout worker 2 uses device cpu [2023-02-22 17:14:51,176][00749] Rollout worker 3 uses device cpu [2023-02-22 17:14:51,177][00749] Rollout worker 4 uses device cpu [2023-02-22 17:14:51,179][00749] Rollout worker 5 uses device cpu [2023-02-22 17:14:51,180][00749] Rollout worker 6 uses device cpu [2023-02-22 17:14:51,182][00749] Rollout worker 7 uses device cpu [2023-02-22 17:14:51,224][00749] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:14:51,226][00749] InferenceWorker_p0-w0: min num requests: 2 [2023-02-22 17:14:51,257][00749] Starting all processes... [2023-02-22 17:14:51,258][00749] Starting process learner_proc0 [2023-02-22 17:14:51,396][00749] Starting all processes... [2023-02-22 17:14:51,402][00749] Starting process inference_proc0-0 [2023-02-22 17:14:51,403][00749] Starting process rollout_proc0 [2023-02-22 17:14:51,404][00749] Starting process rollout_proc1 [2023-02-22 17:14:51,405][00749] Starting process rollout_proc2 [2023-02-22 17:14:51,406][00749] Starting process rollout_proc3 [2023-02-22 17:14:51,407][00749] Starting process rollout_proc4 [2023-02-22 17:14:51,409][00749] Starting process rollout_proc5 [2023-02-22 17:14:51,412][00749] Starting process rollout_proc6 [2023-02-22 17:14:51,413][00749] Starting process rollout_proc7 [2023-02-22 17:14:53,642][12651] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:14:53,642][12651] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-02-22 17:14:53,655][12651] Num visible devices: 1 [2023-02-22 17:14:53,681][12651] Starting seed is not provided [2023-02-22 17:14:53,681][12651] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:14:53,681][12651] Initializing actor-critic model on device cuda:0 [2023-02-22 17:14:53,682][12651] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:14:53,683][12651] RunningMeanStd input shape: (1,) [2023-02-22 17:14:53,708][12651] ConvEncoder: input_channels=3 [2023-02-22 17:14:53,909][12651] Conv encoder output size: 512 [2023-02-22 17:14:53,910][12651] Policy head output size: 512 [2023-02-22 17:14:53,940][12651] Created Actor Critic model with architecture: [2023-02-22 17:14:53,940][12651] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-02-22 17:14:53,976][12666] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:53,987][12669] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:53,995][12667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:14:53,995][12667] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-02-22 17:14:54,006][12665] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:54,018][12667] Num visible devices: 1 [2023-02-22 17:14:54,029][12688] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:54,080][12676] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:54,325][12691] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:54,349][12690] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:54,368][12694] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2023-02-22 17:14:56,948][12651] Using optimizer [2023-02-22 17:14:56,948][12651] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-02-22 17:14:56,976][12651] Loading model from checkpoint [2023-02-22 17:14:56,981][12651] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-02-22 17:14:56,981][12651] Initialized policy 0 weights for model version 978 [2023-02-22 17:14:56,983][12651] LearnerWorker_p0 finished initialization! [2023-02-22 17:14:56,983][12651] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-02-22 17:14:57,093][12667] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:14:57,094][12667] RunningMeanStd input shape: (1,) [2023-02-22 17:14:57,108][12667] ConvEncoder: input_channels=3 [2023-02-22 17:14:57,214][12667] Conv encoder output size: 512 [2023-02-22 17:14:57,215][12667] Policy head output size: 512 [2023-02-22 17:14:59,577][00749] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 17:15:00,036][00749] Inference worker 0-0 is ready! [2023-02-22 17:15:00,038][00749] All inference workers are ready! Signal rollout workers to start! [2023-02-22 17:15:00,058][12688] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,060][12694] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,063][12691] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,064][12665] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,064][12666] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,064][12676] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,064][12669] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,065][12690] Doom resolution: 160x120, resize resolution: (128, 72) [2023-02-22 17:15:00,369][12688] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,369][12694] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,385][12690] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,386][12676] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,387][12666] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,387][12691] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,641][12694] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,644][12688] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,677][12676] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,677][12691] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,679][12665] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,683][12690] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,909][12669] Decorrelating experience for 0 frames... [2023-02-22 17:15:00,965][12666] Decorrelating experience for 32 frames... [2023-02-22 17:15:00,972][12694] Decorrelating experience for 64 frames... [2023-02-22 17:15:00,974][12688] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,006][12676] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,015][12690] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,193][12665] Decorrelating experience for 32 frames... [2023-02-22 17:15:01,243][12691] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,331][12690] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,333][12694] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,342][12669] Decorrelating experience for 32 frames... [2023-02-22 17:15:01,433][12688] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,439][12666] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,499][12676] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,641][12665] Decorrelating experience for 64 frames... [2023-02-22 17:15:01,697][12691] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,750][12666] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,928][12665] Decorrelating experience for 96 frames... [2023-02-22 17:15:01,928][12669] Decorrelating experience for 64 frames... [2023-02-22 17:15:02,213][12669] Decorrelating experience for 96 frames... [2023-02-22 17:15:03,441][12651] Signal inference workers to stop experience collection... [2023-02-22 17:15:03,446][12667] InferenceWorker_p0-w0: stopping experience collection [2023-02-22 17:15:04,577][00749] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 119.6. Samples: 598. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-02-22 17:15:04,579][00749] Avg episode reward: [(0, '3.663')] [2023-02-22 17:15:05,446][12651] Signal inference workers to resume experience collection... [2023-02-22 17:15:05,447][12667] InferenceWorker_p0-w0: resuming experience collection [2023-02-22 17:15:08,087][12667] Updated weights for policy 0, policy_version 988 (0.0392) [2023-02-22 17:15:09,577][00749] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 4071424. Throughput: 0: 1483.8. Samples: 14838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2023-02-22 17:15:09,581][00749] Avg episode reward: [(0, '21.836')] [2023-02-22 17:15:10,395][12667] Updated weights for policy 0, policy_version 998 (0.0012) [2023-02-22 17:15:11,217][00749] Heartbeat connected on Batcher_0 [2023-02-22 17:15:11,220][00749] Heartbeat connected on LearnerWorker_p0 [2023-02-22 17:15:11,228][00749] Heartbeat connected on InferenceWorker_p0-w0 [2023-02-22 17:15:11,237][00749] Heartbeat connected on RolloutWorker_w1 [2023-02-22 17:15:11,240][00749] Heartbeat connected on RolloutWorker_w0 [2023-02-22 17:15:11,242][00749] Heartbeat connected on RolloutWorker_w2 [2023-02-22 17:15:11,246][00749] Heartbeat connected on RolloutWorker_w4 [2023-02-22 17:15:11,249][00749] Heartbeat connected on RolloutWorker_w3 [2023-02-22 17:15:11,251][00749] Heartbeat connected on RolloutWorker_w5 [2023-02-22 17:15:11,259][00749] Heartbeat connected on RolloutWorker_w6 [2023-02-22 17:15:11,264][00749] Heartbeat connected on RolloutWorker_w7 [2023-02-22 17:15:12,626][12667] Updated weights for policy 0, policy_version 1008 (0.0012) [2023-02-22 17:15:14,577][00749] Fps is (10 sec: 15974.3, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 4165632. Throughput: 0: 1898.7. Samples: 28480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:15:14,580][00749] Avg episode reward: [(0, '28.559')] [2023-02-22 17:15:14,583][12651] Saving new best policy, reward=28.559! [2023-02-22 17:15:14,792][12667] Updated weights for policy 0, policy_version 1018 (0.0012) [2023-02-22 17:15:17,011][12667] Updated weights for policy 0, policy_version 1028 (0.0012) [2023-02-22 17:15:19,265][12667] Updated weights for policy 0, policy_version 1038 (0.0011) [2023-02-22 17:15:19,577][00749] Fps is (10 sec: 18432.1, 60 sec: 12492.8, 300 sec: 12492.8). Total num frames: 4255744. Throughput: 0: 2842.5. Samples: 56850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:15:19,580][00749] Avg episode reward: [(0, '25.155')] [2023-02-22 17:15:21,552][12667] Updated weights for policy 0, policy_version 1048 (0.0012) [2023-02-22 17:15:23,856][12667] Updated weights for policy 0, policy_version 1058 (0.0011) [2023-02-22 17:15:24,577][00749] Fps is (10 sec: 18022.2, 60 sec: 13598.7, 300 sec: 13598.7). Total num frames: 4345856. Throughput: 0: 3341.7. Samples: 83542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:15:24,580][00749] Avg episode reward: [(0, '25.223')] [2023-02-22 17:15:26,188][12667] Updated weights for policy 0, policy_version 1068 (0.0012) [2023-02-22 17:15:28,404][12667] Updated weights for policy 0, policy_version 1078 (0.0012) [2023-02-22 17:15:29,577][00749] Fps is (10 sec: 18022.3, 60 sec: 14336.0, 300 sec: 14336.0). Total num frames: 4435968. Throughput: 0: 3227.7. Samples: 96830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:15:29,579][00749] Avg episode reward: [(0, '24.037')] [2023-02-22 17:15:30,564][12667] Updated weights for policy 0, policy_version 1088 (0.0011) [2023-02-22 17:15:32,720][12667] Updated weights for policy 0, policy_version 1098 (0.0011) [2023-02-22 17:15:34,577][00749] Fps is (10 sec: 18432.2, 60 sec: 14979.7, 300 sec: 14979.7). Total num frames: 4530176. Throughput: 0: 3579.1. Samples: 125268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:15:34,580][00749] Avg episode reward: [(0, '23.371')] [2023-02-22 17:15:34,959][12667] Updated weights for policy 0, policy_version 1108 (0.0011) [2023-02-22 17:15:37,190][12667] Updated weights for policy 0, policy_version 1118 (0.0012) [2023-02-22 17:15:39,577][00749] Fps is (10 sec: 18022.5, 60 sec: 15257.6, 300 sec: 15257.6). Total num frames: 4616192. Throughput: 0: 3801.4. Samples: 152056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:15:39,579][00749] Avg episode reward: [(0, '24.356')] [2023-02-22 17:15:39,615][12667] Updated weights for policy 0, policy_version 1128 (0.0013) [2023-02-22 17:15:41,987][12667] Updated weights for policy 0, policy_version 1138 (0.0012) [2023-02-22 17:15:44,372][12667] Updated weights for policy 0, policy_version 1148 (0.0012) [2023-02-22 17:15:44,577][00749] Fps is (10 sec: 17612.8, 60 sec: 15564.8, 300 sec: 15564.8). Total num frames: 4706304. Throughput: 0: 3665.6. Samples: 164950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:15:44,579][00749] Avg episode reward: [(0, '27.289')] [2023-02-22 17:15:46,521][12667] Updated weights for policy 0, policy_version 1158 (0.0011) [2023-02-22 17:15:48,703][12667] Updated weights for policy 0, policy_version 1168 (0.0012) [2023-02-22 17:15:49,577][00749] Fps is (10 sec: 18431.8, 60 sec: 15892.4, 300 sec: 15892.4). Total num frames: 4800512. Throughput: 0: 4261.9. Samples: 192386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:15:49,579][00749] Avg episode reward: [(0, '26.497')] [2023-02-22 17:15:50,855][12667] Updated weights for policy 0, policy_version 1178 (0.0012) [2023-02-22 17:15:52,975][12667] Updated weights for policy 0, policy_version 1188 (0.0012) [2023-02-22 17:15:54,577][00749] Fps is (10 sec: 18841.5, 60 sec: 16160.6, 300 sec: 16160.6). Total num frames: 4894720. Throughput: 0: 4571.6. Samples: 220560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:15:54,580][00749] Avg episode reward: [(0, '26.735')] [2023-02-22 17:15:55,233][12667] Updated weights for policy 0, policy_version 1198 (0.0012) [2023-02-22 17:15:57,559][12667] Updated weights for policy 0, policy_version 1208 (0.0012) [2023-02-22 17:15:59,577][00749] Fps is (10 sec: 18022.1, 60 sec: 16247.4, 300 sec: 16247.4). Total num frames: 4980736. Throughput: 0: 4567.4. Samples: 234014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:15:59,582][00749] Avg episode reward: [(0, '27.523')] [2023-02-22 17:15:59,839][12667] Updated weights for policy 0, policy_version 1218 (0.0011) [2023-02-22 17:16:02,023][12667] Updated weights for policy 0, policy_version 1228 (0.0011) [2023-02-22 17:16:04,192][12667] Updated weights for policy 0, policy_version 1238 (0.0011) [2023-02-22 17:16:04,577][00749] Fps is (10 sec: 18022.5, 60 sec: 17817.6, 300 sec: 16447.0). Total num frames: 5074944. Throughput: 0: 4549.5. Samples: 261578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:16:04,580][00749] Avg episode reward: [(0, '29.141')] [2023-02-22 17:16:04,609][12651] Saving new best policy, reward=29.141! [2023-02-22 17:16:06,339][12667] Updated weights for policy 0, policy_version 1248 (0.0011) [2023-02-22 17:16:08,448][12667] Updated weights for policy 0, policy_version 1258 (0.0012) [2023-02-22 17:16:09,577][00749] Fps is (10 sec: 19251.7, 60 sec: 18363.7, 300 sec: 16676.6). Total num frames: 5173248. Throughput: 0: 4595.4. Samples: 290334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:16:09,580][00749] Avg episode reward: [(0, '26.210')] [2023-02-22 17:16:10,603][12667] Updated weights for policy 0, policy_version 1268 (0.0011) [2023-02-22 17:16:12,880][12667] Updated weights for policy 0, policy_version 1278 (0.0011) [2023-02-22 17:16:14,577][00749] Fps is (10 sec: 18841.6, 60 sec: 18295.5, 300 sec: 16766.3). Total num frames: 5263360. Throughput: 0: 4606.2. Samples: 304110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:16:14,580][00749] Avg episode reward: [(0, '23.567')] [2023-02-22 17:16:15,161][12667] Updated weights for policy 0, policy_version 1288 (0.0012) [2023-02-22 17:16:17,450][12667] Updated weights for policy 0, policy_version 1298 (0.0012) [2023-02-22 17:16:19,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18295.5, 300 sec: 16844.8). Total num frames: 5353472. Throughput: 0: 4579.4. Samples: 331340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:16:19,580][00749] Avg episode reward: [(0, '25.683')] [2023-02-22 17:16:19,606][12667] Updated weights for policy 0, policy_version 1308 (0.0011) [2023-02-22 17:16:21,751][12667] Updated weights for policy 0, policy_version 1318 (0.0011) [2023-02-22 17:16:23,927][12667] Updated weights for policy 0, policy_version 1328 (0.0012) [2023-02-22 17:16:24,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18363.8, 300 sec: 16962.3). Total num frames: 5447680. Throughput: 0: 4613.5. Samples: 359662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 17:16:24,579][00749] Avg episode reward: [(0, '28.218')] [2023-02-22 17:16:26,117][12667] Updated weights for policy 0, policy_version 1338 (0.0012) [2023-02-22 17:16:28,410][12667] Updated weights for policy 0, policy_version 1348 (0.0012) [2023-02-22 17:16:29,577][00749] Fps is (10 sec: 18431.9, 60 sec: 18363.7, 300 sec: 17021.1). Total num frames: 5537792. Throughput: 0: 4636.4. Samples: 373588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:16:29,580][00749] Avg episode reward: [(0, '28.872')] [2023-02-22 17:16:30,863][12667] Updated weights for policy 0, policy_version 1358 (0.0012) [2023-02-22 17:16:33,273][12667] Updated weights for policy 0, policy_version 1368 (0.0012) [2023-02-22 17:16:34,577][00749] Fps is (10 sec: 17612.7, 60 sec: 18227.2, 300 sec: 17030.7). Total num frames: 5623808. Throughput: 0: 4590.6. Samples: 398962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:16:34,580][00749] Avg episode reward: [(0, '27.851')] [2023-02-22 17:16:35,530][12667] Updated weights for policy 0, policy_version 1378 (0.0012) [2023-02-22 17:16:37,738][12667] Updated weights for policy 0, policy_version 1388 (0.0011) [2023-02-22 17:16:39,577][00749] Fps is (10 sec: 18022.5, 60 sec: 18363.7, 300 sec: 17121.3). Total num frames: 5718016. Throughput: 0: 4578.7. Samples: 426600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:16:39,580][00749] Avg episode reward: [(0, '28.400')] [2023-02-22 17:16:39,893][12667] Updated weights for policy 0, policy_version 1398 (0.0011) [2023-02-22 17:16:42,153][12667] Updated weights for policy 0, policy_version 1408 (0.0011) [2023-02-22 17:16:44,494][12667] Updated weights for policy 0, policy_version 1418 (0.0013) [2023-02-22 17:16:44,577][00749] Fps is (10 sec: 18432.1, 60 sec: 18363.7, 300 sec: 17164.2). Total num frames: 5808128. Throughput: 0: 4586.9. Samples: 440422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-02-22 17:16:44,579][00749] Avg episode reward: [(0, '28.873')] [2023-02-22 17:16:47,005][12667] Updated weights for policy 0, policy_version 1428 (0.0012) [2023-02-22 17:16:49,415][12667] Updated weights for policy 0, policy_version 1438 (0.0012) [2023-02-22 17:16:49,577][00749] Fps is (10 sec: 17203.2, 60 sec: 18159.0, 300 sec: 17128.7). Total num frames: 5890048. Throughput: 0: 4539.0. Samples: 465832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:16:49,580][00749] Avg episode reward: [(0, '27.286')] [2023-02-22 17:16:49,588][12651] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001438_5890048.pth... [2023-02-22 17:16:49,663][12651] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000799_3272704.pth [2023-02-22 17:16:51,691][12667] Updated weights for policy 0, policy_version 1448 (0.0012) [2023-02-22 17:16:53,942][12667] Updated weights for policy 0, policy_version 1458 (0.0012) [2023-02-22 17:16:54,577][00749] Fps is (10 sec: 17203.2, 60 sec: 18090.7, 300 sec: 17167.6). Total num frames: 5980160. Throughput: 0: 4497.2. Samples: 492710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-02-22 17:16:54,580][00749] Avg episode reward: [(0, '24.833')] [2023-02-22 17:16:56,179][12667] Updated weights for policy 0, policy_version 1468 (0.0012) [2023-02-22 17:16:58,380][12667] Updated weights for policy 0, policy_version 1478 (0.0012) [2023-02-22 17:16:59,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18227.3, 300 sec: 17237.3). Total num frames: 6074368. Throughput: 0: 4498.0. Samples: 506522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:16:59,579][00749] Avg episode reward: [(0, '29.474')] [2023-02-22 17:16:59,588][12651] Saving new best policy, reward=29.474! [2023-02-22 17:17:00,715][12667] Updated weights for policy 0, policy_version 1488 (0.0012) [2023-02-22 17:17:03,053][12667] Updated weights for policy 0, policy_version 1498 (0.0011) [2023-02-22 17:17:04,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18090.7, 300 sec: 17236.0). Total num frames: 6160384. Throughput: 0: 4486.9. Samples: 533248. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:17:04,579][00749] Avg episode reward: [(0, '31.539')] [2023-02-22 17:17:04,582][12651] Saving new best policy, reward=31.539! [2023-02-22 17:17:05,441][12667] Updated weights for policy 0, policy_version 1508 (0.0012) [2023-02-22 17:17:07,768][12667] Updated weights for policy 0, policy_version 1518 (0.0012) [2023-02-22 17:17:09,577][00749] Fps is (10 sec: 17203.2, 60 sec: 17885.9, 300 sec: 17234.7). Total num frames: 6246400. Throughput: 0: 4440.7. Samples: 559494. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:17:09,579][00749] Avg episode reward: [(0, '28.257')] [2023-02-22 17:17:10,043][12667] Updated weights for policy 0, policy_version 1528 (0.0012) [2023-02-22 17:17:12,327][12667] Updated weights for policy 0, policy_version 1538 (0.0012) [2023-02-22 17:17:14,547][12667] Updated weights for policy 0, policy_version 1548 (0.0012) [2023-02-22 17:17:14,577][00749] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17294.2). Total num frames: 6340608. Throughput: 0: 4432.2. Samples: 573036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:17:14,580][00749] Avg episode reward: [(0, '27.934')] [2023-02-22 17:17:16,913][12667] Updated weights for policy 0, policy_version 1558 (0.0012) [2023-02-22 17:17:19,289][12667] Updated weights for policy 0, policy_version 1568 (0.0013) [2023-02-22 17:17:19,577][00749] Fps is (10 sec: 18022.4, 60 sec: 17885.9, 300 sec: 17291.0). Total num frames: 6426624. Throughput: 0: 4459.9. Samples: 599658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:17:19,580][00749] Avg episode reward: [(0, '27.044')] [2023-02-22 17:17:21,693][12667] Updated weights for policy 0, policy_version 1578 (0.0012) [2023-02-22 17:17:23,883][12667] Updated weights for policy 0, policy_version 1588 (0.0012) [2023-02-22 17:17:24,577][00749] Fps is (10 sec: 17612.9, 60 sec: 17817.6, 300 sec: 17316.2). Total num frames: 6516736. Throughput: 0: 4437.6. Samples: 626290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:17:24,580][00749] Avg episode reward: [(0, '26.844')] [2023-02-22 17:17:26,022][12667] Updated weights for policy 0, policy_version 1598 (0.0011) [2023-02-22 17:17:28,146][12667] Updated weights for policy 0, policy_version 1608 (0.0011) [2023-02-22 17:17:29,577][00749] Fps is (10 sec: 18432.0, 60 sec: 17885.9, 300 sec: 17367.0). Total num frames: 6610944. Throughput: 0: 4448.6. Samples: 640610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:17:29,580][00749] Avg episode reward: [(0, '29.013')] [2023-02-22 17:17:30,300][12667] Updated weights for policy 0, policy_version 1618 (0.0012) [2023-02-22 17:17:32,493][12667] Updated weights for policy 0, policy_version 1628 (0.0012) [2023-02-22 17:17:34,577][00749] Fps is (10 sec: 18841.6, 60 sec: 18022.4, 300 sec: 17414.6). Total num frames: 6705152. Throughput: 0: 4508.9. Samples: 668732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:17:34,580][00749] Avg episode reward: [(0, '29.995')] [2023-02-22 17:17:34,785][12667] Updated weights for policy 0, policy_version 1638 (0.0012) [2023-02-22 17:17:37,086][12667] Updated weights for policy 0, policy_version 1648 (0.0012) [2023-02-22 17:17:39,329][12667] Updated weights for policy 0, policy_version 1658 (0.0012) [2023-02-22 17:17:39,577][00749] Fps is (10 sec: 18432.0, 60 sec: 17954.1, 300 sec: 17433.6). Total num frames: 6795264. Throughput: 0: 4513.4. Samples: 695812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:17:39,579][00749] Avg episode reward: [(0, '27.739')] [2023-02-22 17:17:41,562][12667] Updated weights for policy 0, policy_version 1668 (0.0012) [2023-02-22 17:17:43,706][12667] Updated weights for policy 0, policy_version 1678 (0.0012) [2023-02-22 17:17:44,577][00749] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17451.4). Total num frames: 6885376. Throughput: 0: 4512.4. Samples: 709582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-02-22 17:17:44,580][00749] Avg episode reward: [(0, '28.337')] [2023-02-22 17:17:45,988][12667] Updated weights for policy 0, policy_version 1688 (0.0012) [2023-02-22 17:17:48,333][12667] Updated weights for policy 0, policy_version 1698 (0.0012) [2023-02-22 17:17:49,577][00749] Fps is (10 sec: 17612.7, 60 sec: 18022.4, 300 sec: 17444.1). Total num frames: 6971392. Throughput: 0: 4523.0. Samples: 736784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:17:49,579][00749] Avg episode reward: [(0, '28.114')] [2023-02-22 17:17:50,843][12667] Updated weights for policy 0, policy_version 1708 (0.0011) [2023-02-22 17:17:53,176][12667] Updated weights for policy 0, policy_version 1718 (0.0011) [2023-02-22 17:17:54,577][00749] Fps is (10 sec: 17203.2, 60 sec: 17954.1, 300 sec: 17437.3). Total num frames: 7057408. Throughput: 0: 4509.1. Samples: 762402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:17:54,580][00749] Avg episode reward: [(0, '32.414')] [2023-02-22 17:17:54,583][12651] Saving new best policy, reward=32.414! [2023-02-22 17:17:55,501][12667] Updated weights for policy 0, policy_version 1728 (0.0012) [2023-02-22 17:17:57,701][12667] Updated weights for policy 0, policy_version 1738 (0.0012) [2023-02-22 17:17:59,577][00749] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17476.3). Total num frames: 7151616. Throughput: 0: 4507.6. Samples: 775880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 17:17:59,580][00749] Avg episode reward: [(0, '28.919')] [2023-02-22 17:17:59,845][12667] Updated weights for policy 0, policy_version 1748 (0.0011) [2023-02-22 17:18:02,052][12667] Updated weights for policy 0, policy_version 1758 (0.0011) [2023-02-22 17:18:04,246][12667] Updated weights for policy 0, policy_version 1768 (0.0012) [2023-02-22 17:18:04,577][00749] Fps is (10 sec: 18841.6, 60 sec: 18090.7, 300 sec: 17513.2). Total num frames: 7245824. Throughput: 0: 4544.5. Samples: 804160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:04,580][00749] Avg episode reward: [(0, '28.644')] [2023-02-22 17:18:06,555][12667] Updated weights for policy 0, policy_version 1778 (0.0011) [2023-02-22 17:18:08,850][12667] Updated weights for policy 0, policy_version 1788 (0.0012) [2023-02-22 17:18:09,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18158.9, 300 sec: 17526.6). Total num frames: 7335936. Throughput: 0: 4552.0. Samples: 831130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:09,579][00749] Avg episode reward: [(0, '30.284')] [2023-02-22 17:18:11,187][12667] Updated weights for policy 0, policy_version 1798 (0.0012) [2023-02-22 17:18:13,364][12667] Updated weights for policy 0, policy_version 1808 (0.0011) [2023-02-22 17:18:14,577][00749] Fps is (10 sec: 18022.3, 60 sec: 18090.7, 300 sec: 17539.3). Total num frames: 7426048. Throughput: 0: 4531.5. Samples: 844526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:14,580][00749] Avg episode reward: [(0, '29.378')] [2023-02-22 17:18:15,621][12667] Updated weights for policy 0, policy_version 1818 (0.0011) [2023-02-22 17:18:17,847][12667] Updated weights for policy 0, policy_version 1828 (0.0012) [2023-02-22 17:18:19,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18158.9, 300 sec: 17551.4). Total num frames: 7516160. Throughput: 0: 4519.9. Samples: 872128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:18:19,579][00749] Avg episode reward: [(0, '29.589')] [2023-02-22 17:18:20,064][12667] Updated weights for policy 0, policy_version 1838 (0.0011) [2023-02-22 17:18:22,368][12667] Updated weights for policy 0, policy_version 1848 (0.0012) [2023-02-22 17:18:24,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18158.9, 300 sec: 17562.8). Total num frames: 7606272. Throughput: 0: 4520.6. Samples: 899238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:24,579][00749] Avg episode reward: [(0, '26.927')] [2023-02-22 17:18:24,673][12667] Updated weights for policy 0, policy_version 1858 (0.0012) [2023-02-22 17:18:27,058][12667] Updated weights for policy 0, policy_version 1868 (0.0013) [2023-02-22 17:18:29,269][12667] Updated weights for policy 0, policy_version 1878 (0.0012) [2023-02-22 17:18:29,577][00749] Fps is (10 sec: 18022.6, 60 sec: 18090.7, 300 sec: 17573.8). Total num frames: 7696384. Throughput: 0: 4499.2. Samples: 912046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:29,580][00749] Avg episode reward: [(0, '29.100')] [2023-02-22 17:18:31,440][12667] Updated weights for policy 0, policy_version 1888 (0.0011) [2023-02-22 17:18:33,641][12667] Updated weights for policy 0, policy_version 1898 (0.0012) [2023-02-22 17:18:34,577][00749] Fps is (10 sec: 18432.1, 60 sec: 18090.7, 300 sec: 17603.3). Total num frames: 7790592. Throughput: 0: 4519.6. Samples: 940164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:34,580][00749] Avg episode reward: [(0, '31.288')] [2023-02-22 17:18:35,884][12667] Updated weights for policy 0, policy_version 1908 (0.0011) [2023-02-22 17:18:38,093][12667] Updated weights for policy 0, policy_version 1918 (0.0011) [2023-02-22 17:18:39,577][00749] Fps is (10 sec: 18431.8, 60 sec: 18090.7, 300 sec: 17612.8). Total num frames: 7880704. Throughput: 0: 4557.6. Samples: 967494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:39,579][00749] Avg episode reward: [(0, '32.394')] [2023-02-22 17:18:39,589][12651] Saving new best policy, reward=32.494! [2023-02-22 17:18:40,430][12667] Updated weights for policy 0, policy_version 1928 (0.0012) [2023-02-22 17:18:42,813][12667] Updated weights for policy 0, policy_version 1938 (0.0011) [2023-02-22 17:18:44,577][00749] Fps is (10 sec: 17612.7, 60 sec: 18022.4, 300 sec: 17603.7). Total num frames: 7966720. Throughput: 0: 4548.8. Samples: 980576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:44,579][00749] Avg episode reward: [(0, '29.406')] [2023-02-22 17:18:45,116][12667] Updated weights for policy 0, policy_version 1948 (0.0011) [2023-02-22 17:18:47,365][12667] Updated weights for policy 0, policy_version 1958 (0.0012) [2023-02-22 17:18:49,539][12667] Updated weights for policy 0, policy_version 1968 (0.0011) [2023-02-22 17:18:49,577][00749] Fps is (10 sec: 18022.5, 60 sec: 18158.9, 300 sec: 17630.6). Total num frames: 8060928. Throughput: 0: 4516.9. Samples: 1007422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-02-22 17:18:49,580][00749] Avg episode reward: [(0, '27.639')] [2023-02-22 17:18:49,590][12651] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001968_8060928.pth... [2023-02-22 17:18:49,659][12651] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-02-22 17:18:51,771][12667] Updated weights for policy 0, policy_version 1978 (0.0011) [2023-02-22 17:18:53,890][12667] Updated weights for policy 0, policy_version 1988 (0.0011) [2023-02-22 17:18:54,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18227.2, 300 sec: 17638.9). Total num frames: 8151040. Throughput: 0: 4544.4. Samples: 1035628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:18:54,580][00749] Avg episode reward: [(0, '29.411')] [2023-02-22 17:18:56,244][12667] Updated weights for policy 0, policy_version 1998 (0.0012) [2023-02-22 17:18:58,601][12667] Updated weights for policy 0, policy_version 2008 (0.0012) [2023-02-22 17:18:59,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18159.0, 300 sec: 17646.9). Total num frames: 8241152. Throughput: 0: 4534.8. Samples: 1048592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:18:59,579][00749] Avg episode reward: [(0, '30.931')] [2023-02-22 17:19:00,851][12667] Updated weights for policy 0, policy_version 2018 (0.0012) [2023-02-22 17:19:03,046][12667] Updated weights for policy 0, policy_version 2028 (0.0011) [2023-02-22 17:19:04,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18158.9, 300 sec: 17671.3). Total num frames: 8335360. Throughput: 0: 4532.5. Samples: 1076092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:04,579][00749] Avg episode reward: [(0, '30.807')] [2023-02-22 17:19:05,188][12667] Updated weights for policy 0, policy_version 2038 (0.0011) [2023-02-22 17:19:07,363][12667] Updated weights for policy 0, policy_version 2048 (0.0011) [2023-02-22 17:19:09,461][12667] Updated weights for policy 0, policy_version 2058 (0.0011) [2023-02-22 17:19:09,577][00749] Fps is (10 sec: 18841.7, 60 sec: 18227.2, 300 sec: 17694.7). Total num frames: 8429568. Throughput: 0: 4565.2. Samples: 1104670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:09,579][00749] Avg episode reward: [(0, '28.827')] [2023-02-22 17:19:11,724][12667] Updated weights for policy 0, policy_version 2068 (0.0012) [2023-02-22 17:19:14,033][12667] Updated weights for policy 0, policy_version 2078 (0.0012) [2023-02-22 17:19:14,577][00749] Fps is (10 sec: 18431.9, 60 sec: 18227.2, 300 sec: 17701.1). Total num frames: 8519680. Throughput: 0: 4585.8. Samples: 1118406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:19:14,580][00749] Avg episode reward: [(0, '28.564')] [2023-02-22 17:19:16,264][12667] Updated weights for policy 0, policy_version 2088 (0.0012) [2023-02-22 17:19:18,449][12667] Updated weights for policy 0, policy_version 2098 (0.0011) [2023-02-22 17:19:19,577][00749] Fps is (10 sec: 18431.9, 60 sec: 18295.5, 300 sec: 17723.1). Total num frames: 8613888. Throughput: 0: 4569.0. Samples: 1145770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:19:19,579][00749] Avg episode reward: [(0, '30.227')] [2023-02-22 17:19:20,669][12667] Updated weights for policy 0, policy_version 2108 (0.0012) [2023-02-22 17:19:22,816][12667] Updated weights for policy 0, policy_version 2118 (0.0012) [2023-02-22 17:19:24,577][00749] Fps is (10 sec: 18841.7, 60 sec: 18363.7, 300 sec: 17744.2). Total num frames: 8708096. Throughput: 0: 4586.8. Samples: 1173898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:19:24,581][00749] Avg episode reward: [(0, '31.986')] [2023-02-22 17:19:24,965][12667] Updated weights for policy 0, policy_version 2128 (0.0011) [2023-02-22 17:19:27,183][12667] Updated weights for policy 0, policy_version 2138 (0.0012) [2023-02-22 17:19:29,462][12667] Updated weights for policy 0, policy_version 2148 (0.0012) [2023-02-22 17:19:29,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18363.7, 300 sec: 17749.3). Total num frames: 8798208. Throughput: 0: 4608.6. Samples: 1187962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-02-22 17:19:29,580][00749] Avg episode reward: [(0, '28.608')] [2023-02-22 17:19:31,779][12667] Updated weights for policy 0, policy_version 2158 (0.0012) [2023-02-22 17:19:34,102][12667] Updated weights for policy 0, policy_version 2168 (0.0012) [2023-02-22 17:19:34,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18295.5, 300 sec: 17754.3). Total num frames: 8888320. Throughput: 0: 4602.4. Samples: 1214530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:34,580][00749] Avg episode reward: [(0, '29.909')] [2023-02-22 17:19:36,277][12667] Updated weights for policy 0, policy_version 2178 (0.0012) [2023-02-22 17:19:38,406][12667] Updated weights for policy 0, policy_version 2188 (0.0011) [2023-02-22 17:19:39,577][00749] Fps is (10 sec: 18432.0, 60 sec: 18363.7, 300 sec: 17773.7). Total num frames: 8982528. Throughput: 0: 4605.9. Samples: 1242892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:39,580][00749] Avg episode reward: [(0, '29.624')] [2023-02-22 17:19:40,610][12667] Updated weights for policy 0, policy_version 2198 (0.0012) [2023-02-22 17:19:42,787][12667] Updated weights for policy 0, policy_version 2208 (0.0011) [2023-02-22 17:19:44,577][00749] Fps is (10 sec: 18431.9, 60 sec: 18432.0, 300 sec: 17778.1). Total num frames: 9072640. Throughput: 0: 4626.1. Samples: 1256766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:44,579][00749] Avg episode reward: [(0, '33.328')] [2023-02-22 17:19:44,581][12651] Saving new best policy, reward=33.328! [2023-02-22 17:19:45,108][12667] Updated weights for policy 0, policy_version 2218 (0.0012) [2023-02-22 17:19:47,517][12667] Updated weights for policy 0, policy_version 2228 (0.0012) [2023-02-22 17:19:49,577][00749] Fps is (10 sec: 17612.8, 60 sec: 18295.5, 300 sec: 17768.2). Total num frames: 9158656. Throughput: 0: 4602.1. Samples: 1283186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:49,579][00749] Avg episode reward: [(0, '31.378')] [2023-02-22 17:19:49,824][12667] Updated weights for policy 0, policy_version 2238 (0.0012) [2023-02-22 17:19:52,022][12667] Updated weights for policy 0, policy_version 2248 (0.0011) [2023-02-22 17:19:54,231][12667] Updated weights for policy 0, policy_version 2258 (0.0012) [2023-02-22 17:19:54,577][00749] Fps is (10 sec: 18022.6, 60 sec: 18363.8, 300 sec: 17786.4). Total num frames: 9252864. Throughput: 0: 4578.4. Samples: 1310700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:19:54,579][00749] Avg episode reward: [(0, '30.788')] [2023-02-22 17:19:56,405][12667] Updated weights for policy 0, policy_version 2268 (0.0011) [2023-02-22 17:19:58,587][12667] Updated weights for policy 0, policy_version 2278 (0.0012) [2023-02-22 17:19:59,577][00749] Fps is (10 sec: 18841.5, 60 sec: 18432.0, 300 sec: 18105.7). Total num frames: 9347072. Throughput: 0: 4587.8. Samples: 1324858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:19:59,581][00749] Avg episode reward: [(0, '28.207')] [2023-02-22 17:20:00,925][12667] Updated weights for policy 0, policy_version 2288 (0.0012) [2023-02-22 17:20:03,246][12667] Updated weights for policy 0, policy_version 2298 (0.0012) [2023-02-22 17:20:04,577][00749] Fps is (10 sec: 18022.3, 60 sec: 18295.5, 300 sec: 18175.1). Total num frames: 9433088. Throughput: 0: 4573.4. Samples: 1351574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:20:04,579][00749] Avg episode reward: [(0, '26.676')] [2023-02-22 17:20:05,561][12667] Updated weights for policy 0, policy_version 2308 (0.0012) [2023-02-22 17:20:07,782][12667] Updated weights for policy 0, policy_version 2318 (0.0012) [2023-02-22 17:20:09,577][00749] Fps is (10 sec: 18022.5, 60 sec: 18295.4, 300 sec: 18175.1). Total num frames: 9527296. Throughput: 0: 4556.0. Samples: 1378920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:20:09,585][00749] Avg episode reward: [(0, '28.470')] [2023-02-22 17:20:09,991][12667] Updated weights for policy 0, policy_version 2328 (0.0012) [2023-02-22 17:20:12,136][12667] Updated weights for policy 0, policy_version 2338 (0.0011) [2023-02-22 17:20:14,275][12667] Updated weights for policy 0, policy_version 2348 (0.0012) [2023-02-22 17:20:14,577][00749] Fps is (10 sec: 18841.6, 60 sec: 18363.7, 300 sec: 18189.0). Total num frames: 9621504. Throughput: 0: 4560.5. Samples: 1393186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:20:14,579][00749] Avg episode reward: [(0, '28.858')] [2023-02-22 17:20:16,534][12667] Updated weights for policy 0, policy_version 2358 (0.0012) [2023-02-22 17:20:18,944][12667] Updated weights for policy 0, policy_version 2368 (0.0012) [2023-02-22 17:20:19,577][00749] Fps is (10 sec: 18022.5, 60 sec: 18227.2, 300 sec: 18175.1). Total num frames: 9707520. Throughput: 0: 4577.1. Samples: 1420498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:20:19,580][00749] Avg episode reward: [(0, '29.435')] [2023-02-22 17:20:21,316][12667] Updated weights for policy 0, policy_version 2378 (0.0012) [2023-02-22 17:20:23,584][12667] Updated weights for policy 0, policy_version 2388 (0.0011) [2023-02-22 17:20:24,577][00749] Fps is (10 sec: 17612.8, 60 sec: 18158.9, 300 sec: 18175.1). Total num frames: 9797632. Throughput: 0: 4529.2. Samples: 1446706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:20:24,580][00749] Avg episode reward: [(0, '29.524')] [2023-02-22 17:20:25,865][12667] Updated weights for policy 0, policy_version 2398 (0.0011) [2023-02-22 17:20:28,114][12667] Updated weights for policy 0, policy_version 2408 (0.0012) [2023-02-22 17:20:29,577][00749] Fps is (10 sec: 18022.3, 60 sec: 18158.9, 300 sec: 18161.2). Total num frames: 9887744. Throughput: 0: 4520.3. Samples: 1460178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-02-22 17:20:29,580][00749] Avg episode reward: [(0, '32.055')] [2023-02-22 17:20:30,347][12667] Updated weights for policy 0, policy_version 2418 (0.0011) [2023-02-22 17:20:32,616][12667] Updated weights for policy 0, policy_version 2428 (0.0012) [2023-02-22 17:20:34,577][00749] Fps is (10 sec: 18022.4, 60 sec: 18158.9, 300 sec: 18175.1). Total num frames: 9977856. Throughput: 0: 4540.8. Samples: 1487524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-02-22 17:20:34,580][00749] Avg episode reward: [(0, '29.049')] [2023-02-22 17:20:35,006][12667] Updated weights for policy 0, policy_version 2438 (0.0012) [2023-02-22 17:20:36,175][00749] Component Batcher_0 stopped! [2023-02-22 17:20:36,175][12651] Stopping Batcher_0... [2023-02-22 17:20:36,176][12651] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-22 17:20:36,179][12651] Loop batcher_evt_loop terminating... [2023-02-22 17:20:36,188][12666] Stopping RolloutWorker_w0... [2023-02-22 17:20:36,189][12666] Loop rollout_proc0_evt_loop terminating... [2023-02-22 17:20:36,189][12694] Stopping RolloutWorker_w6... [2023-02-22 17:20:36,190][12694] Loop rollout_proc6_evt_loop terminating... [2023-02-22 17:20:36,190][12676] Stopping RolloutWorker_w4... [2023-02-22 17:20:36,191][12676] Loop rollout_proc4_evt_loop terminating... [2023-02-22 17:20:36,191][12667] Weights refcount: 2 0 [2023-02-22 17:20:36,189][00749] Component RolloutWorker_w0 stopped! [2023-02-22 17:20:36,192][12665] Stopping RolloutWorker_w1... [2023-02-22 17:20:36,193][12667] Stopping InferenceWorker_p0-w0... [2023-02-22 17:20:36,193][12665] Loop rollout_proc1_evt_loop terminating... [2023-02-22 17:20:36,193][12667] Loop inference_proc0-0_evt_loop terminating... [2023-02-22 17:20:36,192][00749] Component RolloutWorker_w6 stopped! [2023-02-22 17:20:36,195][12691] Stopping RolloutWorker_w7... [2023-02-22 17:20:36,196][12691] Loop rollout_proc7_evt_loop terminating... [2023-02-22 17:20:36,197][12669] Stopping RolloutWorker_w2... [2023-02-22 17:20:36,198][12669] Loop rollout_proc2_evt_loop terminating... [2023-02-22 17:20:36,195][00749] Component RolloutWorker_w4 stopped! [2023-02-22 17:20:36,201][12688] Stopping RolloutWorker_w5... [2023-02-22 17:20:36,201][12688] Loop rollout_proc5_evt_loop terminating... [2023-02-22 17:20:36,199][00749] Component RolloutWorker_w1 stopped! [2023-02-22 17:20:36,202][00749] Component InferenceWorker_p0-w0 stopped! [2023-02-22 17:20:36,209][00749] Component RolloutWorker_w7 stopped! [2023-02-22 17:20:36,212][00749] Component RolloutWorker_w2 stopped! [2023-02-22 17:20:36,214][00749] Component RolloutWorker_w5 stopped! [2023-02-22 17:20:36,220][12690] Stopping RolloutWorker_w3... [2023-02-22 17:20:36,221][12690] Loop rollout_proc3_evt_loop terminating... [2023-02-22 17:20:36,220][00749] Component RolloutWorker_w3 stopped! [2023-02-22 17:20:36,262][12651] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001438_5890048.pth [2023-02-22 17:20:36,270][12651] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-22 17:20:36,378][12651] Stopping LearnerWorker_p0... [2023-02-22 17:20:36,379][12651] Loop learner_proc0_evt_loop terminating... [2023-02-22 17:20:36,378][00749] Component LearnerWorker_p0 stopped! [2023-02-22 17:20:36,381][00749] Waiting for process learner_proc0 to stop... [2023-02-22 17:20:37,344][00749] Waiting for process inference_proc0-0 to join... [2023-02-22 17:20:37,347][00749] Waiting for process rollout_proc0 to join... [2023-02-22 17:20:37,349][00749] Waiting for process rollout_proc1 to join... [2023-02-22 17:20:37,351][00749] Waiting for process rollout_proc2 to join... [2023-02-22 17:20:37,353][00749] Waiting for process rollout_proc3 to join... [2023-02-22 17:20:37,355][00749] Waiting for process rollout_proc4 to join... [2023-02-22 17:20:37,357][00749] Waiting for process rollout_proc5 to join... [2023-02-22 17:20:37,359][00749] Waiting for process rollout_proc6 to join... [2023-02-22 17:20:37,360][00749] Waiting for process rollout_proc7 to join... [2023-02-22 17:20:37,362][00749] Batcher 0 profile tree view: batching: 24.2280, releasing_batches: 0.0354 [2023-02-22 17:20:37,364][00749] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 5.5906 update_model: 5.5410 weight_update: 0.0012 one_step: 0.0027 handle_policy_step: 306.3254 deserialize: 13.2310, stack: 2.0325, obs_to_device_normalize: 73.5097, forward: 139.4729, send_messages: 23.7876 prepare_outputs: 40.5463 to_cpu: 24.4073 [2023-02-22 17:20:37,366][00749] Learner 0 profile tree view: misc: 0.0084, prepare_batch: 13.7076 train: 35.2174 epoch_init: 0.0095, minibatch_init: 0.0091, losses_postprocess: 0.4432, kl_divergence: 0.5678, after_optimizer: 1.1132 calculate_losses: 11.8007 losses_init: 0.0050, forward_head: 1.7413, bptt_initial: 4.6428, tail: 1.0238, advantages_returns: 0.2733, losses: 1.7093 bptt: 2.1245 bptt_forward_core: 2.0417 update: 20.7020 clip: 1.8659 [2023-02-22 17:20:37,367][00749] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2373, enqueue_policy_requests: 12.2633, env_step: 210.3821, overhead: 15.8349, complete_rollouts: 0.4234 save_policy_outputs: 13.8777 split_output_tensors: 6.7645 [2023-02-22 17:20:37,369][00749] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2384, enqueue_policy_requests: 12.2324, env_step: 210.3702, overhead: 15.7652, complete_rollouts: 0.3998 save_policy_outputs: 13.9366 split_output_tensors: 6.7449 [2023-02-22 17:20:37,370][00749] Loop Runner_EvtLoop terminating... [2023-02-22 17:20:37,372][00749] Runner profile tree view: main_loop: 346.1152 [2023-02-22 17:20:37,373][00749] Collected {0: 10006528}, FPS: 17337.1 [2023-02-22 17:20:50,104][00749] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 17:20:50,106][00749] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 17:20:50,107][00749] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 17:20:50,109][00749] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 17:20:50,110][00749] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 17:20:50,112][00749] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 17:20:50,114][00749] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 17:20:50,115][00749] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 17:20:50,117][00749] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-02-22 17:20:50,119][00749] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-02-22 17:20:50,121][00749] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 17:20:50,122][00749] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 17:20:50,124][00749] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 17:20:50,126][00749] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 17:20:50,128][00749] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 17:20:50,145][00749] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:20:50,148][00749] RunningMeanStd input shape: (1,) [2023-02-22 17:20:50,163][00749] ConvEncoder: input_channels=3 [2023-02-22 17:20:50,207][00749] Conv encoder output size: 512 [2023-02-22 17:20:50,209][00749] Policy head output size: 512 [2023-02-22 17:20:50,233][00749] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-22 17:20:50,732][00749] Num frames 100... [2023-02-22 17:20:50,859][00749] Num frames 200... [2023-02-22 17:20:50,987][00749] Num frames 300... [2023-02-22 17:20:51,114][00749] Num frames 400... [2023-02-22 17:20:51,236][00749] Num frames 500... [2023-02-22 17:20:51,344][00749] Avg episode rewards: #0: 10.440, true rewards: #0: 5.440 [2023-02-22 17:20:51,346][00749] Avg episode reward: 10.440, avg true_objective: 5.440 [2023-02-22 17:20:51,417][00749] Num frames 600... [2023-02-22 17:20:51,537][00749] Num frames 700... [2023-02-22 17:20:51,685][00749] Num frames 800... [2023-02-22 17:20:51,805][00749] Num frames 900... [2023-02-22 17:20:51,925][00749] Num frames 1000... [2023-02-22 17:20:52,048][00749] Num frames 1100... [2023-02-22 17:20:52,172][00749] Num frames 1200... [2023-02-22 17:20:52,287][00749] Avg episode rewards: #0: 10.740, true rewards: #0: 6.240 [2023-02-22 17:20:52,289][00749] Avg episode reward: 10.740, avg true_objective: 6.240 [2023-02-22 17:20:52,358][00749] Num frames 1300... [2023-02-22 17:20:52,480][00749] Num frames 1400... [2023-02-22 17:20:52,597][00749] Num frames 1500... [2023-02-22 17:20:52,718][00749] Num frames 1600... [2023-02-22 17:20:52,836][00749] Num frames 1700... [2023-02-22 17:20:52,955][00749] Num frames 1800... [2023-02-22 17:20:53,073][00749] Num frames 1900... [2023-02-22 17:20:53,196][00749] Num frames 2000... [2023-02-22 17:20:53,318][00749] Num frames 2100... [2023-02-22 17:20:53,437][00749] Num frames 2200... [2023-02-22 17:20:53,555][00749] Num frames 2300... [2023-02-22 17:20:53,679][00749] Num frames 2400... [2023-02-22 17:20:53,799][00749] Num frames 2500... [2023-02-22 17:20:53,965][00749] Avg episode rewards: #0: 17.307, true rewards: #0: 8.640 [2023-02-22 17:20:53,967][00749] Avg episode reward: 17.307, avg true_objective: 8.640 [2023-02-22 17:20:53,980][00749] Num frames 2600... [2023-02-22 17:20:54,101][00749] Num frames 2700... [2023-02-22 17:20:54,218][00749] Num frames 2800... [2023-02-22 17:20:54,338][00749] Num frames 2900... [2023-02-22 17:20:54,458][00749] Num frames 3000... [2023-02-22 17:20:54,573][00749] Num frames 3100... [2023-02-22 17:20:54,702][00749] Num frames 3200... [2023-02-22 17:20:54,873][00749] Avg episode rewards: #0: 15.990, true rewards: #0: 8.240 [2023-02-22 17:20:54,875][00749] Avg episode reward: 15.990, avg true_objective: 8.240 [2023-02-22 17:20:54,882][00749] Num frames 3300... [2023-02-22 17:20:55,004][00749] Num frames 3400... [2023-02-22 17:20:55,127][00749] Num frames 3500... [2023-02-22 17:20:55,250][00749] Num frames 3600... [2023-02-22 17:20:55,372][00749] Num frames 3700... [2023-02-22 17:20:55,491][00749] Num frames 3800... [2023-02-22 17:20:55,610][00749] Num frames 3900... [2023-02-22 17:20:55,732][00749] Num frames 4000... [2023-02-22 17:20:55,849][00749] Num frames 4100... [2023-02-22 17:20:55,969][00749] Num frames 4200... [2023-02-22 17:20:56,086][00749] Num frames 4300... [2023-02-22 17:20:56,205][00749] Num frames 4400... [2023-02-22 17:20:56,280][00749] Avg episode rewards: #0: 17.632, true rewards: #0: 8.832 [2023-02-22 17:20:56,283][00749] Avg episode reward: 17.632, avg true_objective: 8.832 [2023-02-22 17:20:56,386][00749] Num frames 4500... [2023-02-22 17:20:56,507][00749] Num frames 4600... [2023-02-22 17:20:56,629][00749] Num frames 4700... [2023-02-22 17:20:56,751][00749] Num frames 4800... [2023-02-22 17:20:56,866][00749] Num frames 4900... [2023-02-22 17:20:56,983][00749] Num frames 5000... [2023-02-22 17:20:57,102][00749] Num frames 5100... [2023-02-22 17:20:57,227][00749] Num frames 5200... [2023-02-22 17:20:57,350][00749] Num frames 5300... [2023-02-22 17:20:57,474][00749] Num frames 5400... [2023-02-22 17:20:57,596][00749] Num frames 5500... [2023-02-22 17:20:57,734][00749] Num frames 5600... [2023-02-22 17:20:57,857][00749] Num frames 5700... [2023-02-22 17:20:57,981][00749] Num frames 5800... [2023-02-22 17:20:58,112][00749] Num frames 5900... [2023-02-22 17:20:58,252][00749] Avg episode rewards: #0: 21.442, true rewards: #0: 9.942 [2023-02-22 17:20:58,254][00749] Avg episode reward: 21.442, avg true_objective: 9.942 [2023-02-22 17:20:58,299][00749] Num frames 6000... [2023-02-22 17:20:58,420][00749] Num frames 6100... [2023-02-22 17:20:58,542][00749] Num frames 6200... [2023-02-22 17:20:58,662][00749] Num frames 6300... [2023-02-22 17:20:58,794][00749] Num frames 6400... [2023-02-22 17:20:58,910][00749] Num frames 6500... [2023-02-22 17:20:59,033][00749] Num frames 6600... [2023-02-22 17:20:59,159][00749] Num frames 6700... [2023-02-22 17:20:59,280][00749] Num frames 6800... [2023-02-22 17:20:59,409][00749] Num frames 6900... [2023-02-22 17:20:59,538][00749] Num frames 7000... [2023-02-22 17:20:59,668][00749] Num frames 7100... [2023-02-22 17:20:59,798][00749] Num frames 7200... [2023-02-22 17:20:59,893][00749] Avg episode rewards: #0: 22.760, true rewards: #0: 10.331 [2023-02-22 17:20:59,896][00749] Avg episode reward: 22.760, avg true_objective: 10.331 [2023-02-22 17:20:59,987][00749] Num frames 7300... [2023-02-22 17:21:00,117][00749] Num frames 7400... [2023-02-22 17:21:00,242][00749] Num frames 7500... [2023-02-22 17:21:00,376][00749] Num frames 7600... [2023-02-22 17:21:00,507][00749] Num frames 7700... [2023-02-22 17:21:00,639][00749] Num frames 7800... [2023-02-22 17:21:00,776][00749] Num frames 7900... [2023-02-22 17:21:00,899][00749] Num frames 8000... [2023-02-22 17:21:01,025][00749] Num frames 8100... [2023-02-22 17:21:01,155][00749] Num frames 8200... [2023-02-22 17:21:01,292][00749] Num frames 8300... [2023-02-22 17:21:01,459][00749] Avg episode rewards: #0: 23.481, true rewards: #0: 10.481 [2023-02-22 17:21:01,461][00749] Avg episode reward: 23.481, avg true_objective: 10.481 [2023-02-22 17:21:01,483][00749] Num frames 8400... [2023-02-22 17:21:01,613][00749] Num frames 8500... [2023-02-22 17:21:01,735][00749] Num frames 8600... [2023-02-22 17:21:01,862][00749] Num frames 8700... [2023-02-22 17:21:01,989][00749] Num frames 8800... [2023-02-22 17:21:02,123][00749] Num frames 8900... [2023-02-22 17:21:02,254][00749] Num frames 9000... [2023-02-22 17:21:02,381][00749] Num frames 9100... [2023-02-22 17:21:02,507][00749] Num frames 9200... [2023-02-22 17:21:02,632][00749] Num frames 9300... [2023-02-22 17:21:02,767][00749] Num frames 9400... [2023-02-22 17:21:02,901][00749] Num frames 9500... [2023-02-22 17:21:03,073][00749] Avg episode rewards: #0: 24.428, true rewards: #0: 10.650 [2023-02-22 17:21:03,075][00749] Avg episode reward: 24.428, avg true_objective: 10.650 [2023-02-22 17:21:03,097][00749] Num frames 9600... [2023-02-22 17:21:03,230][00749] Num frames 9700... [2023-02-22 17:21:03,355][00749] Num frames 9800... [2023-02-22 17:21:03,477][00749] Num frames 9900... [2023-02-22 17:21:03,604][00749] Num frames 10000... [2023-02-22 17:21:03,723][00749] Num frames 10100... [2023-02-22 17:21:03,844][00749] Num frames 10200... [2023-02-22 17:21:03,987][00749] Num frames 10300... [2023-02-22 17:21:04,117][00749] Num frames 10400... [2023-02-22 17:21:04,195][00749] Avg episode rewards: #0: 23.617, true rewards: #0: 10.417 [2023-02-22 17:21:04,197][00749] Avg episode reward: 23.617, avg true_objective: 10.417 [2023-02-22 17:21:29,053][00749] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2023-02-22 17:22:18,221][00749] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2023-02-22 17:22:18,223][00749] Overriding arg 'num_workers' with value 1 passed from command line [2023-02-22 17:22:18,224][00749] Adding new argument 'no_render'=True that is not in the saved config file! [2023-02-22 17:22:18,226][00749] Adding new argument 'save_video'=True that is not in the saved config file! [2023-02-22 17:22:18,229][00749] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-02-22 17:22:18,232][00749] Adding new argument 'video_name'=None that is not in the saved config file! [2023-02-22 17:22:18,234][00749] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-02-22 17:22:18,235][00749] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-02-22 17:22:18,238][00749] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-02-22 17:22:18,239][00749] Adding new argument 'hf_repository'='marik0/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-02-22 17:22:18,241][00749] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-02-22 17:22:18,242][00749] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-02-22 17:22:18,246][00749] Adding new argument 'train_script'=None that is not in the saved config file! [2023-02-22 17:22:18,248][00749] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-02-22 17:22:18,250][00749] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-02-22 17:22:18,269][00749] RunningMeanStd input shape: (3, 72, 128) [2023-02-22 17:22:18,273][00749] RunningMeanStd input shape: (1,) [2023-02-22 17:22:18,288][00749] ConvEncoder: input_channels=3 [2023-02-22 17:22:18,334][00749] Conv encoder output size: 512 [2023-02-22 17:22:18,336][00749] Policy head output size: 512 [2023-02-22 17:22:18,359][00749] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-02-22 17:22:18,843][00749] Num frames 100... [2023-02-22 17:22:18,969][00749] Num frames 200... [2023-02-22 17:22:19,092][00749] Num frames 300... [2023-02-22 17:22:19,219][00749] Num frames 400... [2023-02-22 17:22:19,357][00749] Num frames 500... [2023-02-22 17:22:19,481][00749] Num frames 600... [2023-02-22 17:22:19,597][00749] Num frames 700... [2023-02-22 17:22:19,713][00749] Num frames 800... [2023-02-22 17:22:19,824][00749] Num frames 900... [2023-02-22 17:22:19,945][00749] Num frames 1000... [2023-02-22 17:22:20,060][00749] Num frames 1100... [2023-02-22 17:22:20,178][00749] Num frames 1200... [2023-02-22 17:22:20,296][00749] Num frames 1300... [2023-02-22 17:22:20,413][00749] Num frames 1400... [2023-02-22 17:22:20,512][00749] Avg episode rewards: #0: 35.400, true rewards: #0: 14.400 [2023-02-22 17:22:20,514][00749] Avg episode reward: 35.400, avg true_objective: 14.400 [2023-02-22 17:22:20,582][00749] Num frames 1500... [2023-02-22 17:22:20,709][00749] Num frames 1600... [2023-02-22 17:22:20,829][00749] Num frames 1700... [2023-02-22 17:22:20,947][00749] Num frames 1800... [2023-02-22 17:22:21,059][00749] Num frames 1900... [2023-02-22 17:22:21,178][00749] Num frames 2000... [2023-02-22 17:22:21,295][00749] Num frames 2100... [2023-02-22 17:22:21,414][00749] Num frames 2200... [2023-02-22 17:22:21,533][00749] Num frames 2300... [2023-02-22 17:22:21,651][00749] Num frames 2400... [2023-02-22 17:22:21,772][00749] Num frames 2500... [2023-02-22 17:22:21,892][00749] Num frames 2600... [2023-02-22 17:22:22,007][00749] Num frames 2700... [2023-02-22 17:22:22,127][00749] Num frames 2800... [2023-02-22 17:22:22,244][00749] Num frames 2900... [2023-02-22 17:22:22,316][00749] Avg episode rewards: #0: 33.560, true rewards: #0: 14.560 [2023-02-22 17:22:22,318][00749] Avg episode reward: 33.560, avg true_objective: 14.560 [2023-02-22 17:22:22,425][00749] Num frames 3000... [2023-02-22 17:22:22,542][00749] Num frames 3100... [2023-02-22 17:22:22,662][00749] Num frames 3200... [2023-02-22 17:22:22,779][00749] Num frames 3300... [2023-02-22 17:22:22,891][00749] Num frames 3400... [2023-02-22 17:22:23,011][00749] Num frames 3500... [2023-02-22 17:22:23,128][00749] Num frames 3600... [2023-02-22 17:22:23,244][00749] Num frames 3700... [2023-02-22 17:22:23,358][00749] Num frames 3800... [2023-02-22 17:22:23,470][00749] Num frames 3900... [2023-02-22 17:22:23,592][00749] Num frames 4000... [2023-02-22 17:22:23,712][00749] Num frames 4100... [2023-02-22 17:22:23,832][00749] Num frames 4200... [2023-02-22 17:22:23,946][00749] Num frames 4300... [2023-02-22 17:22:24,065][00749] Num frames 4400... [2023-02-22 17:22:24,178][00749] Avg episode rewards: #0: 35.493, true rewards: #0: 14.827 [2023-02-22 17:22:24,180][00749] Avg episode reward: 35.493, avg true_objective: 14.827 [2023-02-22 17:22:24,242][00749] Num frames 4500... [2023-02-22 17:22:24,356][00749] Num frames 4600... [2023-02-22 17:22:24,473][00749] Num frames 4700... [2023-02-22 17:22:24,588][00749] Num frames 4800... [2023-02-22 17:22:24,707][00749] Num frames 4900... [2023-02-22 17:22:24,823][00749] Num frames 5000... [2023-02-22 17:22:24,942][00749] Avg episode rewards: #0: 30.140, true rewards: #0: 12.640 [2023-02-22 17:22:24,944][00749] Avg episode reward: 30.140, avg true_objective: 12.640 [2023-02-22 17:22:25,000][00749] Num frames 5100... [2023-02-22 17:22:25,123][00749] Num frames 5200... [2023-02-22 17:22:25,238][00749] Num frames 5300... [2023-02-22 17:22:25,361][00749] Num frames 5400... [2023-02-22 17:22:25,481][00749] Num frames 5500... [2023-02-22 17:22:25,599][00749] Num frames 5600... [2023-02-22 17:22:25,718][00749] Num frames 5700... [2023-02-22 17:22:25,849][00749] Num frames 5800... [2023-02-22 17:22:25,961][00749] Num frames 5900... [2023-02-22 17:22:26,075][00749] Num frames 6000... [2023-02-22 17:22:26,198][00749] Num frames 6100... [2023-02-22 17:22:26,273][00749] Avg episode rewards: #0: 29.428, true rewards: #0: 12.228 [2023-02-22 17:22:26,276][00749] Avg episode reward: 29.428, avg true_objective: 12.228 [2023-02-22 17:22:26,378][00749] Num frames 6200... [2023-02-22 17:22:26,495][00749] Num frames 6300... [2023-02-22 17:22:26,611][00749] Num frames 6400... [2023-02-22 17:22:26,727][00749] Num frames 6500... [2023-02-22 17:22:26,843][00749] Num frames 6600... [2023-02-22 17:22:26,955][00749] Num frames 6700... [2023-02-22 17:22:27,073][00749] Num frames 6800... [2023-02-22 17:22:27,190][00749] Num frames 6900... [2023-02-22 17:22:27,305][00749] Num frames 7000... [2023-02-22 17:22:27,418][00749] Num frames 7100... [2023-02-22 17:22:27,539][00749] Num frames 7200... [2023-02-22 17:22:27,602][00749] Avg episode rewards: #0: 29.008, true rewards: #0: 12.008 [2023-02-22 17:22:27,603][00749] Avg episode reward: 29.008, avg true_objective: 12.008 [2023-02-22 17:22:27,712][00749] Num frames 7300... [2023-02-22 17:22:27,825][00749] Num frames 7400... [2023-02-22 17:22:27,941][00749] Num frames 7500... [2023-02-22 17:22:28,054][00749] Num frames 7600... [2023-02-22 17:22:28,167][00749] Num frames 7700... [2023-02-22 17:22:28,283][00749] Num frames 7800... [2023-02-22 17:22:28,400][00749] Num frames 7900... [2023-02-22 17:22:28,520][00749] Num frames 8000... [2023-02-22 17:22:28,634][00749] Num frames 8100... [2023-02-22 17:22:28,755][00749] Num frames 8200... [2023-02-22 17:22:28,880][00749] Num frames 8300... [2023-02-22 17:22:28,999][00749] Num frames 8400... [2023-02-22 17:22:29,114][00749] Num frames 8500... [2023-02-22 17:22:29,230][00749] Num frames 8600... [2023-02-22 17:22:29,344][00749] Num frames 8700... [2023-02-22 17:22:29,465][00749] Num frames 8800... [2023-02-22 17:22:29,594][00749] Num frames 8900... [2023-02-22 17:22:29,730][00749] Avg episode rewards: #0: 31.664, true rewards: #0: 12.807 [2023-02-22 17:22:29,732][00749] Avg episode reward: 31.664, avg true_objective: 12.807 [2023-02-22 17:22:29,773][00749] Num frames 9000... [2023-02-22 17:22:29,887][00749] Num frames 9100... [2023-02-22 17:22:30,005][00749] Num frames 9200... [2023-02-22 17:22:30,122][00749] Num frames 9300... [2023-02-22 17:22:30,241][00749] Num frames 9400... [2023-02-22 17:22:30,355][00749] Num frames 9500... [2023-02-22 17:22:30,468][00749] Num frames 9600... [2023-02-22 17:22:30,586][00749] Num frames 9700... [2023-02-22 17:22:30,709][00749] Num frames 9800... [2023-02-22 17:22:30,827][00749] Num frames 9900... [2023-02-22 17:22:30,948][00749] Num frames 10000... [2023-02-22 17:22:31,074][00749] Num frames 10100... [2023-02-22 17:22:31,198][00749] Num frames 10200... [2023-02-22 17:22:31,320][00749] Num frames 10300... [2023-02-22 17:22:31,442][00749] Num frames 10400... [2023-02-22 17:22:31,564][00749] Num frames 10500... [2023-02-22 17:22:31,685][00749] Num frames 10600... [2023-02-22 17:22:31,817][00749] Num frames 10700... [2023-02-22 17:22:31,963][00749] Num frames 10800... [2023-02-22 17:22:32,089][00749] Num frames 10900... [2023-02-22 17:22:32,169][00749] Avg episode rewards: #0: 33.646, true rewards: #0: 13.646 [2023-02-22 17:22:32,171][00749] Avg episode reward: 33.646, avg true_objective: 13.646 [2023-02-22 17:22:32,271][00749] Num frames 11000... [2023-02-22 17:22:32,392][00749] Num frames 11100... [2023-02-22 17:22:32,511][00749] Num frames 11200... [2023-02-22 17:22:32,627][00749] Num frames 11300... [2023-02-22 17:22:32,746][00749] Num frames 11400... [2023-02-22 17:22:32,863][00749] Num frames 11500... [2023-02-22 17:22:32,982][00749] Num frames 11600... [2023-02-22 17:22:33,098][00749] Num frames 11700... [2023-02-22 17:22:33,211][00749] Num frames 11800... [2023-02-22 17:22:33,329][00749] Num frames 11900... [2023-02-22 17:22:33,451][00749] Num frames 12000... [2023-02-22 17:22:33,569][00749] Num frames 12100... [2023-02-22 17:22:33,685][00749] Num frames 12200... [2023-02-22 17:22:33,802][00749] Num frames 12300... [2023-02-22 17:22:33,918][00749] Num frames 12400... [2023-02-22 17:22:34,043][00749] Num frames 12500... [2023-02-22 17:22:34,161][00749] Num frames 12600... [2023-02-22 17:22:34,280][00749] Num frames 12700... [2023-02-22 17:22:34,398][00749] Num frames 12800... [2023-02-22 17:22:34,518][00749] Num frames 12900... [2023-02-22 17:22:34,651][00749] Avg episode rewards: #0: 35.294, true rewards: #0: 14.406 [2023-02-22 17:22:34,654][00749] Avg episode reward: 35.294, avg true_objective: 14.406 [2023-02-22 17:22:34,698][00749] Num frames 13000... [2023-02-22 17:22:34,811][00749] Num frames 13100... [2023-02-22 17:22:34,927][00749] Num frames 13200... [2023-02-22 17:22:35,040][00749] Num frames 13300... [2023-02-22 17:22:35,159][00749] Num frames 13400... [2023-02-22 17:22:35,281][00749] Num frames 13500... [2023-02-22 17:22:35,401][00749] Num frames 13600... [2023-02-22 17:22:35,519][00749] Num frames 13700... [2023-02-22 17:22:35,633][00749] Num frames 13800... [2023-02-22 17:22:35,755][00749] Num frames 13900... [2023-02-22 17:22:35,886][00749] Num frames 14000... [2023-02-22 17:22:36,007][00749] Num frames 14100... [2023-02-22 17:22:36,161][00749] Avg episode rewards: #0: 34.181, true rewards: #0: 14.181 [2023-02-22 17:22:36,163][00749] Avg episode reward: 34.181, avg true_objective: 14.181 [2023-02-22 17:23:09,875][00749] Replay video saved to /content/train_dir/default_experiment/replay.mp4!