spenning's picture
Upload folder using huggingface_hub
84b70e6 verified
raw
history blame
119 kB
[2025-01-03 20:13:15,252][122130] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json...
[2025-01-03 20:13:15,253][122130] Rollout worker 0 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 1 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 2 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 3 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 4 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 5 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 6 uses device cpu
[2025-01-03 20:13:15,253][122130] Rollout worker 7 uses device cpu
[2025-01-03 20:13:15,310][122130] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:13:15,311][122130] InferenceWorker_p0-w0: min num requests: 2
[2025-01-03 20:13:15,334][122130] Starting all processes...
[2025-01-03 20:13:15,334][122130] Starting process learner_proc0
[2025-01-03 20:13:16,937][122130] Starting all processes...
[2025-01-03 20:13:16,945][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:13:16,945][122176] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-01-03 20:13:16,949][122130] Starting process inference_proc0-0
[2025-01-03 20:13:16,949][122130] Starting process rollout_proc0
[2025-01-03 20:13:16,949][122130] Starting process rollout_proc1
[2025-01-03 20:13:16,949][122130] Starting process rollout_proc2
[2025-01-03 20:13:16,949][122130] Starting process rollout_proc3
[2025-01-03 20:13:16,950][122130] Starting process rollout_proc4
[2025-01-03 20:13:16,959][122176] Num visible devices: 1
[2025-01-03 20:13:16,950][122130] Starting process rollout_proc5
[2025-01-03 20:13:16,964][122176] Starting seed is not provided
[2025-01-03 20:13:16,965][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:13:16,965][122176] Initializing actor-critic model on device cuda:0
[2025-01-03 20:13:16,950][122130] Starting process rollout_proc6
[2025-01-03 20:13:16,965][122176] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:13:16,952][122130] Starting process rollout_proc7
[2025-01-03 20:13:16,966][122176] RunningMeanStd input shape: (1,)
[2025-01-03 20:13:16,988][122176] ConvEncoder: input_channels=3
[2025-01-03 20:13:17,165][122176] Conv encoder output size: 512
[2025-01-03 20:13:17,166][122176] Policy head output size: 512
[2025-01-03 20:13:17,212][122176] Created Actor Critic model with architecture:
[2025-01-03 20:13:17,213][122176] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-01-03 20:13:17,448][122176] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-01-03 20:13:19,073][122176] No checkpoints found
[2025-01-03 20:13:19,073][122176] Did not load from checkpoint, starting from scratch!
[2025-01-03 20:13:19,074][122176] Initialized policy 0 weights for model version 0
[2025-01-03 20:13:19,077][122176] LearnerWorker_p0 finished initialization!
[2025-01-03 20:13:19,078][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:13:19,617][122217] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,657][122201] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,678][122218] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,699][122200] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,714][122216] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,738][122202] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:13:19,739][122202] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-01-03 20:13:19,751][122202] Num visible devices: 1
[2025-01-03 20:13:19,780][122219] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,832][122220] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,839][122202] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:13:19,840][122202] RunningMeanStd input shape: (1,)
[2025-01-03 20:13:19,848][122202] ConvEncoder: input_channels=3
[2025-01-03 20:13:19,862][122215] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:13:19,940][122202] Conv encoder output size: 512
[2025-01-03 20:13:19,940][122202] Policy head output size: 512
[2025-01-03 20:13:19,966][122130] Inference worker 0-0 is ready!
[2025-01-03 20:13:19,966][122130] All inference workers are ready! Signal rollout workers to start!
[2025-01-03 20:13:19,996][122217] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:19,996][122218] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:19,997][122220] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,010][122219] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,015][122216] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,015][122200] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,015][122201] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,015][122215] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:13:20,342][122200] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,342][122218] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,342][122220] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,342][122219] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,342][122215] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,563][122200] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,566][122220] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,627][122219] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,627][122215] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,676][122218] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,692][122216] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,867][122217] Decorrelating experience for 0 frames...
[2025-01-03 20:13:20,874][122200] Decorrelating experience for 64 frames...
[2025-01-03 20:13:20,914][122216] Decorrelating experience for 32 frames...
[2025-01-03 20:13:20,915][122219] Decorrelating experience for 64 frames...
[2025-01-03 20:13:20,975][122218] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,120][122220] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,151][122217] Decorrelating experience for 32 frames...
[2025-01-03 20:13:21,152][122201] Decorrelating experience for 0 frames...
[2025-01-03 20:13:21,203][122215] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,318][122216] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,341][122218] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,382][122220] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,391][122201] Decorrelating experience for 32 frames...
[2025-01-03 20:13:21,476][122215] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,500][122217] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,559][122130] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-01-03 20:13:21,589][122216] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,648][122219] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,765][122217] Decorrelating experience for 96 frames...
[2025-01-03 20:13:21,842][122201] Decorrelating experience for 64 frames...
[2025-01-03 20:13:21,883][122200] Decorrelating experience for 96 frames...
[2025-01-03 20:13:22,134][122201] Decorrelating experience for 96 frames...
[2025-01-03 20:13:22,632][122176] Signal inference workers to stop experience collection...
[2025-01-03 20:13:22,666][122202] InferenceWorker_p0-w0: stopping experience collection
[2025-01-03 20:13:25,122][122176] Signal inference workers to resume experience collection...
[2025-01-03 20:13:25,122][122202] InferenceWorker_p0-w0: resuming experience collection
[2025-01-03 20:13:26,559][122130] Fps is (10 sec: 5734.3, 60 sec: 5734.3, 300 sec: 5734.3). Total num frames: 28672. Throughput: 0: 1226.4. Samples: 6132. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-01-03 20:13:26,560][122130] Avg episode reward: [(0, '3.767')]
[2025-01-03 20:13:27,168][122202] Updated weights for policy 0, policy_version 10 (0.0083)
[2025-01-03 20:13:29,549][122202] Updated weights for policy 0, policy_version 20 (0.0013)
[2025-01-03 20:13:31,559][122130] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11468.8). Total num frames: 114688. Throughput: 0: 1908.8. Samples: 19088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-01-03 20:13:31,560][122130] Avg episode reward: [(0, '4.497')]
[2025-01-03 20:13:31,622][122176] Saving new best policy, reward=4.497!
[2025-01-03 20:13:31,871][122202] Updated weights for policy 0, policy_version 30 (0.0012)
[2025-01-03 20:13:34,166][122202] Updated weights for policy 0, policy_version 40 (0.0012)
[2025-01-03 20:13:35,304][122130] Heartbeat connected on Batcher_0
[2025-01-03 20:13:35,307][122130] Heartbeat connected on LearnerWorker_p0
[2025-01-03 20:13:35,314][122130] Heartbeat connected on RolloutWorker_w0
[2025-01-03 20:13:35,316][122130] Heartbeat connected on InferenceWorker_p0-w0
[2025-01-03 20:13:35,317][122130] Heartbeat connected on RolloutWorker_w1
[2025-01-03 20:13:35,321][122130] Heartbeat connected on RolloutWorker_w2
[2025-01-03 20:13:35,324][122130] Heartbeat connected on RolloutWorker_w3
[2025-01-03 20:13:35,325][122130] Heartbeat connected on RolloutWorker_w4
[2025-01-03 20:13:35,327][122130] Heartbeat connected on RolloutWorker_w5
[2025-01-03 20:13:35,330][122130] Heartbeat connected on RolloutWorker_w6
[2025-01-03 20:13:35,333][122130] Heartbeat connected on RolloutWorker_w7
[2025-01-03 20:13:36,434][122202] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-01-03 20:13:36,559][122130] Fps is (10 sec: 17613.0, 60 sec: 13653.4, 300 sec: 13653.4). Total num frames: 204800. Throughput: 0: 3055.1. Samples: 45826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-01-03 20:13:36,559][122130] Avg episode reward: [(0, '4.481')]
[2025-01-03 20:13:38,693][122202] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-01-03 20:13:40,330][122130] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 122130], exiting...
[2025-01-03 20:13:40,332][122130] Runner profile tree view:
main_loop: 24.9987
[2025-01-03 20:13:40,332][122176] Stopping Batcher_0...
[2025-01-03 20:13:40,333][122176] Loop batcher_evt_loop terminating...
[2025-01-03 20:13:40,332][122130] Collected {0: 270336}, FPS: 10814.0
[2025-01-03 20:13:40,335][122176] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth...
[2025-01-03 20:13:40,359][122218] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,361][122217] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,389][122218] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop
[2025-01-03 20:13:40,386][122200] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,381][122201] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,391][122200] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop
[2025-01-03 20:13:40,392][122201] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop
[2025-01-03 20:13:40,389][122217] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop
[2025-01-03 20:13:40,395][122176] Stopping LearnerWorker_p0...
[2025-01-03 20:13:40,368][122216] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,391][122219] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,373][122215] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,381][122220] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0)
Traceback (most recent call last):
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal
slot_callable(*args)
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts
complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts
new_obs, rewards, terminated, truncated, infos = e.step(actions)
^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step
obs, rew, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step
observation, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step
return self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step
obs, reward, terminated, truncated, info = self.env.step(action)
^^^^^^^^^^^^^^^^^^^^^
File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step
reward = self.game.make_action(actions_flattened, self.skip_frames)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed.
[2025-01-03 20:13:40,396][122176] Loop learner_proc0_evt_loop terminating...
[2025-01-03 20:13:40,396][122219] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop
[2025-01-03 20:13:40,401][122202] Weights refcount: 2 0
[2025-01-03 20:13:40,396][122216] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop
[2025-01-03 20:13:40,396][122215] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop
[2025-01-03 20:13:40,397][122220] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop
[2025-01-03 20:13:40,404][122202] Stopping InferenceWorker_p0-w0...
[2025-01-03 20:13:40,404][122202] Loop inference_proc0-0_evt_loop terminating...
[2025-01-03 20:14:02,830][123391] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json...
[2025-01-03 20:14:02,831][123391] Rollout worker 0 uses device cpu
[2025-01-03 20:14:02,831][123391] Rollout worker 1 uses device cpu
[2025-01-03 20:14:02,831][123391] Rollout worker 2 uses device cpu
[2025-01-03 20:14:02,831][123391] Rollout worker 3 uses device cpu
[2025-01-03 20:14:02,831][123391] Rollout worker 4 uses device cpu
[2025-01-03 20:14:02,832][123391] Rollout worker 5 uses device cpu
[2025-01-03 20:14:02,832][123391] Rollout worker 6 uses device cpu
[2025-01-03 20:14:02,832][123391] Rollout worker 7 uses device cpu
[2025-01-03 20:14:02,880][123391] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:14:02,881][123391] InferenceWorker_p0-w0: min num requests: 2
[2025-01-03 20:14:02,906][123391] Starting all processes...
[2025-01-03 20:14:02,907][123391] Starting process learner_proc0
[2025-01-03 20:14:04,543][123391] Starting all processes...
[2025-01-03 20:14:04,547][123391] Starting process inference_proc0-0
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc0
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc1
[2025-01-03 20:14:04,552][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc2
[2025-01-03 20:14:04,552][123451] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc3
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc4
[2025-01-03 20:14:04,547][123391] Starting process rollout_proc5
[2025-01-03 20:14:04,548][123391] Starting process rollout_proc6
[2025-01-03 20:14:04,548][123391] Starting process rollout_proc7
[2025-01-03 20:14:04,566][123451] Num visible devices: 1
[2025-01-03 20:14:04,574][123451] Starting seed is not provided
[2025-01-03 20:14:04,575][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:14:04,575][123451] Initializing actor-critic model on device cuda:0
[2025-01-03 20:14:04,575][123451] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:14:04,576][123451] RunningMeanStd input shape: (1,)
[2025-01-03 20:14:04,593][123451] ConvEncoder: input_channels=3
[2025-01-03 20:14:04,756][123451] Conv encoder output size: 512
[2025-01-03 20:14:04,757][123451] Policy head output size: 512
[2025-01-03 20:14:04,780][123451] Created Actor Critic model with architecture:
[2025-01-03 20:14:04,781][123451] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-01-03 20:14:04,947][123451] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-01-03 20:14:06,623][123451] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth...
[2025-01-03 20:14:06,671][123451] Loading model from checkpoint
[2025-01-03 20:14:06,673][123451] Loaded experiment state at self.train_step=67, self.env_steps=274432
[2025-01-03 20:14:06,678][123451] Initialized policy 0 weights for model version 67
[2025-01-03 20:14:06,684][123451] LearnerWorker_p0 finished initialization!
[2025-01-03 20:14:06,686][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:14:07,189][123481] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,217][123483] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,241][123477] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,363][123479] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,365][123478] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:14:07,366][123478] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-01-03 20:14:07,381][123480] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,381][123478] Num visible devices: 1
[2025-01-03 20:14:07,393][123496] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,438][123495] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,496][123478] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:14:07,497][123478] RunningMeanStd input shape: (1,)
[2025-01-03 20:14:07,507][123478] ConvEncoder: input_channels=3
[2025-01-03 20:14:07,517][123482] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:14:07,599][123478] Conv encoder output size: 512
[2025-01-03 20:14:07,599][123478] Policy head output size: 512
[2025-01-03 20:14:07,627][123391] Inference worker 0-0 is ready!
[2025-01-03 20:14:07,627][123391] All inference workers are ready! Signal rollout workers to start!
[2025-01-03 20:14:07,658][123496] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,658][123477] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,660][123481] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,669][123479] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,674][123483] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,674][123480] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,675][123482] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,675][123495] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:14:07,964][123495] Decorrelating experience for 0 frames...
[2025-01-03 20:14:07,991][123482] Decorrelating experience for 0 frames...
[2025-01-03 20:14:07,992][123481] Decorrelating experience for 0 frames...
[2025-01-03 20:14:07,994][123479] Decorrelating experience for 0 frames...
[2025-01-03 20:14:07,994][123496] Decorrelating experience for 0 frames...
[2025-01-03 20:14:08,046][123477] Decorrelating experience for 0 frames...
[2025-01-03 20:14:08,201][123495] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,234][123482] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,234][123479] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,279][123480] Decorrelating experience for 0 frames...
[2025-01-03 20:14:08,326][123481] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,375][123477] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,507][123496] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,528][123480] Decorrelating experience for 32 frames...
[2025-01-03 20:14:08,633][123495] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,663][123481] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,677][123477] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,696][123479] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,792][123483] Decorrelating experience for 0 frames...
[2025-01-03 20:14:08,839][123482] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,850][123480] Decorrelating experience for 64 frames...
[2025-01-03 20:14:08,926][123495] Decorrelating experience for 96 frames...
[2025-01-03 20:14:08,966][123481] Decorrelating experience for 96 frames...
[2025-01-03 20:14:08,977][123477] Decorrelating experience for 96 frames...
[2025-01-03 20:14:09,075][123479] Decorrelating experience for 96 frames...
[2025-01-03 20:14:09,128][123480] Decorrelating experience for 96 frames...
[2025-01-03 20:14:09,138][123391] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 274432. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-01-03 20:14:09,188][123496] Decorrelating experience for 64 frames...
[2025-01-03 20:14:09,330][123482] Decorrelating experience for 96 frames...
[2025-01-03 20:14:09,466][123496] Decorrelating experience for 96 frames...
[2025-01-03 20:14:09,642][123483] Decorrelating experience for 32 frames...
[2025-01-03 20:14:09,950][123451] Signal inference workers to stop experience collection...
[2025-01-03 20:14:09,956][123478] InferenceWorker_p0-w0: stopping experience collection
[2025-01-03 20:14:10,026][123483] Decorrelating experience for 64 frames...
[2025-01-03 20:14:10,261][123483] Decorrelating experience for 96 frames...
[2025-01-03 20:14:12,350][123451] Signal inference workers to resume experience collection...
[2025-01-03 20:14:12,350][123478] InferenceWorker_p0-w0: resuming experience collection
[2025-01-03 20:14:14,137][123391] Fps is (10 sec: 6554.0, 60 sec: 6554.0, 300 sec: 6554.0). Total num frames: 307200. Throughput: 0: 1517.3. Samples: 7586. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-01-03 20:14:14,138][123391] Avg episode reward: [(0, '3.922')]
[2025-01-03 20:14:14,400][123478] Updated weights for policy 0, policy_version 77 (0.0073)
[2025-01-03 20:14:16,717][123478] Updated weights for policy 0, policy_version 87 (0.0012)
[2025-01-03 20:14:18,977][123478] Updated weights for policy 0, policy_version 97 (0.0011)
[2025-01-03 20:14:19,138][123391] Fps is (10 sec: 12288.2, 60 sec: 12288.2, 300 sec: 12288.2). Total num frames: 397312. Throughput: 0: 2098.6. Samples: 20986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:14:19,138][123391] Avg episode reward: [(0, '4.210')]
[2025-01-03 20:14:21,279][123478] Updated weights for policy 0, policy_version 107 (0.0012)
[2025-01-03 20:14:22,874][123391] Heartbeat connected on Batcher_0
[2025-01-03 20:14:22,877][123391] Heartbeat connected on LearnerWorker_p0
[2025-01-03 20:14:22,886][123391] Heartbeat connected on RolloutWorker_w0
[2025-01-03 20:14:22,887][123391] Heartbeat connected on InferenceWorker_p0-w0
[2025-01-03 20:14:22,889][123391] Heartbeat connected on RolloutWorker_w1
[2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w3
[2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w2
[2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w4
[2025-01-03 20:14:22,899][123391] Heartbeat connected on RolloutWorker_w5
[2025-01-03 20:14:22,904][123391] Heartbeat connected on RolloutWorker_w6
[2025-01-03 20:14:22,907][123391] Heartbeat connected on RolloutWorker_w7
[2025-01-03 20:14:23,627][123478] Updated weights for policy 0, policy_version 117 (0.0012)
[2025-01-03 20:14:24,137][123391] Fps is (10 sec: 18022.4, 60 sec: 14199.7, 300 sec: 14199.7). Total num frames: 487424. Throughput: 0: 3190.3. Samples: 47854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:14:24,138][123391] Avg episode reward: [(0, '4.456')]
[2025-01-03 20:14:26,023][123478] Updated weights for policy 0, policy_version 127 (0.0012)
[2025-01-03 20:14:28,363][123478] Updated weights for policy 0, policy_version 137 (0.0012)
[2025-01-03 20:14:29,137][123391] Fps is (10 sec: 17613.0, 60 sec: 14950.6, 300 sec: 14950.6). Total num frames: 573440. Throughput: 0: 3672.1. Samples: 73442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:14:29,138][123391] Avg episode reward: [(0, '4.283')]
[2025-01-03 20:14:30,815][123478] Updated weights for policy 0, policy_version 147 (0.0012)
[2025-01-03 20:14:33,155][123478] Updated weights for policy 0, policy_version 157 (0.0012)
[2025-01-03 20:14:34,137][123391] Fps is (10 sec: 17203.2, 60 sec: 15401.1, 300 sec: 15401.1). Total num frames: 659456. Throughput: 0: 3449.0. Samples: 86224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:14:34,138][123391] Avg episode reward: [(0, '4.495')]
[2025-01-03 20:14:35,511][123478] Updated weights for policy 0, policy_version 167 (0.0011)
[2025-01-03 20:14:37,942][123478] Updated weights for policy 0, policy_version 177 (0.0012)
[2025-01-03 20:14:39,137][123391] Fps is (10 sec: 16793.6, 60 sec: 15564.9, 300 sec: 15564.9). Total num frames: 741376. Throughput: 0: 3746.0. Samples: 112380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:14:39,138][123391] Avg episode reward: [(0, '4.625')]
[2025-01-03 20:14:39,156][123451] Saving new best policy, reward=4.625!
[2025-01-03 20:14:40,469][123478] Updated weights for policy 0, policy_version 187 (0.0014)
[2025-01-03 20:14:42,837][123478] Updated weights for policy 0, policy_version 197 (0.0012)
[2025-01-03 20:14:44,137][123391] Fps is (10 sec: 16793.7, 60 sec: 15799.0, 300 sec: 15799.0). Total num frames: 827392. Throughput: 0: 3925.6. Samples: 137394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:14:44,138][123391] Avg episode reward: [(0, '4.720')]
[2025-01-03 20:14:44,138][123451] Saving new best policy, reward=4.720!
[2025-01-03 20:14:45,375][123478] Updated weights for policy 0, policy_version 207 (0.0012)
[2025-01-03 20:14:47,712][123478] Updated weights for policy 0, policy_version 217 (0.0011)
[2025-01-03 20:14:49,137][123391] Fps is (10 sec: 17203.2, 60 sec: 15974.5, 300 sec: 15974.5). Total num frames: 913408. Throughput: 0: 3743.6. Samples: 149744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:14:49,138][123391] Avg episode reward: [(0, '5.010')]
[2025-01-03 20:14:49,142][123451] Saving new best policy, reward=5.010!
[2025-01-03 20:14:50,110][123478] Updated weights for policy 0, policy_version 227 (0.0012)
[2025-01-03 20:14:52,522][123478] Updated weights for policy 0, policy_version 237 (0.0011)
[2025-01-03 20:14:54,137][123391] Fps is (10 sec: 16793.5, 60 sec: 16020.0, 300 sec: 16020.0). Total num frames: 995328. Throughput: 0: 3898.7. Samples: 175440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:14:54,138][123391] Avg episode reward: [(0, '4.543')]
[2025-01-03 20:14:54,913][123478] Updated weights for policy 0, policy_version 247 (0.0012)
[2025-01-03 20:14:57,287][123478] Updated weights for policy 0, policy_version 257 (0.0012)
[2025-01-03 20:14:59,137][123391] Fps is (10 sec: 16793.8, 60 sec: 16138.4, 300 sec: 16138.4). Total num frames: 1081344. Throughput: 0: 4301.7. Samples: 201162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:14:59,138][123391] Avg episode reward: [(0, '4.875')]
[2025-01-03 20:14:59,691][123478] Updated weights for policy 0, policy_version 267 (0.0011)
[2025-01-03 20:15:02,017][123478] Updated weights for policy 0, policy_version 277 (0.0011)
[2025-01-03 20:15:04,137][123391] Fps is (10 sec: 17203.3, 60 sec: 16235.1, 300 sec: 16235.1). Total num frames: 1167360. Throughput: 0: 4294.1. Samples: 214220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:04,138][123391] Avg episode reward: [(0, '4.662')]
[2025-01-03 20:15:04,495][123478] Updated weights for policy 0, policy_version 287 (0.0011)
[2025-01-03 20:15:07,076][123478] Updated weights for policy 0, policy_version 297 (0.0012)
[2025-01-03 20:15:09,137][123391] Fps is (10 sec: 16384.0, 60 sec: 16179.3, 300 sec: 16179.3). Total num frames: 1245184. Throughput: 0: 4236.5. Samples: 238494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:09,138][123391] Avg episode reward: [(0, '4.642')]
[2025-01-03 20:15:09,749][123478] Updated weights for policy 0, policy_version 307 (0.0013)
[2025-01-03 20:15:12,292][123478] Updated weights for policy 0, policy_version 317 (0.0012)
[2025-01-03 20:15:14,137][123391] Fps is (10 sec: 15974.4, 60 sec: 16998.4, 300 sec: 16195.0). Total num frames: 1327104. Throughput: 0: 4197.5. Samples: 262328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:15:14,138][123391] Avg episode reward: [(0, '4.805')]
[2025-01-03 20:15:14,850][123478] Updated weights for policy 0, policy_version 327 (0.0013)
[2025-01-03 20:15:17,316][123478] Updated weights for policy 0, policy_version 337 (0.0013)
[2025-01-03 20:15:19,137][123391] Fps is (10 sec: 16383.8, 60 sec: 16861.9, 300 sec: 16208.5). Total num frames: 1409024. Throughput: 0: 4182.4. Samples: 274432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:15:19,138][123391] Avg episode reward: [(0, '4.847')]
[2025-01-03 20:15:19,645][123478] Updated weights for policy 0, policy_version 347 (0.0012)
[2025-01-03 20:15:21,945][123478] Updated weights for policy 0, policy_version 357 (0.0011)
[2025-01-03 20:15:24,137][123391] Fps is (10 sec: 17203.2, 60 sec: 16861.9, 300 sec: 16329.4). Total num frames: 1499136. Throughput: 0: 4191.3. Samples: 300988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:15:24,138][123391] Avg episode reward: [(0, '4.795')]
[2025-01-03 20:15:24,234][123478] Updated weights for policy 0, policy_version 367 (0.0011)
[2025-01-03 20:15:26,560][123478] Updated weights for policy 0, policy_version 377 (0.0012)
[2025-01-03 20:15:28,869][123478] Updated weights for policy 0, policy_version 387 (0.0011)
[2025-01-03 20:15:29,137][123391] Fps is (10 sec: 18022.6, 60 sec: 16930.2, 300 sec: 16435.3). Total num frames: 1589248. Throughput: 0: 4224.0. Samples: 327472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:29,138][123391] Avg episode reward: [(0, '4.798')]
[2025-01-03 20:15:31,239][123478] Updated weights for policy 0, policy_version 397 (0.0012)
[2025-01-03 20:15:33,572][123478] Updated weights for policy 0, policy_version 407 (0.0011)
[2025-01-03 20:15:34,137][123391] Fps is (10 sec: 17612.8, 60 sec: 16930.1, 300 sec: 16480.4). Total num frames: 1675264. Throughput: 0: 4239.5. Samples: 340520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:34,138][123391] Avg episode reward: [(0, '4.693')]
[2025-01-03 20:15:35,909][123478] Updated weights for policy 0, policy_version 417 (0.0011)
[2025-01-03 20:15:38,226][123478] Updated weights for policy 0, policy_version 427 (0.0011)
[2025-01-03 20:15:39,137][123391] Fps is (10 sec: 17203.3, 60 sec: 16998.4, 300 sec: 16520.6). Total num frames: 1761280. Throughput: 0: 4256.2. Samples: 366970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:15:39,138][123391] Avg episode reward: [(0, '4.856')]
[2025-01-03 20:15:40,590][123478] Updated weights for policy 0, policy_version 437 (0.0011)
[2025-01-03 20:15:42,941][123478] Updated weights for policy 0, policy_version 447 (0.0011)
[2025-01-03 20:15:44,137][123391] Fps is (10 sec: 17203.1, 60 sec: 16998.4, 300 sec: 16556.5). Total num frames: 1847296. Throughput: 0: 4262.8. Samples: 392988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:44,138][123391] Avg episode reward: [(0, '4.676')]
[2025-01-03 20:15:45,299][123478] Updated weights for policy 0, policy_version 457 (0.0011)
[2025-01-03 20:15:48,061][123478] Updated weights for policy 0, policy_version 467 (0.0017)
[2025-01-03 20:15:49,138][123391] Fps is (10 sec: 15973.6, 60 sec: 16793.5, 300 sec: 16465.9). Total num frames: 1921024. Throughput: 0: 4265.5. Samples: 406168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:15:49,139][123391] Avg episode reward: [(0, '4.536')]
[2025-01-03 20:15:52,516][123478] Updated weights for policy 0, policy_version 477 (0.0027)
[2025-01-03 20:15:54,138][123391] Fps is (10 sec: 11878.0, 60 sec: 16179.1, 300 sec: 16110.9). Total num frames: 1966080. Throughput: 0: 4052.7. Samples: 420866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:15:54,138][123391] Avg episode reward: [(0, '4.844')]
[2025-01-03 20:15:56,650][123478] Updated weights for policy 0, policy_version 487 (0.0025)
[2025-01-03 20:15:59,138][123391] Fps is (10 sec: 9830.6, 60 sec: 15633.0, 300 sec: 15862.7). Total num frames: 2019328. Throughput: 0: 3855.9. Samples: 435846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:15:59,138][123391] Avg episode reward: [(0, '4.772')]
[2025-01-03 20:15:59,148][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth...
[2025-01-03 20:16:00,858][123478] Updated weights for policy 0, policy_version 497 (0.0025)
[2025-01-03 20:16:04,138][123391] Fps is (10 sec: 10239.9, 60 sec: 15018.6, 300 sec: 15600.4). Total num frames: 2068480. Throughput: 0: 3751.9. Samples: 443270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:04,139][123391] Avg episode reward: [(0, '4.749')]
[2025-01-03 20:16:04,827][123478] Updated weights for policy 0, policy_version 507 (0.0026)
[2025-01-03 20:16:08,674][123478] Updated weights for policy 0, policy_version 517 (0.0023)
[2025-01-03 20:16:09,138][123391] Fps is (10 sec: 10239.8, 60 sec: 14609.0, 300 sec: 15394.1). Total num frames: 2121728. Throughput: 0: 3511.4. Samples: 459002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:09,138][123391] Avg episode reward: [(0, '4.824')]
[2025-01-03 20:16:12,938][123478] Updated weights for policy 0, policy_version 527 (0.0024)
[2025-01-03 20:16:14,137][123391] Fps is (10 sec: 10240.4, 60 sec: 14062.9, 300 sec: 15171.6). Total num frames: 2170880. Throughput: 0: 3250.3. Samples: 473734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:14,138][123391] Avg episode reward: [(0, '4.571')]
[2025-01-03 20:16:16,715][123478] Updated weights for policy 0, policy_version 537 (0.0023)
[2025-01-03 20:16:19,138][123391] Fps is (10 sec: 10240.0, 60 sec: 13585.0, 300 sec: 14997.7). Total num frames: 2224128. Throughput: 0: 3143.6. Samples: 481984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:19,138][123391] Avg episode reward: [(0, '4.625')]
[2025-01-03 20:16:20,795][123478] Updated weights for policy 0, policy_version 547 (0.0024)
[2025-01-03 20:16:24,138][123391] Fps is (10 sec: 10239.8, 60 sec: 12902.3, 300 sec: 14806.3). Total num frames: 2273280. Throughput: 0: 2891.5. Samples: 497090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:24,138][123391] Avg episode reward: [(0, '4.503')]
[2025-01-03 20:16:24,861][123478] Updated weights for policy 0, policy_version 557 (0.0024)
[2025-01-03 20:16:28,703][123478] Updated weights for policy 0, policy_version 567 (0.0024)
[2025-01-03 20:16:29,137][123391] Fps is (10 sec: 10240.2, 60 sec: 12288.0, 300 sec: 14657.8). Total num frames: 2326528. Throughput: 0: 2660.9. Samples: 512730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:29,138][123391] Avg episode reward: [(0, '4.680')]
[2025-01-03 20:16:32,488][123478] Updated weights for policy 0, policy_version 577 (0.0022)
[2025-01-03 20:16:34,138][123391] Fps is (10 sec: 10649.7, 60 sec: 11741.8, 300 sec: 14519.6). Total num frames: 2379776. Throughput: 0: 2550.3. Samples: 520930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:34,138][123391] Avg episode reward: [(0, '4.610')]
[2025-01-03 20:16:36,211][123478] Updated weights for policy 0, policy_version 587 (0.0024)
[2025-01-03 20:16:39,138][123391] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 14390.6). Total num frames: 2433024. Throughput: 0: 2579.0. Samples: 536922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2025-01-03 20:16:39,138][123391] Avg episode reward: [(0, '4.749')]
[2025-01-03 20:16:40,321][123478] Updated weights for policy 0, policy_version 597 (0.0024)
[2025-01-03 20:16:44,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10581.3, 300 sec: 14243.5). Total num frames: 2482176. Throughput: 0: 2590.7. Samples: 552430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:16:44,138][123391] Avg episode reward: [(0, '4.684')]
[2025-01-03 20:16:44,172][123478] Updated weights for policy 0, policy_version 607 (0.0023)
[2025-01-03 20:16:48,209][123478] Updated weights for policy 0, policy_version 617 (0.0025)
[2025-01-03 20:16:49,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10240.0, 300 sec: 14131.2). Total num frames: 2535424. Throughput: 0: 2596.8. Samples: 560124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:16:49,139][123391] Avg episode reward: [(0, '4.515')]
[2025-01-03 20:16:52,159][123478] Updated weights for policy 0, policy_version 627 (0.0023)
[2025-01-03 20:16:54,138][123391] Fps is (10 sec: 10649.8, 60 sec: 10376.6, 300 sec: 14025.7). Total num frames: 2588672. Throughput: 0: 2587.8. Samples: 575452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:16:54,138][123391] Avg episode reward: [(0, '4.768')]
[2025-01-03 20:16:56,079][123478] Updated weights for policy 0, policy_version 637 (0.0024)
[2025-01-03 20:16:59,138][123391] Fps is (10 sec: 10240.0, 60 sec: 10308.2, 300 sec: 13902.3). Total num frames: 2637824. Throughput: 0: 2610.7. Samples: 591214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:16:59,138][123391] Avg episode reward: [(0, '4.684')]
[2025-01-03 20:17:00,099][123478] Updated weights for policy 0, policy_version 647 (0.0025)
[2025-01-03 20:17:04,122][123478] Updated weights for policy 0, policy_version 657 (0.0024)
[2025-01-03 20:17:04,138][123391] Fps is (10 sec: 10239.9, 60 sec: 10376.6, 300 sec: 13809.4). Total num frames: 2691072. Throughput: 0: 2594.9. Samples: 598754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:04,138][123391] Avg episode reward: [(0, '4.632')]
[2025-01-03 20:17:07,892][123478] Updated weights for policy 0, policy_version 667 (0.0023)
[2025-01-03 20:17:09,138][123391] Fps is (10 sec: 10240.0, 60 sec: 10308.3, 300 sec: 13698.8). Total num frames: 2740224. Throughput: 0: 2609.0. Samples: 614494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:09,138][123391] Avg episode reward: [(0, '4.702')]
[2025-01-03 20:17:12,102][123478] Updated weights for policy 0, policy_version 677 (0.0024)
[2025-01-03 20:17:14,137][123391] Fps is (10 sec: 11059.5, 60 sec: 10513.1, 300 sec: 13660.7). Total num frames: 2801664. Throughput: 0: 2618.0. Samples: 630540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:14,138][123391] Avg episode reward: [(0, '4.415')]
[2025-01-03 20:17:14,820][123478] Updated weights for policy 0, policy_version 687 (0.0014)
[2025-01-03 20:17:17,364][123478] Updated weights for policy 0, policy_version 697 (0.0013)
[2025-01-03 20:17:19,137][123391] Fps is (10 sec: 13926.8, 60 sec: 10922.7, 300 sec: 13710.8). Total num frames: 2879488. Throughput: 0: 2716.2. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:17:19,138][123391] Avg episode reward: [(0, '4.649')]
[2025-01-03 20:17:20,477][123478] Updated weights for policy 0, policy_version 707 (0.0019)
[2025-01-03 20:17:24,138][123391] Fps is (10 sec: 13107.0, 60 sec: 10991.0, 300 sec: 13632.3). Total num frames: 2932736. Throughput: 0: 2785.0. Samples: 662248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:24,138][123391] Avg episode reward: [(0, '4.683')]
[2025-01-03 20:17:24,300][123478] Updated weights for policy 0, policy_version 717 (0.0024)
[2025-01-03 20:17:28,473][123478] Updated weights for policy 0, policy_version 727 (0.0024)
[2025-01-03 20:17:29,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10922.6, 300 sec: 13537.3). Total num frames: 2981888. Throughput: 0: 2769.2. Samples: 677044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:29,138][123391] Avg episode reward: [(0, '4.800')]
[2025-01-03 20:17:31,355][123478] Updated weights for policy 0, policy_version 737 (0.0016)
[2025-01-03 20:17:33,599][123478] Updated weights for policy 0, policy_version 747 (0.0012)
[2025-01-03 20:17:34,138][123391] Fps is (10 sec: 13516.8, 60 sec: 11468.8, 300 sec: 13626.7). Total num frames: 3067904. Throughput: 0: 2845.3. Samples: 688162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:17:34,138][123391] Avg episode reward: [(0, '4.541')]
[2025-01-03 20:17:36,901][123478] Updated weights for policy 0, policy_version 757 (0.0021)
[2025-01-03 20:17:39,138][123391] Fps is (10 sec: 13926.3, 60 sec: 11468.8, 300 sec: 13555.8). Total num frames: 3121152. Throughput: 0: 2971.0. Samples: 709146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:39,138][123391] Avg episode reward: [(0, '4.735')]
[2025-01-03 20:17:40,906][123478] Updated weights for policy 0, policy_version 767 (0.0025)
[2025-01-03 20:17:43,752][123478] Updated weights for policy 0, policy_version 777 (0.0016)
[2025-01-03 20:17:44,137][123391] Fps is (10 sec: 11878.7, 60 sec: 11742.0, 300 sec: 13545.4). Total num frames: 3186688. Throughput: 0: 3026.3. Samples: 727394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:44,138][123391] Avg episode reward: [(0, '4.487')]
[2025-01-03 20:17:45,945][123478] Updated weights for policy 0, policy_version 787 (0.0012)
[2025-01-03 20:17:49,138][123391] Fps is (10 sec: 13926.4, 60 sec: 12083.2, 300 sec: 13572.7). Total num frames: 3260416. Throughput: 0: 3154.9. Samples: 740726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:17:49,138][123391] Avg episode reward: [(0, '4.586')]
[2025-01-03 20:17:49,351][123478] Updated weights for policy 0, policy_version 797 (0.0021)
[2025-01-03 20:17:53,110][123478] Updated weights for policy 0, policy_version 807 (0.0023)
[2025-01-03 20:17:54,138][123391] Fps is (10 sec: 12697.2, 60 sec: 12083.2, 300 sec: 13507.7). Total num frames: 3313664. Throughput: 0: 3174.2. Samples: 757334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:17:54,138][123391] Avg episode reward: [(0, '4.911')]
[2025-01-03 20:17:56,800][123478] Updated weights for policy 0, policy_version 817 (0.0023)
[2025-01-03 20:17:59,138][123391] Fps is (10 sec: 11059.2, 60 sec: 12219.7, 300 sec: 13463.4). Total num frames: 3371008. Throughput: 0: 3184.1. Samples: 773826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:17:59,138][123391] Avg episode reward: [(0, '4.832')]
[2025-01-03 20:17:59,146][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth...
[2025-01-03 20:17:59,216][123451] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth
[2025-01-03 20:18:00,633][123478] Updated weights for policy 0, policy_version 827 (0.0023)
[2025-01-03 20:18:04,138][123391] Fps is (10 sec: 11059.2, 60 sec: 12219.8, 300 sec: 13403.5). Total num frames: 3424256. Throughput: 0: 3083.8. Samples: 781930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:18:04,138][123391] Avg episode reward: [(0, '4.782')]
[2025-01-03 20:18:04,337][123478] Updated weights for policy 0, policy_version 837 (0.0024)
[2025-01-03 20:18:08,141][123478] Updated weights for policy 0, policy_version 847 (0.0023)
[2025-01-03 20:18:09,137][123391] Fps is (10 sec: 11469.2, 60 sec: 12424.6, 300 sec: 13380.3). Total num frames: 3485696. Throughput: 0: 3015.2. Samples: 797932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:18:09,138][123391] Avg episode reward: [(0, '4.528')]
[2025-01-03 20:18:10,675][123478] Updated weights for policy 0, policy_version 857 (0.0015)
[2025-01-03 20:18:14,137][123391] Fps is (10 sec: 12288.1, 60 sec: 12424.5, 300 sec: 13358.0). Total num frames: 3547136. Throughput: 0: 3142.4. Samples: 818452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-01-03 20:18:14,138][123391] Avg episode reward: [(0, '4.461')]
[2025-01-03 20:18:14,195][123478] Updated weights for policy 0, policy_version 867 (0.0022)
[2025-01-03 20:18:17,796][123478] Updated weights for policy 0, policy_version 877 (0.0023)
[2025-01-03 20:18:19,138][123391] Fps is (10 sec: 11878.1, 60 sec: 12083.2, 300 sec: 13320.2). Total num frames: 3604480. Throughput: 0: 3085.5. Samples: 827008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:18:19,138][123391] Avg episode reward: [(0, '4.579')]
[2025-01-03 20:18:21,491][123478] Updated weights for policy 0, policy_version 887 (0.0022)
[2025-01-03 20:18:24,138][123391] Fps is (10 sec: 11058.9, 60 sec: 12083.2, 300 sec: 13267.8). Total num frames: 3657728. Throughput: 0: 2987.6. Samples: 843590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:18:24,139][123391] Avg episode reward: [(0, '4.715')]
[2025-01-03 20:18:25,485][123478] Updated weights for policy 0, policy_version 897 (0.0024)
[2025-01-03 20:18:28,544][123478] Updated weights for policy 0, policy_version 907 (0.0016)
[2025-01-03 20:18:29,137][123391] Fps is (10 sec: 11878.7, 60 sec: 12356.3, 300 sec: 13264.8). Total num frames: 3723264. Throughput: 0: 2980.1. Samples: 861498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:18:29,138][123391] Avg episode reward: [(0, '4.847')]
[2025-01-03 20:18:30,846][123478] Updated weights for policy 0, policy_version 917 (0.0012)
[2025-01-03 20:18:33,146][123478] Updated weights for policy 0, policy_version 927 (0.0012)
[2025-01-03 20:18:34,137][123391] Fps is (10 sec: 15565.4, 60 sec: 12424.6, 300 sec: 13354.5). Total num frames: 3813376. Throughput: 0: 2986.6. Samples: 875122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-01-03 20:18:34,138][123391] Avg episode reward: [(0, '4.613')]
[2025-01-03 20:18:35,449][123478] Updated weights for policy 0, policy_version 937 (0.0012)
[2025-01-03 20:18:37,765][123478] Updated weights for policy 0, policy_version 947 (0.0012)
[2025-01-03 20:18:39,138][123391] Fps is (10 sec: 16793.1, 60 sec: 12834.1, 300 sec: 13395.4). Total num frames: 3891200. Throughput: 0: 3205.3. Samples: 901574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:18:39,138][123391] Avg episode reward: [(0, '4.682')]
[2025-01-03 20:18:41,974][123478] Updated weights for policy 0, policy_version 957 (0.0027)
[2025-01-03 20:18:44,138][123391] Fps is (10 sec: 12287.6, 60 sec: 12492.7, 300 sec: 13315.7). Total num frames: 3936256. Throughput: 0: 3158.8. Samples: 915972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:18:44,138][123391] Avg episode reward: [(0, '4.496')]
[2025-01-03 20:18:46,251][123478] Updated weights for policy 0, policy_version 967 (0.0027)
[2025-01-03 20:18:49,138][123391] Fps is (10 sec: 9830.5, 60 sec: 12151.5, 300 sec: 13268.1). Total num frames: 3989504. Throughput: 0: 3144.5. Samples: 923432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-01-03 20:18:49,138][123391] Avg episode reward: [(0, '4.729')]
[2025-01-03 20:18:50,000][123478] Updated weights for policy 0, policy_version 977 (0.0022)
[2025-01-03 20:18:50,382][123451] Stopping Batcher_0...
[2025-01-03 20:18:50,382][123391] Component Batcher_0 stopped!
[2025-01-03 20:18:50,383][123451] Loop batcher_evt_loop terminating...
[2025-01-03 20:18:50,384][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-01-03 20:18:50,418][123478] Weights refcount: 2 0
[2025-01-03 20:18:50,421][123478] Stopping InferenceWorker_p0-w0...
[2025-01-03 20:18:50,422][123478] Loop inference_proc0-0_evt_loop terminating...
[2025-01-03 20:18:50,424][123391] Component InferenceWorker_p0-w0 stopped!
[2025-01-03 20:18:50,467][123479] Stopping RolloutWorker_w1...
[2025-01-03 20:18:50,467][123391] Component RolloutWorker_w1 stopped!
[2025-01-03 20:18:50,467][123482] Stopping RolloutWorker_w5...
[2025-01-03 20:18:50,468][123391] Component RolloutWorker_w5 stopped!
[2025-01-03 20:18:50,468][123482] Loop rollout_proc5_evt_loop terminating...
[2025-01-03 20:18:50,468][123479] Loop rollout_proc1_evt_loop terminating...
[2025-01-03 20:18:50,469][123451] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth
[2025-01-03 20:18:50,470][123480] Stopping RolloutWorker_w2...
[2025-01-03 20:18:50,470][123391] Component RolloutWorker_w2 stopped!
[2025-01-03 20:18:50,471][123483] Stopping RolloutWorker_w4...
[2025-01-03 20:18:50,471][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-01-03 20:18:50,471][123480] Loop rollout_proc2_evt_loop terminating...
[2025-01-03 20:18:50,471][123495] Stopping RolloutWorker_w7...
[2025-01-03 20:18:50,472][123391] Component RolloutWorker_w4 stopped!
[2025-01-03 20:18:50,472][123483] Loop rollout_proc4_evt_loop terminating...
[2025-01-03 20:18:50,472][123495] Loop rollout_proc7_evt_loop terminating...
[2025-01-03 20:18:50,472][123391] Component RolloutWorker_w7 stopped!
[2025-01-03 20:18:50,473][123496] Stopping RolloutWorker_w6...
[2025-01-03 20:18:50,473][123391] Component RolloutWorker_w6 stopped!
[2025-01-03 20:18:50,474][123496] Loop rollout_proc6_evt_loop terminating...
[2025-01-03 20:18:50,476][123391] Component RolloutWorker_w0 stopped!
[2025-01-03 20:18:50,475][123477] Stopping RolloutWorker_w0...
[2025-01-03 20:18:50,477][123477] Loop rollout_proc0_evt_loop terminating...
[2025-01-03 20:18:50,478][123481] Stopping RolloutWorker_w3...
[2025-01-03 20:18:50,478][123391] Component RolloutWorker_w3 stopped!
[2025-01-03 20:18:50,478][123481] Loop rollout_proc3_evt_loop terminating...
[2025-01-03 20:18:50,560][123451] Stopping LearnerWorker_p0...
[2025-01-03 20:18:50,560][123391] Component LearnerWorker_p0 stopped!
[2025-01-03 20:18:50,560][123451] Loop learner_proc0_evt_loop terminating...
[2025-01-03 20:18:50,566][123391] Waiting for process learner_proc0 to stop...
[2025-01-03 20:18:51,932][123391] Waiting for process inference_proc0-0 to join...
[2025-01-03 20:18:51,932][123391] Waiting for process rollout_proc0 to join...
[2025-01-03 20:18:51,932][123391] Waiting for process rollout_proc1 to join...
[2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc2 to join...
[2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc3 to join...
[2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc4 to join...
[2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc5 to join...
[2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc6 to join...
[2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc7 to join...
[2025-01-03 20:18:51,935][123391] Batcher 0 profile tree view:
batching: 12.4530, releasing_batches: 0.0290
[2025-01-03 20:18:51,935][123391] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 5.2467
update_model: 4.3349
weight_update: 0.0023
one_step: 0.0058
handle_policy_step: 257.5488
deserialize: 9.9140, stack: 1.5193, obs_to_device_normalize: 62.8038, forward: 117.5268, send_messages: 18.9620
prepare_outputs: 36.3900
to_cpu: 24.3378
[2025-01-03 20:18:51,935][123391] Learner 0 profile tree view:
misc: 0.0043, prepare_batch: 12.7592
train: 63.0072
epoch_init: 0.0058, minibatch_init: 0.0066, losses_postprocess: 0.3273, kl_divergence: 0.3661, after_optimizer: 1.2819
calculate_losses: 21.5594
losses_init: 0.0036, forward_head: 1.1328, bptt_initial: 15.3543, tail: 0.6870, advantages_returns: 0.1975, losses: 2.7943
bptt: 1.1874
bptt_forward_core: 1.1200
update: 39.0545
clip: 0.8884
[2025-01-03 20:18:51,936][123391] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1788, enqueue_policy_requests: 12.9224, env_step: 155.4093, overhead: 8.7577, complete_rollouts: 0.2908
save_policy_outputs: 14.6290
split_output_tensors: 4.7553
[2025-01-03 20:18:51,936][123391] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1781, enqueue_policy_requests: 13.0374, env_step: 155.4683, overhead: 8.8047, complete_rollouts: 0.2857
save_policy_outputs: 14.5489
split_output_tensors: 4.7563
[2025-01-03 20:18:51,936][123391] Loop Runner_EvtLoop terminating...
[2025-01-03 20:18:51,937][123391] Runner profile tree view:
main_loop: 289.0310
[2025-01-03 20:18:51,937][123391] Collected {0: 4005888}, FPS: 12910.2
[2025-01-03 20:18:52,336][123391] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json
[2025-01-03 20:18:52,336][123391] Overriding arg 'num_workers' with value 1 passed from command line
[2025-01-03 20:18:52,336][123391] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-01-03 20:18:52,336][123391] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-01-03 20:18:52,337][123391] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-01-03 20:18:52,338][123391] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-01-03 20:18:52,338][123391] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-01-03 20:18:52,338][123391] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-01-03 20:18:52,338][123391] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-01-03 20:18:52,372][123391] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:18:52,374][123391] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:18:52,375][123391] RunningMeanStd input shape: (1,)
[2025-01-03 20:18:52,389][123391] ConvEncoder: input_channels=3
[2025-01-03 20:18:52,527][123391] Conv encoder output size: 512
[2025-01-03 20:18:52,527][123391] Policy head output size: 512
[2025-01-03 20:18:52,687][123391] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-01-03 20:18:53,452][123391] Num frames 100...
[2025-01-03 20:18:53,571][123391] Num frames 200...
[2025-01-03 20:18:53,697][123391] Num frames 300...
[2025-01-03 20:18:53,852][123391] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
[2025-01-03 20:18:53,853][123391] Avg episode reward: 3.840, avg true_objective: 3.840
[2025-01-03 20:18:53,903][123391] Num frames 400...
[2025-01-03 20:18:54,017][123391] Num frames 500...
[2025-01-03 20:18:54,127][123391] Num frames 600...
[2025-01-03 20:18:54,240][123391] Num frames 700...
[2025-01-03 20:18:54,350][123391] Num frames 800...
[2025-01-03 20:18:54,478][123391] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
[2025-01-03 20:18:54,478][123391] Avg episode reward: 5.320, avg true_objective: 4.320
[2025-01-03 20:18:54,518][123391] Num frames 900...
[2025-01-03 20:18:54,620][123391] Num frames 1000...
[2025-01-03 20:18:54,725][123391] Num frames 1100...
[2025-01-03 20:18:54,823][123391] Num frames 1200...
[2025-01-03 20:18:54,924][123391] Num frames 1300...
[2025-01-03 20:18:55,022][123391] Num frames 1400...
[2025-01-03 20:18:55,084][123391] Avg episode rewards: #0: 6.027, true rewards: #0: 4.693
[2025-01-03 20:18:55,084][123391] Avg episode reward: 6.027, avg true_objective: 4.693
[2025-01-03 20:18:55,182][123391] Num frames 1500...
[2025-01-03 20:18:55,286][123391] Num frames 1600...
[2025-01-03 20:18:55,389][123391] Num frames 1700...
[2025-01-03 20:18:55,531][123391] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2025-01-03 20:18:55,531][123391] Avg episode reward: 5.480, avg true_objective: 4.480
[2025-01-03 20:18:55,545][123391] Num frames 1800...
[2025-01-03 20:18:55,649][123391] Num frames 1900...
[2025-01-03 20:18:55,748][123391] Num frames 2000...
[2025-01-03 20:18:55,843][123391] Num frames 2100...
[2025-01-03 20:18:55,967][123391] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352
[2025-01-03 20:18:55,968][123391] Avg episode reward: 5.152, avg true_objective: 4.352
[2025-01-03 20:18:55,998][123391] Num frames 2200...
[2025-01-03 20:18:56,105][123391] Num frames 2300...
[2025-01-03 20:18:56,203][123391] Num frames 2400...
[2025-01-03 20:18:56,305][123391] Num frames 2500...
[2025-01-03 20:18:56,407][123391] Num frames 2600...
[2025-01-03 20:18:56,485][123391] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373
[2025-01-03 20:18:56,485][123391] Avg episode reward: 5.373, avg true_objective: 4.373
[2025-01-03 20:18:56,560][123391] Num frames 2700...
[2025-01-03 20:18:56,659][123391] Num frames 2800...
[2025-01-03 20:18:56,760][123391] Num frames 2900...
[2025-01-03 20:18:56,860][123391] Num frames 3000...
[2025-01-03 20:18:56,923][123391] Avg episode rewards: #0: 5.154, true rewards: #0: 4.297
[2025-01-03 20:18:56,923][123391] Avg episode reward: 5.154, avg true_objective: 4.297
[2025-01-03 20:18:57,018][123391] Num frames 3100...
[2025-01-03 20:18:57,114][123391] Num frames 3200...
[2025-01-03 20:18:57,211][123391] Num frames 3300...
[2025-01-03 20:18:57,359][123391] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240
[2025-01-03 20:18:57,359][123391] Avg episode reward: 4.990, avg true_objective: 4.240
[2025-01-03 20:18:57,370][123391] Num frames 3400...
[2025-01-03 20:18:57,473][123391] Num frames 3500...
[2025-01-03 20:18:57,571][123391] Num frames 3600...
[2025-01-03 20:18:57,669][123391] Num frames 3700...
[2025-01-03 20:18:57,771][123391] Num frames 3800...
[2025-01-03 20:18:57,864][123391] Avg episode rewards: #0: 5.044, true rewards: #0: 4.267
[2025-01-03 20:18:57,865][123391] Avg episode reward: 5.044, avg true_objective: 4.267
[2025-01-03 20:18:57,959][123391] Num frames 3900...
[2025-01-03 20:18:58,060][123391] Num frames 4000...
[2025-01-03 20:18:58,165][123391] Num frames 4100...
[2025-01-03 20:18:58,267][123391] Num frames 4200...
[2025-01-03 20:18:58,346][123391] Avg episode rewards: #0: 4.924, true rewards: #0: 4.224
[2025-01-03 20:18:58,346][123391] Avg episode reward: 4.924, avg true_objective: 4.224
[2025-01-03 20:19:05,243][123391] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4!
[2025-01-03 20:24:09,504][124806] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json...
[2025-01-03 20:24:09,504][124806] Rollout worker 0 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 1 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 2 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 3 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 4 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 5 uses device cpu
[2025-01-03 20:24:09,505][124806] Rollout worker 6 uses device cpu
[2025-01-03 20:24:09,506][124806] Rollout worker 7 uses device cpu
[2025-01-03 20:24:09,546][124806] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:24:09,547][124806] InferenceWorker_p0-w0: min num requests: 2
[2025-01-03 20:24:09,578][124806] Starting all processes...
[2025-01-03 20:24:09,578][124806] Starting process learner_proc0
[2025-01-03 20:24:10,976][124806] Starting all processes...
[2025-01-03 20:24:10,980][124806] Starting process inference_proc0-0
[2025-01-03 20:24:10,980][124806] Starting process rollout_proc0
[2025-01-03 20:24:10,984][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:24:10,980][124806] Starting process rollout_proc1
[2025-01-03 20:24:10,984][124851] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-01-03 20:24:10,980][124806] Starting process rollout_proc2
[2025-01-03 20:24:10,984][124806] Starting process rollout_proc3
[2025-01-03 20:24:10,984][124806] Starting process rollout_proc4
[2025-01-03 20:24:11,000][124851] Num visible devices: 1
[2025-01-03 20:24:10,984][124806] Starting process rollout_proc5
[2025-01-03 20:24:10,984][124806] Starting process rollout_proc6
[2025-01-03 20:24:11,010][124851] Starting seed is not provided
[2025-01-03 20:24:11,010][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:24:11,010][124851] Initializing actor-critic model on device cuda:0
[2025-01-03 20:24:11,011][124851] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:24:11,011][124851] RunningMeanStd input shape: (1,)
[2025-01-03 20:24:10,984][124806] Starting process rollout_proc7
[2025-01-03 20:24:11,022][124851] ConvEncoder: input_channels=3
[2025-01-03 20:24:11,129][124851] Conv encoder output size: 512
[2025-01-03 20:24:11,129][124851] Policy head output size: 512
[2025-01-03 20:24:11,152][124851] Created Actor Critic model with architecture:
[2025-01-03 20:24:11,153][124851] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-01-03 20:24:11,314][124851] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-01-03 20:24:12,887][124851] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-01-03 20:24:12,971][124851] Loading model from checkpoint
[2025-01-03 20:24:12,973][124851] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2025-01-03 20:24:12,974][124851] Initialized policy 0 weights for model version 978
[2025-01-03 20:24:12,976][124851] LearnerWorker_p0 finished initialization!
[2025-01-03 20:24:12,977][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:24:13,260][124876] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,269][124894] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,306][124878] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,403][124886] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,487][124895] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,524][124889] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,548][124893] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,643][124875] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-01-03 20:24:13,643][124875] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-01-03 20:24:13,655][124875] Num visible devices: 1
[2025-01-03 20:24:13,672][124806] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-01-03 20:24:13,677][124877] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-01-03 20:24:13,737][124875] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:24:13,738][124875] RunningMeanStd input shape: (1,)
[2025-01-03 20:24:13,746][124875] ConvEncoder: input_channels=3
[2025-01-03 20:24:13,832][124875] Conv encoder output size: 512
[2025-01-03 20:24:13,832][124875] Policy head output size: 512
[2025-01-03 20:24:13,856][124806] Inference worker 0-0 is ready!
[2025-01-03 20:24:13,857][124806] All inference workers are ready! Signal rollout workers to start!
[2025-01-03 20:24:13,888][124886] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,889][124877] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124876] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124889] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124895] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124894] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124878] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:13,905][124893] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:14,164][124886] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,168][124877] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,181][124889] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,181][124878] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,182][124893] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,281][124876] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,283][124895] Decorrelating experience for 0 frames...
[2025-01-03 20:24:14,402][124889] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,405][124893] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,408][124878] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,521][124876] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,637][124886] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,641][124877] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,654][124895] Decorrelating experience for 32 frames...
[2025-01-03 20:24:14,872][124878] Decorrelating experience for 64 frames...
[2025-01-03 20:24:14,883][124889] Decorrelating experience for 64 frames...
[2025-01-03 20:24:14,916][124877] Decorrelating experience for 64 frames...
[2025-01-03 20:24:14,919][124886] Decorrelating experience for 64 frames...
[2025-01-03 20:24:15,129][124894] Decorrelating experience for 0 frames...
[2025-01-03 20:24:15,143][124878] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,152][124889] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,160][124895] Decorrelating experience for 64 frames...
[2025-01-03 20:24:15,192][124886] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,367][124876] Decorrelating experience for 64 frames...
[2025-01-03 20:24:15,372][124894] Decorrelating experience for 32 frames...
[2025-01-03 20:24:15,430][124895] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,456][124877] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,634][124893] Decorrelating experience for 64 frames...
[2025-01-03 20:24:15,669][124876] Decorrelating experience for 96 frames...
[2025-01-03 20:24:15,720][124894] Decorrelating experience for 64 frames...
[2025-01-03 20:24:15,958][124893] Decorrelating experience for 96 frames...
[2025-01-03 20:24:16,049][124894] Decorrelating experience for 96 frames...
[2025-01-03 20:24:16,121][124851] Signal inference workers to stop experience collection...
[2025-01-03 20:24:16,147][124875] InferenceWorker_p0-w0: stopping experience collection
[2025-01-03 20:24:16,246][124806] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-01-03 20:24:16,247][124806] Avg episode reward: [(0, '2.533')]
[2025-01-03 20:24:18,392][124851] Signal inference workers to resume experience collection...
[2025-01-03 20:24:18,393][124851] Stopping Batcher_0...
[2025-01-03 20:24:18,393][124851] Loop batcher_evt_loop terminating...
[2025-01-03 20:24:18,399][124806] Component Batcher_0 stopped!
[2025-01-03 20:24:18,411][124875] Weights refcount: 2 0
[2025-01-03 20:24:18,413][124875] Stopping InferenceWorker_p0-w0...
[2025-01-03 20:24:18,413][124806] Component InferenceWorker_p0-w0 stopped!
[2025-01-03 20:24:18,413][124875] Loop inference_proc0-0_evt_loop terminating...
[2025-01-03 20:24:18,446][124895] Stopping RolloutWorker_w7...
[2025-01-03 20:24:18,446][124806] Component RolloutWorker_w7 stopped!
[2025-01-03 20:24:18,447][124895] Loop rollout_proc7_evt_loop terminating...
[2025-01-03 20:24:18,448][124889] Stopping RolloutWorker_w4...
[2025-01-03 20:24:18,449][124889] Loop rollout_proc4_evt_loop terminating...
[2025-01-03 20:24:18,449][124806] Component RolloutWorker_w4 stopped!
[2025-01-03 20:24:18,450][124877] Stopping RolloutWorker_w1...
[2025-01-03 20:24:18,450][124806] Component RolloutWorker_w1 stopped!
[2025-01-03 20:24:18,450][124877] Loop rollout_proc1_evt_loop terminating...
[2025-01-03 20:24:18,451][124806] Component RolloutWorker_w6 stopped!
[2025-01-03 20:24:18,451][124893] Stopping RolloutWorker_w6...
[2025-01-03 20:24:18,452][124893] Loop rollout_proc6_evt_loop terminating...
[2025-01-03 20:24:18,452][124876] Stopping RolloutWorker_w0...
[2025-01-03 20:24:18,452][124806] Component RolloutWorker_w0 stopped!
[2025-01-03 20:24:18,453][124876] Loop rollout_proc0_evt_loop terminating...
[2025-01-03 20:24:18,453][124806] Component RolloutWorker_w5 stopped!
[2025-01-03 20:24:18,454][124878] Stopping RolloutWorker_w2...
[2025-01-03 20:24:18,454][124806] Component RolloutWorker_w2 stopped!
[2025-01-03 20:24:18,454][124878] Loop rollout_proc2_evt_loop terminating...
[2025-01-03 20:24:18,453][124894] Stopping RolloutWorker_w5...
[2025-01-03 20:24:18,457][124894] Loop rollout_proc5_evt_loop terminating...
[2025-01-03 20:24:18,458][124886] Stopping RolloutWorker_w3...
[2025-01-03 20:24:18,459][124806] Component RolloutWorker_w3 stopped!
[2025-01-03 20:24:18,459][124886] Loop rollout_proc3_evt_loop terminating...
[2025-01-03 20:24:18,841][124851] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-01-03 20:24:18,925][124851] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth
[2025-01-03 20:24:18,927][124851] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-01-03 20:24:18,992][124851] Stopping LearnerWorker_p0...
[2025-01-03 20:24:18,996][124851] Loop learner_proc0_evt_loop terminating...
[2025-01-03 20:24:18,993][124806] Component LearnerWorker_p0 stopped!
[2025-01-03 20:24:18,999][124806] Waiting for process learner_proc0 to stop...
[2025-01-03 20:24:19,664][124806] Waiting for process inference_proc0-0 to join...
[2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc0 to join...
[2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc1 to join...
[2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc2 to join...
[2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc3 to join...
[2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc4 to join...
[2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc5 to join...
[2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc6 to join...
[2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc7 to join...
[2025-01-03 20:24:19,665][124806] Batcher 0 profile tree view:
batching: 0.0192, releasing_batches: 0.0005
[2025-01-03 20:24:19,665][124806] InferenceWorker_p0-w0 profile tree view:
update_model: 0.0062
wait_policy: 0.0001
wait_policy_total: 1.3388
one_step: 0.0029
handle_policy_step: 0.9018
deserialize: 0.0248, stack: 0.0034, obs_to_device_normalize: 0.1673, forward: 0.5922, send_messages: 0.0335
prepare_outputs: 0.0583
to_cpu: 0.0322
[2025-01-03 20:24:19,666][124806] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 0.7957
train: 2.1472
epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0058, after_optimizer: 0.0174
calculate_losses: 0.4651
losses_init: 0.0000, forward_head: 0.3805, bptt_initial: 0.0572, tail: 0.0086, advantages_returns: 0.0007, losses: 0.0139
bptt: 0.0037
bptt_forward_core: 0.0036
update: 1.6574
clip: 0.0233
[2025-01-03 20:24:19,666][124806] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0243, env_step: 0.2195, overhead: 0.0131, complete_rollouts: 0.0005
save_policy_outputs: 0.0223
split_output_tensors: 0.0074
[2025-01-03 20:24:19,666][124806] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0367, env_step: 0.3178, overhead: 0.0197, complete_rollouts: 0.0007
save_policy_outputs: 0.0324
split_output_tensors: 0.0107
[2025-01-03 20:24:19,666][124806] Loop Runner_EvtLoop terminating...
[2025-01-03 20:24:19,667][124806] Runner profile tree view:
main_loop: 10.0887
[2025-01-03 20:24:19,667][124806] Collected {0: 4014080}, FPS: 812.0
[2025-01-03 20:24:19,872][124806] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json
[2025-01-03 20:24:19,873][124806] Overriding arg 'num_workers' with value 1 passed from command line
[2025-01-03 20:24:19,873][124806] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-01-03 20:24:19,873][124806] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-01-03 20:24:19,874][124806] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-01-03 20:24:19,874][124806] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-01-03 20:24:19,896][124806] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-01-03 20:24:19,897][124806] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:24:19,897][124806] RunningMeanStd input shape: (1,)
[2025-01-03 20:24:19,906][124806] ConvEncoder: input_channels=3
[2025-01-03 20:24:19,993][124806] Conv encoder output size: 512
[2025-01-03 20:24:19,993][124806] Policy head output size: 512
[2025-01-03 20:24:20,107][124806] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-01-03 20:24:20,628][124806] Num frames 100...
[2025-01-03 20:24:20,721][124806] Num frames 200...
[2025-01-03 20:24:20,813][124806] Num frames 300...
[2025-01-03 20:24:20,887][124806] Avg episode rewards: #0: 4.200, true rewards: #0: 3.200
[2025-01-03 20:24:20,888][124806] Avg episode reward: 4.200, avg true_objective: 3.200
[2025-01-03 20:24:20,977][124806] Num frames 400...
[2025-01-03 20:24:21,078][124806] Num frames 500...
[2025-01-03 20:24:21,175][124806] Num frames 600...
[2025-01-03 20:24:21,270][124806] Num frames 700...
[2025-01-03 20:24:21,368][124806] Num frames 800...
[2025-01-03 20:24:21,484][124806] Avg episode rewards: #0: 6.320, true rewards: #0: 4.320
[2025-01-03 20:24:21,485][124806] Avg episode reward: 6.320, avg true_objective: 4.320
[2025-01-03 20:24:21,558][124806] Num frames 900...
[2025-01-03 20:24:21,654][124806] Num frames 1000...
[2025-01-03 20:24:21,746][124806] Num frames 1100...
[2025-01-03 20:24:21,839][124806] Num frames 1200...
[2025-01-03 20:24:21,932][124806] Num frames 1300...
[2025-01-03 20:24:21,997][124806] Avg episode rewards: #0: 6.040, true rewards: #0: 4.373
[2025-01-03 20:24:21,998][124806] Avg episode reward: 6.040, avg true_objective: 4.373
[2025-01-03 20:24:22,094][124806] Num frames 1400...
[2025-01-03 20:24:22,191][124806] Num frames 1500...
[2025-01-03 20:24:22,287][124806] Num frames 1600...
[2025-01-03 20:24:22,383][124806] Num frames 1700...
[2025-01-03 20:24:22,477][124806] Num frames 1800...
[2025-01-03 20:24:22,572][124806] Num frames 1900...
[2025-01-03 20:24:22,648][124806] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800
[2025-01-03 20:24:22,648][124806] Avg episode reward: 6.800, avg true_objective: 4.800
[2025-01-03 20:24:22,735][124806] Num frames 2000...
[2025-01-03 20:24:22,831][124806] Num frames 2100...
[2025-01-03 20:24:22,931][124806] Num frames 2200...
[2025-01-03 20:24:23,031][124806] Num frames 2300...
[2025-01-03 20:24:23,152][124806] Avg episode rewards: #0: 6.536, true rewards: #0: 4.736
[2025-01-03 20:24:23,152][124806] Avg episode reward: 6.536, avg true_objective: 4.736
[2025-01-03 20:24:23,197][124806] Num frames 2400...
[2025-01-03 20:24:23,295][124806] Num frames 2500...
[2025-01-03 20:24:23,389][124806] Num frames 2600...
[2025-01-03 20:24:23,467][124806] Avg episode rewards: #0: 5.873, true rewards: #0: 4.373
[2025-01-03 20:24:23,468][124806] Avg episode reward: 5.873, avg true_objective: 4.373
[2025-01-03 20:24:23,580][124806] Num frames 2700...
[2025-01-03 20:24:23,676][124806] Num frames 2800...
[2025-01-03 20:24:23,775][124806] Num frames 2900...
[2025-01-03 20:24:23,872][124806] Num frames 3000...
[2025-01-03 20:24:23,935][124806] Avg episode rewards: #0: 5.583, true rewards: #0: 4.297
[2025-01-03 20:24:23,935][124806] Avg episode reward: 5.583, avg true_objective: 4.297
[2025-01-03 20:24:24,039][124806] Num frames 3100...
[2025-01-03 20:24:24,136][124806] Num frames 3200...
[2025-01-03 20:24:24,233][124806] Num frames 3300...
[2025-01-03 20:24:24,374][124806] Avg episode rewards: #0: 5.365, true rewards: #0: 4.240
[2025-01-03 20:24:24,374][124806] Avg episode reward: 5.365, avg true_objective: 4.240
[2025-01-03 20:24:24,389][124806] Num frames 3400...
[2025-01-03 20:24:24,492][124806] Num frames 3500...
[2025-01-03 20:24:24,592][124806] Num frames 3600...
[2025-01-03 20:24:24,687][124806] Num frames 3700...
[2025-01-03 20:24:24,780][124806] Num frames 3800...
[2025-01-03 20:24:24,871][124806] Avg episode rewards: #0: 5.378, true rewards: #0: 4.267
[2025-01-03 20:24:24,872][124806] Avg episode reward: 5.378, avg true_objective: 4.267
[2025-01-03 20:24:24,942][124806] Num frames 3900...
[2025-01-03 20:24:25,045][124806] Num frames 4000...
[2025-01-03 20:24:25,153][124806] Num frames 4100...
[2025-01-03 20:24:25,271][124806] Num frames 4200...
[2025-01-03 20:24:25,354][124806] Avg episode rewards: #0: 5.224, true rewards: #0: 4.224
[2025-01-03 20:24:25,354][124806] Avg episode reward: 5.224, avg true_objective: 4.224
[2025-01-03 20:24:31,929][124806] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4!
[2025-01-03 20:24:31,941][124806] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json
[2025-01-03 20:24:31,941][124806] Overriding arg 'num_workers' with value 1 passed from command line
[2025-01-03 20:24:31,941][124806] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-01-03 20:24:31,941][124806] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-01-03 20:24:31,942][124806] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-01-03 20:24:31,959][124806] RunningMeanStd input shape: (3, 72, 128)
[2025-01-03 20:24:31,959][124806] RunningMeanStd input shape: (1,)
[2025-01-03 20:24:31,969][124806] ConvEncoder: input_channels=3
[2025-01-03 20:24:31,995][124806] Conv encoder output size: 512
[2025-01-03 20:24:31,995][124806] Policy head output size: 512
[2025-01-03 20:24:32,011][124806] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-01-03 20:24:32,364][124806] Num frames 100...
[2025-01-03 20:24:32,461][124806] Num frames 200...
[2025-01-03 20:24:32,554][124806] Num frames 300...
[2025-01-03 20:24:32,646][124806] Num frames 400...
[2025-01-03 20:24:32,717][124806] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160
[2025-01-03 20:24:32,717][124806] Avg episode reward: 5.160, avg true_objective: 4.160
[2025-01-03 20:24:32,797][124806] Num frames 500...
[2025-01-03 20:24:32,891][124806] Num frames 600...
[2025-01-03 20:24:33,010][124806] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360
[2025-01-03 20:24:33,011][124806] Avg episode reward: 3.860, avg true_objective: 3.360
[2025-01-03 20:24:33,042][124806] Num frames 700...
[2025-01-03 20:24:33,139][124806] Num frames 800...
[2025-01-03 20:24:33,233][124806] Num frames 900...
[2025-01-03 20:24:33,327][124806] Num frames 1000...
[2025-01-03 20:24:33,423][124806] Num frames 1100...
[2025-01-03 20:24:33,504][124806] Avg episode rewards: #0: 4.400, true rewards: #0: 3.733
[2025-01-03 20:24:33,504][124806] Avg episode reward: 4.400, avg true_objective: 3.733
[2025-01-03 20:24:33,586][124806] Num frames 1200...
[2025-01-03 20:24:33,678][124806] Num frames 1300...
[2025-01-03 20:24:33,777][124806] Num frames 1400...
[2025-01-03 20:24:33,873][124806] Num frames 1500...
[2025-01-03 20:24:33,967][124806] Num frames 1600...
[2025-01-03 20:24:34,064][124806] Num frames 1700...
[2025-01-03 20:24:34,145][124806] Avg episode rewards: #0: 5.570, true rewards: #0: 4.320
[2025-01-03 20:24:34,145][124806] Avg episode reward: 5.570, avg true_objective: 4.320
[2025-01-03 20:24:34,245][124806] Num frames 1800...
[2025-01-03 20:24:34,342][124806] Num frames 1900...
[2025-01-03 20:24:34,438][124806] Num frames 2000...
[2025-01-03 20:24:34,536][124806] Num frames 2100...
[2025-01-03 20:24:34,602][124806] Avg episode rewards: #0: 5.224, true rewards: #0: 4.224
[2025-01-03 20:24:34,603][124806] Avg episode reward: 5.224, avg true_objective: 4.224
[2025-01-03 20:24:34,704][124806] Num frames 2200...
[2025-01-03 20:24:34,798][124806] Num frames 2300...
[2025-01-03 20:24:34,890][124806] Num frames 2400...
[2025-01-03 20:24:35,031][124806] Avg episode rewards: #0: 4.993, true rewards: #0: 4.160
[2025-01-03 20:24:35,032][124806] Avg episode reward: 4.993, avg true_objective: 4.160
[2025-01-03 20:24:35,038][124806] Num frames 2500...
[2025-01-03 20:24:35,142][124806] Num frames 2600...
[2025-01-03 20:24:35,238][124806] Num frames 2700...
[2025-01-03 20:24:35,335][124806] Num frames 2800...
[2025-01-03 20:24:35,432][124806] Num frames 2900...
[2025-01-03 20:24:35,526][124806] Num frames 3000...
[2025-01-03 20:24:35,619][124806] Avg episode rewards: #0: 5.629, true rewards: #0: 4.343
[2025-01-03 20:24:35,619][124806] Avg episode reward: 5.629, avg true_objective: 4.343
[2025-01-03 20:24:35,688][124806] Num frames 3100...
[2025-01-03 20:24:35,783][124806] Num frames 3200...
[2025-01-03 20:24:35,878][124806] Num frames 3300...
[2025-01-03 20:24:35,974][124806] Num frames 3400...
[2025-01-03 20:24:36,054][124806] Avg episode rewards: #0: 5.405, true rewards: #0: 4.280
[2025-01-03 20:24:36,055][124806] Avg episode reward: 5.405, avg true_objective: 4.280
[2025-01-03 20:24:36,142][124806] Num frames 3500...
[2025-01-03 20:24:36,244][124806] Num frames 3600...
[2025-01-03 20:24:36,343][124806] Num frames 3700...
[2025-01-03 20:24:36,440][124806] Num frames 3800...
[2025-01-03 20:24:36,502][124806] Avg episode rewards: #0: 5.231, true rewards: #0: 4.231
[2025-01-03 20:24:36,503][124806] Avg episode reward: 5.231, avg true_objective: 4.231
[2025-01-03 20:24:36,626][124806] Num frames 3900...
[2025-01-03 20:24:36,724][124806] Num frames 4000...
[2025-01-03 20:24:36,819][124806] Num frames 4100...
[2025-01-03 20:24:36,961][124806] Avg episode rewards: #0: 5.092, true rewards: #0: 4.192
[2025-01-03 20:24:36,961][124806] Avg episode reward: 5.092, avg true_objective: 4.192
[2025-01-03 20:24:43,480][124806] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4!