diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,52 @@ -[2023-02-22 19:44:15,206][06183] Saving configuration to /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2023-02-22 19:44:16,254][06183] Rollout worker 0 uses device cpu -[2023-02-22 19:44:16,257][06183] Rollout worker 1 uses device cpu -[2023-02-22 19:44:16,261][06183] Rollout worker 2 uses device cpu -[2023-02-22 19:44:16,263][06183] Rollout worker 3 uses device cpu -[2023-02-22 19:44:16,266][06183] Rollout worker 4 uses device cpu -[2023-02-22 19:44:16,269][06183] Rollout worker 5 uses device cpu -[2023-02-22 19:44:16,273][06183] Rollout worker 6 uses device cpu -[2023-02-22 19:44:16,276][06183] Rollout worker 7 uses device cpu -[2023-02-22 19:44:16,339][06183] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 19:44:16,341][06183] InferenceWorker_p0-w0: min num requests: 2 -[2023-02-22 19:44:16,370][06183] Starting all processes... -[2023-02-22 19:44:16,372][06183] Starting process learner_proc0 -[2023-02-22 19:44:16,762][06183] Starting all processes... -[2023-02-22 19:44:16,775][06183] Starting process inference_proc0-0 -[2023-02-22 19:44:16,776][06183] Starting process rollout_proc0 -[2023-02-22 19:44:16,776][06183] Starting process rollout_proc1 -[2023-02-22 19:44:16,778][06183] Starting process rollout_proc2 -[2023-02-22 19:44:16,779][06183] Starting process rollout_proc3 -[2023-02-22 19:44:16,781][06183] Starting process rollout_proc4 -[2023-02-22 19:44:16,897][06183] Starting process rollout_proc5 -[2023-02-22 19:44:16,898][06183] Starting process rollout_proc6 -[2023-02-22 19:44:16,900][06183] Starting process rollout_proc7 -[2023-02-22 19:44:20,639][14984] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 19:44:20,640][14984] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-02-22 19:44:20,772][14984] Num visible devices: 1 -[2023-02-22 19:44:20,793][14984] Starting seed is not provided -[2023-02-22 19:44:20,795][14984] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 19:44:20,796][14984] Initializing actor-critic model on device cuda:0 -[2023-02-22 19:44:20,799][14984] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 19:44:20,803][14984] RunningMeanStd input shape: (1,) -[2023-02-22 19:44:20,823][14984] ConvEncoder: input_channels=3 -[2023-02-22 19:44:20,829][15003] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:20,854][15000] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 19:44:20,855][15000] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-02-22 19:44:20,898][15001] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:20,926][15000] Num visible devices: 1 -[2023-02-22 19:44:20,959][15005] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:20,960][15008] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:20,963][15004] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:21,060][15007] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:21,199][15006] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:21,913][15002] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 19:44:23,056][14984] Conv encoder output size: 512 -[2023-02-22 19:44:23,057][14984] Policy head output size: 512 -[2023-02-22 19:44:23,084][14984] Created Actor Critic model with architecture: -[2023-02-22 19:44:23,085][14984] ActorCriticSharedWeights( +[2023-02-24 07:55:59,139][784615] Saving configuration to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... +[2023-02-24 07:55:59,305][784615] Rollout worker 0 uses device cpu +[2023-02-24 07:55:59,305][784615] Rollout worker 1 uses device cpu +[2023-02-24 07:55:59,306][784615] Rollout worker 2 uses device cpu +[2023-02-24 07:55:59,306][784615] Rollout worker 3 uses device cpu +[2023-02-24 07:55:59,306][784615] Rollout worker 4 uses device cpu +[2023-02-24 07:55:59,307][784615] Rollout worker 5 uses device cpu +[2023-02-24 07:55:59,307][784615] Rollout worker 6 uses device cpu +[2023-02-24 07:55:59,308][784615] Rollout worker 7 uses device cpu +[2023-02-24 07:55:59,357][784615] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2023-02-24 07:55:59,358][784615] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-24 07:55:59,378][784615] Starting all processes... +[2023-02-24 07:55:59,378][784615] Starting process learner_proc0 +[2023-02-24 07:55:59,428][784615] Starting all processes... +[2023-02-24 07:55:59,438][784615] Starting process inference_proc0-0 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc0 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc1 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc2 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc3 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc4 +[2023-02-24 07:55:59,439][784615] Starting process rollout_proc5 +[2023-02-24 07:55:59,440][784615] Starting process rollout_proc6 +[2023-02-24 07:55:59,441][784615] Starting process rollout_proc7 +[2023-02-24 07:56:00,842][794035] Worker 3 uses CPU cores [3] +[2023-02-24 07:56:00,862][794036] Worker 5 uses CPU cores [5] +[2023-02-24 07:56:00,978][794037] Worker 4 uses CPU cores [4] +[2023-02-24 07:56:00,998][794038] Worker 2 uses CPU cores [2] +[2023-02-24 07:56:00,999][794019] Low niceness requires sudo! +[2023-02-24 07:56:00,999][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2023-02-24 07:56:01,000][794019] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for learning process 0 +[2023-02-24 07:56:01,022][794032] Low niceness requires sudo! +[2023-02-24 07:56:01,022][794032] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2023-02-24 07:56:01,022][794032] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [0]) for inference process 0 +[2023-02-24 07:56:01,025][794019] Num visible devices: 1 +[2023-02-24 07:56:01,040][794032] Num visible devices: 1 +[2023-02-24 07:56:01,050][794019] Starting seed is not provided +[2023-02-24 07:56:01,050][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2023-02-24 07:56:01,050][794019] Initializing actor-critic model on device cuda:0 +[2023-02-24 07:56:01,051][794019] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 07:56:01,051][794019] RunningMeanStd input shape: (1,) +[2023-02-24 07:56:01,058][794040] Worker 7 uses CPU cores [7] +[2023-02-24 07:56:01,065][794019] ConvEncoder: input_channels=3 +[2023-02-24 07:56:01,154][794034] Worker 1 uses CPU cores [1] +[2023-02-24 07:56:01,186][794039] Worker 6 uses CPU cores [6] +[2023-02-24 07:56:01,194][794033] Worker 0 uses CPU cores [0] +[2023-02-24 07:56:01,198][794019] Conv encoder output size: 512 +[2023-02-24 07:56:01,198][794019] Policy head output size: 512 +[2023-02-24 07:56:01,216][794019] Created Actor Critic model with architecture: +[2023-02-24 07:56:01,216][794019] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,3994 +87,342 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2023-02-22 19:44:29,155][14984] Using optimizer -[2023-02-22 19:44:29,181][14984] No checkpoints found -[2023-02-22 19:44:29,183][14984] Did not load from checkpoint, starting from scratch! -[2023-02-22 19:44:29,186][14984] Initialized policy 0 weights for model version 0 -[2023-02-22 19:44:29,202][14984] LearnerWorker_p0 finished initialization! -[2023-02-22 19:44:29,203][14984] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 19:44:29,445][15000] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 19:44:29,447][15000] RunningMeanStd input shape: (1,) -[2023-02-22 19:44:29,460][15000] ConvEncoder: input_channels=3 -[2023-02-22 19:44:29,587][15000] Conv encoder output size: 512 -[2023-02-22 19:44:29,589][15000] Policy head output size: 512 -[2023-02-22 19:44:30,328][06183] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 19:44:33,566][06183] Inference worker 0-0 is ready! -[2023-02-22 19:44:33,568][06183] All inference workers are ready! Signal rollout workers to start! -[2023-02-22 19:44:33,613][15001] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,613][15008] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,614][15006] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,615][15002] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,616][15003] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,618][15007] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,620][15004] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:33,622][15005] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 19:44:34,106][15005] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... -[2023-02-22 19:44:34,108][15005] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() -Traceback (most recent call last): - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init - self.game.init() -vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. - -During handling of the above exception, another exception occurred: - -Traceback (most recent call last): - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init - env_runner.init(self.timing) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init - self._reset() - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset - observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/gym/core.py", line 323, in reset - return self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/algo/utils/make_env.py", line 125, in reset - obs, info = self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/algo/utils/make_env.py", line 110, in reset - obs, info = self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset - return self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/gym/core.py", line 379, in reset - obs, info = self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sample_factory/envs/env_wrappers.py", line 84, in reset - obs, info = self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/gym/core.py", line 323, in reset - return self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset - return self.env.reset(**kwargs) - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset - self._ensure_initialized() - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized - self.initialize() - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize - self._game_init() - File "/home/chqma/miniconda3/envs/deep-rl-class/lib/python3.9/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init - raise EnvCriticalError() -sample_factory.envs.env_utils.EnvCriticalError -[2023-02-22 19:44:34,110][15005] Unhandled exception in evt loop rollout_proc4_evt_loop -[2023-02-22 19:44:35,328][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 19:44:36,332][06183] Heartbeat connected on Batcher_0 -[2023-02-22 19:44:36,335][06183] Heartbeat connected on LearnerWorker_p0 -[2023-02-22 19:44:36,369][06183] Heartbeat connected on InferenceWorker_p0-w0 -[2023-02-22 19:44:37,192][15006] Decorrelating experience for 0 frames... -[2023-02-22 19:44:37,192][15001] Decorrelating experience for 0 frames... -[2023-02-22 19:44:37,192][15002] Decorrelating experience for 0 frames... -[2023-02-22 19:44:37,192][15004] Decorrelating experience for 0 frames... -[2023-02-22 19:44:37,193][15008] Decorrelating experience for 0 frames... -[2023-02-22 19:44:37,921][15006] Decorrelating experience for 32 frames... -[2023-02-22 19:44:37,923][15008] Decorrelating experience for 32 frames... -[2023-02-22 19:44:37,928][15004] Decorrelating experience for 32 frames... -[2023-02-22 19:44:37,928][15001] Decorrelating experience for 32 frames... -[2023-02-22 19:44:37,929][15002] Decorrelating experience for 32 frames... -[2023-02-22 19:44:37,934][15007] Decorrelating experience for 0 frames... -[2023-02-22 19:44:38,003][15003] Decorrelating experience for 0 frames... -[2023-02-22 19:44:38,768][15007] Decorrelating experience for 32 frames... -[2023-02-22 19:44:38,809][15004] Decorrelating experience for 64 frames... -[2023-02-22 19:44:38,814][15008] Decorrelating experience for 64 frames... -[2023-02-22 19:44:38,851][15003] Decorrelating experience for 32 frames... -[2023-02-22 19:44:38,882][15006] Decorrelating experience for 64 frames... -[2023-02-22 19:44:39,585][15007] Decorrelating experience for 64 frames... -[2023-02-22 19:44:39,604][15002] Decorrelating experience for 64 frames... -[2023-02-22 19:44:39,646][15004] Decorrelating experience for 96 frames... -[2023-02-22 19:44:39,647][15001] Decorrelating experience for 64 frames... -[2023-02-22 19:44:39,675][15003] Decorrelating experience for 64 frames... -[2023-02-22 19:44:39,714][06183] Heartbeat connected on RolloutWorker_w2 -[2023-02-22 19:44:40,328][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 19:44:40,419][15007] Decorrelating experience for 96 frames... -[2023-02-22 19:44:40,421][15008] Decorrelating experience for 96 frames... -[2023-02-22 19:44:40,464][15002] Decorrelating experience for 96 frames... -[2023-02-22 19:44:40,490][15006] Decorrelating experience for 96 frames... -[2023-02-22 19:44:40,516][06183] Heartbeat connected on RolloutWorker_w7 -[2023-02-22 19:44:40,536][06183] Heartbeat connected on RolloutWorker_w5 -[2023-02-22 19:44:40,559][06183] Heartbeat connected on RolloutWorker_w1 -[2023-02-22 19:44:40,599][06183] Heartbeat connected on RolloutWorker_w6 -[2023-02-22 19:44:41,183][15003] Decorrelating experience for 96 frames... -[2023-02-22 19:44:41,253][15001] Decorrelating experience for 96 frames... -[2023-02-22 19:44:41,272][06183] Heartbeat connected on RolloutWorker_w3 -[2023-02-22 19:44:41,345][06183] Heartbeat connected on RolloutWorker_w0 -[2023-02-22 19:44:45,328][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.1. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 19:44:45,332][06183] Avg episode reward: [(0, '1.914')] -[2023-02-22 19:44:45,876][14984] Signal inference workers to stop experience collection... -[2023-02-22 19:44:45,884][15000] InferenceWorker_p0-w0: stopping experience collection -[2023-02-22 19:44:49,504][14984] Signal inference workers to resume experience collection... -[2023-02-22 19:44:49,506][15000] InferenceWorker_p0-w0: resuming experience collection -[2023-02-22 19:44:50,328][06183] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 147.1. Samples: 2942. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2023-02-22 19:44:50,334][06183] Avg episode reward: [(0, '3.108')] -[2023-02-22 19:44:54,506][15000] Updated weights for policy 0, policy_version 10 (0.0364) -[2023-02-22 19:44:55,328][06183] Fps is (10 sec: 4505.6, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 454.3. Samples: 11358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:44:55,330][06183] Avg episode reward: [(0, '4.406')] -[2023-02-22 19:44:59,586][15000] Updated weights for policy 0, policy_version 20 (0.0015) -[2023-02-22 19:45:00,329][06183] Fps is (10 sec: 8191.6, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 86016. Throughput: 0: 568.7. Samples: 17060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:45:00,332][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 19:45:04,441][15000] Updated weights for policy 0, policy_version 30 (0.0012) -[2023-02-22 19:45:05,328][06183] Fps is (10 sec: 8192.0, 60 sec: 3627.9, 300 sec: 3627.9). Total num frames: 126976. Throughput: 0: 849.7. Samples: 29740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:45:05,331][06183] Avg episode reward: [(0, '4.560')] -[2023-02-22 19:45:05,400][14984] Saving new best policy, reward=4.560! -[2023-02-22 19:45:09,187][15000] Updated weights for policy 0, policy_version 40 (0.0010) -[2023-02-22 19:45:10,328][06183] Fps is (10 sec: 8601.9, 60 sec: 4300.8, 300 sec: 4300.8). Total num frames: 172032. Throughput: 0: 1068.5. Samples: 42740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:45:10,332][06183] Avg episode reward: [(0, '4.466')] -[2023-02-22 19:45:14,107][15000] Updated weights for policy 0, policy_version 50 (0.0017) -[2023-02-22 19:45:15,328][06183] Fps is (10 sec: 8601.7, 60 sec: 4733.2, 300 sec: 4733.2). Total num frames: 212992. Throughput: 0: 1085.6. Samples: 48854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) -[2023-02-22 19:45:15,330][06183] Avg episode reward: [(0, '4.413')] -[2023-02-22 19:45:18,942][15000] Updated weights for policy 0, policy_version 60 (0.0011) -[2023-02-22 19:45:20,328][06183] Fps is (10 sec: 8601.7, 60 sec: 5160.9, 300 sec: 5160.9). Total num frames: 258048. Throughput: 0: 1371.8. Samples: 61732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:45:20,331][06183] Avg episode reward: [(0, '4.563')] -[2023-02-22 19:45:20,341][14984] Saving new best policy, reward=4.563! -[2023-02-22 19:45:23,692][15000] Updated weights for policy 0, policy_version 70 (0.0013) -[2023-02-22 19:45:25,328][06183] Fps is (10 sec: 8601.5, 60 sec: 5436.5, 300 sec: 5436.5). Total num frames: 299008. Throughput: 0: 1648.9. Samples: 74202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:45:25,332][06183] Avg episode reward: [(0, '4.329')] -[2023-02-22 19:45:28,664][15000] Updated weights for policy 0, policy_version 80 (0.0011) -[2023-02-22 19:45:30,328][06183] Fps is (10 sec: 8192.2, 60 sec: 5666.1, 300 sec: 5666.1). Total num frames: 339968. Throughput: 0: 1788.4. Samples: 80512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:45:30,330][06183] Avg episode reward: [(0, '4.295')] -[2023-02-22 19:45:33,515][15000] Updated weights for policy 0, policy_version 90 (0.0009) -[2023-02-22 19:45:35,328][06183] Fps is (10 sec: 8192.1, 60 sec: 6348.8, 300 sec: 5860.4). Total num frames: 380928. Throughput: 0: 2007.0. Samples: 93258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:45:35,330][06183] Avg episode reward: [(0, '4.407')] -[2023-02-22 19:45:38,572][15000] Updated weights for policy 0, policy_version 100 (0.0012) -[2023-02-22 19:45:40,328][06183] Fps is (10 sec: 8191.9, 60 sec: 7031.5, 300 sec: 6027.0). Total num frames: 421888. Throughput: 0: 2093.5. Samples: 105566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 19:45:40,331][06183] Avg episode reward: [(0, '4.493')] -[2023-02-22 19:45:43,650][15000] Updated weights for policy 0, policy_version 110 (0.0014) -[2023-02-22 19:45:45,328][06183] Fps is (10 sec: 8192.0, 60 sec: 7714.1, 300 sec: 6171.3). Total num frames: 462848. Throughput: 0: 2099.2. Samples: 111522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:45:45,331][06183] Avg episode reward: [(0, '4.525')] -[2023-02-22 19:45:48,756][15000] Updated weights for policy 0, policy_version 120 (0.0012) -[2023-02-22 19:45:50,328][06183] Fps is (10 sec: 7782.4, 60 sec: 8260.3, 300 sec: 6246.4). Total num frames: 499712. Throughput: 0: 2078.4. Samples: 123266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:45:50,331][06183] Avg episode reward: [(0, '4.624')] -[2023-02-22 19:45:50,477][14984] Saving new best policy, reward=4.624! -[2023-02-22 19:45:54,234][15000] Updated weights for policy 0, policy_version 130 (0.0015) -[2023-02-22 19:45:55,328][06183] Fps is (10 sec: 7782.3, 60 sec: 8260.3, 300 sec: 6360.8). Total num frames: 540672. Throughput: 0: 2043.6. Samples: 134702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:45:55,332][06183] Avg episode reward: [(0, '4.484')] -[2023-02-22 19:45:59,467][15000] Updated weights for policy 0, policy_version 140 (0.0014) -[2023-02-22 19:46:00,328][06183] Fps is (10 sec: 7782.3, 60 sec: 8192.1, 300 sec: 6417.1). Total num frames: 577536. Throughput: 0: 2037.4. Samples: 140538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:46:00,331][06183] Avg episode reward: [(0, '4.447')] -[2023-02-22 19:46:04,827][15000] Updated weights for policy 0, policy_version 150 (0.0021) -[2023-02-22 19:46:05,328][06183] Fps is (10 sec: 7372.9, 60 sec: 8123.7, 300 sec: 6467.4). Total num frames: 614400. Throughput: 0: 2006.8. Samples: 152036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:46:05,332][06183] Avg episode reward: [(0, '4.358')] -[2023-02-22 19:46:10,160][15000] Updated weights for policy 0, policy_version 160 (0.0017) -[2023-02-22 19:46:10,328][06183] Fps is (10 sec: 7782.4, 60 sec: 8055.5, 300 sec: 6553.6). Total num frames: 655360. Throughput: 0: 1985.4. Samples: 163544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:46:10,331][06183] Avg episode reward: [(0, '4.399')] -[2023-02-22 19:46:10,354][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000160_655360.pth... -[2023-02-22 19:46:15,328][06183] Fps is (10 sec: 7782.3, 60 sec: 7987.2, 300 sec: 6592.6). Total num frames: 692224. Throughput: 0: 1978.1. Samples: 169526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:46:15,331][06183] Avg episode reward: [(0, '4.615')] -[2023-02-22 19:46:15,414][15000] Updated weights for policy 0, policy_version 170 (0.0019) -[2023-02-22 19:46:20,328][06183] Fps is (10 sec: 7782.4, 60 sec: 7918.9, 300 sec: 6665.3). Total num frames: 733184. Throughput: 0: 1950.5. Samples: 181032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:46:20,331][06183] Avg episode reward: [(0, '4.568')] -[2023-02-22 19:46:20,725][15000] Updated weights for policy 0, policy_version 180 (0.0012) -[2023-02-22 19:46:25,328][06183] Fps is (10 sec: 7372.8, 60 sec: 7782.4, 300 sec: 6660.4). Total num frames: 765952. Throughput: 0: 1912.4. Samples: 191622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:46:25,333][06183] Avg episode reward: [(0, '4.507')] -[2023-02-22 19:46:26,853][15000] Updated weights for policy 0, policy_version 190 (0.0017) -[2023-02-22 19:46:30,328][06183] Fps is (10 sec: 6553.6, 60 sec: 7645.8, 300 sec: 6656.0). Total num frames: 798720. Throughput: 0: 1887.3. Samples: 196452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:46:30,332][06183] Avg episode reward: [(0, '4.383')] -[2023-02-22 19:46:33,084][15000] Updated weights for policy 0, policy_version 200 (0.0023) -[2023-02-22 19:46:35,328][06183] Fps is (10 sec: 6553.7, 60 sec: 7509.3, 300 sec: 6651.9). Total num frames: 831488. Throughput: 0: 1848.0. Samples: 206428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:46:35,332][06183] Avg episode reward: [(0, '4.311')] -[2023-02-22 19:46:39,210][15000] Updated weights for policy 0, policy_version 210 (0.0020) -[2023-02-22 19:46:40,328][06183] Fps is (10 sec: 6963.3, 60 sec: 7441.1, 300 sec: 6679.6). Total num frames: 868352. Throughput: 0: 1817.5. Samples: 216490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:46:40,332][06183] Avg episode reward: [(0, '4.257')] -[2023-02-22 19:46:45,328][06183] Fps is (10 sec: 6553.5, 60 sec: 7236.2, 300 sec: 6644.6). Total num frames: 897024. Throughput: 0: 1795.2. Samples: 221324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:46:45,332][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 19:46:45,856][15000] Updated weights for policy 0, policy_version 220 (0.0026) -[2023-02-22 19:46:50,329][06183] Fps is (10 sec: 5734.2, 60 sec: 7099.7, 300 sec: 6612.1). Total num frames: 925696. Throughput: 0: 1726.7. Samples: 229738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:46:50,334][06183] Avg episode reward: [(0, '4.426')] -[2023-02-22 19:46:53,358][15000] Updated weights for policy 0, policy_version 230 (0.0032) -[2023-02-22 19:46:55,329][06183] Fps is (10 sec: 5734.1, 60 sec: 6894.9, 300 sec: 6581.8). Total num frames: 954368. Throughput: 0: 1659.2. Samples: 238210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:46:55,333][06183] Avg episode reward: [(0, '4.467')] -[2023-02-22 19:46:59,818][15000] Updated weights for policy 0, policy_version 240 (0.0017) -[2023-02-22 19:47:00,328][06183] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6553.6). Total num frames: 983040. Throughput: 0: 1633.1. Samples: 243016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:47:00,333][06183] Avg episode reward: [(0, '4.537')] -[2023-02-22 19:47:05,328][06183] Fps is (10 sec: 5734.8, 60 sec: 6621.9, 300 sec: 6527.2). Total num frames: 1011712. Throughput: 0: 1578.8. Samples: 252080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:47:05,332][06183] Avg episode reward: [(0, '4.393')] -[2023-02-22 19:47:06,684][15000] Updated weights for policy 0, policy_version 250 (0.0026) -[2023-02-22 19:47:10,328][06183] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6528.0). Total num frames: 1044480. Throughput: 0: 1543.8. Samples: 261092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:47:10,332][06183] Avg episode reward: [(0, '4.500')] -[2023-02-22 19:47:13,877][15000] Updated weights for policy 0, policy_version 260 (0.0034) -[2023-02-22 19:47:15,329][06183] Fps is (10 sec: 5734.1, 60 sec: 6280.5, 300 sec: 6479.1). Total num frames: 1069056. Throughput: 0: 1528.6. Samples: 265238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:47:15,336][06183] Avg episode reward: [(0, '4.446')] -[2023-02-22 19:47:20,329][06183] Fps is (10 sec: 4505.5, 60 sec: 5939.2, 300 sec: 6409.0). Total num frames: 1089536. Throughput: 0: 1465.9. Samples: 272394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:47:20,335][06183] Avg episode reward: [(0, '4.503')] -[2023-02-22 19:47:22,727][15000] Updated weights for policy 0, policy_version 270 (0.0038) -[2023-02-22 19:47:25,329][06183] Fps is (10 sec: 4505.7, 60 sec: 5802.6, 300 sec: 6366.3). Total num frames: 1114112. Throughput: 0: 1393.6. Samples: 279204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:47:25,343][06183] Avg episode reward: [(0, '4.563')] -[2023-02-22 19:47:30,331][06183] Fps is (10 sec: 4095.2, 60 sec: 5529.4, 300 sec: 6280.5). Total num frames: 1130496. Throughput: 0: 1344.1. Samples: 281812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:47:30,337][06183] Avg episode reward: [(0, '4.578')] -[2023-02-22 19:47:34,132][15000] Updated weights for policy 0, policy_version 280 (0.0075) -[2023-02-22 19:47:35,329][06183] Fps is (10 sec: 3686.4, 60 sec: 5324.7, 300 sec: 6221.5). Total num frames: 1150976. Throughput: 0: 1267.7. Samples: 286786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:47:35,332][06183] Avg episode reward: [(0, '4.419')] -[2023-02-22 19:47:40,329][06183] Fps is (10 sec: 4506.3, 60 sec: 5119.9, 300 sec: 6187.1). Total num frames: 1175552. Throughput: 0: 1232.6. Samples: 293676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:47:40,334][06183] Avg episode reward: [(0, '4.364')] -[2023-02-22 19:47:43,155][15000] Updated weights for policy 0, policy_version 290 (0.0041) -[2023-02-22 19:47:45,329][06183] Fps is (10 sec: 4505.7, 60 sec: 4983.4, 300 sec: 6133.5). Total num frames: 1196032. Throughput: 0: 1201.9. Samples: 297102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:47:45,333][06183] Avg episode reward: [(0, '4.416')] -[2023-02-22 19:47:50,329][06183] Fps is (10 sec: 4096.1, 60 sec: 4846.9, 300 sec: 6082.5). Total num frames: 1216512. Throughput: 0: 1132.1. Samples: 303026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-02-22 19:47:50,341][06183] Avg episode reward: [(0, '4.404')] -[2023-02-22 19:47:53,335][15000] Updated weights for policy 0, policy_version 300 (0.0062) -[2023-02-22 19:47:55,329][06183] Fps is (10 sec: 4096.0, 60 sec: 4710.4, 300 sec: 6034.1). Total num frames: 1236992. Throughput: 0: 1073.8. Samples: 309412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:47:55,333][06183] Avg episode reward: [(0, '4.453')] -[2023-02-22 19:48:00,329][06183] Fps is (10 sec: 4915.2, 60 sec: 4710.4, 300 sec: 6027.0). Total num frames: 1265664. Throughput: 0: 1071.2. Samples: 313444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:48:00,337][06183] Avg episode reward: [(0, '4.269')] -[2023-02-22 19:48:01,035][15000] Updated weights for policy 0, policy_version 310 (0.0037) -[2023-02-22 19:48:05,328][06183] Fps is (10 sec: 5324.8, 60 sec: 4642.1, 300 sec: 6001.1). Total num frames: 1290240. Throughput: 0: 1082.8. Samples: 321122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:48:05,334][06183] Avg episode reward: [(0, '4.324')] -[2023-02-22 19:48:09,841][15000] Updated weights for policy 0, policy_version 320 (0.0035) -[2023-02-22 19:48:10,328][06183] Fps is (10 sec: 4505.7, 60 sec: 4437.3, 300 sec: 5957.8). Total num frames: 1310720. Throughput: 0: 1086.6. Samples: 328102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:48:10,333][06183] Avg episode reward: [(0, '4.457')] -[2023-02-22 19:48:10,378][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000320_1310720.pth... -[2023-02-22 19:48:15,328][06183] Fps is (10 sec: 4505.6, 60 sec: 4437.4, 300 sec: 5934.6). Total num frames: 1335296. Throughput: 0: 1111.8. Samples: 331840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:48:15,333][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 19:48:19,348][15000] Updated weights for policy 0, policy_version 330 (0.0053) -[2023-02-22 19:48:20,329][06183] Fps is (10 sec: 4505.3, 60 sec: 4437.3, 300 sec: 5894.7). Total num frames: 1355776. Throughput: 0: 1137.7. Samples: 337982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:48:20,334][06183] Avg episode reward: [(0, '4.440')] -[2023-02-22 19:48:25,329][06183] Fps is (10 sec: 4095.8, 60 sec: 4369.0, 300 sec: 5856.4). Total num frames: 1376256. Throughput: 0: 1117.7. Samples: 343972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-02-22 19:48:25,343][06183] Avg episode reward: [(0, '4.406')] -[2023-02-22 19:48:28,754][15000] Updated weights for policy 0, policy_version 340 (0.0045) -[2023-02-22 19:48:30,329][06183] Fps is (10 sec: 4096.3, 60 sec: 4437.5, 300 sec: 5819.7). Total num frames: 1396736. Throughput: 0: 1115.6. Samples: 347306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:48:30,333][06183] Avg episode reward: [(0, '4.419')] -[2023-02-22 19:48:35,329][06183] Fps is (10 sec: 4505.8, 60 sec: 4505.6, 300 sec: 5801.3). Total num frames: 1421312. Throughput: 0: 1149.8. Samples: 354766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:48:35,333][06183] Avg episode reward: [(0, '4.488')] -[2023-02-22 19:48:37,340][15000] Updated weights for policy 0, policy_version 350 (0.0039) -[2023-02-22 19:48:40,329][06183] Fps is (10 sec: 4505.5, 60 sec: 4437.3, 300 sec: 5767.2). Total num frames: 1441792. Throughput: 0: 1151.1. Samples: 361214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:48:40,343][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 19:48:45,328][06183] Fps is (10 sec: 4096.1, 60 sec: 4437.4, 300 sec: 5734.4). Total num frames: 1462272. Throughput: 0: 1120.4. Samples: 363862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:48:45,334][06183] Avg episode reward: [(0, '4.358')] -[2023-02-22 19:48:48,069][15000] Updated weights for policy 0, policy_version 360 (0.0038) -[2023-02-22 19:48:50,328][06183] Fps is (10 sec: 4096.2, 60 sec: 4437.3, 300 sec: 5702.9). Total num frames: 1482752. Throughput: 0: 1088.8. Samples: 370116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:48:50,336][06183] Avg episode reward: [(0, '4.525')] -[2023-02-22 19:48:55,329][06183] Fps is (10 sec: 4095.7, 60 sec: 4437.3, 300 sec: 5672.6). Total num frames: 1503232. Throughput: 0: 1070.1. Samples: 376256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:48:55,348][06183] Avg episode reward: [(0, '4.419')] -[2023-02-22 19:48:58,116][15000] Updated weights for policy 0, policy_version 370 (0.0050) -[2023-02-22 19:49:00,329][06183] Fps is (10 sec: 4095.8, 60 sec: 4300.8, 300 sec: 5643.4). Total num frames: 1523712. Throughput: 0: 1050.0. Samples: 379090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:49:00,340][06183] Avg episode reward: [(0, '4.281')] -[2023-02-22 19:49:05,329][06183] Fps is (10 sec: 4096.1, 60 sec: 4232.5, 300 sec: 5615.2). Total num frames: 1544192. Throughput: 0: 1053.6. Samples: 385394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:49:05,339][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 19:49:10,328][06183] Fps is (10 sec: 2867.4, 60 sec: 4027.7, 300 sec: 5544.2). Total num frames: 1552384. Throughput: 0: 1005.1. Samples: 389200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:49:10,334][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 19:49:10,896][15000] Updated weights for policy 0, policy_version 380 (0.0070) -[2023-02-22 19:49:15,329][06183] Fps is (10 sec: 2048.0, 60 sec: 3822.9, 300 sec: 5490.1). Total num frames: 1564672. Throughput: 0: 963.8. Samples: 390676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:49:15,339][06183] Avg episode reward: [(0, '4.489')] -[2023-02-22 19:49:20,329][06183] Fps is (10 sec: 2867.1, 60 sec: 3754.7, 300 sec: 5451.9). Total num frames: 1581056. Throughput: 0: 895.7. Samples: 395074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:49:20,347][06183] Avg episode reward: [(0, '4.518')] -[2023-02-22 19:49:24,344][15000] Updated weights for policy 0, policy_version 390 (0.0063) -[2023-02-22 19:49:25,329][06183] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 5415.0). Total num frames: 1597440. Throughput: 0: 864.5. Samples: 400116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:49:25,341][06183] Avg episode reward: [(0, '4.529')] -[2023-02-22 19:49:30,329][06183] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 5470.6). Total num frames: 1613824. Throughput: 0: 859.3. Samples: 402530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:49:30,337][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 19:49:35,329][06183] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 5526.1). Total num frames: 1630208. Throughput: 0: 823.4. Samples: 407170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:49:35,340][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 19:49:37,868][15000] Updated weights for policy 0, policy_version 400 (0.0061) -[2023-02-22 19:49:40,330][06183] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 5567.8). Total num frames: 1642496. Throughput: 0: 784.7. Samples: 411570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:49:40,341][06183] Avg episode reward: [(0, '4.559')] -[2023-02-22 19:49:45,329][06183] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 5609.4). Total num frames: 1658880. Throughput: 0: 768.7. Samples: 413680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-02-22 19:49:45,359][06183] Avg episode reward: [(0, '4.448')] -[2023-02-22 19:49:50,330][06183] Fps is (10 sec: 2867.3, 60 sec: 3140.2, 300 sec: 5512.2). Total num frames: 1671168. Throughput: 0: 718.5. Samples: 417728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:49:50,342][06183] Avg episode reward: [(0, '4.462')] -[2023-02-22 19:49:54,192][15000] Updated weights for policy 0, policy_version 410 (0.0085) -[2023-02-22 19:49:55,329][06183] Fps is (10 sec: 2048.1, 60 sec: 2935.5, 300 sec: 5401.2). Total num frames: 1679360. Throughput: 0: 695.7. Samples: 420506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:49:55,351][06183] Avg episode reward: [(0, '4.586')] -[2023-02-22 19:50:00,329][06183] Fps is (10 sec: 2048.0, 60 sec: 2798.9, 300 sec: 5304.0). Total num frames: 1691648. Throughput: 0: 698.0. Samples: 422088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:50:00,351][06183] Avg episode reward: [(0, '4.457')] -[2023-02-22 19:50:05,329][06183] Fps is (10 sec: 2457.5, 60 sec: 2662.4, 300 sec: 5192.9). Total num frames: 1703936. Throughput: 0: 678.6. Samples: 425612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:50:05,337][06183] Avg episode reward: [(0, '4.308')] -[2023-02-22 19:50:10,329][06183] Fps is (10 sec: 2048.1, 60 sec: 2662.4, 300 sec: 5081.8). Total num frames: 1712128. Throughput: 0: 645.4. Samples: 429158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:50:10,336][06183] Avg episode reward: [(0, '4.364')] -[2023-02-22 19:50:10,860][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000419_1716224.pth... -[2023-02-22 19:50:11,765][14984] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000160_655360.pth -[2023-02-22 19:50:12,172][15000] Updated weights for policy 0, policy_version 420 (0.0067) -[2023-02-22 19:50:15,329][06183] Fps is (10 sec: 2457.7, 60 sec: 2730.7, 300 sec: 4984.6). Total num frames: 1728512. Throughput: 0: 632.7. Samples: 431000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:50:15,364][06183] Avg episode reward: [(0, '4.379')] -[2023-02-22 19:50:20,329][06183] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 4887.4). Total num frames: 1740800. Throughput: 0: 619.5. Samples: 435048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 19:50:20,341][06183] Avg episode reward: [(0, '4.567')] -[2023-02-22 19:50:25,329][06183] Fps is (10 sec: 2867.2, 60 sec: 2662.4, 300 sec: 4804.1). Total num frames: 1757184. Throughput: 0: 630.9. Samples: 439958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:50:25,343][06183] Avg episode reward: [(0, '4.502')] -[2023-02-22 19:50:25,673][15000] Updated weights for policy 0, policy_version 430 (0.0074) -[2023-02-22 19:50:30,329][06183] Fps is (10 sec: 3276.7, 60 sec: 2662.4, 300 sec: 4720.8). Total num frames: 1773568. Throughput: 0: 635.1. Samples: 442260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:50:30,395][06183] Avg episode reward: [(0, '4.370')] -[2023-02-22 19:50:35,329][06183] Fps is (10 sec: 2867.2, 60 sec: 2594.2, 300 sec: 4623.6). Total num frames: 1785856. Throughput: 0: 630.1. Samples: 446080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:50:35,342][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 19:50:40,329][06183] Fps is (10 sec: 2048.1, 60 sec: 2525.9, 300 sec: 4512.5). Total num frames: 1794048. Throughput: 0: 641.6. Samples: 449380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:50:40,377][06183] Avg episode reward: [(0, '4.452')] -[2023-02-22 19:50:42,687][15000] Updated weights for policy 0, policy_version 440 (0.0068) -[2023-02-22 19:50:45,329][06183] Fps is (10 sec: 2047.9, 60 sec: 2457.6, 300 sec: 4429.2). Total num frames: 1806336. Throughput: 0: 647.2. Samples: 451214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:50:45,345][06183] Avg episode reward: [(0, '4.428')] -[2023-02-22 19:50:50,329][06183] Fps is (10 sec: 2457.5, 60 sec: 2457.6, 300 sec: 4332.0). Total num frames: 1818624. Throughput: 0: 649.9. Samples: 454858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:50:50,341][06183] Avg episode reward: [(0, '4.342')] -[2023-02-22 19:50:55,329][06183] Fps is (10 sec: 2457.7, 60 sec: 2525.9, 300 sec: 4248.7). Total num frames: 1830912. Throughput: 0: 657.8. Samples: 458758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:50:55,341][06183] Avg episode reward: [(0, '4.225')] -[2023-02-22 19:50:57,894][15000] Updated weights for policy 0, policy_version 450 (0.0094) -[2023-02-22 19:51:00,329][06183] Fps is (10 sec: 2867.2, 60 sec: 2594.1, 300 sec: 4179.3). Total num frames: 1847296. Throughput: 0: 668.9. Samples: 461102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:51:00,364][06183] Avg episode reward: [(0, '4.264')] -[2023-02-22 19:51:05,330][06183] Fps is (10 sec: 2867.1, 60 sec: 2594.1, 300 sec: 4082.1). Total num frames: 1859584. Throughput: 0: 658.2. Samples: 464666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:51:05,347][06183] Avg episode reward: [(0, '4.306')] -[2023-02-22 19:51:10,329][06183] Fps is (10 sec: 3276.7, 60 sec: 2798.9, 300 sec: 4026.6). Total num frames: 1880064. Throughput: 0: 672.2. Samples: 470206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:51:10,335][06183] Avg episode reward: [(0, '4.395')] -[2023-02-22 19:51:10,967][15000] Updated weights for policy 0, policy_version 460 (0.0063) -[2023-02-22 19:51:15,329][06183] Fps is (10 sec: 3686.5, 60 sec: 2798.9, 300 sec: 3943.3). Total num frames: 1896448. Throughput: 0: 689.0. Samples: 473264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:51:15,336][06183] Avg episode reward: [(0, '4.431')] -[2023-02-22 19:51:20,329][06183] Fps is (10 sec: 3276.9, 60 sec: 2867.2, 300 sec: 3887.7). Total num frames: 1912832. Throughput: 0: 702.5. Samples: 477694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:51:20,337][06183] Avg episode reward: [(0, '4.327')] -[2023-02-22 19:51:22,882][15000] Updated weights for policy 0, policy_version 470 (0.0076) -[2023-02-22 19:51:25,329][06183] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 3832.2). Total num frames: 1929216. Throughput: 0: 745.9. Samples: 482946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:51:25,342][06183] Avg episode reward: [(0, '4.400')] -[2023-02-22 19:51:30,329][06183] Fps is (10 sec: 3276.7, 60 sec: 2867.2, 300 sec: 3776.6). Total num frames: 1945600. Throughput: 0: 757.2. Samples: 485290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:51:30,339][06183] Avg episode reward: [(0, '4.355')] -[2023-02-22 19:51:35,188][15000] Updated weights for policy 0, policy_version 480 (0.0058) -[2023-02-22 19:51:35,328][06183] Fps is (10 sec: 3686.6, 60 sec: 3003.8, 300 sec: 3721.1). Total num frames: 1966080. Throughput: 0: 794.8. Samples: 490622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:51:35,335][06183] Avg episode reward: [(0, '4.469')] -[2023-02-22 19:51:40,330][06183] Fps is (10 sec: 3686.2, 60 sec: 3140.2, 300 sec: 3679.4). Total num frames: 1982464. Throughput: 0: 831.6. Samples: 496182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:51:40,352][06183] Avg episode reward: [(0, '4.530')] -[2023-02-22 19:51:45,329][06183] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3651.7). Total num frames: 2002944. Throughput: 0: 837.7. Samples: 498800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:51:45,337][06183] Avg episode reward: [(0, '4.628')] -[2023-02-22 19:51:45,347][14984] Saving new best policy, reward=4.628! -[2023-02-22 19:51:46,467][15000] Updated weights for policy 0, policy_version 490 (0.0040) -[2023-02-22 19:51:50,329][06183] Fps is (10 sec: 3686.7, 60 sec: 3345.1, 300 sec: 3610.0). Total num frames: 2019328. Throughput: 0: 884.9. Samples: 504488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:51:50,334][06183] Avg episode reward: [(0, '4.772')] -[2023-02-22 19:51:50,507][14984] Saving new best policy, reward=4.772! -[2023-02-22 19:51:55,330][06183] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3582.3). Total num frames: 2039808. Throughput: 0: 886.4. Samples: 510092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:51:55,337][06183] Avg episode reward: [(0, '4.618')] -[2023-02-22 19:51:57,362][15000] Updated weights for policy 0, policy_version 500 (0.0057) -[2023-02-22 19:52:00,330][06183] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3540.6). Total num frames: 2056192. Throughput: 0: 879.3. Samples: 512832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:52:00,348][06183] Avg episode reward: [(0, '4.444')] -[2023-02-22 19:52:05,330][06183] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 2072576. Throughput: 0: 892.3. Samples: 517850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:52:05,337][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 19:52:08,766][15000] Updated weights for policy 0, policy_version 510 (0.0052) -[2023-02-22 19:52:10,329][06183] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2093056. Throughput: 0: 905.5. Samples: 523694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:52:10,336][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 19:52:10,415][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth... -[2023-02-22 19:52:11,271][14984] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000320_1310720.pth -[2023-02-22 19:52:15,329][06183] Fps is (10 sec: 3686.7, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 2109440. Throughput: 0: 914.8. Samples: 526456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:52:15,345][06183] Avg episode reward: [(0, '4.429')] -[2023-02-22 19:52:20,329][06183] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3429.5). Total num frames: 2125824. Throughput: 0: 907.8. Samples: 531472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:52:20,334][06183] Avg episode reward: [(0, '4.466')] -[2023-02-22 19:52:20,385][15000] Updated weights for policy 0, policy_version 520 (0.0075) -[2023-02-22 19:52:25,329][06183] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3443.4). Total num frames: 2146304. Throughput: 0: 905.6. Samples: 536934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:52:25,339][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 19:52:30,330][06183] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3429.5). Total num frames: 2162688. Throughput: 0: 904.3. Samples: 539496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:52:30,337][06183] Avg episode reward: [(0, '4.417')] -[2023-02-22 19:52:31,552][15000] Updated weights for policy 0, policy_version 530 (0.0052) -[2023-02-22 19:52:35,330][06183] Fps is (10 sec: 3686.1, 60 sec: 3618.0, 300 sec: 3415.6). Total num frames: 2183168. Throughput: 0: 902.2. Samples: 545088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:52:35,340][06183] Avg episode reward: [(0, '4.525')] -[2023-02-22 19:52:40,329][06183] Fps is (10 sec: 3686.7, 60 sec: 3618.2, 300 sec: 3401.8). Total num frames: 2199552. Throughput: 0: 897.8. Samples: 550492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:52:40,338][06183] Avg episode reward: [(0, '4.405')] -[2023-02-22 19:52:43,172][15000] Updated weights for policy 0, policy_version 540 (0.0057) -[2023-02-22 19:52:45,329][06183] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3387.9). Total num frames: 2215936. Throughput: 0: 894.9. Samples: 553102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:52:45,341][06183] Avg episode reward: [(0, '4.299')] -[2023-02-22 19:52:50,334][06183] Fps is (10 sec: 3684.6, 60 sec: 3617.8, 300 sec: 3387.8). Total num frames: 2236416. Throughput: 0: 904.6. Samples: 558560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:52:50,417][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 19:52:55,198][15000] Updated weights for policy 0, policy_version 550 (0.0059) -[2023-02-22 19:52:55,329][06183] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3346.2). Total num frames: 2252800. Throughput: 0: 877.9. Samples: 563198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:52:55,334][06183] Avg episode reward: [(0, '4.544')] -[2023-02-22 19:53:00,330][06183] Fps is (10 sec: 3278.2, 60 sec: 3549.9, 300 sec: 3318.4). Total num frames: 2269184. Throughput: 0: 875.5. Samples: 565856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:53:00,338][06183] Avg episode reward: [(0, '4.607')] -[2023-02-22 19:53:05,329][06183] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3304.6). Total num frames: 2285568. Throughput: 0: 883.9. Samples: 571248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:53:05,336][06183] Avg episode reward: [(0, '4.433')] -[2023-02-22 19:53:06,783][15000] Updated weights for policy 0, policy_version 560 (0.0075) -[2023-02-22 19:53:10,329][06183] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3290.7). Total num frames: 2306048. Throughput: 0: 882.0. Samples: 576624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:53:10,337][06183] Avg episode reward: [(0, '4.360')] -[2023-02-22 19:53:15,329][06183] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 2322432. Throughput: 0: 885.4. Samples: 579336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:53:15,338][06183] Avg episode reward: [(0, '4.393')] -[2023-02-22 19:53:17,881][15000] Updated weights for policy 0, policy_version 570 (0.0045) -[2023-02-22 19:53:20,329][06183] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 2342912. Throughput: 0: 883.7. Samples: 584852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:53:20,335][06183] Avg episode reward: [(0, '4.461')] -[2023-02-22 19:53:25,329][06183] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3262.9). Total num frames: 2359296. Throughput: 0: 881.6. Samples: 590162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:53:25,338][06183] Avg episode reward: [(0, '4.526')] -[2023-02-22 19:53:29,477][15000] Updated weights for policy 0, policy_version 580 (0.0044) -[2023-02-22 19:53:30,329][06183] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3235.1). Total num frames: 2375680. Throughput: 0: 883.1. Samples: 592842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 19:53:30,340][06183] Avg episode reward: [(0, '4.525')] -[2023-02-22 19:53:35,329][06183] Fps is (10 sec: 3686.2, 60 sec: 3549.9, 300 sec: 3235.1). Total num frames: 2396160. Throughput: 0: 882.5. Samples: 598268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:53:35,337][06183] Avg episode reward: [(0, '4.468')] -[2023-02-22 19:53:40,329][06183] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3221.3). Total num frames: 2412544. Throughput: 0: 903.2. Samples: 603844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 19:53:40,336][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 19:53:40,527][15000] Updated weights for policy 0, policy_version 590 (0.0061) -[2023-02-22 19:53:45,329][06183] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3221.3). Total num frames: 2433024. Throughput: 0: 906.1. Samples: 606630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2023-02-22 19:53:45,345][06183] Avg episode reward: [(0, '4.392')] -[2023-02-22 19:53:50,329][06183] Fps is (10 sec: 3686.5, 60 sec: 3550.2, 300 sec: 3207.4). Total num frames: 2449408. Throughput: 0: 912.9. Samples: 612328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:53:50,337][06183] Avg episode reward: [(0, '4.578')] -[2023-02-22 19:53:52,123][15000] Updated weights for policy 0, policy_version 600 (0.0062) -[2023-02-22 19:53:55,329][06183] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3193.5). Total num frames: 2465792. Throughput: 0: 889.9. Samples: 616670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:53:55,344][06183] Avg episode reward: [(0, '4.591')] -[2023-02-22 19:54:00,329][06183] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3179.6). Total num frames: 2482176. Throughput: 0: 880.1. Samples: 618942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:54:00,345][06183] Avg episode reward: [(0, '4.686')] -[2023-02-22 19:54:05,329][06183] Fps is (10 sec: 2457.5, 60 sec: 3413.3, 300 sec: 3179.6). Total num frames: 2490368. Throughput: 0: 846.7. Samples: 622956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:54:05,342][06183] Avg episode reward: [(0, '4.477')] -[2023-02-22 19:54:07,315][15000] Updated weights for policy 0, policy_version 610 (0.0059) -[2023-02-22 19:54:10,328][06183] Fps is (10 sec: 3686.7, 60 sec: 3549.9, 300 sec: 3235.2). Total num frames: 2519040. Throughput: 0: 876.1. Samples: 629588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:54:10,330][06183] Avg episode reward: [(0, '4.442')] -[2023-02-22 19:54:10,342][06183] Components not started: RolloutWorker_w4, wait_time=600.1 seconds -[2023-02-22 19:54:10,553][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000616_2523136.pth... -[2023-02-22 19:54:10,749][14984] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000419_1716224.pth -[2023-02-22 19:54:12,412][15000] Updated weights for policy 0, policy_version 620 (0.0013) -[2023-02-22 19:54:15,328][06183] Fps is (10 sec: 7373.7, 60 sec: 4027.8, 300 sec: 3332.3). Total num frames: 2564096. Throughput: 0: 965.9. Samples: 636308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:54:15,330][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 19:54:17,302][15000] Updated weights for policy 0, policy_version 630 (0.0011) -[2023-02-22 19:54:20,328][06183] Fps is (10 sec: 8601.6, 60 sec: 4369.1, 300 sec: 3415.7). Total num frames: 2605056. Throughput: 0: 1122.4. Samples: 648774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) -[2023-02-22 19:54:20,330][06183] Avg episode reward: [(0, '4.489')] -[2023-02-22 19:54:22,046][15000] Updated weights for policy 0, policy_version 640 (0.0014) -[2023-02-22 19:54:25,328][06183] Fps is (10 sec: 8191.8, 60 sec: 4778.7, 300 sec: 3499.0). Total num frames: 2646016. Throughput: 0: 1274.0. Samples: 661174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:54:25,332][06183] Avg episode reward: [(0, '4.348')] -[2023-02-22 19:54:27,020][15000] Updated weights for policy 0, policy_version 650 (0.0015) -[2023-02-22 19:54:30,328][06183] Fps is (10 sec: 8192.0, 60 sec: 5188.3, 300 sec: 3582.3). Total num frames: 2686976. Throughput: 0: 1352.3. Samples: 667484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:54:30,332][06183] Avg episode reward: [(0, '4.599')] -[2023-02-22 19:54:32,143][15000] Updated weights for policy 0, policy_version 660 (0.0013) -[2023-02-22 19:54:35,328][06183] Fps is (10 sec: 8192.2, 60 sec: 5529.7, 300 sec: 3679.5). Total num frames: 2727936. Throughput: 0: 1505.4. Samples: 680070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 19:54:35,331][06183] Avg episode reward: [(0, '4.600')] -[2023-02-22 19:54:36,880][15000] Updated weights for policy 0, policy_version 670 (0.0015) -[2023-02-22 19:54:40,328][06183] Fps is (10 sec: 8192.1, 60 sec: 5939.3, 300 sec: 3762.8). Total num frames: 2768896. Throughput: 0: 1682.3. Samples: 692374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) -[2023-02-22 19:54:40,330][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 19:54:41,917][15000] Updated weights for policy 0, policy_version 680 (0.0014) -[2023-02-22 19:54:45,328][06183] Fps is (10 sec: 8191.8, 60 sec: 6280.6, 300 sec: 3860.0). Total num frames: 2809856. Throughput: 0: 1771.9. Samples: 698674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:54:45,335][06183] Avg episode reward: [(0, '4.540')] -[2023-02-22 19:54:47,030][15000] Updated weights for policy 0, policy_version 690 (0.0017) -[2023-02-22 19:54:50,329][06183] Fps is (10 sec: 8191.7, 60 sec: 6690.2, 300 sec: 3971.0). Total num frames: 2850816. Throughput: 0: 1943.1. Samples: 710392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 19:54:50,332][06183] Avg episode reward: [(0, '4.672')] -[2023-02-22 19:54:52,306][15000] Updated weights for policy 0, policy_version 700 (0.0014) -[2023-02-22 19:54:55,328][06183] Fps is (10 sec: 8192.0, 60 sec: 7099.8, 300 sec: 4068.2). Total num frames: 2891776. Throughput: 0: 2059.7. Samples: 722274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:54:55,331][06183] Avg episode reward: [(0, '4.302')] -[2023-02-22 19:54:57,507][15000] Updated weights for policy 0, policy_version 710 (0.0016) -[2023-02-22 19:55:00,329][06183] Fps is (10 sec: 7782.1, 60 sec: 7441.1, 300 sec: 4151.5). Total num frames: 2928640. Throughput: 0: 2040.9. Samples: 728152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:55:00,332][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 19:55:02,833][15000] Updated weights for policy 0, policy_version 720 (0.0017) -[2023-02-22 19:55:05,328][06183] Fps is (10 sec: 7372.9, 60 sec: 7919.1, 300 sec: 4248.7). Total num frames: 2965504. Throughput: 0: 2017.5. Samples: 739562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:05,332][06183] Avg episode reward: [(0, '4.500')] -[2023-02-22 19:55:08,366][15000] Updated weights for policy 0, policy_version 730 (0.0022) -[2023-02-22 19:55:10,328][06183] Fps is (10 sec: 7373.3, 60 sec: 8055.5, 300 sec: 4318.2). Total num frames: 3002368. Throughput: 0: 1981.9. Samples: 750360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:55:10,332][06183] Avg episode reward: [(0, '4.632')] -[2023-02-22 19:55:13,945][15000] Updated weights for policy 0, policy_version 740 (0.0016) -[2023-02-22 19:55:15,328][06183] Fps is (10 sec: 7372.8, 60 sec: 7918.9, 300 sec: 4401.5). Total num frames: 3039232. Throughput: 0: 1972.0. Samples: 756222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:15,331][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 19:55:19,232][15000] Updated weights for policy 0, policy_version 750 (0.0015) -[2023-02-22 19:55:20,328][06183] Fps is (10 sec: 7782.3, 60 sec: 7918.9, 300 sec: 4484.8). Total num frames: 3080192. Throughput: 0: 1945.2. Samples: 767604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:20,332][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 19:55:25,240][15000] Updated weights for policy 0, policy_version 760 (0.0023) -[2023-02-22 19:55:25,329][06183] Fps is (10 sec: 7372.6, 60 sec: 7782.4, 300 sec: 4540.3). Total num frames: 3112960. Throughput: 0: 1905.4. Samples: 778120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:55:25,331][06183] Avg episode reward: [(0, '4.305')] -[2023-02-22 19:55:30,328][06183] Fps is (10 sec: 6553.6, 60 sec: 7645.8, 300 sec: 4609.7). Total num frames: 3145728. Throughput: 0: 1884.2. Samples: 783462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:30,332][06183] Avg episode reward: [(0, '4.478')] -[2023-02-22 19:55:30,929][15000] Updated weights for policy 0, policy_version 770 (0.0016) -[2023-02-22 19:55:35,328][06183] Fps is (10 sec: 6963.3, 60 sec: 7577.6, 300 sec: 4706.9). Total num frames: 3182592. Throughput: 0: 1848.1. Samples: 793558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 19:55:35,331][06183] Avg episode reward: [(0, '4.278')] -[2023-02-22 19:55:36,915][15000] Updated weights for policy 0, policy_version 780 (0.0023) -[2023-02-22 19:55:40,328][06183] Fps is (10 sec: 7372.9, 60 sec: 7509.3, 300 sec: 4790.3). Total num frames: 3219456. Throughput: 0: 1828.2. Samples: 804542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:40,331][06183] Avg episode reward: [(0, '4.435')] -[2023-02-22 19:55:42,396][15000] Updated weights for policy 0, policy_version 790 (0.0015) -[2023-02-22 19:55:45,328][06183] Fps is (10 sec: 7373.0, 60 sec: 7441.1, 300 sec: 4873.6). Total num frames: 3256320. Throughput: 0: 1819.0. Samples: 810008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:55:45,332][06183] Avg episode reward: [(0, '4.464')] -[2023-02-22 19:55:48,358][15000] Updated weights for policy 0, policy_version 800 (0.0015) -[2023-02-22 19:55:50,328][06183] Fps is (10 sec: 6963.1, 60 sec: 7304.6, 300 sec: 4943.0). Total num frames: 3289088. Throughput: 0: 1798.2. Samples: 820480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:50,333][06183] Avg episode reward: [(0, '4.343')] -[2023-02-22 19:55:54,485][15000] Updated weights for policy 0, policy_version 810 (0.0015) -[2023-02-22 19:55:55,328][06183] Fps is (10 sec: 6553.5, 60 sec: 7168.0, 300 sec: 4998.5). Total num frames: 3321856. Throughput: 0: 1776.1. Samples: 830284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:55:55,331][06183] Avg episode reward: [(0, '4.378')] -[2023-02-22 19:56:00,328][06183] Fps is (10 sec: 6553.6, 60 sec: 7099.8, 300 sec: 5067.9). Total num frames: 3354624. Throughput: 0: 1762.8. Samples: 835550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:56:00,333][06183] Avg episode reward: [(0, '4.323')] -[2023-02-22 19:56:00,418][15000] Updated weights for policy 0, policy_version 820 (0.0019) -[2023-02-22 19:56:05,329][06183] Fps is (10 sec: 6553.3, 60 sec: 7031.4, 300 sec: 5109.6). Total num frames: 3387392. Throughput: 0: 1734.9. Samples: 845674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:05,331][06183] Avg episode reward: [(0, '4.397')] -[2023-02-22 19:56:06,625][15000] Updated weights for policy 0, policy_version 830 (0.0022) -[2023-02-22 19:56:10,328][06183] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 5165.1). Total num frames: 3420160. Throughput: 0: 1722.4. Samples: 855626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:10,332][06183] Avg episode reward: [(0, '4.600')] -[2023-02-22 19:56:10,485][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000836_3424256.pth... -[2023-02-22 19:56:10,930][14984] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth -[2023-02-22 19:56:12,963][15000] Updated weights for policy 0, policy_version 840 (0.0019) -[2023-02-22 19:56:15,328][06183] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 5234.6). Total num frames: 3457024. Throughput: 0: 1707.0. Samples: 860276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:56:15,332][06183] Avg episode reward: [(0, '4.545')] -[2023-02-22 19:56:19,252][15000] Updated weights for policy 0, policy_version 850 (0.0022) -[2023-02-22 19:56:20,328][06183] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 5276.2). Total num frames: 3485696. Throughput: 0: 1700.8. Samples: 870094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:20,331][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 19:56:25,328][06183] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 5331.8). Total num frames: 3518464. Throughput: 0: 1677.1. Samples: 880014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:25,332][06183] Avg episode reward: [(0, '4.440')] -[2023-02-22 19:56:25,388][15000] Updated weights for policy 0, policy_version 860 (0.0021) -[2023-02-22 19:56:30,328][06183] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 5373.4). Total num frames: 3551232. Throughput: 0: 1667.3. Samples: 885036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:30,332][06183] Avg episode reward: [(0, '4.578')] -[2023-02-22 19:56:31,523][15000] Updated weights for policy 0, policy_version 870 (0.0021) -[2023-02-22 19:56:35,328][06183] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 5442.8). Total num frames: 3588096. Throughput: 0: 1658.0. Samples: 895088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:35,331][06183] Avg episode reward: [(0, '4.614')] -[2023-02-22 19:56:37,650][15000] Updated weights for policy 0, policy_version 880 (0.0024) -[2023-02-22 19:56:40,328][06183] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 5484.5). Total num frames: 3620864. Throughput: 0: 1661.6. Samples: 905056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:40,335][06183] Avg episode reward: [(0, '4.402')] -[2023-02-22 19:56:43,775][15000] Updated weights for policy 0, policy_version 890 (0.0017) -[2023-02-22 19:56:45,328][06183] Fps is (10 sec: 6553.6, 60 sec: 6621.8, 300 sec: 5540.0). Total num frames: 3653632. Throughput: 0: 1656.3. Samples: 910082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:56:45,333][06183] Avg episode reward: [(0, '4.616')] -[2023-02-22 19:56:50,328][06183] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 5581.7). Total num frames: 3686400. Throughput: 0: 1642.3. Samples: 919578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 19:56:50,332][15000] Updated weights for policy 0, policy_version 900 (0.0030) -[2023-02-22 19:56:50,332][06183] Avg episode reward: [(0, '4.342')] -[2023-02-22 19:56:55,329][06183] Fps is (10 sec: 6143.7, 60 sec: 6553.5, 300 sec: 5623.3). Total num frames: 3715072. Throughput: 0: 1636.5. Samples: 929270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:56:55,335][06183] Avg episode reward: [(0, '4.242')] -[2023-02-22 19:56:56,845][15000] Updated weights for policy 0, policy_version 910 (0.0034) -[2023-02-22 19:57:00,329][06183] Fps is (10 sec: 5733.8, 60 sec: 6485.2, 300 sec: 5665.0). Total num frames: 3743744. Throughput: 0: 1627.8. Samples: 933528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:57:00,333][06183] Avg episode reward: [(0, '4.389')] -[2023-02-22 19:57:04,308][15000] Updated weights for policy 0, policy_version 920 (0.0029) -[2023-02-22 19:57:05,329][06183] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 5692.7). Total num frames: 3772416. Throughput: 0: 1594.5. Samples: 941848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:57:05,339][06183] Avg episode reward: [(0, '4.555')] -[2023-02-22 19:57:10,329][06183] Fps is (10 sec: 5325.2, 60 sec: 6280.5, 300 sec: 5720.5). Total num frames: 3796992. Throughput: 0: 1544.1. Samples: 949498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:57:10,333][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 19:57:11,788][15000] Updated weights for policy 0, policy_version 930 (0.0033) -[2023-02-22 19:57:15,328][06183] Fps is (10 sec: 5734.5, 60 sec: 6212.2, 300 sec: 5776.1). Total num frames: 3829760. Throughput: 0: 1535.6. Samples: 954138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 19:57:15,334][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 19:57:19,282][15000] Updated weights for policy 0, policy_version 940 (0.0031) -[2023-02-22 19:57:20,329][06183] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 5790.0). Total num frames: 3854336. Throughput: 0: 1499.3. Samples: 962558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 19:57:20,333][06183] Avg episode reward: [(0, '4.441')] -[2023-02-22 19:57:25,329][06183] Fps is (10 sec: 5324.7, 60 sec: 6075.7, 300 sec: 5831.6). Total num frames: 3883008. Throughput: 0: 1461.5. Samples: 970824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 19:57:25,334][06183] Avg episode reward: [(0, '4.174')] -[2023-02-22 19:57:26,479][15000] Updated weights for policy 0, policy_version 950 (0.0030) -[2023-02-22 19:57:30,328][06183] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 5859.4). Total num frames: 3911680. Throughput: 0: 1446.0. Samples: 975152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:57:30,333][06183] Avg episode reward: [(0, '4.382')] -[2023-02-22 19:57:33,771][15000] Updated weights for policy 0, policy_version 960 (0.0027) -[2023-02-22 19:57:35,329][06183] Fps is (10 sec: 5734.1, 60 sec: 5870.9, 300 sec: 5901.0). Total num frames: 3940352. Throughput: 0: 1420.5. Samples: 983500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:57:35,333][06183] Avg episode reward: [(0, '4.299')] -[2023-02-22 19:57:40,328][06183] Fps is (10 sec: 5734.4, 60 sec: 5802.6, 300 sec: 5942.7). Total num frames: 3969024. Throughput: 0: 1400.8. Samples: 992306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 19:57:40,334][06183] Avg episode reward: [(0, '4.471')] -[2023-02-22 19:57:40,870][15000] Updated weights for policy 0, policy_version 970 (0.0023) -[2023-02-22 19:57:45,329][06183] Fps is (10 sec: 5734.8, 60 sec: 5734.4, 300 sec: 5970.6). Total num frames: 3997696. Throughput: 0: 1392.6. Samples: 996192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 19:57:45,334][06183] Avg episode reward: [(0, '4.346')] -[2023-02-22 19:57:46,777][14984] Stopping Batcher_0... -[2023-02-22 19:57:46,781][14984] Loop batcher_evt_loop terminating... -[2023-02-22 19:57:46,787][06183] Component Batcher_0 stopped! -[2023-02-22 19:57:46,792][06183] Component RolloutWorker_w4 process died already! Don't wait for it. -[2023-02-22 19:57:46,795][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-02-22 19:57:46,811][15003] Stopping RolloutWorker_w3... -[2023-02-22 19:57:46,814][15003] Loop rollout_proc3_evt_loop terminating... -[2023-02-22 19:57:46,812][06183] Component RolloutWorker_w3 stopped! -[2023-02-22 19:57:46,814][15002] Stopping RolloutWorker_w1... -[2023-02-22 19:57:46,814][15000] Weights refcount: 2 0 -[2023-02-22 19:57:46,814][15004] Stopping RolloutWorker_w2... -[2023-02-22 19:57:46,815][15008] Stopping RolloutWorker_w5... -[2023-02-22 19:57:46,815][15007] Stopping RolloutWorker_w7... -[2023-02-22 19:57:46,815][15001] Stopping RolloutWorker_w0... -[2023-02-22 19:57:46,815][15006] Stopping RolloutWorker_w6... -[2023-02-22 19:57:46,816][15002] Loop rollout_proc1_evt_loop terminating... -[2023-02-22 19:57:46,817][15004] Loop rollout_proc2_evt_loop terminating... -[2023-02-22 19:57:46,818][15008] Loop rollout_proc5_evt_loop terminating... -[2023-02-22 19:57:46,819][15007] Loop rollout_proc7_evt_loop terminating... -[2023-02-22 19:57:46,819][15001] Loop rollout_proc0_evt_loop terminating... -[2023-02-22 19:57:46,816][06183] Component RolloutWorker_w1 stopped! -[2023-02-22 19:57:46,819][15006] Loop rollout_proc6_evt_loop terminating... -[2023-02-22 19:57:46,822][15000] Stopping InferenceWorker_p0-w0... -[2023-02-22 19:57:46,822][06183] Component RolloutWorker_w2 stopped! -[2023-02-22 19:57:46,831][15000] Loop inference_proc0-0_evt_loop terminating... -[2023-02-22 19:57:46,831][06183] Component RolloutWorker_w0 stopped! -[2023-02-22 19:57:46,839][06183] Component RolloutWorker_w6 stopped! -[2023-02-22 19:57:46,843][06183] Component RolloutWorker_w5 stopped! -[2023-02-22 19:57:46,851][06183] Component RolloutWorker_w7 stopped! -[2023-02-22 19:57:46,858][06183] Component InferenceWorker_p0-w0 stopped! -[2023-02-22 19:57:47,288][14984] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000616_2523136.pth -[2023-02-22 19:57:47,326][14984] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-02-22 19:57:47,774][14984] Stopping LearnerWorker_p0... -[2023-02-22 19:57:47,776][14984] Loop learner_proc0_evt_loop terminating... -[2023-02-22 19:57:47,774][06183] Component LearnerWorker_p0 stopped! -[2023-02-22 19:57:47,780][06183] Waiting for process learner_proc0 to stop... -[2023-02-22 19:57:51,038][06183] Waiting for process inference_proc0-0 to join... -[2023-02-22 19:57:51,041][06183] Waiting for process rollout_proc0 to join... -[2023-02-22 19:57:51,044][06183] Waiting for process rollout_proc1 to join... -[2023-02-22 19:57:51,048][06183] Waiting for process rollout_proc2 to join... -[2023-02-22 19:57:51,051][06183] Waiting for process rollout_proc3 to join... -[2023-02-22 19:57:51,054][06183] Waiting for process rollout_proc4 to join... -[2023-02-22 19:57:51,057][06183] Waiting for process rollout_proc5 to join... -[2023-02-22 19:57:51,061][06183] Waiting for process rollout_proc6 to join... -[2023-02-22 19:57:51,065][06183] Waiting for process rollout_proc7 to join... -[2023-02-22 19:57:51,070][06183] Batcher 0 profile tree view: -batching: 25.7881, releasing_batches: 0.0596 -[2023-02-22 19:57:51,073][06183] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0001 - wait_policy_total: 12.6567 -update_model: 11.5028 - weight_update: 0.0038 -one_step: 0.0060 - handle_policy_step: 735.6968 - deserialize: 18.8574, stack: 3.9381, obs_to_device_normalize: 177.8106, forward: 278.2881, send_messages: 53.0078 - prepare_outputs: 178.7753 - to_cpu: 152.6849 -[2023-02-22 19:57:51,076][06183] Learner 0 profile tree view: -misc: 0.0097, prepare_batch: 78.0888 -train: 160.3988 - epoch_init: 0.0142, minibatch_init: 0.0160, losses_postprocess: 1.1488, kl_divergence: 1.2237, after_optimizer: 84.7016 - calculate_losses: 46.2952 - losses_init: 0.0071, forward_head: 2.9452, bptt_initial: 33.8132, tail: 1.5771, advantages_returns: 0.4771, losses: 3.8496 - bptt: 3.1942 - bptt_forward_core: 3.0460 - update: 26.0565 - clip: 3.4502 -[2023-02-22 19:57:51,080][06183] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.3167, enqueue_policy_requests: 18.2264, env_step: 361.8246, overhead: 28.6136, complete_rollouts: 0.7092 -save_policy_outputs: 22.1311 - split_output_tensors: 10.4875 -[2023-02-22 19:57:51,083][06183] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.3322, enqueue_policy_requests: 18.1051, env_step: 360.2381, overhead: 28.7457, complete_rollouts: 0.6980 -save_policy_outputs: 22.5731 - split_output_tensors: 10.6929 -[2023-02-22 19:57:51,087][06183] Loop Runner_EvtLoop terminating... -[2023-02-22 19:57:51,090][06183] Runner profile tree view: -main_loop: 814.7198 -[2023-02-22 19:57:51,093][06183] Collected {0: 4005888}, FPS: 4916.9 -[2023-02-22 20:25:32,536][06183] Loading existing experiment configuration from /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2023-02-22 20:25:32,541][06183] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-22 20:25:32,544][06183] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-22 20:25:32,547][06183] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-22 20:25:32,550][06183] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-22 20:25:32,552][06183] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-22 20:25:32,553][06183] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2023-02-22 20:25:32,555][06183] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-22 20:25:32,556][06183] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2023-02-22 20:25:32,558][06183] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2023-02-22 20:25:32,560][06183] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-22 20:25:32,562][06183] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-22 20:25:32,564][06183] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-22 20:25:32,566][06183] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-22 20:25:32,568][06183] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-22 20:25:32,609][06183] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:25:32,617][06183] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 20:25:32,631][06183] RunningMeanStd input shape: (1,) -[2023-02-22 20:25:32,717][06183] ConvEncoder: input_channels=3 -[2023-02-22 20:25:33,575][06183] Conv encoder output size: 512 -[2023-02-22 20:25:33,578][06183] Policy head output size: 512 -[2023-02-22 20:25:38,271][06183] Loading state from checkpoint /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-02-22 20:25:42,521][06183] Num frames 100... -[2023-02-22 20:25:42,700][06183] Num frames 200... -[2023-02-22 20:25:42,879][06183] Num frames 300... -[2023-02-22 20:25:43,085][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-22 20:25:43,087][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-22 20:25:43,117][06183] Num frames 400... -[2023-02-22 20:25:43,298][06183] Num frames 500... -[2023-02-22 20:25:43,474][06183] Num frames 600... -[2023-02-22 20:25:43,647][06183] Num frames 700... -[2023-02-22 20:25:43,836][06183] Num frames 800... -[2023-02-22 20:25:43,954][06183] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2023-02-22 20:25:43,959][06183] Avg episode reward: 4.660, avg true_objective: 4.160 -[2023-02-22 20:25:44,104][06183] Num frames 900... -[2023-02-22 20:25:44,289][06183] Num frames 1000... -[2023-02-22 20:25:44,471][06183] Num frames 1100... -[2023-02-22 20:25:44,649][06183] Num frames 1200... -[2023-02-22 20:25:44,742][06183] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2023-02-22 20:25:44,745][06183] Avg episode reward: 4.387, avg true_objective: 4.053 -[2023-02-22 20:25:44,898][06183] Num frames 1300... -[2023-02-22 20:25:45,081][06183] Num frames 1400... -[2023-02-22 20:25:45,262][06183] Avg episode rewards: #0: 3.930, true rewards: #0: 3.680 -[2023-02-22 20:25:45,264][06183] Avg episode reward: 3.930, avg true_objective: 3.680 -[2023-02-22 20:25:45,323][06183] Num frames 1500... -[2023-02-22 20:25:45,500][06183] Num frames 1600... -[2023-02-22 20:25:45,677][06183] Num frames 1700... -[2023-02-22 20:25:45,855][06183] Num frames 1800... -[2023-02-22 20:25:46,035][06183] Num frames 1900... -[2023-02-22 20:25:46,130][06183] Avg episode rewards: #0: 4.240, true rewards: #0: 3.840 -[2023-02-22 20:25:46,132][06183] Avg episode reward: 4.240, avg true_objective: 3.840 -[2023-02-22 20:25:46,276][06183] Num frames 2000... -[2023-02-22 20:25:46,450][06183] Num frames 2100... -[2023-02-22 20:25:46,622][06183] Num frames 2200... -[2023-02-22 20:25:46,816][06183] Num frames 2300... -[2023-02-22 20:25:46,881][06183] Avg episode rewards: #0: 4.173, true rewards: #0: 3.840 -[2023-02-22 20:25:46,884][06183] Avg episode reward: 4.173, avg true_objective: 3.840 -[2023-02-22 20:25:47,064][06183] Num frames 2400... -[2023-02-22 20:25:47,255][06183] Num frames 2500... -[2023-02-22 20:25:47,435][06183] Num frames 2600... -[2023-02-22 20:25:47,655][06183] Avg episode rewards: #0: 4.126, true rewards: #0: 3.840 -[2023-02-22 20:25:47,658][06183] Avg episode reward: 4.126, avg true_objective: 3.840 -[2023-02-22 20:25:47,686][06183] Num frames 2700... -[2023-02-22 20:25:47,876][06183] Num frames 2800... -[2023-02-22 20:25:48,054][06183] Num frames 2900... -[2023-02-22 20:25:48,238][06183] Num frames 3000... -[2023-02-22 20:25:48,411][06183] Num frames 3100... -[2023-02-22 20:25:48,479][06183] Avg episode rewards: #0: 4.255, true rewards: #0: 3.880 -[2023-02-22 20:25:48,481][06183] Avg episode reward: 4.255, avg true_objective: 3.880 -[2023-02-22 20:25:48,654][06183] Num frames 3200... -[2023-02-22 20:25:48,826][06183] Num frames 3300... -[2023-02-22 20:25:49,002][06183] Num frames 3400... -[2023-02-22 20:25:49,219][06183] Avg episode rewards: #0: 4.209, true rewards: #0: 3.876 -[2023-02-22 20:25:49,221][06183] Avg episode reward: 4.209, avg true_objective: 3.876 -[2023-02-22 20:25:49,271][06183] Num frames 3500... -[2023-02-22 20:25:49,455][06183] Num frames 3600... -[2023-02-22 20:25:49,636][06183] Num frames 3700... -[2023-02-22 20:25:49,820][06183] Num frames 3800... -[2023-02-22 20:25:49,897][06183] Avg episode rewards: #0: 4.208, true rewards: #0: 3.808 -[2023-02-22 20:25:49,900][06183] Avg episode reward: 4.208, avg true_objective: 3.808 -[2023-02-22 20:25:51,533][06183] Replay video saved to /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2023-02-22 20:28:07,610][06183] Loading existing experiment configuration from /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2023-02-22 20:28:07,617][06183] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-22 20:28:07,620][06183] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-22 20:28:07,623][06183] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-22 20:28:07,625][06183] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-22 20:28:07,627][06183] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-22 20:28:07,630][06183] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-02-22 20:28:07,632][06183] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-22 20:28:07,634][06183] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-02-22 20:28:07,636][06183] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-02-22 20:28:07,638][06183] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-22 20:28:07,639][06183] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-22 20:28:07,640][06183] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-22 20:28:07,642][06183] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-22 20:28:07,643][06183] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-22 20:28:07,661][06183] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 20:28:07,665][06183] RunningMeanStd input shape: (1,) -[2023-02-22 20:28:07,680][06183] ConvEncoder: input_channels=3 -[2023-02-22 20:28:07,712][06183] Conv encoder output size: 512 -[2023-02-22 20:28:07,714][06183] Policy head output size: 512 -[2023-02-22 20:28:07,762][06183] Loading state from checkpoint /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-02-22 20:28:09,216][06183] Num frames 100... -[2023-02-22 20:28:09,390][06183] Num frames 200... -[2023-02-22 20:28:09,564][06183] Num frames 300... -[2023-02-22 20:28:09,775][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-22 20:28:09,777][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-22 20:28:09,833][06183] Num frames 400... -[2023-02-22 20:28:10,015][06183] Num frames 500... -[2023-02-22 20:28:10,196][06183] Num frames 600... -[2023-02-22 20:28:10,374][06183] Num frames 700... -[2023-02-22 20:28:10,551][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-22 20:28:10,555][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-22 20:28:10,614][06183] Num frames 800... -[2023-02-22 20:28:10,799][06183] Num frames 900... -[2023-02-22 20:28:10,978][06183] Num frames 1000... -[2023-02-22 20:28:11,080][06183] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 -[2023-02-22 20:28:11,082][06183] Avg episode reward: 3.413, avg true_objective: 3.413 -[2023-02-22 20:28:11,220][06183] Num frames 1100... -[2023-02-22 20:28:11,403][06183] Num frames 1200... -[2023-02-22 20:28:11,589][06183] Num frames 1300... -[2023-02-22 20:28:11,801][06183] Num frames 1400... -[2023-02-22 20:28:11,880][06183] Avg episode rewards: #0: 3.520, true rewards: #0: 3.520 -[2023-02-22 20:28:11,883][06183] Avg episode reward: 3.520, avg true_objective: 3.520 -[2023-02-22 20:28:12,065][06183] Num frames 1500... -[2023-02-22 20:28:12,242][06183] Num frames 1600... -[2023-02-22 20:28:12,420][06183] Num frames 1700... -[2023-02-22 20:28:12,635][06183] Avg episode rewards: #0: 3.584, true rewards: #0: 3.584 -[2023-02-22 20:28:12,639][06183] Avg episode reward: 3.584, avg true_objective: 3.584 -[2023-02-22 20:28:12,659][06183] Num frames 1800... -[2023-02-22 20:28:12,838][06183] Num frames 1900... -[2023-02-22 20:28:13,023][06183] Num frames 2000... -[2023-02-22 20:28:13,210][06183] Num frames 2100... -[2023-02-22 20:28:13,401][06183] Num frames 2200... -[2023-02-22 20:28:13,475][06183] Avg episode rewards: #0: 3.847, true rewards: #0: 3.680 -[2023-02-22 20:28:13,477][06183] Avg episode reward: 3.847, avg true_objective: 3.680 -[2023-02-22 20:28:13,644][06183] Num frames 2300... -[2023-02-22 20:28:13,814][06183] Num frames 2400... -[2023-02-22 20:28:13,990][06183] Num frames 2500... -[2023-02-22 20:28:14,187][06183] Num frames 2600... -[2023-02-22 20:28:14,364][06183] Num frames 2700... -[2023-02-22 20:28:14,572][06183] Avg episode rewards: #0: 4.549, true rewards: #0: 3.977 -[2023-02-22 20:28:14,575][06183] Avg episode reward: 4.549, avg true_objective: 3.977 -[2023-02-22 20:28:14,610][06183] Num frames 2800... -[2023-02-22 20:28:14,800][06183] Num frames 2900... -[2023-02-22 20:28:14,990][06183] Num frames 3000... -[2023-02-22 20:28:15,181][06183] Num frames 3100... -[2023-02-22 20:28:15,363][06183] Num frames 3200... -[2023-02-22 20:28:15,486][06183] Avg episode rewards: #0: 4.665, true rewards: #0: 4.040 -[2023-02-22 20:28:15,488][06183] Avg episode reward: 4.665, avg true_objective: 4.040 -[2023-02-22 20:28:15,635][06183] Num frames 3300... -[2023-02-22 20:28:15,828][06183] Num frames 3400... -[2023-02-22 20:28:16,009][06183] Num frames 3500... -[2023-02-22 20:28:16,196][06183] Num frames 3600... -[2023-02-22 20:28:16,399][06183] Avg episode rewards: #0: 4.756, true rewards: #0: 4.089 -[2023-02-22 20:28:16,401][06183] Avg episode reward: 4.756, avg true_objective: 4.089 -[2023-02-22 20:28:16,444][06183] Num frames 3700... -[2023-02-22 20:28:16,645][06183] Num frames 3800... -[2023-02-22 20:28:16,836][06183] Num frames 3900... -[2023-02-22 20:28:16,960][06183] Avg episode rewards: #0: 4.536, true rewards: #0: 3.936 -[2023-02-22 20:28:16,964][06183] Avg episode reward: 4.536, avg true_objective: 3.936 -[2023-02-22 20:28:18,569][06183] Replay video saved to /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2023-02-22 20:28:59,305][06183] The model has been pushed to https://huggingface.co/chqmatteo/rl_course_vizdoom_health_gathering_supreme -[2023-02-22 20:43:50,535][06183] Environment doom_basic already registered, overwriting... -[2023-02-22 20:43:50,537][06183] Environment doom_two_colors_easy already registered, overwriting... -[2023-02-22 20:43:50,540][06183] Environment doom_two_colors_hard already registered, overwriting... -[2023-02-22 20:43:50,542][06183] Environment doom_dm already registered, overwriting... -[2023-02-22 20:43:50,544][06183] Environment doom_dwango5 already registered, overwriting... -[2023-02-22 20:43:50,546][06183] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2023-02-22 20:43:50,547][06183] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2023-02-22 20:43:50,548][06183] Environment doom_my_way_home already registered, overwriting... -[2023-02-22 20:43:50,550][06183] Environment doom_deadly_corridor already registered, overwriting... -[2023-02-22 20:43:50,552][06183] Environment doom_defend_the_center already registered, overwriting... -[2023-02-22 20:43:50,556][06183] Environment doom_defend_the_line already registered, overwriting... -[2023-02-22 20:43:50,559][06183] Environment doom_health_gathering already registered, overwriting... -[2023-02-22 20:43:50,561][06183] Environment doom_health_gathering_supreme already registered, overwriting... -[2023-02-22 20:43:50,563][06183] Environment doom_battle already registered, overwriting... -[2023-02-22 20:43:50,565][06183] Environment doom_battle2 already registered, overwriting... -[2023-02-22 20:43:50,568][06183] Environment doom_duel_bots already registered, overwriting... -[2023-02-22 20:43:50,570][06183] Environment doom_deathmatch_bots already registered, overwriting... -[2023-02-22 20:43:50,571][06183] Environment doom_duel already registered, overwriting... -[2023-02-22 20:43:50,573][06183] Environment doom_deathmatch_full already registered, overwriting... -[2023-02-22 20:43:50,575][06183] Environment doom_benchmark already registered, overwriting... -[2023-02-22 20:43:50,577][06183] register_encoder_factory: -[2023-02-22 20:43:50,991][06183] Loading existing experiment configuration from /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2023-02-22 20:43:50,995][06183] Overriding arg 'train_for_env_steps' with value 40000000 passed from command line -[2023-02-22 20:43:51,002][06183] Experiment dir /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2023-02-22 20:43:51,004][06183] Resuming existing experiment from /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2023-02-22 20:43:51,006][06183] Weights and Biases integration disabled -[2023-02-22 20:43:51,027][06183] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2023-02-22 20:43:55,013][06183] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=8 -num_envs_per_worker=4 -batch_size=1024 -num_batches_per_epoch=1 -num_epochs=1 -rollout=32 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.1 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0001 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=40000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=372eb1042c1a2a82a2684e1795d47eaa26c046f7 -git_repo_name=https://github.com/huggingface/deep-rl-class.git -[2023-02-22 20:43:55,027][06183] Saving configuration to /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2023-02-22 20:43:56,112][06183] Rollout worker 0 uses device cpu -[2023-02-22 20:43:56,114][06183] Rollout worker 1 uses device cpu -[2023-02-22 20:43:56,116][06183] Rollout worker 2 uses device cpu -[2023-02-22 20:43:56,117][06183] Rollout worker 3 uses device cpu -[2023-02-22 20:43:56,119][06183] Rollout worker 4 uses device cpu -[2023-02-22 20:43:56,123][06183] Rollout worker 5 uses device cpu -[2023-02-22 20:43:56,125][06183] Rollout worker 6 uses device cpu -[2023-02-22 20:43:56,128][06183] Rollout worker 7 uses device cpu -[2023-02-22 20:43:56,173][06183] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 20:43:56,177][06183] InferenceWorker_p0-w0: min num requests: 2 -[2023-02-22 20:43:56,204][06183] Starting all processes... -[2023-02-22 20:43:56,206][06183] Starting process learner_proc0 -[2023-02-22 20:43:56,594][06183] Starting all processes... -[2023-02-22 20:43:56,607][06183] Starting process inference_proc0-0 -[2023-02-22 20:43:56,613][06183] Starting process rollout_proc0 -[2023-02-22 20:43:56,614][06183] Starting process rollout_proc1 -[2023-02-22 20:43:56,615][06183] Starting process rollout_proc2 -[2023-02-22 20:43:56,616][06183] Starting process rollout_proc3 -[2023-02-22 20:43:56,620][06183] Starting process rollout_proc4 -[2023-02-22 20:43:56,624][06183] Starting process rollout_proc5 -[2023-02-22 20:43:56,627][06183] Starting process rollout_proc6 -[2023-02-22 20:43:56,651][06183] Starting process rollout_proc7 -[2023-02-22 20:43:59,543][28120] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 20:43:59,544][28120] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2023-02-22 20:43:59,597][28120] Num visible devices: 1 -[2023-02-22 20:43:59,628][28120] Starting seed is not provided -[2023-02-22 20:43:59,630][28120] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 20:43:59,631][28120] Initializing actor-critic model on device cuda:0 -[2023-02-22 20:43:59,632][28120] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 20:43:59,634][28120] RunningMeanStd input shape: (1,) -[2023-02-22 20:43:59,649][28120] ConvEncoder: input_channels=3 -[2023-02-22 20:43:59,686][28133] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 20:43:59,687][28133] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2023-02-22 20:43:59,730][28133] Num visible devices: 1 -[2023-02-22 20:43:59,739][28134] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:43:59,815][28138] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:43:59,885][28120] Conv encoder output size: 512 -[2023-02-22 20:43:59,886][28120] Policy head output size: 512 -[2023-02-22 20:43:59,906][28120] Created Actor Critic model with architecture: -[2023-02-22 20:43:59,908][28120] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2023-02-22 20:43:59,925][28141] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:43:59,926][28142] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:43:59,992][28139] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:44:00,082][28140] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:44:00,316][28163] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:44:00,355][28146] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2023-02-22 20:44:05,510][28120] Using optimizer -[2023-02-22 20:44:05,525][28120] Loading state from checkpoint /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2023-02-22 20:44:05,795][28120] Loading model from checkpoint -[2023-02-22 20:44:05,802][28120] Loaded experiment state at self.train_step=978, self.env_steps=4005888 -[2023-02-22 20:44:05,803][28120] Initialized policy 0 weights for model version 978 -[2023-02-22 20:44:05,815][28120] LearnerWorker_p0 finished initialization! -[2023-02-22 20:44:05,815][28120] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2023-02-22 20:44:05,994][28133] RunningMeanStd input shape: (3, 72, 128) -[2023-02-22 20:44:05,996][28133] RunningMeanStd input shape: (1,) -[2023-02-22 20:44:06,007][28133] ConvEncoder: input_channels=3 -[2023-02-22 20:44:06,028][06183] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:06,124][28133] Conv encoder output size: 512 -[2023-02-22 20:44:06,126][28133] Policy head output size: 512 -[2023-02-22 20:44:11,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:16,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:16,163][06183] Heartbeat connected on Batcher_0 -[2023-02-22 20:44:16,169][06183] Heartbeat connected on LearnerWorker_p0 -[2023-02-22 20:44:16,184][06183] Heartbeat connected on RolloutWorker_w0 -[2023-02-22 20:44:16,187][06183] Heartbeat connected on RolloutWorker_w1 -[2023-02-22 20:44:16,190][06183] Heartbeat connected on RolloutWorker_w2 -[2023-02-22 20:44:16,192][06183] Heartbeat connected on RolloutWorker_w3 -[2023-02-22 20:44:16,196][06183] Heartbeat connected on RolloutWorker_w4 -[2023-02-22 20:44:16,198][06183] Heartbeat connected on RolloutWorker_w5 -[2023-02-22 20:44:16,201][06183] Heartbeat connected on RolloutWorker_w6 -[2023-02-22 20:44:16,203][06183] Heartbeat connected on RolloutWorker_w7 -[2023-02-22 20:44:21,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:21,749][06183] Inference worker 0-0 is ready! -[2023-02-22 20:44:21,751][06183] All inference workers are ready! Signal rollout workers to start! -[2023-02-22 20:44:21,753][06183] Heartbeat connected on InferenceWorker_p0-w0 -[2023-02-22 20:44:22,406][28163] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,436][28139] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,443][28141] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,453][28146] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,459][28142] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,462][28138] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,471][28134] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:22,474][28140] Doom resolution: 160x120, resize resolution: (128, 72) -[2023-02-22 20:44:23,303][28163] Decorrelating experience for 0 frames... -[2023-02-22 20:44:23,312][28146] Decorrelating experience for 0 frames... -[2023-02-22 20:44:23,312][28142] Decorrelating experience for 0 frames... -[2023-02-22 20:44:23,318][28134] Decorrelating experience for 0 frames... -[2023-02-22 20:44:23,319][28138] Decorrelating experience for 0 frames... -[2023-02-22 20:44:23,361][28139] Decorrelating experience for 0 frames... -[2023-02-22 20:44:24,085][28146] Decorrelating experience for 32 frames... -[2023-02-22 20:44:24,086][28138] Decorrelating experience for 32 frames... -[2023-02-22 20:44:24,091][28142] Decorrelating experience for 32 frames... -[2023-02-22 20:44:24,091][28134] Decorrelating experience for 32 frames... -[2023-02-22 20:44:25,498][28140] Decorrelating experience for 0 frames... -[2023-02-22 20:44:25,507][28163] Decorrelating experience for 32 frames... -[2023-02-22 20:44:26,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:26,133][28134] Decorrelating experience for 64 frames... -[2023-02-22 20:44:26,145][28138] Decorrelating experience for 64 frames... -[2023-02-22 20:44:26,230][28146] Decorrelating experience for 64 frames... -[2023-02-22 20:44:26,233][28142] Decorrelating experience for 64 frames... -[2023-02-22 20:44:26,620][28139] Decorrelating experience for 32 frames... -[2023-02-22 20:44:26,642][28141] Decorrelating experience for 0 frames... -[2023-02-22 20:44:26,727][28163] Decorrelating experience for 64 frames... -[2023-02-22 20:44:27,162][28134] Decorrelating experience for 96 frames... -[2023-02-22 20:44:27,244][28140] Decorrelating experience for 32 frames... -[2023-02-22 20:44:27,406][28142] Decorrelating experience for 96 frames... -[2023-02-22 20:44:27,649][28146] Decorrelating experience for 96 frames... -[2023-02-22 20:44:27,651][28141] Decorrelating experience for 32 frames... -[2023-02-22 20:44:27,652][28138] Decorrelating experience for 96 frames... -[2023-02-22 20:44:27,968][28139] Decorrelating experience for 64 frames... -[2023-02-22 20:44:28,651][28140] Decorrelating experience for 64 frames... -[2023-02-22 20:44:29,082][28139] Decorrelating experience for 96 frames... -[2023-02-22 20:44:29,162][28141] Decorrelating experience for 64 frames... -[2023-02-22 20:44:29,668][28140] Decorrelating experience for 96 frames... -[2023-02-22 20:44:29,668][28163] Decorrelating experience for 96 frames... -[2023-02-22 20:44:30,538][28141] Decorrelating experience for 96 frames... -[2023-02-22 20:44:31,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:36,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:41,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:46,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:49,977][28120] Signal inference workers to stop experience collection... -[2023-02-22 20:44:49,994][28133] InferenceWorker_p0-w0: stopping experience collection -[2023-02-22 20:44:51,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 25.1. Samples: 1130. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:51,032][06183] Avg episode reward: [(0, '3.034')] -[2023-02-22 20:44:56,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 69.5. Samples: 3128. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:44:56,029][06183] Avg episode reward: [(0, '3.034')] -[2023-02-22 20:45:01,027][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 69.5. Samples: 3128. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:45:01,031][06183] Avg episode reward: [(0, '3.034')] -[2023-02-22 20:45:06,028][06183] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 69.5. Samples: 3128. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2023-02-22 20:45:06,030][06183] Avg episode reward: [(0, '3.034')] -[2023-02-22 20:45:09,852][28120] Signal inference workers to resume experience collection... -[2023-02-22 20:45:09,854][28133] InferenceWorker_p0-w0: resuming experience collection -[2023-02-22 20:45:11,028][06183] Fps is (10 sec: 1228.7, 60 sec: 204.8, 300 sec: 189.0). Total num frames: 4018176. Throughput: 0: 69.5. Samples: 3128. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2023-02-22 20:45:11,034][06183] Avg episode reward: [(0, '3.218')] -[2023-02-22 20:45:13,834][28133] Updated weights for policy 0, policy_version 988 (0.0024) -[2023-02-22 20:45:16,027][06183] Fps is (10 sec: 6553.7, 60 sec: 1092.3, 300 sec: 936.2). Total num frames: 4071424. Throughput: 0: 357.1. Samples: 16068. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-02-22 20:45:16,031][06183] Avg episode reward: [(0, '4.441')] -[2023-02-22 20:45:17,490][28133] Updated weights for policy 0, policy_version 998 (0.0018) -[2023-02-22 20:45:21,027][06183] Fps is (10 sec: 10649.9, 60 sec: 1979.7, 300 sec: 1583.8). Total num frames: 4124672. Throughput: 0: 533.2. Samples: 23996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-02-22 20:45:21,030][06183] Avg episode reward: [(0, '4.342')] -[2023-02-22 20:45:21,268][28133] Updated weights for policy 0, policy_version 1008 (0.0019) -[2023-02-22 20:45:24,599][28133] Updated weights for policy 0, policy_version 1018 (0.0014) -[2023-02-22 20:45:26,027][06183] Fps is (10 sec: 11059.2, 60 sec: 2935.5, 300 sec: 2201.6). Total num frames: 4182016. Throughput: 0: 928.8. Samples: 41796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:45:26,031][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 20:45:28,724][28133] Updated weights for policy 0, policy_version 1028 (0.0016) -[2023-02-22 20:45:31,028][06183] Fps is (10 sec: 11059.3, 60 sec: 3822.9, 300 sec: 2698.5). Total num frames: 4235264. Throughput: 0: 1285.3. Samples: 57838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:45:31,031][06183] Avg episode reward: [(0, '4.466')] -[2023-02-22 20:45:32,164][28133] Updated weights for policy 0, policy_version 1038 (0.0023) -[2023-02-22 20:45:35,870][28133] Updated weights for policy 0, policy_version 1048 (0.0014) -[2023-02-22 20:45:36,028][06183] Fps is (10 sec: 11058.8, 60 sec: 4778.7, 300 sec: 3185.8). Total num frames: 4292608. Throughput: 0: 1455.0. Samples: 66606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:45:36,031][06183] Avg episode reward: [(0, '4.282')] -[2023-02-22 20:45:39,972][28133] Updated weights for policy 0, policy_version 1058 (0.0017) -[2023-02-22 20:45:41,027][06183] Fps is (10 sec: 10649.9, 60 sec: 5597.9, 300 sec: 3535.5). Total num frames: 4341760. Throughput: 0: 1752.8. Samples: 82002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:45:41,030][06183] Avg episode reward: [(0, '4.390')] -[2023-02-22 20:45:44,022][28133] Updated weights for policy 0, policy_version 1068 (0.0016) -[2023-02-22 20:45:46,028][06183] Fps is (10 sec: 9830.5, 60 sec: 6417.0, 300 sec: 3850.2). Total num frames: 4390912. Throughput: 0: 2088.4. Samples: 97106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:45:46,031][06183] Avg episode reward: [(0, '4.430')] -[2023-02-22 20:45:48,091][28133] Updated weights for policy 0, policy_version 1078 (0.0021) -[2023-02-22 20:45:51,028][06183] Fps is (10 sec: 10239.4, 60 sec: 7304.5, 300 sec: 4174.0). Total num frames: 4444160. Throughput: 0: 2254.3. Samples: 104574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:45:51,032][06183] Avg episode reward: [(0, '4.423')] -[2023-02-22 20:45:51,057][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001085_4444160.pth... -[2023-02-22 20:45:51,385][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000836_3424256.pth -[2023-02-22 20:45:52,235][28133] Updated weights for policy 0, policy_version 1088 (0.0017) -[2023-02-22 20:45:56,027][06183] Fps is (10 sec: 10240.5, 60 sec: 8123.8, 300 sec: 4431.1). Total num frames: 4493312. Throughput: 0: 2586.7. Samples: 119526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:45:56,031][06183] Avg episode reward: [(0, '4.444')] -[2023-02-22 20:45:56,491][28133] Updated weights for policy 0, policy_version 1098 (0.0017) -[2023-02-22 20:46:00,493][28133] Updated weights for policy 0, policy_version 1108 (0.0018) -[2023-02-22 20:46:01,028][06183] Fps is (10 sec: 9830.5, 60 sec: 8942.9, 300 sec: 4665.9). Total num frames: 4542464. Throughput: 0: 2631.9. Samples: 134504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:46:01,031][06183] Avg episode reward: [(0, '4.723')] -[2023-02-22 20:46:04,581][28133] Updated weights for policy 0, policy_version 1118 (0.0021) -[2023-02-22 20:46:06,028][06183] Fps is (10 sec: 9829.6, 60 sec: 9762.1, 300 sec: 4881.0). Total num frames: 4591616. Throughput: 0: 2620.7. Samples: 141930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:46:06,035][06183] Avg episode reward: [(0, '4.405')] -[2023-02-22 20:46:09,646][28133] Updated weights for policy 0, policy_version 1128 (0.0027) -[2023-02-22 20:46:11,027][06183] Fps is (10 sec: 8601.9, 60 sec: 10171.8, 300 sec: 4980.7). Total num frames: 4628480. Throughput: 0: 2501.0. Samples: 154342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:46:11,032][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 20:46:15,129][28133] Updated weights for policy 0, policy_version 1138 (0.0021) -[2023-02-22 20:46:16,027][06183] Fps is (10 sec: 7373.0, 60 sec: 9898.6, 300 sec: 5072.7). Total num frames: 4665344. Throughput: 0: 2394.8. Samples: 165606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:46:16,033][06183] Avg episode reward: [(0, '4.509')] -[2023-02-22 20:46:20,619][28133] Updated weights for policy 0, policy_version 1148 (0.0017) -[2023-02-22 20:46:21,028][06183] Fps is (10 sec: 7372.4, 60 sec: 9625.6, 300 sec: 5157.9). Total num frames: 4702208. Throughput: 0: 2327.5. Samples: 171342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:46:21,032][06183] Avg episode reward: [(0, '4.588')] -[2023-02-22 20:46:26,028][06183] Fps is (10 sec: 7372.9, 60 sec: 9284.2, 300 sec: 5237.0). Total num frames: 4739072. Throughput: 0: 2231.3. Samples: 182410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:46:26,033][06183] Avg episode reward: [(0, '4.631')] -[2023-02-22 20:46:26,086][28133] Updated weights for policy 0, policy_version 1158 (0.0017) -[2023-02-22 20:46:31,027][06183] Fps is (10 sec: 7373.3, 60 sec: 9011.2, 300 sec: 5310.7). Total num frames: 4775936. Throughput: 0: 2135.8. Samples: 193214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:46:31,031][06183] Avg episode reward: [(0, '4.799')] -[2023-02-22 20:46:31,158][28120] Saving new best policy, reward=4.799! -[2023-02-22 20:46:31,935][28133] Updated weights for policy 0, policy_version 1168 (0.0022) -[2023-02-22 20:46:36,028][06183] Fps is (10 sec: 7372.7, 60 sec: 8669.9, 300 sec: 5379.4). Total num frames: 4812800. Throughput: 0: 2086.5. Samples: 198464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:46:36,032][06183] Avg episode reward: [(0, '4.667')] -[2023-02-22 20:46:37,642][28133] Updated weights for policy 0, policy_version 1178 (0.0019) -[2023-02-22 20:46:41,027][06183] Fps is (10 sec: 7372.7, 60 sec: 8465.0, 300 sec: 5443.7). Total num frames: 4849664. Throughput: 0: 1995.5. Samples: 209326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:46:41,033][06183] Avg episode reward: [(0, '4.444')] -[2023-02-22 20:46:43,281][28133] Updated weights for policy 0, policy_version 1188 (0.0024) -[2023-02-22 20:46:46,027][06183] Fps is (10 sec: 6963.5, 60 sec: 8192.0, 300 sec: 5478.4). Total num frames: 4882432. Throughput: 0: 1902.0. Samples: 220092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:46:46,031][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 20:46:49,253][28133] Updated weights for policy 0, policy_version 1198 (0.0033) -[2023-02-22 20:46:51,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7919.0, 300 sec: 5535.8). Total num frames: 4919296. Throughput: 0: 1847.3. Samples: 225056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 20:46:51,033][06183] Avg episode reward: [(0, '4.467')] -[2023-02-22 20:46:54,980][28133] Updated weights for policy 0, policy_version 1208 (0.0025) -[2023-02-22 20:46:56,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7645.8, 300 sec: 5565.7). Total num frames: 4952064. Throughput: 0: 1807.8. Samples: 235694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 20:46:56,031][06183] Avg episode reward: [(0, '4.469')] -[2023-02-22 20:47:01,028][06183] Fps is (10 sec: 6553.2, 60 sec: 7372.8, 300 sec: 5594.0). Total num frames: 4984832. Throughput: 0: 1781.3. Samples: 245766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 20:47:01,033][06183] Avg episode reward: [(0, '4.516')] -[2023-02-22 20:47:01,115][28133] Updated weights for policy 0, policy_version 1218 (0.0028) -[2023-02-22 20:47:06,028][06183] Fps is (10 sec: 6962.8, 60 sec: 7168.0, 300 sec: 5643.4). Total num frames: 5021696. Throughput: 0: 1769.5. Samples: 250968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:47:06,033][06183] Avg episode reward: [(0, '4.360')] -[2023-02-22 20:47:07,089][28133] Updated weights for policy 0, policy_version 1228 (0.0024) -[2023-02-22 20:47:11,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 5668.0). Total num frames: 5054464. Throughput: 0: 1751.3. Samples: 261220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:47:11,033][06183] Avg episode reward: [(0, '4.370')] -[2023-02-22 20:47:13,175][28133] Updated weights for policy 0, policy_version 1238 (0.0031) -[2023-02-22 20:47:16,027][06183] Fps is (10 sec: 6554.0, 60 sec: 7031.5, 300 sec: 5691.3). Total num frames: 5087232. Throughput: 0: 1734.4. Samples: 271264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:47:16,032][06183] Avg episode reward: [(0, '4.380')] -[2023-02-22 20:47:19,384][28133] Updated weights for policy 0, policy_version 1248 (0.0033) -[2023-02-22 20:47:21,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 5713.4). Total num frames: 5120000. Throughput: 0: 1728.9. Samples: 276262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:21,033][06183] Avg episode reward: [(0, '4.481')] -[2023-02-22 20:47:25,805][28133] Updated weights for policy 0, policy_version 1258 (0.0028) -[2023-02-22 20:47:26,028][06183] Fps is (10 sec: 6553.2, 60 sec: 6894.9, 300 sec: 5734.4). Total num frames: 5152768. Throughput: 0: 1704.3. Samples: 286022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:26,035][06183] Avg episode reward: [(0, '4.697')] -[2023-02-22 20:47:31,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6826.6, 300 sec: 5754.4). Total num frames: 5185536. Throughput: 0: 1679.1. Samples: 295650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:31,031][06183] Avg episode reward: [(0, '4.550')] -[2023-02-22 20:47:32,197][28133] Updated weights for policy 0, policy_version 1268 (0.0029) -[2023-02-22 20:47:36,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6758.3, 300 sec: 5773.4). Total num frames: 5218304. Throughput: 0: 1677.0. Samples: 300522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:36,035][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 20:47:38,481][28133] Updated weights for policy 0, policy_version 1278 (0.0039) -[2023-02-22 20:47:41,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 5772.5). Total num frames: 5246976. Throughput: 0: 1653.9. Samples: 310118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:41,031][06183] Avg episode reward: [(0, '4.776')] -[2023-02-22 20:47:45,093][28133] Updated weights for policy 0, policy_version 1288 (0.0027) -[2023-02-22 20:47:46,027][06183] Fps is (10 sec: 6144.4, 60 sec: 6621.8, 300 sec: 5790.3). Total num frames: 5279744. Throughput: 0: 1634.5. Samples: 319318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:47:46,032][06183] Avg episode reward: [(0, '4.810')] -[2023-02-22 20:47:46,038][28120] Saving new best policy, reward=4.810! -[2023-02-22 20:47:51,029][06183] Fps is (10 sec: 5733.4, 60 sec: 6416.9, 300 sec: 5770.8). Total num frames: 5304320. Throughput: 0: 1602.9. Samples: 323102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:47:51,035][06183] Avg episode reward: [(0, '4.699')] -[2023-02-22 20:47:51,087][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001295_5304320.pth... -[2023-02-22 20:47:51,703][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth -[2023-02-22 20:47:52,829][28133] Updated weights for policy 0, policy_version 1298 (0.0043) -[2023-02-22 20:47:56,027][06183] Fps is (10 sec: 5324.9, 60 sec: 6348.8, 300 sec: 5770.0). Total num frames: 5332992. Throughput: 0: 1543.8. Samples: 330690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:47:56,032][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 20:48:00,116][28133] Updated weights for policy 0, policy_version 1308 (0.0043) -[2023-02-22 20:48:01,028][06183] Fps is (10 sec: 5735.1, 60 sec: 6280.5, 300 sec: 5769.2). Total num frames: 5361664. Throughput: 0: 1511.7. Samples: 339292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:48:01,032][06183] Avg episode reward: [(0, '4.418')] -[2023-02-22 20:48:06,027][06183] Fps is (10 sec: 5324.8, 60 sec: 6075.8, 300 sec: 5751.5). Total num frames: 5386240. Throughput: 0: 1491.6. Samples: 343384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:48:06,032][06183] Avg episode reward: [(0, '4.451')] -[2023-02-22 20:48:07,727][28133] Updated weights for policy 0, policy_version 1318 (0.0047) -[2023-02-22 20:48:11,028][06183] Fps is (10 sec: 5324.9, 60 sec: 6007.4, 300 sec: 5751.1). Total num frames: 5414912. Throughput: 0: 1453.5. Samples: 351430. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:48:11,033][06183] Avg episode reward: [(0, '4.321')] -[2023-02-22 20:48:15,948][28133] Updated weights for policy 0, policy_version 1328 (0.0046) -[2023-02-22 20:48:16,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5870.9, 300 sec: 5734.4). Total num frames: 5439488. Throughput: 0: 1402.0. Samples: 358742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:48:16,033][06183] Avg episode reward: [(0, '4.386')] -[2023-02-22 20:48:21,028][06183] Fps is (10 sec: 4505.5, 60 sec: 5666.1, 300 sec: 5702.3). Total num frames: 5459968. Throughput: 0: 1371.7. Samples: 362246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 20:48:21,037][06183] Avg episode reward: [(0, '4.416')] -[2023-02-22 20:48:24,596][28133] Updated weights for policy 0, policy_version 1338 (0.0047) -[2023-02-22 20:48:26,028][06183] Fps is (10 sec: 4505.6, 60 sec: 5529.6, 300 sec: 5687.1). Total num frames: 5484544. Throughput: 0: 1318.2. Samples: 369438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:48:26,035][06183] Avg episode reward: [(0, '4.365')] -[2023-02-22 20:48:31,028][06183] Fps is (10 sec: 4505.6, 60 sec: 5324.8, 300 sec: 5657.1). Total num frames: 5505024. Throughput: 0: 1251.5. Samples: 375636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:48:31,041][06183] Avg episode reward: [(0, '4.416')] -[2023-02-22 20:48:34,072][28133] Updated weights for policy 0, policy_version 1348 (0.0040) -[2023-02-22 20:48:36,027][06183] Fps is (10 sec: 4505.7, 60 sec: 5188.3, 300 sec: 5643.4). Total num frames: 5529600. Throughput: 0: 1246.0. Samples: 379170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:48:36,032][06183] Avg episode reward: [(0, '4.301')] -[2023-02-22 20:48:41,028][06183] Fps is (10 sec: 4505.7, 60 sec: 5051.7, 300 sec: 5615.2). Total num frames: 5550080. Throughput: 0: 1227.7. Samples: 385936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:48:41,033][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 20:48:43,182][28133] Updated weights for policy 0, policy_version 1358 (0.0045) -[2023-02-22 20:48:46,028][06183] Fps is (10 sec: 4505.5, 60 sec: 4915.2, 300 sec: 5602.7). Total num frames: 5574656. Throughput: 0: 1193.0. Samples: 392978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:48:46,034][06183] Avg episode reward: [(0, '4.265')] -[2023-02-22 20:48:51,028][06183] Fps is (10 sec: 4915.2, 60 sec: 4915.3, 300 sec: 5590.7). Total num frames: 5599232. Throughput: 0: 1181.9. Samples: 396568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:48:51,034][06183] Avg episode reward: [(0, '4.302')] -[2023-02-22 20:48:51,931][28133] Updated weights for policy 0, policy_version 1368 (0.0049) -[2023-02-22 20:48:56,028][06183] Fps is (10 sec: 4505.5, 60 sec: 4778.6, 300 sec: 5564.9). Total num frames: 5619712. Throughput: 0: 1160.8. Samples: 403668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:48:56,059][06183] Avg episode reward: [(0, '4.295')] -[2023-02-22 20:49:01,028][06183] Fps is (10 sec: 3686.4, 60 sec: 4573.9, 300 sec: 5526.1). Total num frames: 5636096. Throughput: 0: 1112.8. Samples: 408820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:49:01,037][06183] Avg episode reward: [(0, '4.232')] -[2023-02-22 20:49:02,602][28133] Updated weights for policy 0, policy_version 1378 (0.0048) -[2023-02-22 20:49:06,028][06183] Fps is (10 sec: 3686.3, 60 sec: 4505.5, 300 sec: 5595.5). Total num frames: 5656576. Throughput: 0: 1097.3. Samples: 411626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:06,036][06183] Avg episode reward: [(0, '4.367')] -[2023-02-22 20:49:11,028][06183] Fps is (10 sec: 4096.1, 60 sec: 4369.1, 300 sec: 5665.0). Total num frames: 5677056. Throughput: 0: 1069.7. Samples: 417576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:11,034][06183] Avg episode reward: [(0, '4.533')] -[2023-02-22 20:49:12,885][28133] Updated weights for policy 0, policy_version 1388 (0.0046) -[2023-02-22 20:49:16,028][06183] Fps is (10 sec: 3686.6, 60 sec: 4232.5, 300 sec: 5720.5). Total num frames: 5693440. Throughput: 0: 1064.2. Samples: 423524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:49:16,043][06183] Avg episode reward: [(0, '4.374')] -[2023-02-22 20:49:21,028][06183] Fps is (10 sec: 3686.4, 60 sec: 4232.6, 300 sec: 5789.9). Total num frames: 5713920. Throughput: 0: 1049.0. Samples: 426374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:49:21,033][06183] Avg episode reward: [(0, '4.245')] -[2023-02-22 20:49:23,885][28133] Updated weights for policy 0, policy_version 1398 (0.0051) -[2023-02-22 20:49:26,028][06183] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 5859.4). Total num frames: 5734400. Throughput: 0: 1020.8. Samples: 431874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:26,035][06183] Avg episode reward: [(0, '4.368')] -[2023-02-22 20:49:31,028][06183] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 5914.9). Total num frames: 5750784. Throughput: 0: 974.2. Samples: 436816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:31,037][06183] Avg episode reward: [(0, '4.482')] -[2023-02-22 20:49:36,028][06183] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 5956.5). Total num frames: 5763072. Throughput: 0: 948.2. Samples: 439236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:36,038][28133] Updated weights for policy 0, policy_version 1408 (0.0067) -[2023-02-22 20:49:36,035][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 20:49:41,028][06183] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 6026.0). Total num frames: 5783552. Throughput: 0: 902.7. Samples: 444288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:41,034][06183] Avg episode reward: [(0, '4.383')] -[2023-02-22 20:49:46,027][06183] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 6081.5). Total num frames: 5799936. Throughput: 0: 899.7. Samples: 449308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:46,034][06183] Avg episode reward: [(0, '4.422')] -[2023-02-22 20:49:48,298][28133] Updated weights for policy 0, policy_version 1418 (0.0063) -[2023-02-22 20:49:51,028][06183] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 6137.0). Total num frames: 5816320. Throughput: 0: 893.3. Samples: 451824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:49:51,038][06183] Avg episode reward: [(0, '4.330')] -[2023-02-22 20:49:51,100][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001420_5816320.pth... -[2023-02-22 20:49:51,989][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001085_4444160.pth -[2023-02-22 20:49:56,028][06183] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 6192.6). Total num frames: 5832704. Throughput: 0: 873.2. Samples: 456868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:49:56,034][06183] Avg episode reward: [(0, '4.278')] -[2023-02-22 20:50:00,382][28133] Updated weights for policy 0, policy_version 1428 (0.0061) -[2023-02-22 20:50:01,028][06183] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 6248.1). Total num frames: 5849088. Throughput: 0: 848.9. Samples: 461724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:50:01,138][06183] Avg episode reward: [(0, '4.296')] -[2023-02-22 20:50:06,028][06183] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 6262.0). Total num frames: 5865472. Throughput: 0: 841.0. Samples: 464218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:50:06,035][06183] Avg episode reward: [(0, '4.297')] -[2023-02-22 20:50:11,028][06183] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 6137.0). Total num frames: 5881856. Throughput: 0: 840.3. Samples: 469686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:50:11,038][06183] Avg episode reward: [(0, '4.361')] -[2023-02-22 20:50:12,095][28133] Updated weights for policy 0, policy_version 1438 (0.0058) -[2023-02-22 20:50:16,028][06183] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 6026.0). Total num frames: 5902336. Throughput: 0: 840.8. Samples: 474652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:50:16,034][06183] Avg episode reward: [(0, '4.399')] -[2023-02-22 20:50:21,028][06183] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 5887.1). Total num frames: 5918720. Throughput: 0: 846.7. Samples: 477338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:50:21,041][06183] Avg episode reward: [(0, '4.279')] -[2023-02-22 20:50:24,038][28133] Updated weights for policy 0, policy_version 1448 (0.0055) -[2023-02-22 20:50:26,028][06183] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 5762.2). Total num frames: 5935104. Throughput: 0: 858.5. Samples: 482920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:50:26,042][06183] Avg episode reward: [(0, '4.258')] -[2023-02-22 20:50:31,028][06183] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 5637.2). Total num frames: 5955584. Throughput: 0: 875.4. Samples: 488702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:50:31,035][06183] Avg episode reward: [(0, '4.457')] -[2023-02-22 20:50:34,974][28133] Updated weights for policy 0, policy_version 1458 (0.0043) -[2023-02-22 20:50:36,028][06183] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 5526.1). Total num frames: 5971968. Throughput: 0: 877.9. Samples: 491328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:50:36,039][06183] Avg episode reward: [(0, '4.502')] -[2023-02-22 20:50:41,029][06183] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 5428.9). Total num frames: 5992448. Throughput: 0: 880.2. Samples: 496478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:50:41,034][06183] Avg episode reward: [(0, '4.377')] -[2023-02-22 20:50:45,291][28133] Updated weights for policy 0, policy_version 1468 (0.0061) -[2023-02-22 20:50:46,028][06183] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 5317.9). Total num frames: 6012928. Throughput: 0: 921.6. Samples: 503196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:50:46,038][06183] Avg episode reward: [(0, '4.456')] -[2023-02-22 20:50:51,028][06183] Fps is (10 sec: 4505.7, 60 sec: 3686.4, 300 sec: 5234.5). Total num frames: 6037504. Throughput: 0: 945.2. Samples: 506752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:50:51,034][06183] Avg episode reward: [(0, '4.448')] -[2023-02-22 20:50:53,078][28133] Updated weights for policy 0, policy_version 1478 (0.0032) -[2023-02-22 20:50:56,027][06183] Fps is (10 sec: 7782.8, 60 sec: 4300.8, 300 sec: 5248.4). Total num frames: 6090752. Throughput: 0: 1070.5. Samples: 517856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:50:56,029][06183] Avg episode reward: [(0, '4.294')] -[2023-02-22 20:50:56,334][28133] Updated weights for policy 0, policy_version 1488 (0.0016) -[2023-02-22 20:50:59,671][28133] Updated weights for policy 0, policy_version 1498 (0.0014) -[2023-02-22 20:51:01,027][06183] Fps is (10 sec: 11060.3, 60 sec: 4983.5, 300 sec: 5276.2). Total num frames: 6148096. Throughput: 0: 1376.7. Samples: 536604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:01,030][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 20:51:02,800][28133] Updated weights for policy 0, policy_version 1508 (0.0013) -[2023-02-22 20:51:06,027][06183] Fps is (10 sec: 12287.9, 60 sec: 5802.7, 300 sec: 5373.4). Total num frames: 6213632. Throughput: 0: 1544.5. Samples: 546840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:06,029][06183] Avg episode reward: [(0, '4.348')] -[2023-02-22 20:51:06,069][28133] Updated weights for policy 0, policy_version 1518 (0.0013) -[2023-02-22 20:51:09,534][28133] Updated weights for policy 0, policy_version 1528 (0.0016) -[2023-02-22 20:51:11,027][06183] Fps is (10 sec: 12697.7, 60 sec: 6553.7, 300 sec: 5456.7). Total num frames: 6275072. Throughput: 0: 1820.0. Samples: 564820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:11,029][06183] Avg episode reward: [(0, '4.480')] -[2023-02-22 20:51:12,833][28133] Updated weights for policy 0, policy_version 1538 (0.0013) -[2023-02-22 20:51:16,027][06183] Fps is (10 sec: 12288.0, 60 sec: 7236.4, 300 sec: 5540.0). Total num frames: 6336512. Throughput: 0: 2096.8. Samples: 583054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:16,030][06183] Avg episode reward: [(0, '4.212')] -[2023-02-22 20:51:16,306][28133] Updated weights for policy 0, policy_version 1548 (0.0019) -[2023-02-22 20:51:19,611][28133] Updated weights for policy 0, policy_version 1558 (0.0015) -[2023-02-22 20:51:21,027][06183] Fps is (10 sec: 12287.9, 60 sec: 7987.3, 300 sec: 5623.3). Total num frames: 6397952. Throughput: 0: 2241.6. Samples: 592198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:51:21,030][06183] Avg episode reward: [(0, '4.329')] -[2023-02-22 20:51:23,060][28133] Updated weights for policy 0, policy_version 1568 (0.0016) -[2023-02-22 20:51:26,027][06183] Fps is (10 sec: 11878.4, 60 sec: 8670.0, 300 sec: 5692.7). Total num frames: 6455296. Throughput: 0: 2528.6. Samples: 610262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:26,030][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 20:51:26,428][28133] Updated weights for policy 0, policy_version 1578 (0.0014) -[2023-02-22 20:51:29,895][28133] Updated weights for policy 0, policy_version 1588 (0.0017) -[2023-02-22 20:51:31,027][06183] Fps is (10 sec: 11878.3, 60 sec: 9352.7, 300 sec: 5776.1). Total num frames: 6516736. Throughput: 0: 2777.5. Samples: 628182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:31,030][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 20:51:33,395][28133] Updated weights for policy 0, policy_version 1598 (0.0016) -[2023-02-22 20:51:36,027][06183] Fps is (10 sec: 11878.3, 60 sec: 10035.3, 300 sec: 5845.5). Total num frames: 6574080. Throughput: 0: 2892.4. Samples: 636908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:51:36,030][06183] Avg episode reward: [(0, '4.511')] -[2023-02-22 20:51:36,956][28133] Updated weights for policy 0, policy_version 1608 (0.0016) -[2023-02-22 20:51:40,460][28133] Updated weights for policy 0, policy_version 1618 (0.0014) -[2023-02-22 20:51:41,027][06183] Fps is (10 sec: 11469.0, 60 sec: 10649.8, 300 sec: 5928.8). Total num frames: 6631424. Throughput: 0: 3032.7. Samples: 654326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:41,030][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 20:51:44,004][28133] Updated weights for policy 0, policy_version 1628 (0.0017) -[2023-02-22 20:51:46,027][06183] Fps is (10 sec: 11468.7, 60 sec: 11264.1, 300 sec: 5998.2). Total num frames: 6688768. Throughput: 0: 2998.1. Samples: 671520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:51:46,030][06183] Avg episode reward: [(0, '4.469')] -[2023-02-22 20:51:47,815][28133] Updated weights for policy 0, policy_version 1638 (0.0013) -[2023-02-22 20:51:51,027][06183] Fps is (10 sec: 11059.1, 60 sec: 11742.1, 300 sec: 6067.6). Total num frames: 6742016. Throughput: 0: 2944.9. Samples: 679360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:51:51,030][06183] Avg episode reward: [(0, '4.517')] -[2023-02-22 20:51:51,057][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001646_6742016.pth... -[2023-02-22 20:51:51,373][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001295_5304320.pth -[2023-02-22 20:51:51,696][28133] Updated weights for policy 0, policy_version 1648 (0.0014) -[2023-02-22 20:51:55,864][28133] Updated weights for policy 0, policy_version 1658 (0.0019) -[2023-02-22 20:51:56,027][06183] Fps is (10 sec: 10240.3, 60 sec: 11673.6, 300 sec: 6123.2). Total num frames: 6791168. Throughput: 0: 2879.2. Samples: 694382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:51:56,030][06183] Avg episode reward: [(0, '4.373')] -[2023-02-22 20:52:00,015][28133] Updated weights for policy 0, policy_version 1668 (0.0019) -[2023-02-22 20:52:01,028][06183] Fps is (10 sec: 9830.0, 60 sec: 11537.0, 300 sec: 6164.8). Total num frames: 6840320. Throughput: 0: 2795.9. Samples: 708870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:52:01,033][06183] Avg episode reward: [(0, '4.347')] -[2023-02-22 20:52:04,407][28133] Updated weights for policy 0, policy_version 1678 (0.0020) -[2023-02-22 20:52:06,027][06183] Fps is (10 sec: 9420.5, 60 sec: 11195.7, 300 sec: 6206.5). Total num frames: 6885376. Throughput: 0: 2749.8. Samples: 715940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:52:06,032][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 20:52:08,710][28133] Updated weights for policy 0, policy_version 1688 (0.0018) -[2023-02-22 20:52:11,027][06183] Fps is (10 sec: 9421.3, 60 sec: 10990.9, 300 sec: 6262.0). Total num frames: 6934528. Throughput: 0: 2663.7. Samples: 730128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:11,031][06183] Avg episode reward: [(0, '4.486')] -[2023-02-22 20:52:13,180][28133] Updated weights for policy 0, policy_version 1698 (0.0018) -[2023-02-22 20:52:16,027][06183] Fps is (10 sec: 9420.8, 60 sec: 10717.8, 300 sec: 6303.7). Total num frames: 6979584. Throughput: 0: 2567.8. Samples: 743732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:52:16,031][06183] Avg episode reward: [(0, '4.473')] -[2023-02-22 20:52:17,742][28133] Updated weights for policy 0, policy_version 1708 (0.0018) -[2023-02-22 20:52:21,027][06183] Fps is (10 sec: 9011.2, 60 sec: 10444.8, 300 sec: 6345.3). Total num frames: 7024640. Throughput: 0: 2523.0. Samples: 750444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 20:52:21,031][06183] Avg episode reward: [(0, '4.564')] -[2023-02-22 20:52:22,315][28133] Updated weights for policy 0, policy_version 1718 (0.0021) -[2023-02-22 20:52:26,027][06183] Fps is (10 sec: 9011.2, 60 sec: 10240.0, 300 sec: 6387.0). Total num frames: 7069696. Throughput: 0: 2431.6. Samples: 763750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:26,041][06183] Avg episode reward: [(0, '4.210')] -[2023-02-22 20:52:27,009][28133] Updated weights for policy 0, policy_version 1728 (0.0022) -[2023-02-22 20:52:31,027][06183] Fps is (10 sec: 8601.4, 60 sec: 9898.7, 300 sec: 6414.8). Total num frames: 7110656. Throughput: 0: 2330.4. Samples: 776388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:31,032][06183] Avg episode reward: [(0, '4.552')] -[2023-02-22 20:52:31,917][28133] Updated weights for policy 0, policy_version 1738 (0.0019) -[2023-02-22 20:52:36,027][06183] Fps is (10 sec: 8192.0, 60 sec: 9625.6, 300 sec: 6456.4). Total num frames: 7151616. Throughput: 0: 2292.7. Samples: 782532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:36,037][06183] Avg episode reward: [(0, '4.556')] -[2023-02-22 20:52:36,903][28133] Updated weights for policy 0, policy_version 1748 (0.0026) -[2023-02-22 20:52:41,027][06183] Fps is (10 sec: 8192.2, 60 sec: 9352.5, 300 sec: 6484.2). Total num frames: 7192576. Throughput: 0: 2233.2. Samples: 794878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:41,030][06183] Avg episode reward: [(0, '4.484')] -[2023-02-22 20:52:41,913][28133] Updated weights for policy 0, policy_version 1758 (0.0025) -[2023-02-22 20:52:46,027][06183] Fps is (10 sec: 8191.7, 60 sec: 9079.4, 300 sec: 6539.7). Total num frames: 7233536. Throughput: 0: 2183.4. Samples: 807122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:46,031][06183] Avg episode reward: [(0, '4.472')] -[2023-02-22 20:52:46,980][28133] Updated weights for policy 0, policy_version 1768 (0.0025) -[2023-02-22 20:52:51,028][06183] Fps is (10 sec: 7781.8, 60 sec: 8806.3, 300 sec: 6567.5). Total num frames: 7270400. Throughput: 0: 2160.6. Samples: 813170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:51,032][06183] Avg episode reward: [(0, '4.240')] -[2023-02-22 20:52:52,087][28133] Updated weights for policy 0, policy_version 1778 (0.0024) -[2023-02-22 20:52:56,028][06183] Fps is (10 sec: 7782.0, 60 sec: 8669.7, 300 sec: 6609.1). Total num frames: 7311360. Throughput: 0: 2108.2. Samples: 824998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:52:56,035][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 20:52:57,400][28133] Updated weights for policy 0, policy_version 1788 (0.0026) -[2023-02-22 20:53:01,027][06183] Fps is (10 sec: 7783.0, 60 sec: 8465.1, 300 sec: 6650.8). Total num frames: 7348224. Throughput: 0: 2058.3. Samples: 836356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:53:01,031][06183] Avg episode reward: [(0, '4.495')] -[2023-02-22 20:53:02,762][28133] Updated weights for policy 0, policy_version 1798 (0.0039) -[2023-02-22 20:53:06,027][06183] Fps is (10 sec: 7783.0, 60 sec: 8396.8, 300 sec: 6692.5). Total num frames: 7389184. Throughput: 0: 2038.7. Samples: 842186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:53:06,032][06183] Avg episode reward: [(0, '4.733')] -[2023-02-22 20:53:07,978][28133] Updated weights for policy 0, policy_version 1808 (0.0021) -[2023-02-22 20:53:11,027][06183] Fps is (10 sec: 7782.5, 60 sec: 8192.0, 300 sec: 6734.1). Total num frames: 7426048. Throughput: 0: 2004.8. Samples: 853968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:53:11,031][06183] Avg episode reward: [(0, '4.373')] -[2023-02-22 20:53:13,334][28133] Updated weights for policy 0, policy_version 1818 (0.0025) -[2023-02-22 20:53:16,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8123.7, 300 sec: 6803.5). Total num frames: 7467008. Throughput: 0: 1981.6. Samples: 865562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:53:16,031][06183] Avg episode reward: [(0, '4.520')] -[2023-02-22 20:53:18,738][28133] Updated weights for policy 0, policy_version 1828 (0.0028) -[2023-02-22 20:53:21,028][06183] Fps is (10 sec: 7781.9, 60 sec: 7987.1, 300 sec: 6845.2). Total num frames: 7503872. Throughput: 0: 1970.6. Samples: 871212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:53:21,031][06183] Avg episode reward: [(0, '4.478')] -[2023-02-22 20:53:24,290][28133] Updated weights for policy 0, policy_version 1838 (0.0026) -[2023-02-22 20:53:26,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7850.6, 300 sec: 6900.7). Total num frames: 7540736. Throughput: 0: 1944.1. Samples: 882364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:53:26,032][06183] Avg episode reward: [(0, '4.648')] -[2023-02-22 20:53:29,749][28133] Updated weights for policy 0, policy_version 1848 (0.0030) -[2023-02-22 20:53:31,027][06183] Fps is (10 sec: 7373.1, 60 sec: 7782.4, 300 sec: 6942.4). Total num frames: 7577600. Throughput: 0: 1922.7. Samples: 893642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:53:31,031][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 20:53:35,208][28133] Updated weights for policy 0, policy_version 1858 (0.0025) -[2023-02-22 20:53:36,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7714.1, 300 sec: 6997.9). Total num frames: 7614464. Throughput: 0: 1913.4. Samples: 899270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:53:36,032][06183] Avg episode reward: [(0, '4.420')] -[2023-02-22 20:53:40,927][28133] Updated weights for policy 0, policy_version 1868 (0.0028) -[2023-02-22 20:53:41,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7645.8, 300 sec: 7039.6). Total num frames: 7651328. Throughput: 0: 1893.6. Samples: 910208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:53:41,031][06183] Avg episode reward: [(0, '4.333')] -[2023-02-22 20:53:46,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7509.4, 300 sec: 7067.3). Total num frames: 7684096. Throughput: 0: 1873.7. Samples: 920674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:53:46,031][06183] Avg episode reward: [(0, '4.287')] -[2023-02-22 20:53:46,752][28133] Updated weights for policy 0, policy_version 1878 (0.0028) -[2023-02-22 20:53:51,028][06183] Fps is (10 sec: 6962.7, 60 sec: 7509.3, 300 sec: 7122.9). Total num frames: 7720960. Throughput: 0: 1861.8. Samples: 925970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:53:51,033][06183] Avg episode reward: [(0, '4.269')] -[2023-02-22 20:53:51,069][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001885_7720960.pth... -[2023-02-22 20:53:51,552][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001420_5816320.pth -[2023-02-22 20:53:52,521][28133] Updated weights for policy 0, policy_version 1888 (0.0026) -[2023-02-22 20:53:56,028][06183] Fps is (10 sec: 6963.0, 60 sec: 7372.8, 300 sec: 7178.4). Total num frames: 7753728. Throughput: 0: 1832.1. Samples: 936412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:53:56,032][06183] Avg episode reward: [(0, '4.580')] -[2023-02-22 20:53:58,409][28133] Updated weights for policy 0, policy_version 1898 (0.0030) -[2023-02-22 20:54:01,028][06183] Fps is (10 sec: 6963.5, 60 sec: 7372.7, 300 sec: 7234.0). Total num frames: 7790592. Throughput: 0: 1808.8. Samples: 946958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 20:54:01,032][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 20:54:04,284][28133] Updated weights for policy 0, policy_version 1908 (0.0027) -[2023-02-22 20:54:06,027][06183] Fps is (10 sec: 6963.4, 60 sec: 7236.3, 300 sec: 7275.6). Total num frames: 7823360. Throughput: 0: 1798.8. Samples: 952158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:54:06,032][06183] Avg episode reward: [(0, '4.280')] -[2023-02-22 20:54:10,347][28133] Updated weights for policy 0, policy_version 1918 (0.0028) -[2023-02-22 20:54:11,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7236.3, 300 sec: 7345.0). Total num frames: 7860224. Throughput: 0: 1779.2. Samples: 962428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:54:11,032][06183] Avg episode reward: [(0, '4.229')] -[2023-02-22 20:54:16,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 7386.7). Total num frames: 7892992. Throughput: 0: 1749.3. Samples: 972360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:16,032][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 20:54:16,626][28133] Updated weights for policy 0, policy_version 1928 (0.0037) -[2023-02-22 20:54:21,027][06183] Fps is (10 sec: 6553.5, 60 sec: 7031.5, 300 sec: 7428.4). Total num frames: 7925760. Throughput: 0: 1734.2. Samples: 977310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:21,033][06183] Avg episode reward: [(0, '4.308')] -[2023-02-22 20:54:22,925][28133] Updated weights for policy 0, policy_version 1938 (0.0031) -[2023-02-22 20:54:26,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 7470.0). Total num frames: 7954432. Throughput: 0: 1704.6. Samples: 986916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:54:26,034][06183] Avg episode reward: [(0, '4.370')] -[2023-02-22 20:54:29,290][28133] Updated weights for policy 0, policy_version 1948 (0.0032) -[2023-02-22 20:54:31,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 7539.4). Total num frames: 7987200. Throughput: 0: 1684.9. Samples: 996494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:54:31,031][06183] Avg episode reward: [(0, '4.490')] -[2023-02-22 20:54:35,834][28133] Updated weights for policy 0, policy_version 1958 (0.0029) -[2023-02-22 20:54:36,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 7581.1). Total num frames: 8019968. Throughput: 0: 1672.9. Samples: 1001250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:36,032][06183] Avg episode reward: [(0, '4.471')] -[2023-02-22 20:54:41,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6621.8, 300 sec: 7622.7). Total num frames: 8048640. Throughput: 0: 1641.3. Samples: 1010270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:41,034][06183] Avg episode reward: [(0, '4.335')] -[2023-02-22 20:54:42,784][28133] Updated weights for policy 0, policy_version 1968 (0.0046) -[2023-02-22 20:54:46,027][06183] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 7664.4). Total num frames: 8077312. Throughput: 0: 1607.5. Samples: 1019296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:54:46,032][06183] Avg episode reward: [(0, '4.666')] -[2023-02-22 20:54:49,716][28133] Updated weights for policy 0, policy_version 1978 (0.0038) -[2023-02-22 20:54:51,027][06183] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 7706.0). Total num frames: 8105984. Throughput: 0: 1590.5. Samples: 1023730. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:51,040][06183] Avg episode reward: [(0, '4.523')] -[2023-02-22 20:54:56,028][06183] Fps is (10 sec: 5734.0, 60 sec: 6348.8, 300 sec: 7747.7). Total num frames: 8134656. Throughput: 0: 1548.4. Samples: 1032106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:54:56,034][06183] Avg episode reward: [(0, '4.215')] -[2023-02-22 20:54:57,455][28133] Updated weights for policy 0, policy_version 1988 (0.0039) -[2023-02-22 20:55:01,028][06183] Fps is (10 sec: 5324.5, 60 sec: 6144.0, 300 sec: 7775.4). Total num frames: 8159232. Throughput: 0: 1499.0. Samples: 1039814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:55:01,034][06183] Avg episode reward: [(0, '4.387')] -[2023-02-22 20:55:05,412][28133] Updated weights for policy 0, policy_version 1998 (0.0040) -[2023-02-22 20:55:06,028][06183] Fps is (10 sec: 4915.5, 60 sec: 6007.4, 300 sec: 7803.2). Total num frames: 8183808. Throughput: 0: 1467.1. Samples: 1043328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:55:06,033][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 20:55:11,028][06183] Fps is (10 sec: 5325.0, 60 sec: 5870.9, 300 sec: 7831.0). Total num frames: 8212480. Throughput: 0: 1434.0. Samples: 1051448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:11,035][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 20:55:13,221][28133] Updated weights for policy 0, policy_version 2008 (0.0049) -[2023-02-22 20:55:16,028][06183] Fps is (10 sec: 5324.7, 60 sec: 5734.4, 300 sec: 7858.8). Total num frames: 8237056. Throughput: 0: 1386.5. Samples: 1058888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:16,032][06183] Avg episode reward: [(0, '4.391')] -[2023-02-22 20:55:21,028][06183] Fps is (10 sec: 4915.3, 60 sec: 5597.9, 300 sec: 7886.5). Total num frames: 8261632. Throughput: 0: 1365.7. Samples: 1062706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:21,032][06183] Avg episode reward: [(0, '4.281')] -[2023-02-22 20:55:21,142][28133] Updated weights for policy 0, policy_version 2018 (0.0041) -[2023-02-22 20:55:26,028][06183] Fps is (10 sec: 5324.7, 60 sec: 5597.8, 300 sec: 7914.3). Total num frames: 8290304. Throughput: 0: 1338.4. Samples: 1070500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:26,033][06183] Avg episode reward: [(0, '4.676')] -[2023-02-22 20:55:29,190][28133] Updated weights for policy 0, policy_version 2028 (0.0035) -[2023-02-22 20:55:31,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5461.3, 300 sec: 7942.1). Total num frames: 8314880. Throughput: 0: 1309.6. Samples: 1078230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:31,037][06183] Avg episode reward: [(0, '4.431')] -[2023-02-22 20:55:36,028][06183] Fps is (10 sec: 4915.3, 60 sec: 5324.8, 300 sec: 7956.0). Total num frames: 8339456. Throughput: 0: 1294.8. Samples: 1081996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:55:36,032][06183] Avg episode reward: [(0, '4.214')] -[2023-02-22 20:55:37,046][28133] Updated weights for policy 0, policy_version 2038 (0.0040) -[2023-02-22 20:55:41,028][06183] Fps is (10 sec: 5324.8, 60 sec: 5324.8, 300 sec: 7983.7). Total num frames: 8368128. Throughput: 0: 1286.3. Samples: 1089990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:41,033][06183] Avg episode reward: [(0, '4.331')] -[2023-02-22 20:55:44,576][28133] Updated weights for policy 0, policy_version 2048 (0.0039) -[2023-02-22 20:55:46,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5256.5, 300 sec: 7983.7). Total num frames: 8392704. Throughput: 0: 1291.7. Samples: 1097938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:46,034][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 20:55:51,028][06183] Fps is (10 sec: 4915.1, 60 sec: 5188.2, 300 sec: 7886.5). Total num frames: 8417280. Throughput: 0: 1298.7. Samples: 1101768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:51,038][06183] Avg episode reward: [(0, '4.237')] -[2023-02-22 20:55:51,225][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth... -[2023-02-22 20:55:51,939][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001646_6742016.pth -[2023-02-22 20:55:52,911][28133] Updated weights for policy 0, policy_version 2058 (0.0048) -[2023-02-22 20:55:56,027][06183] Fps is (10 sec: 4915.4, 60 sec: 5120.1, 300 sec: 7775.5). Total num frames: 8441856. Throughput: 0: 1270.8. Samples: 1108632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:55:56,033][06183] Avg episode reward: [(0, '4.210')] -[2023-02-22 20:56:01,028][06183] Fps is (10 sec: 4915.4, 60 sec: 5120.0, 300 sec: 7636.6). Total num frames: 8466432. Throughput: 0: 1275.3. Samples: 1116278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:56:01,033][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 20:56:01,328][28133] Updated weights for policy 0, policy_version 2068 (0.0051) -[2023-02-22 20:56:06,028][06183] Fps is (10 sec: 4914.8, 60 sec: 5119.9, 300 sec: 7511.6). Total num frames: 8491008. Throughput: 0: 1271.2. Samples: 1119910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:06,037][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 20:56:09,521][28133] Updated weights for policy 0, policy_version 2078 (0.0040) -[2023-02-22 20:56:11,028][06183] Fps is (10 sec: 4915.0, 60 sec: 5051.7, 300 sec: 7386.7). Total num frames: 8515584. Throughput: 0: 1266.8. Samples: 1127508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:11,036][06183] Avg episode reward: [(0, '4.401')] -[2023-02-22 20:56:16,027][06183] Fps is (10 sec: 5325.2, 60 sec: 5120.0, 300 sec: 7275.6). Total num frames: 8544256. Throughput: 0: 1268.7. Samples: 1135320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:16,033][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 20:56:17,335][28133] Updated weights for policy 0, policy_version 2088 (0.0034) -[2023-02-22 20:56:21,028][06183] Fps is (10 sec: 5325.0, 60 sec: 5120.0, 300 sec: 7164.5). Total num frames: 8568832. Throughput: 0: 1275.6. Samples: 1139398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:21,032][06183] Avg episode reward: [(0, '4.342')] -[2023-02-22 20:56:24,670][28133] Updated weights for policy 0, policy_version 2098 (0.0042) -[2023-02-22 20:56:26,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5120.0, 300 sec: 7053.4). Total num frames: 8597504. Throughput: 0: 1282.9. Samples: 1147722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:26,032][06183] Avg episode reward: [(0, '4.437')] -[2023-02-22 20:56:31,028][06183] Fps is (10 sec: 5734.2, 60 sec: 5188.3, 300 sec: 6956.2). Total num frames: 8626176. Throughput: 0: 1284.6. Samples: 1155744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:31,034][06183] Avg episode reward: [(0, '4.395')] -[2023-02-22 20:56:32,496][28133] Updated weights for policy 0, policy_version 2108 (0.0036) -[2023-02-22 20:56:36,028][06183] Fps is (10 sec: 5324.9, 60 sec: 5188.3, 300 sec: 6845.2). Total num frames: 8650752. Throughput: 0: 1283.3. Samples: 1159518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:36,034][06183] Avg episode reward: [(0, '4.584')] -[2023-02-22 20:56:40,204][28133] Updated weights for policy 0, policy_version 2118 (0.0034) -[2023-02-22 20:56:41,030][06183] Fps is (10 sec: 5324.8, 60 sec: 5188.3, 300 sec: 6748.0). Total num frames: 8679424. Throughput: 0: 1311.1. Samples: 1167630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:56:41,035][06183] Avg episode reward: [(0, '4.552')] -[2023-02-22 20:56:46,027][06183] Fps is (10 sec: 5734.5, 60 sec: 5256.6, 300 sec: 6664.7). Total num frames: 8708096. Throughput: 0: 1328.4. Samples: 1176056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:46,032][06183] Avg episode reward: [(0, '4.266')] -[2023-02-22 20:56:47,325][28133] Updated weights for policy 0, policy_version 2128 (0.0046) -[2023-02-22 20:56:51,027][06183] Fps is (10 sec: 5734.7, 60 sec: 5324.9, 300 sec: 6595.2). Total num frames: 8736768. Throughput: 0: 1342.1. Samples: 1180304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:51,033][06183] Avg episode reward: [(0, '4.277')] -[2023-02-22 20:56:54,452][28133] Updated weights for policy 0, policy_version 2138 (0.0037) -[2023-02-22 20:56:56,028][06183] Fps is (10 sec: 5734.3, 60 sec: 5393.0, 300 sec: 6525.8). Total num frames: 8765440. Throughput: 0: 1365.9. Samples: 1188972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:56:56,033][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 20:57:01,029][06183] Fps is (10 sec: 5324.2, 60 sec: 5393.0, 300 sec: 6456.4). Total num frames: 8790016. Throughput: 0: 1371.8. Samples: 1197052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 20:57:01,035][06183] Avg episode reward: [(0, '4.630')] -[2023-02-22 20:57:02,107][28133] Updated weights for policy 0, policy_version 2148 (0.0039) -[2023-02-22 20:57:06,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5461.3, 300 sec: 6387.0). Total num frames: 8818688. Throughput: 0: 1365.5. Samples: 1200844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:06,035][06183] Avg episode reward: [(0, '4.419')] -[2023-02-22 20:57:09,616][28133] Updated weights for policy 0, policy_version 2158 (0.0046) -[2023-02-22 20:57:11,028][06183] Fps is (10 sec: 5325.3, 60 sec: 5461.4, 300 sec: 6317.6). Total num frames: 8843264. Throughput: 0: 1366.6. Samples: 1209220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:11,033][06183] Avg episode reward: [(0, '4.654')] -[2023-02-22 20:57:16,028][06183] Fps is (10 sec: 5325.0, 60 sec: 5461.3, 300 sec: 6262.0). Total num frames: 8871936. Throughput: 0: 1367.6. Samples: 1217286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:16,033][06183] Avg episode reward: [(0, '4.615')] -[2023-02-22 20:57:17,218][28133] Updated weights for policy 0, policy_version 2168 (0.0035) -[2023-02-22 20:57:21,028][06183] Fps is (10 sec: 5324.5, 60 sec: 5461.3, 300 sec: 6192.6). Total num frames: 8896512. Throughput: 0: 1373.2. Samples: 1221314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:21,037][06183] Avg episode reward: [(0, '4.377')] -[2023-02-22 20:57:25,077][28133] Updated weights for policy 0, policy_version 2178 (0.0036) -[2023-02-22 20:57:26,027][06183] Fps is (10 sec: 5324.9, 60 sec: 5461.4, 300 sec: 6150.9). Total num frames: 8925184. Throughput: 0: 1365.1. Samples: 1229060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:26,033][06183] Avg episode reward: [(0, '4.321')] -[2023-02-22 20:57:31,028][06183] Fps is (10 sec: 5325.1, 60 sec: 5393.1, 300 sec: 6095.4). Total num frames: 8949760. Throughput: 0: 1351.7. Samples: 1236882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:57:31,031][06183] Avg episode reward: [(0, '4.475')] -[2023-02-22 20:57:32,914][28133] Updated weights for policy 0, policy_version 2188 (0.0032) -[2023-02-22 20:57:36,028][06183] Fps is (10 sec: 4915.0, 60 sec: 5393.0, 300 sec: 6039.9). Total num frames: 8974336. Throughput: 0: 1347.8. Samples: 1240954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:57:36,035][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 20:57:40,686][28133] Updated weights for policy 0, policy_version 2198 (0.0049) -[2023-02-22 20:57:41,028][06183] Fps is (10 sec: 5324.8, 60 sec: 5393.1, 300 sec: 5998.2). Total num frames: 9003008. Throughput: 0: 1328.9. Samples: 1248774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:57:41,033][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 20:57:46,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5324.7, 300 sec: 5956.6). Total num frames: 9027584. Throughput: 0: 1322.3. Samples: 1256554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:57:46,035][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 20:57:48,656][28133] Updated weights for policy 0, policy_version 2208 (0.0037) -[2023-02-22 20:57:51,028][06183] Fps is (10 sec: 5324.6, 60 sec: 5324.8, 300 sec: 5914.9). Total num frames: 9056256. Throughput: 0: 1322.8. Samples: 1260370. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 20:57:51,035][06183] Avg episode reward: [(0, '4.436')] -[2023-02-22 20:57:51,094][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002211_9056256.pth... -[2023-02-22 20:57:51,742][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001885_7720960.pth -[2023-02-22 20:57:56,028][06183] Fps is (10 sec: 5325.0, 60 sec: 5256.5, 300 sec: 5873.2). Total num frames: 9080832. Throughput: 0: 1305.0. Samples: 1267946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:57:56,035][06183] Avg episode reward: [(0, '4.226')] -[2023-02-22 20:57:56,849][28133] Updated weights for policy 0, policy_version 2218 (0.0041) -[2023-02-22 20:58:01,027][06183] Fps is (10 sec: 6144.3, 60 sec: 5461.4, 300 sec: 5859.4). Total num frames: 9117696. Throughput: 0: 1348.6. Samples: 1277974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 20:58:01,029][06183] Avg episode reward: [(0, '4.322')] -[2023-02-22 20:58:01,655][28133] Updated weights for policy 0, policy_version 2228 (0.0023) -[2023-02-22 20:58:04,619][28133] Updated weights for policy 0, policy_version 2238 (0.0013) -[2023-02-22 20:58:06,027][06183] Fps is (10 sec: 10240.8, 60 sec: 6075.8, 300 sec: 5956.6). Total num frames: 9183232. Throughput: 0: 1482.9. Samples: 1288042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:58:06,029][06183] Avg episode reward: [(0, '4.319')] -[2023-02-22 20:58:07,553][28133] Updated weights for policy 0, policy_version 2248 (0.0012) -[2023-02-22 20:58:10,604][28133] Updated weights for policy 0, policy_version 2258 (0.0013) -[2023-02-22 20:58:11,027][06183] Fps is (10 sec: 13517.2, 60 sec: 6826.7, 300 sec: 6053.8). Total num frames: 9252864. Throughput: 0: 1769.2. Samples: 1308672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:58:11,030][06183] Avg episode reward: [(0, '4.223')] -[2023-02-22 20:58:13,702][28133] Updated weights for policy 0, policy_version 2268 (0.0012) -[2023-02-22 20:58:16,027][06183] Fps is (10 sec: 13517.0, 60 sec: 7441.1, 300 sec: 6151.0). Total num frames: 9318400. Throughput: 0: 2040.8. Samples: 1328716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:58:16,029][06183] Avg episode reward: [(0, '4.344')] -[2023-02-22 20:58:16,784][28133] Updated weights for policy 0, policy_version 2278 (0.0011) -[2023-02-22 20:58:20,006][28133] Updated weights for policy 0, policy_version 2288 (0.0011) -[2023-02-22 20:58:21,027][06183] Fps is (10 sec: 13107.2, 60 sec: 8123.9, 300 sec: 6248.1). Total num frames: 9383936. Throughput: 0: 2163.9. Samples: 1338326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 20:58:21,029][06183] Avg episode reward: [(0, '4.263')] -[2023-02-22 20:58:23,155][28133] Updated weights for policy 0, policy_version 2298 (0.0013) -[2023-02-22 20:58:26,027][06183] Fps is (10 sec: 12697.6, 60 sec: 8669.9, 300 sec: 6331.4). Total num frames: 9445376. Throughput: 0: 2415.8. Samples: 1357484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:58:26,030][06183] Avg episode reward: [(0, '4.397')] -[2023-02-22 20:58:26,410][28133] Updated weights for policy 0, policy_version 2308 (0.0010) -[2023-02-22 20:58:29,597][28133] Updated weights for policy 0, policy_version 2318 (0.0015) -[2023-02-22 20:58:31,027][06183] Fps is (10 sec: 12697.4, 60 sec: 9352.6, 300 sec: 6428.6). Total num frames: 9510912. Throughput: 0: 2660.8. Samples: 1376288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:58:31,030][06183] Avg episode reward: [(0, '4.575')] -[2023-02-22 20:58:33,116][28133] Updated weights for policy 0, policy_version 2328 (0.0015) -[2023-02-22 20:58:36,027][06183] Fps is (10 sec: 12288.0, 60 sec: 9898.8, 300 sec: 6498.1). Total num frames: 9568256. Throughput: 0: 2769.4. Samples: 1384990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:58:36,031][06183] Avg episode reward: [(0, '4.493')] -[2023-02-22 20:58:36,657][28133] Updated weights for policy 0, policy_version 2338 (0.0014) -[2023-02-22 20:58:40,714][28133] Updated weights for policy 0, policy_version 2348 (0.0019) -[2023-02-22 20:58:41,027][06183] Fps is (10 sec: 10649.7, 60 sec: 10240.1, 300 sec: 6553.6). Total num frames: 9617408. Throughput: 0: 2954.5. Samples: 1400896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:58:41,030][06183] Avg episode reward: [(0, '4.363')] -[2023-02-22 20:58:44,169][28133] Updated weights for policy 0, policy_version 2358 (0.0011) -[2023-02-22 20:58:46,027][06183] Fps is (10 sec: 11059.0, 60 sec: 10854.6, 300 sec: 6636.9). Total num frames: 9678848. Throughput: 0: 3137.9. Samples: 1419178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:58:46,030][06183] Avg episode reward: [(0, '4.437')] -[2023-02-22 20:58:47,473][28133] Updated weights for policy 0, policy_version 2368 (0.0014) -[2023-02-22 20:58:51,027][06183] Fps is (10 sec: 11878.3, 60 sec: 11332.4, 300 sec: 6720.2). Total num frames: 9736192. Throughput: 0: 3102.2. Samples: 1427642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:58:51,030][06183] Avg episode reward: [(0, '4.347')] -[2023-02-22 20:58:51,295][28133] Updated weights for policy 0, policy_version 2378 (0.0017) -[2023-02-22 20:58:55,095][28133] Updated weights for policy 0, policy_version 2388 (0.0014) -[2023-02-22 20:58:56,027][06183] Fps is (10 sec: 11059.4, 60 sec: 11810.3, 300 sec: 6775.8). Total num frames: 9789440. Throughput: 0: 3003.4. Samples: 1443824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:58:56,030][06183] Avg episode reward: [(0, '4.159')] -[2023-02-22 20:58:59,007][28133] Updated weights for policy 0, policy_version 2398 (0.0013) -[2023-02-22 20:59:01,027][06183] Fps is (10 sec: 10239.9, 60 sec: 12015.0, 300 sec: 6831.3). Total num frames: 9838592. Throughput: 0: 2905.4. Samples: 1459458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:59:01,030][06183] Avg episode reward: [(0, '4.526')] -[2023-02-22 20:59:03,018][28133] Updated weights for policy 0, policy_version 2408 (0.0012) -[2023-02-22 20:59:06,027][06183] Fps is (10 sec: 10239.9, 60 sec: 11810.1, 300 sec: 6886.8). Total num frames: 9891840. Throughput: 0: 2862.7. Samples: 1467148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 20:59:06,030][06183] Avg episode reward: [(0, '4.387')] -[2023-02-22 20:59:07,000][28133] Updated weights for policy 0, policy_version 2418 (0.0017) -[2023-02-22 20:59:11,027][06183] Fps is (10 sec: 10240.1, 60 sec: 11468.8, 300 sec: 6942.4). Total num frames: 9940992. Throughput: 0: 2766.9. Samples: 1481996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:59:11,030][06183] Avg episode reward: [(0, '4.324')] -[2023-02-22 20:59:11,293][28133] Updated weights for policy 0, policy_version 2428 (0.0015) -[2023-02-22 20:59:15,498][28133] Updated weights for policy 0, policy_version 2438 (0.0014) -[2023-02-22 20:59:16,027][06183] Fps is (10 sec: 9830.0, 60 sec: 11195.6, 300 sec: 6997.9). Total num frames: 9990144. Throughput: 0: 2670.2. Samples: 1496446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:16,030][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 20:59:19,700][28133] Updated weights for policy 0, policy_version 2448 (0.0014) -[2023-02-22 20:59:21,028][06183] Fps is (10 sec: 9420.3, 60 sec: 10854.3, 300 sec: 7053.4). Total num frames: 10035200. Throughput: 0: 2638.9. Samples: 1503740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:59:21,031][06183] Avg episode reward: [(0, '4.506')] -[2023-02-22 20:59:23,957][28133] Updated weights for policy 0, policy_version 2458 (0.0015) -[2023-02-22 20:59:26,027][06183] Fps is (10 sec: 9421.2, 60 sec: 10649.6, 300 sec: 7109.0). Total num frames: 10084352. Throughput: 0: 2605.4. Samples: 1518138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:26,031][06183] Avg episode reward: [(0, '4.376')] -[2023-02-22 20:59:28,261][28133] Updated weights for policy 0, policy_version 2468 (0.0021) -[2023-02-22 20:59:31,027][06183] Fps is (10 sec: 9830.7, 60 sec: 10376.5, 300 sec: 7164.5). Total num frames: 10133504. Throughput: 0: 2516.8. Samples: 1532432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:31,031][06183] Avg episode reward: [(0, '4.537')] -[2023-02-22 20:59:32,646][28133] Updated weights for policy 0, policy_version 2478 (0.0012) -[2023-02-22 20:59:36,028][06183] Fps is (10 sec: 9420.1, 60 sec: 10171.6, 300 sec: 7220.1). Total num frames: 10178560. Throughput: 0: 2485.0. Samples: 1539468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 20:59:36,032][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 20:59:37,095][28133] Updated weights for policy 0, policy_version 2488 (0.0015) -[2023-02-22 20:59:41,027][06183] Fps is (10 sec: 9011.3, 60 sec: 10103.5, 300 sec: 7275.6). Total num frames: 10223616. Throughput: 0: 2426.5. Samples: 1553018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:41,031][06183] Avg episode reward: [(0, '4.377')] -[2023-02-22 20:59:41,614][28133] Updated weights for policy 0, policy_version 2498 (0.0018) -[2023-02-22 20:59:46,027][06183] Fps is (10 sec: 9011.8, 60 sec: 9830.4, 300 sec: 7331.2). Total num frames: 10268672. Throughput: 0: 2376.0. Samples: 1566378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:46,031][06183] Avg episode reward: [(0, '4.423')] -[2023-02-22 20:59:46,210][28133] Updated weights for policy 0, policy_version 2508 (0.0014) -[2023-02-22 20:59:50,879][28133] Updated weights for policy 0, policy_version 2518 (0.0020) -[2023-02-22 20:59:51,027][06183] Fps is (10 sec: 9011.1, 60 sec: 9625.6, 300 sec: 7386.7). Total num frames: 10313728. Throughput: 0: 2351.9. Samples: 1572982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 20:59:51,033][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 20:59:51,064][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002518_10313728.pth... -[2023-02-22 20:59:51,496][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth -[2023-02-22 20:59:55,696][28133] Updated weights for policy 0, policy_version 2528 (0.0015) -[2023-02-22 20:59:56,027][06183] Fps is (10 sec: 8601.4, 60 sec: 9420.8, 300 sec: 7442.2). Total num frames: 10354688. Throughput: 0: 2305.6. Samples: 1585748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 20:59:56,032][06183] Avg episode reward: [(0, '4.642')] -[2023-02-22 21:00:00,618][28133] Updated weights for policy 0, policy_version 2538 (0.0016) -[2023-02-22 21:00:01,027][06183] Fps is (10 sec: 8192.2, 60 sec: 9284.3, 300 sec: 7497.8). Total num frames: 10395648. Throughput: 0: 2265.7. Samples: 1598400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:00:01,036][06183] Avg episode reward: [(0, '4.476')] -[2023-02-22 21:00:05,450][28133] Updated weights for policy 0, policy_version 2548 (0.0017) -[2023-02-22 21:00:06,028][06183] Fps is (10 sec: 8601.3, 60 sec: 9147.6, 300 sec: 7553.3). Total num frames: 10440704. Throughput: 0: 2244.0. Samples: 1604718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:06,033][06183] Avg episode reward: [(0, '4.231')] -[2023-02-22 21:00:10,373][28133] Updated weights for policy 0, policy_version 2558 (0.0014) -[2023-02-22 21:00:11,028][06183] Fps is (10 sec: 8600.9, 60 sec: 9011.1, 300 sec: 7608.8). Total num frames: 10481664. Throughput: 0: 2201.2. Samples: 1617196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:11,032][06183] Avg episode reward: [(0, '4.297')] -[2023-02-22 21:00:15,429][28133] Updated weights for policy 0, policy_version 2568 (0.0025) -[2023-02-22 21:00:16,027][06183] Fps is (10 sec: 8192.3, 60 sec: 8874.7, 300 sec: 7664.4). Total num frames: 10522624. Throughput: 0: 2156.4. Samples: 1629468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:16,031][06183] Avg episode reward: [(0, '4.396')] -[2023-02-22 21:00:20,368][28133] Updated weights for policy 0, policy_version 2578 (0.0022) -[2023-02-22 21:00:21,028][06183] Fps is (10 sec: 8192.2, 60 sec: 8806.4, 300 sec: 7706.0). Total num frames: 10563584. Throughput: 0: 2139.5. Samples: 1635746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:21,033][06183] Avg episode reward: [(0, '4.324')] -[2023-02-22 21:00:25,605][28133] Updated weights for policy 0, policy_version 2588 (0.0022) -[2023-02-22 21:00:26,027][06183] Fps is (10 sec: 7782.6, 60 sec: 8601.6, 300 sec: 7747.7). Total num frames: 10600448. Throughput: 0: 2098.7. Samples: 1647460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:00:26,031][06183] Avg episode reward: [(0, '4.341')] -[2023-02-22 21:00:30,878][28133] Updated weights for policy 0, policy_version 2598 (0.0025) -[2023-02-22 21:00:31,028][06183] Fps is (10 sec: 7782.2, 60 sec: 8465.0, 300 sec: 7803.2). Total num frames: 10641408. Throughput: 0: 2064.1. Samples: 1659262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:31,034][06183] Avg episode reward: [(0, '4.369')] -[2023-02-22 21:00:36,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8328.6, 300 sec: 7831.0). Total num frames: 10678272. Throughput: 0: 2041.0. Samples: 1664826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:00:36,032][06183] Avg episode reward: [(0, '4.395')] -[2023-02-22 21:00:36,314][28133] Updated weights for policy 0, policy_version 2608 (0.0025) -[2023-02-22 21:00:41,027][06183] Fps is (10 sec: 7373.3, 60 sec: 8192.0, 300 sec: 7872.7). Total num frames: 10715136. Throughput: 0: 2016.9. Samples: 1676510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:00:41,031][06183] Avg episode reward: [(0, '4.311')] -[2023-02-22 21:00:41,603][28133] Updated weights for policy 0, policy_version 2618 (0.0017) -[2023-02-22 21:00:46,027][06183] Fps is (10 sec: 7782.4, 60 sec: 8123.7, 300 sec: 7928.2). Total num frames: 10756096. Throughput: 0: 1996.0. Samples: 1688222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:00:46,031][06183] Avg episode reward: [(0, '4.490')] -[2023-02-22 21:00:46,924][28133] Updated weights for policy 0, policy_version 2628 (0.0017) -[2023-02-22 21:00:51,027][06183] Fps is (10 sec: 7782.3, 60 sec: 7987.2, 300 sec: 7969.8). Total num frames: 10792960. Throughput: 0: 1983.1. Samples: 1693956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:00:51,032][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 21:00:52,325][28133] Updated weights for policy 0, policy_version 2638 (0.0020) -[2023-02-22 21:00:56,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7918.9, 300 sec: 8011.5). Total num frames: 10829824. Throughput: 0: 1955.4. Samples: 1705186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:00:56,033][06183] Avg episode reward: [(0, '4.219')] -[2023-02-22 21:00:57,971][28133] Updated weights for policy 0, policy_version 2648 (0.0028) -[2023-02-22 21:01:01,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7850.6, 300 sec: 8053.2). Total num frames: 10866688. Throughput: 0: 1924.3. Samples: 1716062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:01:01,031][06183] Avg episode reward: [(0, '4.252')] -[2023-02-22 21:01:03,374][28133] Updated weights for policy 0, policy_version 2658 (0.0022) -[2023-02-22 21:01:06,027][06183] Fps is (10 sec: 7373.0, 60 sec: 7714.2, 300 sec: 8094.8). Total num frames: 10903552. Throughput: 0: 1910.5. Samples: 1721716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:01:06,031][06183] Avg episode reward: [(0, '4.290')] -[2023-02-22 21:01:09,101][28133] Updated weights for policy 0, policy_version 2668 (0.0025) -[2023-02-22 21:01:11,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7645.9, 300 sec: 8122.6). Total num frames: 10940416. Throughput: 0: 1890.7. Samples: 1732540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:01:11,031][06183] Avg episode reward: [(0, '4.288')] -[2023-02-22 21:01:14,821][28133] Updated weights for policy 0, policy_version 2678 (0.0030) -[2023-02-22 21:01:16,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7577.6, 300 sec: 8164.2). Total num frames: 10977280. Throughput: 0: 1867.9. Samples: 1743316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:01:16,031][06183] Avg episode reward: [(0, '4.293')] -[2023-02-22 21:01:20,580][28133] Updated weights for policy 0, policy_version 2688 (0.0023) -[2023-02-22 21:01:21,028][06183] Fps is (10 sec: 6962.8, 60 sec: 7441.1, 300 sec: 8178.1). Total num frames: 11010048. Throughput: 0: 1864.2. Samples: 1748714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:01:21,033][06183] Avg episode reward: [(0, '4.331')] -[2023-02-22 21:01:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7441.0, 300 sec: 8205.9). Total num frames: 11046912. Throughput: 0: 1841.3. Samples: 1759370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:01:26,031][06183] Avg episode reward: [(0, '4.378')] -[2023-02-22 21:01:26,430][28133] Updated weights for policy 0, policy_version 2698 (0.0021) -[2023-02-22 21:01:31,027][06183] Fps is (10 sec: 6963.6, 60 sec: 7304.6, 300 sec: 8233.7). Total num frames: 11079680. Throughput: 0: 1811.4. Samples: 1769736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:01:31,030][06183] Avg episode reward: [(0, '4.337')] -[2023-02-22 21:01:32,263][28133] Updated weights for policy 0, policy_version 2708 (0.0027) -[2023-02-22 21:01:36,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7304.5, 300 sec: 8261.4). Total num frames: 11116544. Throughput: 0: 1799.5. Samples: 1774932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:01:36,032][06183] Avg episode reward: [(0, '4.217')] -[2023-02-22 21:01:38,235][28133] Updated weights for policy 0, policy_version 2718 (0.0027) -[2023-02-22 21:01:41,028][06183] Fps is (10 sec: 6962.7, 60 sec: 7236.2, 300 sec: 8275.3). Total num frames: 11149312. Throughput: 0: 1780.2. Samples: 1785296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:01:41,034][06183] Avg episode reward: [(0, '4.358')] -[2023-02-22 21:01:44,170][28133] Updated weights for policy 0, policy_version 2728 (0.0022) -[2023-02-22 21:01:46,028][06183] Fps is (10 sec: 6963.0, 60 sec: 7168.0, 300 sec: 8303.1). Total num frames: 11186176. Throughput: 0: 1765.7. Samples: 1795520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:01:46,034][06183] Avg episode reward: [(0, '4.341')] -[2023-02-22 21:01:50,327][28133] Updated weights for policy 0, policy_version 2738 (0.0024) -[2023-02-22 21:01:51,028][06183] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 8317.0). Total num frames: 11218944. Throughput: 0: 1752.5. Samples: 1800578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:01:51,032][06183] Avg episode reward: [(0, '4.469')] -[2023-02-22 21:01:51,070][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002739_11218944.pth... -[2023-02-22 21:01:51,585][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002211_9056256.pth -[2023-02-22 21:01:56,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 8330.9). Total num frames: 11247616. Throughput: 0: 1726.2. Samples: 1810218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:01:56,033][06183] Avg episode reward: [(0, '4.596')] -[2023-02-22 21:01:56,747][28133] Updated weights for policy 0, policy_version 2748 (0.0034) -[2023-02-22 21:02:01,028][06183] Fps is (10 sec: 6143.9, 60 sec: 6894.8, 300 sec: 8344.7). Total num frames: 11280384. Throughput: 0: 1704.2. Samples: 1820008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:02:01,034][06183] Avg episode reward: [(0, '4.453')] -[2023-02-22 21:02:02,961][28133] Updated weights for policy 0, policy_version 2758 (0.0031) -[2023-02-22 21:02:06,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6826.6, 300 sec: 8372.5). Total num frames: 11313152. Throughput: 0: 1690.5. Samples: 1824786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:06,033][06183] Avg episode reward: [(0, '4.482')] -[2023-02-22 21:02:09,385][28133] Updated weights for policy 0, policy_version 2768 (0.0030) -[2023-02-22 21:02:11,027][06183] Fps is (10 sec: 6554.1, 60 sec: 6758.4, 300 sec: 8386.4). Total num frames: 11345920. Throughput: 0: 1668.0. Samples: 1834432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:11,032][06183] Avg episode reward: [(0, '4.604')] -[2023-02-22 21:02:15,878][28133] Updated weights for policy 0, policy_version 2778 (0.0030) -[2023-02-22 21:02:16,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 8414.2). Total num frames: 11378688. Throughput: 0: 1649.0. Samples: 1843940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:16,032][06183] Avg episode reward: [(0, '4.454')] -[2023-02-22 21:02:21,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6621.9, 300 sec: 8414.1). Total num frames: 11407360. Throughput: 0: 1639.4. Samples: 1848708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:02:21,033][06183] Avg episode reward: [(0, '4.463')] -[2023-02-22 21:02:22,292][28133] Updated weights for policy 0, policy_version 2788 (0.0031) -[2023-02-22 21:02:26,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 11440128. Throughput: 0: 1617.1. Samples: 1858064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:26,032][06183] Avg episode reward: [(0, '4.282')] -[2023-02-22 21:02:28,869][28133] Updated weights for policy 0, policy_version 2798 (0.0032) -[2023-02-22 21:02:31,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 8469.7). Total num frames: 11472896. Throughput: 0: 1599.7. Samples: 1867504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:31,032][06183] Avg episode reward: [(0, '4.392')] -[2023-02-22 21:02:35,376][28133] Updated weights for policy 0, policy_version 2808 (0.0040) -[2023-02-22 21:02:36,027][06183] Fps is (10 sec: 6144.2, 60 sec: 6417.1, 300 sec: 8469.7). Total num frames: 11501568. Throughput: 0: 1590.4. Samples: 1872144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:36,045][06183] Avg episode reward: [(0, '4.480')] -[2023-02-22 21:02:41,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6417.1, 300 sec: 8497.5). Total num frames: 11534336. Throughput: 0: 1586.0. Samples: 1881588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:41,033][06183] Avg episode reward: [(0, '4.324')] -[2023-02-22 21:02:42,130][28133] Updated weights for policy 0, policy_version 2818 (0.0040) -[2023-02-22 21:02:46,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 8497.5). Total num frames: 11563008. Throughput: 0: 1560.5. Samples: 1890230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:46,032][06183] Avg episode reward: [(0, '4.216')] -[2023-02-22 21:02:49,375][28133] Updated weights for policy 0, policy_version 2828 (0.0041) -[2023-02-22 21:02:51,028][06183] Fps is (10 sec: 5734.5, 60 sec: 6212.3, 300 sec: 8511.4). Total num frames: 11591680. Throughput: 0: 1544.9. Samples: 1894306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:51,032][06183] Avg episode reward: [(0, '4.299')] -[2023-02-22 21:02:56,027][06183] Fps is (10 sec: 5734.3, 60 sec: 6212.3, 300 sec: 8483.6). Total num frames: 11620352. Throughput: 0: 1521.0. Samples: 1902876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:02:56,032][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 21:02:56,622][28133] Updated weights for policy 0, policy_version 2838 (0.0042) -[2023-02-22 21:03:01,028][06183] Fps is (10 sec: 5734.4, 60 sec: 6144.1, 300 sec: 8358.6). Total num frames: 11649024. Throughput: 0: 1503.3. Samples: 1911588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:01,033][06183] Avg episode reward: [(0, '4.447')] -[2023-02-22 21:03:03,864][28133] Updated weights for policy 0, policy_version 2848 (0.0037) -[2023-02-22 21:03:06,027][06183] Fps is (10 sec: 5324.8, 60 sec: 6007.5, 300 sec: 8205.9). Total num frames: 11673600. Throughput: 0: 1486.7. Samples: 1915610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:06,037][06183] Avg episode reward: [(0, '4.552')] -[2023-02-22 21:03:10,797][28133] Updated weights for policy 0, policy_version 2858 (0.0035) -[2023-02-22 21:03:11,027][06183] Fps is (10 sec: 5734.5, 60 sec: 6007.5, 300 sec: 8094.8). Total num frames: 11706368. Throughput: 0: 1473.5. Samples: 1924372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:11,032][06183] Avg episode reward: [(0, '4.560')] -[2023-02-22 21:03:16,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6007.4, 300 sec: 7983.7). Total num frames: 11739136. Throughput: 0: 1487.3. Samples: 1934434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:03:16,032][06183] Avg episode reward: [(0, '4.478')] -[2023-02-22 21:03:16,776][28133] Updated weights for policy 0, policy_version 2868 (0.0032) -[2023-02-22 21:03:21,028][06183] Fps is (10 sec: 6963.1, 60 sec: 6144.0, 300 sec: 7900.4). Total num frames: 11776000. Throughput: 0: 1500.5. Samples: 1939666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:03:21,033][06183] Avg episode reward: [(0, '4.361')] -[2023-02-22 21:03:22,702][28133] Updated weights for policy 0, policy_version 2878 (0.0026) -[2023-02-22 21:03:26,028][06183] Fps is (10 sec: 6962.8, 60 sec: 6143.9, 300 sec: 7789.3). Total num frames: 11808768. Throughput: 0: 1517.9. Samples: 1949894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:26,033][06183] Avg episode reward: [(0, '4.235')] -[2023-02-22 21:03:28,875][28133] Updated weights for policy 0, policy_version 2888 (0.0035) -[2023-02-22 21:03:31,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 7706.0). Total num frames: 11841536. Throughput: 0: 1546.3. Samples: 1959816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:31,033][06183] Avg episode reward: [(0, '4.313')] -[2023-02-22 21:03:35,104][28133] Updated weights for policy 0, policy_version 2898 (0.0032) -[2023-02-22 21:03:36,028][06183] Fps is (10 sec: 6554.0, 60 sec: 6212.2, 300 sec: 7650.5). Total num frames: 11874304. Throughput: 0: 1564.1. Samples: 1964690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:36,033][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 21:03:41,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6212.3, 300 sec: 7553.3). Total num frames: 11907072. Throughput: 0: 1589.5. Samples: 1974402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:41,031][06183] Avg episode reward: [(0, '4.383')] -[2023-02-22 21:03:41,430][28133] Updated weights for policy 0, policy_version 2908 (0.0030) -[2023-02-22 21:03:46,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 7470.0). Total num frames: 11939840. Throughput: 0: 1611.2. Samples: 1984092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:46,032][06183] Avg episode reward: [(0, '4.513')] -[2023-02-22 21:03:47,824][28133] Updated weights for policy 0, policy_version 2918 (0.0031) -[2023-02-22 21:03:51,028][06183] Fps is (10 sec: 6143.4, 60 sec: 6280.5, 300 sec: 7386.7). Total num frames: 11968512. Throughput: 0: 1631.5. Samples: 1989030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:51,035][06183] Avg episode reward: [(0, '4.328')] -[2023-02-22 21:03:51,086][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002923_11972608.pth... -[2023-02-22 21:03:51,631][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002518_10313728.pth -[2023-02-22 21:03:54,363][28133] Updated weights for policy 0, policy_version 2928 (0.0031) -[2023-02-22 21:03:56,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6348.8, 300 sec: 7331.1). Total num frames: 12001280. Throughput: 0: 1639.1. Samples: 1998134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:03:56,033][06183] Avg episode reward: [(0, '4.243')] -[2023-02-22 21:04:00,895][28133] Updated weights for policy 0, policy_version 2938 (0.0030) -[2023-02-22 21:04:01,028][06183] Fps is (10 sec: 6553.9, 60 sec: 6417.1, 300 sec: 7261.7). Total num frames: 12034048. Throughput: 0: 1627.3. Samples: 2007662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:01,033][06183] Avg episode reward: [(0, '4.340')] -[2023-02-22 21:04:06,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7206.2). Total num frames: 12066816. Throughput: 0: 1620.1. Samples: 2012572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:06,033][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:04:07,277][28133] Updated weights for policy 0, policy_version 2948 (0.0038) -[2023-02-22 21:04:11,027][06183] Fps is (10 sec: 6144.2, 60 sec: 6485.3, 300 sec: 7136.8). Total num frames: 12095488. Throughput: 0: 1608.2. Samples: 2022262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:04:11,033][06183] Avg episode reward: [(0, '4.328')] -[2023-02-22 21:04:13,604][28133] Updated weights for policy 0, policy_version 2958 (0.0027) -[2023-02-22 21:04:16,028][06183] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 7095.1). Total num frames: 12128256. Throughput: 0: 1603.8. Samples: 2031988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:04:16,033][06183] Avg episode reward: [(0, '4.324')] -[2023-02-22 21:04:19,897][28133] Updated weights for policy 0, policy_version 2968 (0.0035) -[2023-02-22 21:04:21,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6417.0, 300 sec: 7039.5). Total num frames: 12161024. Throughput: 0: 1602.7. Samples: 2036812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:04:21,034][06183] Avg episode reward: [(0, '4.247')] -[2023-02-22 21:04:26,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6417.2, 300 sec: 6984.0). Total num frames: 12193792. Throughput: 0: 1597.4. Samples: 2046284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:04:26,031][06183] Avg episode reward: [(0, '4.295')] -[2023-02-22 21:04:26,377][28133] Updated weights for policy 0, policy_version 2978 (0.0038) -[2023-02-22 21:04:31,028][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6928.5). Total num frames: 12222464. Throughput: 0: 1583.5. Samples: 2055350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:31,034][06183] Avg episode reward: [(0, '4.463')] -[2023-02-22 21:04:33,247][28133] Updated weights for policy 0, policy_version 2988 (0.0037) -[2023-02-22 21:04:36,028][06183] Fps is (10 sec: 6143.5, 60 sec: 6348.8, 300 sec: 6886.8). Total num frames: 12255232. Throughput: 0: 1573.1. Samples: 2059818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:36,033][06183] Avg episode reward: [(0, '4.453')] -[2023-02-22 21:04:39,878][28133] Updated weights for policy 0, policy_version 2998 (0.0033) -[2023-02-22 21:04:41,028][06183] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6831.3). Total num frames: 12283904. Throughput: 0: 1579.4. Samples: 2069206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:04:41,037][06183] Avg episode reward: [(0, '4.521')] -[2023-02-22 21:04:46,028][06183] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6789.6). Total num frames: 12316672. Throughput: 0: 1577.6. Samples: 2078656. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:04:46,035][06183] Avg episode reward: [(0, '4.642')] -[2023-02-22 21:04:46,322][28133] Updated weights for policy 0, policy_version 3008 (0.0028) -[2023-02-22 21:04:51,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6348.9, 300 sec: 6761.9). Total num frames: 12349440. Throughput: 0: 1572.1. Samples: 2083318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:51,033][06183] Avg episode reward: [(0, '4.707')] -[2023-02-22 21:04:52,973][28133] Updated weights for policy 0, policy_version 3018 (0.0039) -[2023-02-22 21:04:56,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6280.6, 300 sec: 6720.2). Total num frames: 12378112. Throughput: 0: 1562.8. Samples: 2092590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:04:56,032][06183] Avg episode reward: [(0, '4.416')] -[2023-02-22 21:04:59,709][28133] Updated weights for policy 0, policy_version 3028 (0.0031) -[2023-02-22 21:05:01,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6280.6, 300 sec: 6678.6). Total num frames: 12410880. Throughput: 0: 1549.9. Samples: 2101732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:05:01,036][06183] Avg episode reward: [(0, '4.430')] -[2023-02-22 21:05:06,028][06183] Fps is (10 sec: 6143.6, 60 sec: 6212.2, 300 sec: 6636.9). Total num frames: 12439552. Throughput: 0: 1546.1. Samples: 2106388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:05:06,032][06183] Avg episode reward: [(0, '4.356')] -[2023-02-22 21:05:06,223][28133] Updated weights for policy 0, policy_version 3038 (0.0036) -[2023-02-22 21:05:11,028][06183] Fps is (10 sec: 5734.1, 60 sec: 6212.2, 300 sec: 6595.2). Total num frames: 12468224. Throughput: 0: 1535.4. Samples: 2115376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:05:11,040][06183] Avg episode reward: [(0, '4.441')] -[2023-02-22 21:05:13,268][28133] Updated weights for policy 0, policy_version 3048 (0.0034) -[2023-02-22 21:05:16,028][06183] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6553.6). Total num frames: 12496896. Throughput: 0: 1528.0. Samples: 2124112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:05:16,032][06183] Avg episode reward: [(0, '4.629')] -[2023-02-22 21:05:20,136][28133] Updated weights for policy 0, policy_version 3058 (0.0034) -[2023-02-22 21:05:21,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6144.0, 300 sec: 6539.7). Total num frames: 12529664. Throughput: 0: 1529.0. Samples: 2128622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:05:21,032][06183] Avg episode reward: [(0, '4.555')] -[2023-02-22 21:05:26,028][06183] Fps is (10 sec: 6144.3, 60 sec: 6075.7, 300 sec: 6498.1). Total num frames: 12558336. Throughput: 0: 1524.8. Samples: 2137820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:05:26,033][06183] Avg episode reward: [(0, '4.254')] -[2023-02-22 21:05:26,984][28133] Updated weights for policy 0, policy_version 3068 (0.0032) -[2023-02-22 21:05:31,027][06183] Fps is (10 sec: 6143.9, 60 sec: 6144.0, 300 sec: 6484.2). Total num frames: 12591104. Throughput: 0: 1516.5. Samples: 2146898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:05:31,032][06183] Avg episode reward: [(0, '4.293')] -[2023-02-22 21:05:33,577][28133] Updated weights for policy 0, policy_version 3078 (0.0033) -[2023-02-22 21:05:36,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6075.8, 300 sec: 6456.4). Total num frames: 12619776. Throughput: 0: 1513.6. Samples: 2151430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:05:36,037][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:05:40,186][28133] Updated weights for policy 0, policy_version 3088 (0.0030) -[2023-02-22 21:05:41,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6144.0, 300 sec: 6428.6). Total num frames: 12652544. Throughput: 0: 1510.8. Samples: 2160576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:05:41,033][06183] Avg episode reward: [(0, '4.458')] -[2023-02-22 21:05:46,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6075.8, 300 sec: 6400.9). Total num frames: 12681216. Throughput: 0: 1499.5. Samples: 2169210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:05:46,034][06183] Avg episode reward: [(0, '4.278')] -[2023-02-22 21:05:47,406][28133] Updated weights for policy 0, policy_version 3098 (0.0029) -[2023-02-22 21:05:51,028][06183] Fps is (10 sec: 5734.2, 60 sec: 6007.4, 300 sec: 6373.1). Total num frames: 12709888. Throughput: 0: 1494.5. Samples: 2173640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:05:51,034][06183] Avg episode reward: [(0, '4.218')] -[2023-02-22 21:05:51,085][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003103_12709888.pth... -[2023-02-22 21:05:51,721][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002739_11218944.pth -[2023-02-22 21:05:54,445][28133] Updated weights for policy 0, policy_version 3108 (0.0030) -[2023-02-22 21:05:56,028][06183] Fps is (10 sec: 5734.4, 60 sec: 6007.5, 300 sec: 6345.3). Total num frames: 12738560. Throughput: 0: 1486.3. Samples: 2182258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:05:56,033][06183] Avg episode reward: [(0, '4.487')] -[2023-02-22 21:06:01,028][06183] Fps is (10 sec: 5734.5, 60 sec: 5939.1, 300 sec: 6317.5). Total num frames: 12767232. Throughput: 0: 1496.1. Samples: 2191438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:06:01,032][06183] Avg episode reward: [(0, '4.515')] -[2023-02-22 21:06:01,136][28133] Updated weights for policy 0, policy_version 3118 (0.0033) -[2023-02-22 21:06:06,028][06183] Fps is (10 sec: 6144.0, 60 sec: 6007.5, 300 sec: 6303.7). Total num frames: 12800000. Throughput: 0: 1500.4. Samples: 2196142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:06:06,032][06183] Avg episode reward: [(0, '4.261')] -[2023-02-22 21:06:07,850][28133] Updated weights for policy 0, policy_version 3128 (0.0028) -[2023-02-22 21:06:11,028][06183] Fps is (10 sec: 6144.1, 60 sec: 6007.5, 300 sec: 6275.9). Total num frames: 12828672. Throughput: 0: 1490.4. Samples: 2204888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:06:11,033][06183] Avg episode reward: [(0, '4.574')] -[2023-02-22 21:06:15,094][28133] Updated weights for policy 0, policy_version 3138 (0.0033) -[2023-02-22 21:06:16,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6075.8, 300 sec: 6275.9). Total num frames: 12861440. Throughput: 0: 1489.9. Samples: 2213942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:06:16,029][06183] Avg episode reward: [(0, '4.323')] -[2023-02-22 21:06:18,051][28133] Updated weights for policy 0, policy_version 3148 (0.0009) -[2023-02-22 21:06:20,684][28133] Updated weights for policy 0, policy_version 3158 (0.0010) -[2023-02-22 21:06:21,027][06183] Fps is (10 sec: 11059.9, 60 sec: 6826.7, 300 sec: 6414.8). Total num frames: 12939264. Throughput: 0: 1651.0. Samples: 2225724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:06:21,029][06183] Avg episode reward: [(0, '4.342')] -[2023-02-22 21:06:23,462][28133] Updated weights for policy 0, policy_version 3168 (0.0010) -[2023-02-22 21:06:26,027][06183] Fps is (10 sec: 15155.1, 60 sec: 7577.7, 300 sec: 6553.6). Total num frames: 13012992. Throughput: 0: 1949.7. Samples: 2248312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:06:26,029][06183] Avg episode reward: [(0, '4.437')] -[2023-02-22 21:06:26,201][28133] Updated weights for policy 0, policy_version 3178 (0.0013) -[2023-02-22 21:06:29,047][28133] Updated weights for policy 0, policy_version 3188 (0.0013) -[2023-02-22 21:06:31,027][06183] Fps is (10 sec: 14336.0, 60 sec: 8192.0, 300 sec: 6664.7). Total num frames: 13082624. Throughput: 0: 2240.1. Samples: 2270014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:06:31,029][06183] Avg episode reward: [(0, '4.220')] -[2023-02-22 21:06:32,000][28133] Updated weights for policy 0, policy_version 3198 (0.0016) -[2023-02-22 21:06:34,815][28133] Updated weights for policy 0, policy_version 3208 (0.0012) -[2023-02-22 21:06:36,027][06183] Fps is (10 sec: 14336.1, 60 sec: 8943.1, 300 sec: 6803.5). Total num frames: 13156352. Throughput: 0: 2374.1. Samples: 2280470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:06:36,030][06183] Avg episode reward: [(0, '4.508')] -[2023-02-22 21:06:37,756][28133] Updated weights for policy 0, policy_version 3218 (0.0013) -[2023-02-22 21:06:40,735][28133] Updated weights for policy 0, policy_version 3228 (0.0014) -[2023-02-22 21:06:41,027][06183] Fps is (10 sec: 13926.4, 60 sec: 9489.2, 300 sec: 6900.7). Total num frames: 13221888. Throughput: 0: 2650.7. Samples: 2301538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:06:41,030][06183] Avg episode reward: [(0, '4.289')] -[2023-02-22 21:06:43,735][28133] Updated weights for policy 0, policy_version 3238 (0.0011) -[2023-02-22 21:06:46,027][06183] Fps is (10 sec: 13516.5, 60 sec: 10171.8, 300 sec: 7025.7). Total num frames: 13291520. Throughput: 0: 2910.4. Samples: 2322404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:06:46,030][06183] Avg episode reward: [(0, '4.381')] -[2023-02-22 21:06:46,629][28133] Updated weights for policy 0, policy_version 3248 (0.0012) -[2023-02-22 21:06:49,606][28133] Updated weights for policy 0, policy_version 3258 (0.0011) -[2023-02-22 21:06:51,027][06183] Fps is (10 sec: 13926.5, 60 sec: 10854.6, 300 sec: 7164.5). Total num frames: 13361152. Throughput: 0: 3037.5. Samples: 2332830. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:06:51,029][06183] Avg episode reward: [(0, '4.548')] -[2023-02-22 21:06:52,646][28133] Updated weights for policy 0, policy_version 3268 (0.0013) -[2023-02-22 21:06:55,675][28133] Updated weights for policy 0, policy_version 3278 (0.0013) -[2023-02-22 21:06:56,027][06183] Fps is (10 sec: 13926.6, 60 sec: 11537.1, 300 sec: 7289.5). Total num frames: 13430784. Throughput: 0: 3288.6. Samples: 2352872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:06:56,030][06183] Avg episode reward: [(0, '4.286')] -[2023-02-22 21:06:58,779][28133] Updated weights for policy 0, policy_version 3288 (0.0012) -[2023-02-22 21:07:01,027][06183] Fps is (10 sec: 13106.7, 60 sec: 12083.3, 300 sec: 7386.7). Total num frames: 13492224. Throughput: 0: 3526.0. Samples: 2372612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:01,030][06183] Avg episode reward: [(0, '4.325')] -[2023-02-22 21:07:02,056][28133] Updated weights for policy 0, policy_version 3298 (0.0012) -[2023-02-22 21:07:05,574][28133] Updated weights for policy 0, policy_version 3308 (0.0014) -[2023-02-22 21:07:06,027][06183] Fps is (10 sec: 12288.0, 60 sec: 12561.2, 300 sec: 7483.9). Total num frames: 13553664. Throughput: 0: 3463.9. Samples: 2381598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:06,029][06183] Avg episode reward: [(0, '4.260')] -[2023-02-22 21:07:09,074][28133] Updated weights for policy 0, policy_version 3318 (0.0014) -[2023-02-22 21:07:11,027][06183] Fps is (10 sec: 11878.8, 60 sec: 13039.1, 300 sec: 7567.2). Total num frames: 13611008. Throughput: 0: 3352.5. Samples: 2399174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:11,030][06183] Avg episode reward: [(0, '4.515')] -[2023-02-22 21:07:12,606][28133] Updated weights for policy 0, policy_version 3328 (0.0013) -[2023-02-22 21:07:16,027][06183] Fps is (10 sec: 11468.8, 60 sec: 13448.5, 300 sec: 7664.4). Total num frames: 13668352. Throughput: 0: 3243.4. Samples: 2415968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:07:16,031][06183] Avg episode reward: [(0, '4.760')] -[2023-02-22 21:07:16,329][28133] Updated weights for policy 0, policy_version 3338 (0.0014) -[2023-02-22 21:07:20,182][28133] Updated weights for policy 0, policy_version 3348 (0.0018) -[2023-02-22 21:07:21,027][06183] Fps is (10 sec: 11059.2, 60 sec: 13038.9, 300 sec: 7733.8). Total num frames: 13721600. Throughput: 0: 3188.2. Samples: 2423940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:07:21,030][06183] Avg episode reward: [(0, '4.426')] -[2023-02-22 21:07:24,184][28133] Updated weights for policy 0, policy_version 3358 (0.0016) -[2023-02-22 21:07:26,028][06183] Fps is (10 sec: 10239.4, 60 sec: 12629.2, 300 sec: 7789.3). Total num frames: 13770752. Throughput: 0: 3064.6. Samples: 2439446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:07:26,032][06183] Avg episode reward: [(0, '4.331')] -[2023-02-22 21:07:28,150][28133] Updated weights for policy 0, policy_version 3368 (0.0015) -[2023-02-22 21:07:31,027][06183] Fps is (10 sec: 9830.0, 60 sec: 12287.9, 300 sec: 7858.8). Total num frames: 13819904. Throughput: 0: 2936.2. Samples: 2454532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:07:31,031][06183] Avg episode reward: [(0, '4.562')] -[2023-02-22 21:07:32,376][28133] Updated weights for policy 0, policy_version 3378 (0.0016) -[2023-02-22 21:07:36,027][06183] Fps is (10 sec: 9831.0, 60 sec: 11878.4, 300 sec: 7914.3). Total num frames: 13869056. Throughput: 0: 2865.8. Samples: 2461792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:07:36,030][06183] Avg episode reward: [(0, '4.485')] -[2023-02-22 21:07:36,664][28133] Updated weights for policy 0, policy_version 3388 (0.0016) -[2023-02-22 21:07:40,956][28133] Updated weights for policy 0, policy_version 3398 (0.0013) -[2023-02-22 21:07:41,027][06183] Fps is (10 sec: 9830.6, 60 sec: 11605.3, 300 sec: 7983.7). Total num frames: 13918208. Throughput: 0: 2738.7. Samples: 2476114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:41,030][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 21:07:45,221][28133] Updated weights for policy 0, policy_version 3408 (0.0013) -[2023-02-22 21:07:46,027][06183] Fps is (10 sec: 9420.7, 60 sec: 11195.7, 300 sec: 8039.3). Total num frames: 13963264. Throughput: 0: 2620.5. Samples: 2490532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:46,031][06183] Avg episode reward: [(0, '4.397')] -[2023-02-22 21:07:49,614][28133] Updated weights for policy 0, policy_version 3418 (0.0013) -[2023-02-22 21:07:51,027][06183] Fps is (10 sec: 9421.0, 60 sec: 10854.4, 300 sec: 8108.7). Total num frames: 14012416. Throughput: 0: 2578.0. Samples: 2497606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:07:51,031][06183] Avg episode reward: [(0, '4.372')] -[2023-02-22 21:07:51,055][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003421_14012416.pth... -[2023-02-22 21:07:51,494][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002923_11972608.pth -[2023-02-22 21:07:53,969][28133] Updated weights for policy 0, policy_version 3428 (0.0024) -[2023-02-22 21:07:56,028][06183] Fps is (10 sec: 9420.4, 60 sec: 10444.7, 300 sec: 8164.2). Total num frames: 14057472. Throughput: 0: 2491.7. Samples: 2511302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:07:56,032][06183] Avg episode reward: [(0, '4.391')] -[2023-02-22 21:07:58,195][28133] Updated weights for policy 0, policy_version 3438 (0.0019) -[2023-02-22 21:08:01,027][06183] Fps is (10 sec: 9420.5, 60 sec: 10240.0, 300 sec: 8247.5). Total num frames: 14106624. Throughput: 0: 2438.6. Samples: 2525706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:08:01,031][06183] Avg episode reward: [(0, '4.436')] -[2023-02-22 21:08:02,617][28133] Updated weights for policy 0, policy_version 3448 (0.0013) -[2023-02-22 21:08:06,027][06183] Fps is (10 sec: 9421.3, 60 sec: 9966.9, 300 sec: 8289.2). Total num frames: 14151680. Throughput: 0: 2415.6. Samples: 2532642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:08:06,031][06183] Avg episode reward: [(0, '4.485')] -[2023-02-22 21:08:07,013][28133] Updated weights for policy 0, policy_version 3458 (0.0014) -[2023-02-22 21:08:11,027][06183] Fps is (10 sec: 9421.1, 60 sec: 9830.4, 300 sec: 8344.7). Total num frames: 14200832. Throughput: 0: 2378.6. Samples: 2546480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:08:11,030][06183] Avg episode reward: [(0, '4.556')] -[2023-02-22 21:08:11,463][28133] Updated weights for policy 0, policy_version 3468 (0.0018) -[2023-02-22 21:08:15,964][28133] Updated weights for policy 0, policy_version 3478 (0.0015) -[2023-02-22 21:08:16,027][06183] Fps is (10 sec: 9420.8, 60 sec: 9625.6, 300 sec: 8372.5). Total num frames: 14245888. Throughput: 0: 2350.2. Samples: 2560290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:08:16,031][06183] Avg episode reward: [(0, '4.376')] -[2023-02-22 21:08:20,714][28133] Updated weights for policy 0, policy_version 3488 (0.0019) -[2023-02-22 21:08:21,027][06183] Fps is (10 sec: 8601.5, 60 sec: 9420.8, 300 sec: 8400.3). Total num frames: 14286848. Throughput: 0: 2333.3. Samples: 2566790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:08:21,030][06183] Avg episode reward: [(0, '4.362')] -[2023-02-22 21:08:25,341][28133] Updated weights for policy 0, policy_version 3498 (0.0017) -[2023-02-22 21:08:26,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9352.6, 300 sec: 8441.9). Total num frames: 14331904. Throughput: 0: 2310.4. Samples: 2580082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:08:26,030][06183] Avg episode reward: [(0, '4.263')] -[2023-02-22 21:08:29,938][28133] Updated weights for policy 0, policy_version 3508 (0.0018) -[2023-02-22 21:08:31,028][06183] Fps is (10 sec: 9011.1, 60 sec: 9284.3, 300 sec: 8483.6). Total num frames: 14376960. Throughput: 0: 2285.0. Samples: 2593358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:08:31,033][06183] Avg episode reward: [(0, '4.239')] -[2023-02-22 21:08:34,645][28133] Updated weights for policy 0, policy_version 3518 (0.0019) -[2023-02-22 21:08:36,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9147.7, 300 sec: 8511.4). Total num frames: 14417920. Throughput: 0: 2272.2. Samples: 2599856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:08:36,031][06183] Avg episode reward: [(0, '4.567')] -[2023-02-22 21:08:39,409][28133] Updated weights for policy 0, policy_version 3528 (0.0019) -[2023-02-22 21:08:41,027][06183] Fps is (10 sec: 8601.7, 60 sec: 9079.5, 300 sec: 8553.0). Total num frames: 14462976. Throughput: 0: 2255.9. Samples: 2612818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:08:41,030][06183] Avg episode reward: [(0, '4.306')] -[2023-02-22 21:08:44,143][28133] Updated weights for policy 0, policy_version 3538 (0.0019) -[2023-02-22 21:08:46,027][06183] Fps is (10 sec: 8601.4, 60 sec: 9011.2, 300 sec: 8594.7). Total num frames: 14503936. Throughput: 0: 2219.6. Samples: 2625590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:08:46,032][06183] Avg episode reward: [(0, '4.400')] -[2023-02-22 21:08:48,978][28133] Updated weights for policy 0, policy_version 3548 (0.0022) -[2023-02-22 21:08:51,027][06183] Fps is (10 sec: 8601.6, 60 sec: 8942.9, 300 sec: 8636.3). Total num frames: 14548992. Throughput: 0: 2206.3. Samples: 2631926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:08:51,033][06183] Avg episode reward: [(0, '4.282')] -[2023-02-22 21:08:53,941][28133] Updated weights for policy 0, policy_version 3558 (0.0020) -[2023-02-22 21:08:56,027][06183] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 8664.1). Total num frames: 14589952. Throughput: 0: 2176.2. Samples: 2644410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:08:56,031][06183] Avg episode reward: [(0, '4.571')] -[2023-02-22 21:08:58,888][28133] Updated weights for policy 0, policy_version 3568 (0.0015) -[2023-02-22 21:09:01,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8738.1, 300 sec: 8691.9). Total num frames: 14630912. Throughput: 0: 2145.4. Samples: 2656834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:09:01,031][06183] Avg episode reward: [(0, '4.488')] -[2023-02-22 21:09:03,837][28133] Updated weights for policy 0, policy_version 3578 (0.0019) -[2023-02-22 21:09:06,027][06183] Fps is (10 sec: 8192.1, 60 sec: 8669.9, 300 sec: 8733.5). Total num frames: 14671872. Throughput: 0: 2140.0. Samples: 2663088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:09:06,031][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 21:09:08,712][28133] Updated weights for policy 0, policy_version 3588 (0.0022) -[2023-02-22 21:09:11,027][06183] Fps is (10 sec: 8191.9, 60 sec: 8533.3, 300 sec: 8761.3). Total num frames: 14712832. Throughput: 0: 2120.4. Samples: 2675502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:09:11,031][06183] Avg episode reward: [(0, '4.500')] -[2023-02-22 21:09:13,718][28133] Updated weights for policy 0, policy_version 3598 (0.0023) -[2023-02-22 21:09:16,027][06183] Fps is (10 sec: 8192.0, 60 sec: 8465.1, 300 sec: 8789.1). Total num frames: 14753792. Throughput: 0: 2098.2. Samples: 2687778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:09:16,032][06183] Avg episode reward: [(0, '4.373')] -[2023-02-22 21:09:18,776][28133] Updated weights for policy 0, policy_version 3608 (0.0025) -[2023-02-22 21:09:21,027][06183] Fps is (10 sec: 8192.3, 60 sec: 8465.1, 300 sec: 8816.8). Total num frames: 14794752. Throughput: 0: 2089.2. Samples: 2693872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:09:21,030][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 21:09:23,859][28133] Updated weights for policy 0, policy_version 3618 (0.0027) -[2023-02-22 21:09:26,027][06183] Fps is (10 sec: 8192.0, 60 sec: 8396.8, 300 sec: 8858.5). Total num frames: 14835712. Throughput: 0: 2071.3. Samples: 2706028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:09:26,030][06183] Avg episode reward: [(0, '4.244')] -[2023-02-22 21:09:28,960][28133] Updated weights for policy 0, policy_version 3628 (0.0018) -[2023-02-22 21:09:31,027][06183] Fps is (10 sec: 7782.4, 60 sec: 8260.3, 300 sec: 8872.4). Total num frames: 14872576. Throughput: 0: 2053.7. Samples: 2718004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:09:31,031][06183] Avg episode reward: [(0, '4.175')] -[2023-02-22 21:09:34,275][28133] Updated weights for policy 0, policy_version 3638 (0.0023) -[2023-02-22 21:09:36,027][06183] Fps is (10 sec: 7782.4, 60 sec: 8260.3, 300 sec: 8914.0). Total num frames: 14913536. Throughput: 0: 2038.6. Samples: 2723662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:09:36,032][06183] Avg episode reward: [(0, '4.505')] -[2023-02-22 21:09:39,583][28133] Updated weights for policy 0, policy_version 3648 (0.0022) -[2023-02-22 21:09:41,027][06183] Fps is (10 sec: 7782.4, 60 sec: 8123.7, 300 sec: 8927.9). Total num frames: 14950400. Throughput: 0: 2021.7. Samples: 2735388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:09:41,031][06183] Avg episode reward: [(0, '4.587')] -[2023-02-22 21:09:44,919][28133] Updated weights for policy 0, policy_version 3658 (0.0019) -[2023-02-22 21:09:46,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8123.7, 300 sec: 8955.7). Total num frames: 14991360. Throughput: 0: 2001.5. Samples: 2746900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:09:46,031][06183] Avg episode reward: [(0, '4.374')] -[2023-02-22 21:09:50,189][28133] Updated weights for policy 0, policy_version 3668 (0.0020) -[2023-02-22 21:09:51,027][06183] Fps is (10 sec: 7782.3, 60 sec: 7987.2, 300 sec: 8983.4). Total num frames: 15028224. Throughput: 0: 1988.0. Samples: 2752548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:09:51,032][06183] Avg episode reward: [(0, '4.344')] -[2023-02-22 21:09:51,068][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003669_15028224.pth... -[2023-02-22 21:09:51,487][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003103_12709888.pth -[2023-02-22 21:09:55,625][28133] Updated weights for policy 0, policy_version 3678 (0.0036) -[2023-02-22 21:09:56,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7918.9, 300 sec: 8997.3). Total num frames: 15065088. Throughput: 0: 1960.7. Samples: 2763732. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:09:56,031][06183] Avg episode reward: [(0, '4.374')] -[2023-02-22 21:10:01,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7850.7, 300 sec: 9025.1). Total num frames: 15101952. Throughput: 0: 1934.2. Samples: 2774816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:10:01,032][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 21:10:01,105][28133] Updated weights for policy 0, policy_version 3688 (0.0026) -[2023-02-22 21:10:06,029][06183] Fps is (10 sec: 7371.8, 60 sec: 7782.2, 300 sec: 9052.8). Total num frames: 15138816. Throughput: 0: 1924.8. Samples: 2780490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:10:06,034][06183] Avg episode reward: [(0, '4.585')] -[2023-02-22 21:10:06,649][28133] Updated weights for policy 0, policy_version 3698 (0.0023) -[2023-02-22 21:10:11,028][06183] Fps is (10 sec: 7782.1, 60 sec: 7782.4, 300 sec: 9094.5). Total num frames: 15179776. Throughput: 0: 1901.0. Samples: 2791572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:10:11,033][06183] Avg episode reward: [(0, '4.207')] -[2023-02-22 21:10:12,188][28133] Updated weights for policy 0, policy_version 3708 (0.0021) -[2023-02-22 21:10:16,028][06183] Fps is (10 sec: 7373.4, 60 sec: 7645.8, 300 sec: 9094.5). Total num frames: 15212544. Throughput: 0: 1878.1. Samples: 2802518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:10:16,035][06183] Avg episode reward: [(0, '4.424')] -[2023-02-22 21:10:17,739][28133] Updated weights for policy 0, policy_version 3718 (0.0021) -[2023-02-22 21:10:21,028][06183] Fps is (10 sec: 7372.9, 60 sec: 7645.8, 300 sec: 9136.2). Total num frames: 15253504. Throughput: 0: 1882.0. Samples: 2808352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:10:21,034][06183] Avg episode reward: [(0, '4.310')] -[2023-02-22 21:10:23,126][28133] Updated weights for policy 0, policy_version 3728 (0.0017) -[2023-02-22 21:10:26,027][06183] Fps is (10 sec: 7782.7, 60 sec: 7577.6, 300 sec: 9150.0). Total num frames: 15290368. Throughput: 0: 1873.1. Samples: 2819676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:10:26,031][06183] Avg episode reward: [(0, '4.679')] -[2023-02-22 21:10:28,631][28133] Updated weights for policy 0, policy_version 3738 (0.0020) -[2023-02-22 21:10:31,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7577.6, 300 sec: 9177.8). Total num frames: 15327232. Throughput: 0: 1868.4. Samples: 2830978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:10:31,032][06183] Avg episode reward: [(0, '4.333')] -[2023-02-22 21:10:34,148][28133] Updated weights for policy 0, policy_version 3748 (0.0020) -[2023-02-22 21:10:36,027][06183] Fps is (10 sec: 7373.0, 60 sec: 7509.3, 300 sec: 9191.7). Total num frames: 15364096. Throughput: 0: 1866.0. Samples: 2836520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:10:36,031][06183] Avg episode reward: [(0, '4.440')] -[2023-02-22 21:10:39,598][28133] Updated weights for policy 0, policy_version 3758 (0.0024) -[2023-02-22 21:10:41,028][06183] Fps is (10 sec: 7372.6, 60 sec: 7509.3, 300 sec: 9219.5). Total num frames: 15400960. Throughput: 0: 1865.9. Samples: 2847700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:10:41,031][06183] Avg episode reward: [(0, '4.237')] -[2023-02-22 21:10:45,246][28133] Updated weights for policy 0, policy_version 3768 (0.0019) -[2023-02-22 21:10:46,028][06183] Fps is (10 sec: 7372.6, 60 sec: 7441.0, 300 sec: 9247.3). Total num frames: 15437824. Throughput: 0: 1863.9. Samples: 2858690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:10:46,031][06183] Avg episode reward: [(0, '4.299')] -[2023-02-22 21:10:50,790][28133] Updated weights for policy 0, policy_version 3778 (0.0022) -[2023-02-22 21:10:51,027][06183] Fps is (10 sec: 7373.2, 60 sec: 7441.1, 300 sec: 9275.0). Total num frames: 15474688. Throughput: 0: 1861.8. Samples: 2864270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:10:51,031][06183] Avg episode reward: [(0, '4.124')] -[2023-02-22 21:10:56,027][06183] Fps is (10 sec: 7373.0, 60 sec: 7441.1, 300 sec: 9302.8). Total num frames: 15511552. Throughput: 0: 1854.5. Samples: 2875022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:10:56,031][06183] Avg episode reward: [(0, '4.230')] -[2023-02-22 21:10:56,572][28133] Updated weights for policy 0, policy_version 3788 (0.0029) -[2023-02-22 21:11:01,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7372.8, 300 sec: 9302.8). Total num frames: 15544320. Throughput: 0: 1853.2. Samples: 2885912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:01,031][06183] Avg episode reward: [(0, '4.333')] -[2023-02-22 21:11:02,219][28133] Updated weights for policy 0, policy_version 3798 (0.0028) -[2023-02-22 21:11:06,027][06183] Fps is (10 sec: 6963.0, 60 sec: 7372.9, 300 sec: 9330.6). Total num frames: 15581184. Throughput: 0: 1843.6. Samples: 2891314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:11:06,032][06183] Avg episode reward: [(0, '4.532')] -[2023-02-22 21:11:07,853][28133] Updated weights for policy 0, policy_version 3808 (0.0023) -[2023-02-22 21:11:11,028][06183] Fps is (10 sec: 7372.3, 60 sec: 7304.5, 300 sec: 9344.4). Total num frames: 15618048. Throughput: 0: 1827.1. Samples: 2901896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:11:11,033][06183] Avg episode reward: [(0, '4.417')] -[2023-02-22 21:11:13,694][28133] Updated weights for policy 0, policy_version 3818 (0.0023) -[2023-02-22 21:11:16,027][06183] Fps is (10 sec: 6963.4, 60 sec: 7304.6, 300 sec: 9191.7). Total num frames: 15650816. Throughput: 0: 1812.9. Samples: 2912558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:11:16,031][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:11:19,605][28133] Updated weights for policy 0, policy_version 3828 (0.0026) -[2023-02-22 21:11:21,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7236.3, 300 sec: 9066.7). Total num frames: 15687680. Throughput: 0: 1804.6. Samples: 2917726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:21,033][06183] Avg episode reward: [(0, '4.600')] -[2023-02-22 21:11:25,444][28133] Updated weights for policy 0, policy_version 3838 (0.0025) -[2023-02-22 21:11:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7168.0, 300 sec: 8941.8). Total num frames: 15720448. Throughput: 0: 1792.2. Samples: 2928350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:26,032][06183] Avg episode reward: [(0, '4.477')] -[2023-02-22 21:11:31,028][06183] Fps is (10 sec: 6962.8, 60 sec: 7167.9, 300 sec: 8816.8). Total num frames: 15757312. Throughput: 0: 1779.9. Samples: 2938786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:31,033][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:11:31,302][28133] Updated weights for policy 0, policy_version 3848 (0.0023) -[2023-02-22 21:11:36,028][06183] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 8705.7). Total num frames: 15790080. Throughput: 0: 1773.8. Samples: 2944090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:36,033][06183] Avg episode reward: [(0, '4.307')] -[2023-02-22 21:11:37,300][28133] Updated weights for policy 0, policy_version 3858 (0.0026) -[2023-02-22 21:11:41,027][06183] Fps is (10 sec: 6963.7, 60 sec: 7099.8, 300 sec: 8594.7). Total num frames: 15826944. Throughput: 0: 1762.0. Samples: 2954310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:11:41,031][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 21:11:43,066][28133] Updated weights for policy 0, policy_version 3868 (0.0023) -[2023-02-22 21:11:46,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7031.5, 300 sec: 8469.7). Total num frames: 15859712. Throughput: 0: 1748.2. Samples: 2964582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:11:46,031][06183] Avg episode reward: [(0, '4.433')] -[2023-02-22 21:11:49,075][28133] Updated weights for policy 0, policy_version 3878 (0.0027) -[2023-02-22 21:11:51,028][06183] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 8358.6). Total num frames: 15896576. Throughput: 0: 1743.0. Samples: 2969748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:11:51,034][06183] Avg episode reward: [(0, '4.489')] -[2023-02-22 21:11:51,075][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003881_15896576.pth... -[2023-02-22 21:11:51,564][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003421_14012416.pth -[2023-02-22 21:11:55,263][28133] Updated weights for policy 0, policy_version 3888 (0.0031) -[2023-02-22 21:11:56,027][06183] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 8261.4). Total num frames: 15929344. Throughput: 0: 1731.1. Samples: 2979796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:11:56,031][06183] Avg episode reward: [(0, '4.401')] -[2023-02-22 21:12:01,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6963.2, 300 sec: 8164.2). Total num frames: 15962112. Throughput: 0: 1720.5. Samples: 2989980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:01,031][06183] Avg episode reward: [(0, '4.490')] -[2023-02-22 21:12:01,266][28133] Updated weights for policy 0, policy_version 3898 (0.0031) -[2023-02-22 21:12:06,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6895.0, 300 sec: 8080.9). Total num frames: 15994880. Throughput: 0: 1718.3. Samples: 2995050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:06,033][06183] Avg episode reward: [(0, '4.617')] -[2023-02-22 21:12:07,343][28133] Updated weights for policy 0, policy_version 3908 (0.0030) -[2023-02-22 21:12:11,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 7997.6). Total num frames: 16027648. Throughput: 0: 1706.5. Samples: 3005142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:12:11,032][06183] Avg episode reward: [(0, '4.329')] -[2023-02-22 21:12:13,538][28133] Updated weights for policy 0, policy_version 3918 (0.0036) -[2023-02-22 21:12:16,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 7928.2). Total num frames: 16060416. Throughput: 0: 1693.7. Samples: 3015000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:16,031][06183] Avg episode reward: [(0, '4.337')] -[2023-02-22 21:12:19,772][28133] Updated weights for policy 0, policy_version 3928 (0.0032) -[2023-02-22 21:12:21,028][06183] Fps is (10 sec: 6962.9, 60 sec: 6826.6, 300 sec: 7886.5). Total num frames: 16097280. Throughput: 0: 1683.8. Samples: 3019860. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:21,033][06183] Avg episode reward: [(0, '4.436')] -[2023-02-22 21:12:26,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6758.4, 300 sec: 7817.1). Total num frames: 16125952. Throughput: 0: 1674.5. Samples: 3029664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:12:26,033][06183] Avg episode reward: [(0, '4.646')] -[2023-02-22 21:12:26,093][28133] Updated weights for policy 0, policy_version 3938 (0.0035) -[2023-02-22 21:12:31,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6690.2, 300 sec: 7761.6). Total num frames: 16158720. Throughput: 0: 1667.1. Samples: 3039602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:31,032][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 21:12:32,239][28133] Updated weights for policy 0, policy_version 3948 (0.0022) -[2023-02-22 21:12:36,028][06183] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 7719.9). Total num frames: 16195584. Throughput: 0: 1660.1. Samples: 3044452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:12:36,033][06183] Avg episode reward: [(0, '4.574')] -[2023-02-22 21:12:38,554][28133] Updated weights for policy 0, policy_version 3958 (0.0028) -[2023-02-22 21:12:41,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6621.8, 300 sec: 7664.4). Total num frames: 16224256. Throughput: 0: 1655.1. Samples: 3054276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:12:41,032][06183] Avg episode reward: [(0, '4.440')] -[2023-02-22 21:12:44,913][28133] Updated weights for policy 0, policy_version 3968 (0.0031) -[2023-02-22 21:12:46,028][06183] Fps is (10 sec: 6144.0, 60 sec: 6621.8, 300 sec: 7608.8). Total num frames: 16257024. Throughput: 0: 1643.4. Samples: 3063936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:12:46,032][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 21:12:51,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7567.2). Total num frames: 16289792. Throughput: 0: 1637.2. Samples: 3068724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:12:51,032][06183] Avg episode reward: [(0, '4.340')] -[2023-02-22 21:12:51,314][28133] Updated weights for policy 0, policy_version 3978 (0.0030) -[2023-02-22 21:12:56,028][06183] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7511.6). Total num frames: 16322560. Throughput: 0: 1627.8. Samples: 3078394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:12:56,033][06183] Avg episode reward: [(0, '4.237')] -[2023-02-22 21:12:57,485][28133] Updated weights for policy 0, policy_version 3988 (0.0029) -[2023-02-22 21:13:01,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6553.5, 300 sec: 7470.0). Total num frames: 16355328. Throughput: 0: 1628.2. Samples: 3088270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:01,033][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:13:03,841][28133] Updated weights for policy 0, policy_version 3998 (0.0024) -[2023-02-22 21:13:06,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 7414.4). Total num frames: 16388096. Throughput: 0: 1628.3. Samples: 3093132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:06,033][06183] Avg episode reward: [(0, '4.339')] -[2023-02-22 21:13:10,120][28133] Updated weights for policy 0, policy_version 4008 (0.0031) -[2023-02-22 21:13:11,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7372.8). Total num frames: 16420864. Throughput: 0: 1627.7. Samples: 3102912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:11,035][06183] Avg episode reward: [(0, '4.543')] -[2023-02-22 21:13:16,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6553.5, 300 sec: 7345.0). Total num frames: 16453632. Throughput: 0: 1621.6. Samples: 3112574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:16,033][06183] Avg episode reward: [(0, '4.458')] -[2023-02-22 21:13:16,584][28133] Updated weights for policy 0, policy_version 4018 (0.0032) -[2023-02-22 21:13:21,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 7303.4). Total num frames: 16486400. Throughput: 0: 1617.9. Samples: 3117258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:21,033][06183] Avg episode reward: [(0, '4.378')] -[2023-02-22 21:13:22,829][28133] Updated weights for policy 0, policy_version 4028 (0.0031) -[2023-02-22 21:13:26,027][06183] Fps is (10 sec: 6144.4, 60 sec: 6485.4, 300 sec: 7247.8). Total num frames: 16515072. Throughput: 0: 1615.3. Samples: 3126964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:13:26,031][06183] Avg episode reward: [(0, '4.515')] -[2023-02-22 21:13:29,125][28133] Updated weights for policy 0, policy_version 4038 (0.0033) -[2023-02-22 21:13:31,028][06183] Fps is (10 sec: 6144.2, 60 sec: 6485.3, 300 sec: 7220.1). Total num frames: 16547840. Throughput: 0: 1616.6. Samples: 3136684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:13:31,032][06183] Avg episode reward: [(0, '4.612')] -[2023-02-22 21:13:35,580][28133] Updated weights for policy 0, policy_version 4048 (0.0028) -[2023-02-22 21:13:36,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 7178.4). Total num frames: 16580608. Throughput: 0: 1612.7. Samples: 3141296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:13:36,032][06183] Avg episode reward: [(0, '4.576')] -[2023-02-22 21:13:41,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 7150.6). Total num frames: 16613376. Throughput: 0: 1611.6. Samples: 3150914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:13:41,032][06183] Avg episode reward: [(0, '4.451')] -[2023-02-22 21:13:42,031][28133] Updated weights for policy 0, policy_version 4058 (0.0025) -[2023-02-22 21:13:46,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6485.4, 300 sec: 7109.0). Total num frames: 16646144. Throughput: 0: 1605.9. Samples: 3160534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:13:46,033][06183] Avg episode reward: [(0, '4.304')] -[2023-02-22 21:13:48,399][28133] Updated weights for policy 0, policy_version 4068 (0.0028) -[2023-02-22 21:13:51,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 7081.2). Total num frames: 16678912. Throughput: 0: 1603.9. Samples: 3165308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:13:51,031][06183] Avg episode reward: [(0, '4.445')] -[2023-02-22 21:13:51,063][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004072_16678912.pth... -[2023-02-22 21:13:51,603][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003669_15028224.pth -[2023-02-22 21:13:54,920][28133] Updated weights for policy 0, policy_version 4078 (0.0027) -[2023-02-22 21:13:56,027][06183] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 7039.6). Total num frames: 16707584. Throughput: 0: 1596.2. Samples: 3174742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:13:56,033][06183] Avg episode reward: [(0, '4.378')] -[2023-02-22 21:14:01,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 7011.8). Total num frames: 16740352. Throughput: 0: 1591.9. Samples: 3184208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:14:01,032][06183] Avg episode reward: [(0, '4.362')] -[2023-02-22 21:14:01,427][28133] Updated weights for policy 0, policy_version 4088 (0.0027) -[2023-02-22 21:14:06,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6984.0). Total num frames: 16773120. Throughput: 0: 1594.2. Samples: 3188996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:06,033][06183] Avg episode reward: [(0, '4.356')] -[2023-02-22 21:14:07,952][28133] Updated weights for policy 0, policy_version 4098 (0.0034) -[2023-02-22 21:14:11,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6348.8, 300 sec: 6942.4). Total num frames: 16801792. Throughput: 0: 1587.3. Samples: 3198394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:11,032][06183] Avg episode reward: [(0, '4.397')] -[2023-02-22 21:14:14,406][28133] Updated weights for policy 0, policy_version 4108 (0.0027) -[2023-02-22 21:14:16,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.9, 300 sec: 6914.6). Total num frames: 16834560. Throughput: 0: 1582.2. Samples: 3207884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:14:16,032][06183] Avg episode reward: [(0, '4.385')] -[2023-02-22 21:14:20,978][28133] Updated weights for policy 0, policy_version 4118 (0.0029) -[2023-02-22 21:14:21,028][06183] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6886.8). Total num frames: 16867328. Throughput: 0: 1583.8. Samples: 3212566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:21,032][06183] Avg episode reward: [(0, '4.322')] -[2023-02-22 21:14:26,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6859.1). Total num frames: 16896000. Throughput: 0: 1578.7. Samples: 3221954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:26,032][06183] Avg episode reward: [(0, '4.467')] -[2023-02-22 21:14:27,518][28133] Updated weights for policy 0, policy_version 4128 (0.0032) -[2023-02-22 21:14:30,655][28133] Updated weights for policy 0, policy_version 4138 (0.0012) -[2023-02-22 21:14:31,027][06183] Fps is (10 sec: 8602.2, 60 sec: 6758.5, 300 sec: 6914.6). Total num frames: 16953344. Throughput: 0: 1699.6. Samples: 3237014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:14:31,029][06183] Avg episode reward: [(0, '4.305')] -[2023-02-22 21:14:33,287][28133] Updated weights for policy 0, policy_version 4148 (0.0011) -[2023-02-22 21:14:36,027][06183] Fps is (10 sec: 13107.7, 60 sec: 7441.1, 300 sec: 7039.6). Total num frames: 17027072. Throughput: 0: 1845.1. Samples: 3248336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:36,029][06183] Avg episode reward: [(0, '4.343')] -[2023-02-22 21:14:36,124][28133] Updated weights for policy 0, policy_version 4158 (0.0009) -[2023-02-22 21:14:38,845][28133] Updated weights for policy 0, policy_version 4168 (0.0010) -[2023-02-22 21:14:41,027][06183] Fps is (10 sec: 14336.2, 60 sec: 8055.5, 300 sec: 7136.8). Total num frames: 17096704. Throughput: 0: 2121.3. Samples: 3270198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:14:41,029][06183] Avg episode reward: [(0, '4.392')] -[2023-02-22 21:14:41,870][28133] Updated weights for policy 0, policy_version 4178 (0.0014) -[2023-02-22 21:14:44,783][28133] Updated weights for policy 0, policy_version 4188 (0.0010) -[2023-02-22 21:14:46,027][06183] Fps is (10 sec: 14335.5, 60 sec: 8738.1, 300 sec: 7261.7). Total num frames: 17170432. Throughput: 0: 2392.8. Samples: 3291884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:14:46,033][06183] Avg episode reward: [(0, '4.359')] -[2023-02-22 21:14:47,679][28133] Updated weights for policy 0, policy_version 4198 (0.0009) -[2023-02-22 21:14:50,784][28133] Updated weights for policy 0, policy_version 4208 (0.0010) -[2023-02-22 21:14:51,027][06183] Fps is (10 sec: 14335.7, 60 sec: 9352.6, 300 sec: 7372.8). Total num frames: 17240064. Throughput: 0: 2498.5. Samples: 3301428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:14:51,030][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 21:14:53,702][28133] Updated weights for policy 0, policy_version 4218 (0.0009) -[2023-02-22 21:14:56,027][06183] Fps is (10 sec: 13517.2, 60 sec: 9967.0, 300 sec: 7470.0). Total num frames: 17305600. Throughput: 0: 2743.1. Samples: 3321830. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:14:56,028][06183] Avg episode reward: [(0, '4.380')] -[2023-02-22 21:14:56,554][28133] Updated weights for policy 0, policy_version 4228 (0.0010) -[2023-02-22 21:14:59,555][28133] Updated weights for policy 0, policy_version 4238 (0.0014) -[2023-02-22 21:15:01,027][06183] Fps is (10 sec: 13516.7, 60 sec: 10581.3, 300 sec: 7581.1). Total num frames: 17375232. Throughput: 0: 3011.0. Samples: 3343378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:15:01,030][06183] Avg episode reward: [(0, '4.373')] -[2023-02-22 21:15:02,683][28133] Updated weights for policy 0, policy_version 4248 (0.0011) -[2023-02-22 21:15:05,689][28133] Updated weights for policy 0, policy_version 4258 (0.0009) -[2023-02-22 21:15:06,027][06183] Fps is (10 sec: 13926.4, 60 sec: 11195.8, 300 sec: 7678.3). Total num frames: 17444864. Throughput: 0: 3113.1. Samples: 3352652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:15:06,029][06183] Avg episode reward: [(0, '4.413')] -[2023-02-22 21:15:08,661][28133] Updated weights for policy 0, policy_version 4268 (0.0010) -[2023-02-22 21:15:11,027][06183] Fps is (10 sec: 13926.7, 60 sec: 11878.6, 300 sec: 7803.3). Total num frames: 17514496. Throughput: 0: 3371.6. Samples: 3373674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:15:11,029][06183] Avg episode reward: [(0, '4.202')] -[2023-02-22 21:15:11,527][28133] Updated weights for policy 0, policy_version 4278 (0.0011) -[2023-02-22 21:15:14,585][28133] Updated weights for policy 0, policy_version 4288 (0.0012) -[2023-02-22 21:15:16,027][06183] Fps is (10 sec: 13516.7, 60 sec: 12424.6, 300 sec: 7886.6). Total num frames: 17580032. Throughput: 0: 3484.4. Samples: 3393814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:15:16,029][06183] Avg episode reward: [(0, '4.266')] -[2023-02-22 21:15:18,032][28133] Updated weights for policy 0, policy_version 4298 (0.0012) -[2023-02-22 21:15:20,887][28133] Updated weights for policy 0, policy_version 4308 (0.0010) -[2023-02-22 21:15:21,028][06183] Fps is (10 sec: 13106.4, 60 sec: 12970.7, 300 sec: 7983.7). Total num frames: 17645568. Throughput: 0: 3443.3. Samples: 3403288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:15:21,030][06183] Avg episode reward: [(0, '4.595')] -[2023-02-22 21:15:24,121][28133] Updated weights for policy 0, policy_version 4318 (0.0012) -[2023-02-22 21:15:26,027][06183] Fps is (10 sec: 13107.3, 60 sec: 13585.1, 300 sec: 8080.9). Total num frames: 17711104. Throughput: 0: 3400.2. Samples: 3423208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:15:26,029][06183] Avg episode reward: [(0, '4.497')] -[2023-02-22 21:15:27,302][28133] Updated weights for policy 0, policy_version 4328 (0.0013) -[2023-02-22 21:15:30,484][28133] Updated weights for policy 0, policy_version 4338 (0.0016) -[2023-02-22 21:15:31,027][06183] Fps is (10 sec: 12698.3, 60 sec: 13653.3, 300 sec: 8164.2). Total num frames: 17772544. Throughput: 0: 3339.1. Samples: 3442142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:15:31,029][06183] Avg episode reward: [(0, '4.332')] -[2023-02-22 21:15:33,881][28133] Updated weights for policy 0, policy_version 4348 (0.0011) -[2023-02-22 21:15:36,027][06183] Fps is (10 sec: 12697.6, 60 sec: 13516.8, 300 sec: 8261.4). Total num frames: 17838080. Throughput: 0: 3331.2. Samples: 3451332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:15:36,029][06183] Avg episode reward: [(0, '4.456')] -[2023-02-22 21:15:36,992][28133] Updated weights for policy 0, policy_version 4358 (0.0012) -[2023-02-22 21:15:40,480][28133] Updated weights for policy 0, policy_version 4368 (0.0012) -[2023-02-22 21:15:41,028][06183] Fps is (10 sec: 12287.8, 60 sec: 13311.9, 300 sec: 8330.9). Total num frames: 17895424. Throughput: 0: 3294.1. Samples: 3470064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:15:41,030][06183] Avg episode reward: [(0, '4.382')] -[2023-02-22 21:15:43,899][28133] Updated weights for policy 0, policy_version 4378 (0.0012) -[2023-02-22 21:15:46,027][06183] Fps is (10 sec: 11878.4, 60 sec: 13107.3, 300 sec: 8414.2). Total num frames: 17956864. Throughput: 0: 3214.5. Samples: 3488032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:15:46,030][06183] Avg episode reward: [(0, '4.343')] -[2023-02-22 21:15:47,356][28133] Updated weights for policy 0, policy_version 4388 (0.0013) -[2023-02-22 21:15:50,858][28133] Updated weights for policy 0, policy_version 4398 (0.0013) -[2023-02-22 21:15:51,027][06183] Fps is (10 sec: 11878.5, 60 sec: 12902.4, 300 sec: 8483.6). Total num frames: 18014208. Throughput: 0: 3205.9. Samples: 3496918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:15:51,029][06183] Avg episode reward: [(0, '4.282')] -[2023-02-22 21:15:51,045][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004398_18014208.pth... -[2023-02-22 21:15:51,314][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003881_15896576.pth -[2023-02-22 21:15:54,379][28133] Updated weights for policy 0, policy_version 4408 (0.0015) -[2023-02-22 21:15:56,027][06183] Fps is (10 sec: 11468.7, 60 sec: 12765.8, 300 sec: 8566.9). Total num frames: 18071552. Throughput: 0: 3118.0. Samples: 3513986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:15:56,030][06183] Avg episode reward: [(0, '4.511')] -[2023-02-22 21:15:57,910][28133] Updated weights for policy 0, policy_version 4418 (0.0013) -[2023-02-22 21:16:01,027][06183] Fps is (10 sec: 11468.7, 60 sec: 12561.1, 300 sec: 8636.3). Total num frames: 18128896. Throughput: 0: 3056.9. Samples: 3531376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:16:01,030][06183] Avg episode reward: [(0, '4.441')] -[2023-02-22 21:16:01,472][28133] Updated weights for policy 0, policy_version 4428 (0.0012) -[2023-02-22 21:16:05,081][28133] Updated weights for policy 0, policy_version 4438 (0.0016) -[2023-02-22 21:16:06,027][06183] Fps is (10 sec: 11468.9, 60 sec: 12356.3, 300 sec: 8705.8). Total num frames: 18186240. Throughput: 0: 3035.7. Samples: 3539892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:16:06,029][06183] Avg episode reward: [(0, '4.295')] -[2023-02-22 21:16:08,813][28133] Updated weights for policy 0, policy_version 4448 (0.0012) -[2023-02-22 21:16:11,027][06183] Fps is (10 sec: 11059.4, 60 sec: 12083.2, 300 sec: 8775.2). Total num frames: 18239488. Throughput: 0: 2962.3. Samples: 3556512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:16:11,030][06183] Avg episode reward: [(0, '4.410')] -[2023-02-22 21:16:12,633][28133] Updated weights for policy 0, policy_version 4458 (0.0019) -[2023-02-22 21:16:16,027][06183] Fps is (10 sec: 11059.2, 60 sec: 11946.7, 300 sec: 8844.6). Total num frames: 18296832. Throughput: 0: 2902.4. Samples: 3572748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:16:16,030][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 21:16:16,375][28133] Updated weights for policy 0, policy_version 4468 (0.0014) -[2023-02-22 21:16:20,208][28133] Updated weights for policy 0, policy_version 4478 (0.0014) -[2023-02-22 21:16:21,028][06183] Fps is (10 sec: 11058.2, 60 sec: 11741.8, 300 sec: 8914.0). Total num frames: 18350080. Throughput: 0: 2879.7. Samples: 3580920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:16:21,031][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:16:24,108][28133] Updated weights for policy 0, policy_version 4488 (0.0012) -[2023-02-22 21:16:26,027][06183] Fps is (10 sec: 10240.0, 60 sec: 11468.8, 300 sec: 8955.7). Total num frames: 18399232. Throughput: 0: 2817.5. Samples: 3596850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:16:26,030][06183] Avg episode reward: [(0, '4.400')] -[2023-02-22 21:16:27,976][28133] Updated weights for policy 0, policy_version 4498 (0.0012) -[2023-02-22 21:16:31,027][06183] Fps is (10 sec: 10240.9, 60 sec: 11332.3, 300 sec: 9025.1). Total num frames: 18452480. Throughput: 0: 2764.6. Samples: 3612440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:16:31,030][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:16:31,968][28133] Updated weights for policy 0, policy_version 4508 (0.0012) -[2023-02-22 21:16:35,908][28133] Updated weights for policy 0, policy_version 4518 (0.0014) -[2023-02-22 21:16:36,027][06183] Fps is (10 sec: 10649.6, 60 sec: 11127.5, 300 sec: 9080.6). Total num frames: 18505728. Throughput: 0: 2740.0. Samples: 3620216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:16:36,030][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 21:16:39,953][28133] Updated weights for policy 0, policy_version 4528 (0.0013) -[2023-02-22 21:16:41,028][06183] Fps is (10 sec: 10239.0, 60 sec: 10990.8, 300 sec: 9136.1). Total num frames: 18554880. Throughput: 0: 2702.7. Samples: 3635610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:16:41,033][06183] Avg episode reward: [(0, '4.583')] -[2023-02-22 21:16:44,059][28133] Updated weights for policy 0, policy_version 4538 (0.0015) -[2023-02-22 21:16:46,027][06183] Fps is (10 sec: 9830.3, 60 sec: 10786.1, 300 sec: 9177.8). Total num frames: 18604032. Throughput: 0: 2648.2. Samples: 3650546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:16:46,030][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 21:16:48,186][28133] Updated weights for policy 0, policy_version 4548 (0.0013) -[2023-02-22 21:16:51,027][06183] Fps is (10 sec: 9831.3, 60 sec: 10649.6, 300 sec: 9233.4). Total num frames: 18653184. Throughput: 0: 2624.3. Samples: 3657988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:16:51,030][06183] Avg episode reward: [(0, '4.553')] -[2023-02-22 21:16:52,353][28133] Updated weights for policy 0, policy_version 4558 (0.0015) -[2023-02-22 21:16:56,027][06183] Fps is (10 sec: 9830.3, 60 sec: 10513.0, 300 sec: 9288.9). Total num frames: 18702336. Throughput: 0: 2579.8. Samples: 3672602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:16:56,031][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 21:16:56,544][28133] Updated weights for policy 0, policy_version 4568 (0.0015) -[2023-02-22 21:17:00,713][28133] Updated weights for policy 0, policy_version 4578 (0.0016) -[2023-02-22 21:17:01,028][06183] Fps is (10 sec: 9829.8, 60 sec: 10376.4, 300 sec: 9344.4). Total num frames: 18751488. Throughput: 0: 2546.1. Samples: 3687322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:17:01,033][06183] Avg episode reward: [(0, '4.335')] -[2023-02-22 21:17:05,049][28133] Updated weights for policy 0, policy_version 4588 (0.0014) -[2023-02-22 21:17:06,027][06183] Fps is (10 sec: 9830.2, 60 sec: 10239.9, 300 sec: 9400.0). Total num frames: 18800640. Throughput: 0: 2521.4. Samples: 3694382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:17:06,031][06183] Avg episode reward: [(0, '4.448')] -[2023-02-22 21:17:09,230][28133] Updated weights for policy 0, policy_version 4598 (0.0013) -[2023-02-22 21:17:11,028][06183] Fps is (10 sec: 9830.3, 60 sec: 10171.6, 300 sec: 9455.5). Total num frames: 18849792. Throughput: 0: 2488.9. Samples: 3708852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:17:11,032][06183] Avg episode reward: [(0, '4.577')] -[2023-02-22 21:17:13,518][28133] Updated weights for policy 0, policy_version 4608 (0.0012) -[2023-02-22 21:17:16,027][06183] Fps is (10 sec: 9421.0, 60 sec: 9966.9, 300 sec: 9483.3). Total num frames: 18894848. Throughput: 0: 2460.3. Samples: 3723156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:17:16,030][06183] Avg episode reward: [(0, '4.362')] -[2023-02-22 21:17:17,916][28133] Updated weights for policy 0, policy_version 4618 (0.0015) -[2023-02-22 21:17:21,027][06183] Fps is (10 sec: 9421.5, 60 sec: 9898.8, 300 sec: 9552.7). Total num frames: 18944000. Throughput: 0: 2442.5. Samples: 3730128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:17:21,030][06183] Avg episode reward: [(0, '4.522')] -[2023-02-22 21:17:22,298][28133] Updated weights for policy 0, policy_version 4628 (0.0019) -[2023-02-22 21:17:26,027][06183] Fps is (10 sec: 9420.9, 60 sec: 9830.4, 300 sec: 9594.4). Total num frames: 18989056. Throughput: 0: 2410.1. Samples: 3744062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:17:26,031][06183] Avg episode reward: [(0, '4.624')] -[2023-02-22 21:17:26,688][28133] Updated weights for policy 0, policy_version 4638 (0.0014) -[2023-02-22 21:17:31,027][06183] Fps is (10 sec: 9011.1, 60 sec: 9693.9, 300 sec: 9622.1). Total num frames: 19034112. Throughput: 0: 2387.1. Samples: 3757964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:17:31,031][06183] Avg episode reward: [(0, '4.517')] -[2023-02-22 21:17:31,081][28133] Updated weights for policy 0, policy_version 4648 (0.0016) -[2023-02-22 21:17:35,567][28133] Updated weights for policy 0, policy_version 4658 (0.0016) -[2023-02-22 21:17:36,027][06183] Fps is (10 sec: 9420.7, 60 sec: 9625.6, 300 sec: 9691.6). Total num frames: 19083264. Throughput: 0: 2374.0. Samples: 3764820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:17:36,031][06183] Avg episode reward: [(0, '4.539')] -[2023-02-22 21:17:40,051][28133] Updated weights for policy 0, policy_version 4668 (0.0016) -[2023-02-22 21:17:41,027][06183] Fps is (10 sec: 9420.9, 60 sec: 9557.5, 300 sec: 9733.2). Total num frames: 19128320. Throughput: 0: 2356.5. Samples: 3778642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:17:41,030][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:17:44,627][28133] Updated weights for policy 0, policy_version 4678 (0.0014) -[2023-02-22 21:17:46,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9420.8, 300 sec: 9761.0). Total num frames: 19169280. Throughput: 0: 2326.8. Samples: 3792028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:17:46,033][06183] Avg episode reward: [(0, '4.416')] -[2023-02-22 21:17:49,260][28133] Updated weights for policy 0, policy_version 4688 (0.0016) -[2023-02-22 21:17:51,027][06183] Fps is (10 sec: 8601.5, 60 sec: 9352.5, 300 sec: 9802.6). Total num frames: 19214336. Throughput: 0: 2318.9. Samples: 3798732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:17:51,030][06183] Avg episode reward: [(0, '4.543')] -[2023-02-22 21:17:51,111][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004692_19218432.pth... -[2023-02-22 21:17:51,513][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004072_16678912.pth -[2023-02-22 21:17:53,938][28133] Updated weights for policy 0, policy_version 4698 (0.0015) -[2023-02-22 21:17:56,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9844.3). Total num frames: 19259392. Throughput: 0: 2290.2. Samples: 3811910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:17:56,031][06183] Avg episode reward: [(0, '4.498')] -[2023-02-22 21:17:58,594][28133] Updated weights for policy 0, policy_version 4708 (0.0022) -[2023-02-22 21:18:01,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9216.1, 300 sec: 9885.9). Total num frames: 19304448. Throughput: 0: 2261.3. Samples: 3824912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:01,031][06183] Avg episode reward: [(0, '4.409')] -[2023-02-22 21:18:03,398][28133] Updated weights for policy 0, policy_version 4718 (0.0019) -[2023-02-22 21:18:06,027][06183] Fps is (10 sec: 8601.4, 60 sec: 9079.5, 300 sec: 9913.7). Total num frames: 19345408. Throughput: 0: 2246.6. Samples: 3831228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:06,034][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 21:18:08,286][28133] Updated weights for policy 0, policy_version 4728 (0.0024) -[2023-02-22 21:18:11,027][06183] Fps is (10 sec: 8191.9, 60 sec: 8943.0, 300 sec: 9941.5). Total num frames: 19386368. Throughput: 0: 2218.4. Samples: 3843892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:11,033][06183] Avg episode reward: [(0, '4.387')] -[2023-02-22 21:18:13,113][28133] Updated weights for policy 0, policy_version 4738 (0.0021) -[2023-02-22 21:18:16,027][06183] Fps is (10 sec: 8192.2, 60 sec: 8874.7, 300 sec: 9969.3). Total num frames: 19427328. Throughput: 0: 2190.8. Samples: 3856548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:16,032][06183] Avg episode reward: [(0, '4.491')] -[2023-02-22 21:18:17,952][28133] Updated weights for policy 0, policy_version 4748 (0.0018) -[2023-02-22 21:18:21,028][06183] Fps is (10 sec: 8601.2, 60 sec: 8806.3, 300 sec: 10024.8). Total num frames: 19472384. Throughput: 0: 2181.2. Samples: 3862976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:18:21,031][06183] Avg episode reward: [(0, '4.570')] -[2023-02-22 21:18:22,800][28133] Updated weights for policy 0, policy_version 4758 (0.0019) -[2023-02-22 21:18:26,027][06183] Fps is (10 sec: 8601.7, 60 sec: 8738.1, 300 sec: 10052.6). Total num frames: 19513344. Throughput: 0: 2149.6. Samples: 3875372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:26,030][06183] Avg episode reward: [(0, '4.685')] -[2023-02-22 21:18:27,818][28133] Updated weights for policy 0, policy_version 4768 (0.0018) -[2023-02-22 21:18:31,027][06183] Fps is (10 sec: 8192.4, 60 sec: 8669.9, 300 sec: 10080.3). Total num frames: 19554304. Throughput: 0: 2127.7. Samples: 3887774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:31,030][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 21:18:32,835][28133] Updated weights for policy 0, policy_version 4778 (0.0023) -[2023-02-22 21:18:36,027][06183] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 10108.1). Total num frames: 19595264. Throughput: 0: 2114.7. Samples: 3893894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:36,031][06183] Avg episode reward: [(0, '4.345')] -[2023-02-22 21:18:37,921][28133] Updated weights for policy 0, policy_version 4788 (0.0015) -[2023-02-22 21:18:41,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8465.0, 300 sec: 10135.9). Total num frames: 19636224. Throughput: 0: 2088.8. Samples: 3905904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:18:41,031][06183] Avg episode reward: [(0, '4.243')] -[2023-02-22 21:18:43,070][28133] Updated weights for policy 0, policy_version 4798 (0.0018) -[2023-02-22 21:18:46,028][06183] Fps is (10 sec: 7782.0, 60 sec: 8396.7, 300 sec: 10149.7). Total num frames: 19673088. Throughput: 0: 2067.4. Samples: 3917944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:18:46,031][06183] Avg episode reward: [(0, '4.308')] -[2023-02-22 21:18:48,178][28133] Updated weights for policy 0, policy_version 4808 (0.0023) -[2023-02-22 21:18:51,027][06183] Fps is (10 sec: 7782.5, 60 sec: 8328.5, 300 sec: 10191.4). Total num frames: 19714048. Throughput: 0: 2062.4. Samples: 3924036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:18:51,032][06183] Avg episode reward: [(0, '4.499')] -[2023-02-22 21:18:53,298][28133] Updated weights for policy 0, policy_version 4818 (0.0022) -[2023-02-22 21:18:56,027][06183] Fps is (10 sec: 8192.1, 60 sec: 8260.2, 300 sec: 10219.2). Total num frames: 19755008. Throughput: 0: 2046.2. Samples: 3935972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:18:56,031][06183] Avg episode reward: [(0, '4.390')] -[2023-02-22 21:18:58,678][28133] Updated weights for policy 0, policy_version 4828 (0.0019) -[2023-02-22 21:19:01,027][06183] Fps is (10 sec: 7782.4, 60 sec: 8123.7, 300 sec: 10233.1). Total num frames: 19791872. Throughput: 0: 2019.4. Samples: 3947422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:19:01,030][06183] Avg episode reward: [(0, '4.361')] -[2023-02-22 21:19:03,928][28133] Updated weights for policy 0, policy_version 4838 (0.0020) -[2023-02-22 21:19:06,027][06183] Fps is (10 sec: 7782.6, 60 sec: 8123.8, 300 sec: 10274.7). Total num frames: 19832832. Throughput: 0: 2005.7. Samples: 3953232. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:19:06,031][06183] Avg episode reward: [(0, '4.290')] -[2023-02-22 21:19:09,049][28133] Updated weights for policy 0, policy_version 4848 (0.0016) -[2023-02-22 21:19:11,028][06183] Fps is (10 sec: 7782.0, 60 sec: 8055.4, 300 sec: 10288.6). Total num frames: 19869696. Throughput: 0: 1993.0. Samples: 3965060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:19:11,032][06183] Avg episode reward: [(0, '4.348')] -[2023-02-22 21:19:14,390][28133] Updated weights for policy 0, policy_version 4858 (0.0019) -[2023-02-22 21:19:16,037][06183] Fps is (10 sec: 7365.8, 60 sec: 7985.9, 300 sec: 10302.2). Total num frames: 19906560. Throughput: 0: 1972.1. Samples: 3976538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:19:16,040][06183] Avg episode reward: [(0, '4.312')] -[2023-02-22 21:19:19,750][28133] Updated weights for policy 0, policy_version 4868 (0.0025) -[2023-02-22 21:19:21,027][06183] Fps is (10 sec: 7782.8, 60 sec: 7919.0, 300 sec: 10344.1). Total num frames: 19947520. Throughput: 0: 1964.3. Samples: 3982286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:19:21,034][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 21:19:25,198][28133] Updated weights for policy 0, policy_version 4878 (0.0023) -[2023-02-22 21:19:26,027][06183] Fps is (10 sec: 7789.7, 60 sec: 7850.6, 300 sec: 10274.7). Total num frames: 19984384. Throughput: 0: 1948.4. Samples: 3993580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:19:26,030][06183] Avg episode reward: [(0, '4.304')] -[2023-02-22 21:19:30,623][28133] Updated weights for policy 0, policy_version 4888 (0.0020) -[2023-02-22 21:19:31,028][06183] Fps is (10 sec: 7372.1, 60 sec: 7782.3, 300 sec: 10149.7). Total num frames: 20021248. Throughput: 0: 1932.2. Samples: 4004892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:19:31,032][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 21:19:36,028][06183] Fps is (10 sec: 7372.6, 60 sec: 7714.1, 300 sec: 10038.6). Total num frames: 20058112. Throughput: 0: 1919.8. Samples: 4010428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:19:36,032][06183] Avg episode reward: [(0, '4.389')] -[2023-02-22 21:19:36,106][28133] Updated weights for policy 0, policy_version 4898 (0.0021) -[2023-02-22 21:19:41,027][06183] Fps is (10 sec: 7783.1, 60 sec: 7714.1, 300 sec: 9927.6). Total num frames: 20099072. Throughput: 0: 1906.9. Samples: 4021784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:19:41,032][06183] Avg episode reward: [(0, '4.436')] -[2023-02-22 21:19:41,535][28133] Updated weights for policy 0, policy_version 4908 (0.0024) -[2023-02-22 21:19:46,028][06183] Fps is (10 sec: 7782.1, 60 sec: 7714.1, 300 sec: 9816.5). Total num frames: 20135936. Throughput: 0: 1904.2. Samples: 4033112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:19:46,033][06183] Avg episode reward: [(0, '4.384')] -[2023-02-22 21:19:46,900][28133] Updated weights for policy 0, policy_version 4918 (0.0024) -[2023-02-22 21:19:51,028][06183] Fps is (10 sec: 7372.3, 60 sec: 7645.8, 300 sec: 9719.3). Total num frames: 20172800. Throughput: 0: 1895.1. Samples: 4038514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:19:51,033][06183] Avg episode reward: [(0, '4.578')] -[2023-02-22 21:19:51,061][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004925_20172800.pth... -[2023-02-22 21:19:51,537][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004398_18014208.pth -[2023-02-22 21:19:52,540][28133] Updated weights for policy 0, policy_version 4928 (0.0021) -[2023-02-22 21:19:56,027][06183] Fps is (10 sec: 7373.4, 60 sec: 7577.6, 300 sec: 9608.2). Total num frames: 20209664. Throughput: 0: 1874.8. Samples: 4049426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:19:56,030][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 21:19:58,168][28133] Updated weights for policy 0, policy_version 4938 (0.0021) -[2023-02-22 21:20:01,027][06183] Fps is (10 sec: 7373.3, 60 sec: 7577.6, 300 sec: 9497.2). Total num frames: 20246528. Throughput: 0: 1863.2. Samples: 4060362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:20:01,031][06183] Avg episode reward: [(0, '4.595')] -[2023-02-22 21:20:03,810][28133] Updated weights for policy 0, policy_version 4948 (0.0021) -[2023-02-22 21:20:06,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7441.1, 300 sec: 9372.2). Total num frames: 20279296. Throughput: 0: 1854.8. Samples: 4065752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:20:06,031][06183] Avg episode reward: [(0, '4.515')] -[2023-02-22 21:20:09,505][28133] Updated weights for policy 0, policy_version 4958 (0.0022) -[2023-02-22 21:20:11,027][06183] Fps is (10 sec: 6963.0, 60 sec: 7441.1, 300 sec: 9275.0). Total num frames: 20316160. Throughput: 0: 1842.0. Samples: 4076470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:11,033][06183] Avg episode reward: [(0, '4.567')] -[2023-02-22 21:20:15,345][28133] Updated weights for policy 0, policy_version 4968 (0.0023) -[2023-02-22 21:20:16,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7442.2, 300 sec: 9177.8). Total num frames: 20353024. Throughput: 0: 1826.4. Samples: 4087078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:16,032][06183] Avg episode reward: [(0, '4.499')] -[2023-02-22 21:20:21,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7304.5, 300 sec: 9066.7). Total num frames: 20385792. Throughput: 0: 1822.0. Samples: 4092416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:21,032][06183] Avg episode reward: [(0, '4.334')] -[2023-02-22 21:20:21,047][28133] Updated weights for policy 0, policy_version 4978 (0.0030) -[2023-02-22 21:20:26,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7304.5, 300 sec: 8983.4). Total num frames: 20422656. Throughput: 0: 1806.3. Samples: 4103068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:26,031][06183] Avg episode reward: [(0, '4.585')] -[2023-02-22 21:20:26,916][28133] Updated weights for policy 0, policy_version 4988 (0.0029) -[2023-02-22 21:20:31,028][06183] Fps is (10 sec: 6963.0, 60 sec: 7236.3, 300 sec: 8872.3). Total num frames: 20455424. Throughput: 0: 1787.3. Samples: 4113540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:31,032][06183] Avg episode reward: [(0, '4.254')] -[2023-02-22 21:20:32,726][28133] Updated weights for policy 0, policy_version 4998 (0.0028) -[2023-02-22 21:20:36,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 8802.9). Total num frames: 20492288. Throughput: 0: 1786.8. Samples: 4118920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:36,031][06183] Avg episode reward: [(0, '4.300')] -[2023-02-22 21:20:38,578][28133] Updated weights for policy 0, policy_version 5008 (0.0026) -[2023-02-22 21:20:41,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 8719.6). Total num frames: 20529152. Throughput: 0: 1775.2. Samples: 4129312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:20:41,032][06183] Avg episode reward: [(0, '4.339')] -[2023-02-22 21:20:44,538][28133] Updated weights for policy 0, policy_version 5018 (0.0027) -[2023-02-22 21:20:46,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7099.8, 300 sec: 8636.3). Total num frames: 20561920. Throughput: 0: 1761.3. Samples: 4139620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:46,031][06183] Avg episode reward: [(0, '4.496')] -[2023-02-22 21:20:50,500][28133] Updated weights for policy 0, policy_version 5028 (0.0021) -[2023-02-22 21:20:51,028][06183] Fps is (10 sec: 6553.4, 60 sec: 7031.5, 300 sec: 8553.0). Total num frames: 20594688. Throughput: 0: 1756.4. Samples: 4144792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:20:51,032][06183] Avg episode reward: [(0, '4.415')] -[2023-02-22 21:20:56,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 8483.6). Total num frames: 20631552. Throughput: 0: 1749.0. Samples: 4155174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:20:56,031][06183] Avg episode reward: [(0, '4.293')] -[2023-02-22 21:20:56,519][28133] Updated weights for policy 0, policy_version 5038 (0.0028) -[2023-02-22 21:21:01,027][06183] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 8400.3). Total num frames: 20664320. Throughput: 0: 1742.9. Samples: 4165506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:01,031][06183] Avg episode reward: [(0, '4.229')] -[2023-02-22 21:21:02,565][28133] Updated weights for policy 0, policy_version 5048 (0.0026) -[2023-02-22 21:21:06,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6963.1, 300 sec: 8330.8). Total num frames: 20697088. Throughput: 0: 1734.8. Samples: 4170482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:06,033][06183] Avg episode reward: [(0, '4.381')] -[2023-02-22 21:21:08,658][28133] Updated weights for policy 0, policy_version 5058 (0.0029) -[2023-02-22 21:21:11,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 8247.5). Total num frames: 20729856. Throughput: 0: 1722.0. Samples: 4180558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:21:11,032][06183] Avg episode reward: [(0, '4.505')] -[2023-02-22 21:21:14,811][28133] Updated weights for policy 0, policy_version 5068 (0.0031) -[2023-02-22 21:21:16,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6826.7, 300 sec: 8178.1). Total num frames: 20762624. Throughput: 0: 1709.3. Samples: 4190460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:21:16,033][06183] Avg episode reward: [(0, '4.550')] -[2023-02-22 21:21:20,855][28133] Updated weights for policy 0, policy_version 5078 (0.0022) -[2023-02-22 21:21:21,028][06183] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 8136.4). Total num frames: 20799488. Throughput: 0: 1699.2. Samples: 4195386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:21:21,031][06183] Avg episode reward: [(0, '4.511')] -[2023-02-22 21:21:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 8067.0). Total num frames: 20832256. Throughput: 0: 1687.6. Samples: 4205256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:21:26,031][06183] Avg episode reward: [(0, '4.394')] -[2023-02-22 21:21:27,262][28133] Updated weights for policy 0, policy_version 5088 (0.0032) -[2023-02-22 21:21:31,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6758.4, 300 sec: 7983.7). Total num frames: 20860928. Throughput: 0: 1670.6. Samples: 4214796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:31,034][06183] Avg episode reward: [(0, '4.310')] -[2023-02-22 21:21:33,823][28133] Updated weights for policy 0, policy_version 5098 (0.0031) -[2023-02-22 21:21:36,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 7928.2). Total num frames: 20893696. Throughput: 0: 1663.6. Samples: 4219654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:36,036][06183] Avg episode reward: [(0, '4.436')] -[2023-02-22 21:21:40,292][28133] Updated weights for policy 0, policy_version 5108 (0.0031) -[2023-02-22 21:21:41,027][06183] Fps is (10 sec: 6554.0, 60 sec: 6621.9, 300 sec: 7872.6). Total num frames: 20926464. Throughput: 0: 1641.7. Samples: 4229052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:41,031][06183] Avg episode reward: [(0, '4.433')] -[2023-02-22 21:21:46,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 7817.1). Total num frames: 20959232. Throughput: 0: 1635.6. Samples: 4239106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:21:46,032][06183] Avg episode reward: [(0, '4.459')] -[2023-02-22 21:21:46,419][28133] Updated weights for policy 0, policy_version 5118 (0.0032) -[2023-02-22 21:21:51,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 7761.6). Total num frames: 20992000. Throughput: 0: 1634.8. Samples: 4244046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:21:51,032][06183] Avg episode reward: [(0, '4.445')] -[2023-02-22 21:21:51,068][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005125_20992000.pth... -[2023-02-22 21:21:51,586][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004692_19218432.pth -[2023-02-22 21:21:52,714][28133] Updated weights for policy 0, policy_version 5128 (0.0024) -[2023-02-22 21:21:56,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7706.0). Total num frames: 21024768. Throughput: 0: 1623.0. Samples: 4253594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:21:56,031][06183] Avg episode reward: [(0, '4.535')] -[2023-02-22 21:21:59,104][28133] Updated weights for policy 0, policy_version 5138 (0.0032) -[2023-02-22 21:22:01,028][06183] Fps is (10 sec: 6553.2, 60 sec: 6553.5, 300 sec: 7650.5). Total num frames: 21057536. Throughput: 0: 1618.4. Samples: 4263290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:22:01,033][06183] Avg episode reward: [(0, '4.429')] -[2023-02-22 21:22:05,349][28133] Updated weights for policy 0, policy_version 5148 (0.0024) -[2023-02-22 21:22:06,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7595.0). Total num frames: 21090304. Throughput: 0: 1615.4. Samples: 4268078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:22:06,033][06183] Avg episode reward: [(0, '4.221')] -[2023-02-22 21:22:11,028][06183] Fps is (10 sec: 6144.2, 60 sec: 6485.3, 300 sec: 7539.4). Total num frames: 21118976. Throughput: 0: 1617.0. Samples: 4278020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:11,033][06183] Avg episode reward: [(0, '4.387')] -[2023-02-22 21:22:11,692][28133] Updated weights for policy 0, policy_version 5158 (0.0032) -[2023-02-22 21:22:16,029][06183] Fps is (10 sec: 6143.1, 60 sec: 6485.2, 300 sec: 7483.8). Total num frames: 21151744. Throughput: 0: 1620.4. Samples: 4287714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:16,035][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 21:22:18,045][28133] Updated weights for policy 0, policy_version 5168 (0.0025) -[2023-02-22 21:22:21,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 7442.2). Total num frames: 21184512. Throughput: 0: 1614.8. Samples: 4292322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:22:21,031][06183] Avg episode reward: [(0, '4.442')] -[2023-02-22 21:22:24,610][28133] Updated weights for policy 0, policy_version 5178 (0.0031) -[2023-02-22 21:22:26,028][06183] Fps is (10 sec: 6554.2, 60 sec: 6417.0, 300 sec: 7400.6). Total num frames: 21217280. Throughput: 0: 1617.2. Samples: 4301828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:26,034][06183] Avg episode reward: [(0, '4.420')] -[2023-02-22 21:22:30,875][28133] Updated weights for policy 0, policy_version 5188 (0.0029) -[2023-02-22 21:22:31,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6485.4, 300 sec: 7345.0). Total num frames: 21250048. Throughput: 0: 1611.8. Samples: 4311636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:31,032][06183] Avg episode reward: [(0, '4.505')] -[2023-02-22 21:22:36,027][06183] Fps is (10 sec: 6554.1, 60 sec: 6485.3, 300 sec: 7303.4). Total num frames: 21282816. Throughput: 0: 1609.3. Samples: 4316466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:36,032][06183] Avg episode reward: [(0, '4.346')] -[2023-02-22 21:22:37,208][28133] Updated weights for policy 0, policy_version 5198 (0.0033) -[2023-02-22 21:22:41,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 7275.6). Total num frames: 21315584. Throughput: 0: 1613.4. Samples: 4326196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:22:41,032][06183] Avg episode reward: [(0, '4.473')] -[2023-02-22 21:22:43,806][28133] Updated weights for policy 0, policy_version 5208 (0.0026) -[2023-02-22 21:22:46,027][06183] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 7220.1). Total num frames: 21344256. Throughput: 0: 1603.4. Samples: 4335442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:22:46,032][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 21:22:50,506][28133] Updated weights for policy 0, policy_version 5218 (0.0031) -[2023-02-22 21:22:51,028][06183] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 7164.5). Total num frames: 21372928. Throughput: 0: 1594.9. Samples: 4339850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:22:51,036][06183] Avg episode reward: [(0, '4.366')] -[2023-02-22 21:22:56,028][06183] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 7122.9). Total num frames: 21405696. Throughput: 0: 1589.7. Samples: 4349558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:22:56,034][06183] Avg episode reward: [(0, '4.637')] -[2023-02-22 21:22:56,737][28133] Updated weights for policy 0, policy_version 5228 (0.0031) -[2023-02-22 21:23:01,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6348.8, 300 sec: 7095.1). Total num frames: 21438464. Throughput: 0: 1590.2. Samples: 4359272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 21:23:01,032][06183] Avg episode reward: [(0, '4.464')] -[2023-02-22 21:23:03,099][28133] Updated weights for policy 0, policy_version 5238 (0.0033) -[2023-02-22 21:23:06,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 7067.3). Total num frames: 21471232. Throughput: 0: 1593.7. Samples: 4364040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:23:06,033][06183] Avg episode reward: [(0, '4.428')] -[2023-02-22 21:23:09,674][28133] Updated weights for policy 0, policy_version 5248 (0.0027) -[2023-02-22 21:23:11,028][06183] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 7039.6). Total num frames: 21504000. Throughput: 0: 1591.0. Samples: 4373424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:23:11,032][06183] Avg episode reward: [(0, '4.402')] -[2023-02-22 21:23:16,027][06183] Fps is (10 sec: 6144.2, 60 sec: 6349.0, 300 sec: 6984.0). Total num frames: 21532672. Throughput: 0: 1581.4. Samples: 4382798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:23:16,032][06183] Avg episode reward: [(0, '4.480')] -[2023-02-22 21:23:16,226][28133] Updated weights for policy 0, policy_version 5258 (0.0030) -[2023-02-22 21:23:21,027][06183] Fps is (10 sec: 6554.1, 60 sec: 6417.1, 300 sec: 6970.1). Total num frames: 21569536. Throughput: 0: 1579.4. Samples: 4387540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:23:21,029][06183] Avg episode reward: [(0, '4.449')] -[2023-02-22 21:23:21,426][28133] Updated weights for policy 0, policy_version 5268 (0.0022) -[2023-02-22 21:23:23,992][28133] Updated weights for policy 0, policy_version 5278 (0.0009) -[2023-02-22 21:23:26,027][06183] Fps is (10 sec: 11469.2, 60 sec: 7168.1, 300 sec: 7095.1). Total num frames: 21647360. Throughput: 0: 1783.3. Samples: 4406444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:23:26,028][06183] Avg episode reward: [(0, '4.424')] -[2023-02-22 21:23:26,743][28133] Updated weights for policy 0, policy_version 5288 (0.0011) -[2023-02-22 21:23:29,748][28133] Updated weights for policy 0, policy_version 5298 (0.0012) -[2023-02-22 21:23:31,027][06183] Fps is (10 sec: 14745.6, 60 sec: 7782.4, 300 sec: 7192.3). Total num frames: 21716992. Throughput: 0: 2055.2. Samples: 4427924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:23:31,029][06183] Avg episode reward: [(0, '4.422')] -[2023-02-22 21:23:32,452][28133] Updated weights for policy 0, policy_version 5308 (0.0010) -[2023-02-22 21:23:35,569][28133] Updated weights for policy 0, policy_version 5318 (0.0010) -[2023-02-22 21:23:36,027][06183] Fps is (10 sec: 13926.2, 60 sec: 8396.8, 300 sec: 7289.5). Total num frames: 21786624. Throughput: 0: 2214.6. Samples: 4439506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:23:36,028][06183] Avg episode reward: [(0, '4.423')] -[2023-02-22 21:23:38,149][28133] Updated weights for policy 0, policy_version 5328 (0.0009) -[2023-02-22 21:23:41,027][06183] Fps is (10 sec: 14336.1, 60 sec: 9079.6, 300 sec: 7414.5). Total num frames: 21860352. Throughput: 0: 2475.4. Samples: 4460952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:23:41,029][06183] Avg episode reward: [(0, '4.260')] -[2023-02-22 21:23:41,185][28133] Updated weights for policy 0, policy_version 5338 (0.0013) -[2023-02-22 21:23:43,913][28133] Updated weights for policy 0, policy_version 5348 (0.0011) -[2023-02-22 21:23:46,027][06183] Fps is (10 sec: 14745.6, 60 sec: 9830.4, 300 sec: 7525.5). Total num frames: 21934080. Throughput: 0: 2736.5. Samples: 4482412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:23:46,029][06183] Avg episode reward: [(0, '4.581')] -[2023-02-22 21:23:46,814][28133] Updated weights for policy 0, policy_version 5358 (0.0010) -[2023-02-22 21:23:50,187][28133] Updated weights for policy 0, policy_version 5368 (0.0015) -[2023-02-22 21:23:51,027][06183] Fps is (10 sec: 13516.7, 60 sec: 10376.6, 300 sec: 7595.0). Total num frames: 21995520. Throughput: 0: 2841.5. Samples: 4491904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:23:51,030][06183] Avg episode reward: [(0, '4.389')] -[2023-02-22 21:23:51,043][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005370_21995520.pth... -[2023-02-22 21:23:51,238][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004925_20172800.pth -[2023-02-22 21:23:53,184][28133] Updated weights for policy 0, policy_version 5378 (0.0011) -[2023-02-22 21:23:56,027][06183] Fps is (10 sec: 13106.7, 60 sec: 10990.9, 300 sec: 7706.0). Total num frames: 22065152. Throughput: 0: 3067.8. Samples: 4511474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:23:56,032][06183] Avg episode reward: [(0, '4.311')] -[2023-02-22 21:23:56,185][28133] Updated weights for policy 0, policy_version 5388 (0.0010) -[2023-02-22 21:23:59,273][28133] Updated weights for policy 0, policy_version 5398 (0.0010) -[2023-02-22 21:24:01,027][06183] Fps is (10 sec: 13516.7, 60 sec: 11537.2, 300 sec: 7789.3). Total num frames: 22130688. Throughput: 0: 3305.8. Samples: 4531560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:24:01,030][06183] Avg episode reward: [(0, '4.646')] -[2023-02-22 21:24:02,505][28133] Updated weights for policy 0, policy_version 5408 (0.0012) -[2023-02-22 21:24:05,819][28133] Updated weights for policy 0, policy_version 5418 (0.0012) -[2023-02-22 21:24:06,027][06183] Fps is (10 sec: 12698.0, 60 sec: 12015.0, 300 sec: 7872.7). Total num frames: 22192128. Throughput: 0: 3404.7. Samples: 4540750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:24:06,030][06183] Avg episode reward: [(0, '4.714')] -[2023-02-22 21:24:09,436][28133] Updated weights for policy 0, policy_version 5428 (0.0017) -[2023-02-22 21:24:11,027][06183] Fps is (10 sec: 11878.5, 60 sec: 12424.7, 300 sec: 7942.3). Total num frames: 22249472. Throughput: 0: 3380.5. Samples: 4558568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:24:11,030][06183] Avg episode reward: [(0, '4.325')] -[2023-02-22 21:24:13,023][28133] Updated weights for policy 0, policy_version 5438 (0.0012) -[2023-02-22 21:24:16,027][06183] Fps is (10 sec: 11468.8, 60 sec: 12902.5, 300 sec: 7997.6). Total num frames: 22306816. Throughput: 0: 3282.3. Samples: 4575626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:24:16,030][06183] Avg episode reward: [(0, '4.307')] -[2023-02-22 21:24:16,667][28133] Updated weights for policy 0, policy_version 5448 (0.0013) -[2023-02-22 21:24:20,285][28133] Updated weights for policy 0, policy_version 5458 (0.0011) -[2023-02-22 21:24:21,027][06183] Fps is (10 sec: 11468.8, 60 sec: 13243.7, 300 sec: 8067.0). Total num frames: 22364160. Throughput: 0: 3214.9. Samples: 4584178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:24:21,031][06183] Avg episode reward: [(0, '4.326')] -[2023-02-22 21:24:24,021][28133] Updated weights for policy 0, policy_version 5468 (0.0012) -[2023-02-22 21:24:26,027][06183] Fps is (10 sec: 11059.2, 60 sec: 12834.1, 300 sec: 8122.6). Total num frames: 22417408. Throughput: 0: 3101.4. Samples: 4600514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:24:26,029][06183] Avg episode reward: [(0, '4.518')] -[2023-02-22 21:24:27,927][28133] Updated weights for policy 0, policy_version 5478 (0.0014) -[2023-02-22 21:24:31,027][06183] Fps is (10 sec: 10649.5, 60 sec: 12561.1, 300 sec: 8178.1). Total num frames: 22470656. Throughput: 0: 2974.6. Samples: 4616268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:24:31,030][06183] Avg episode reward: [(0, '4.309')] -[2023-02-22 21:24:31,787][28133] Updated weights for policy 0, policy_version 5488 (0.0011) -[2023-02-22 21:24:35,643][28133] Updated weights for policy 0, policy_version 5498 (0.0012) -[2023-02-22 21:24:36,028][06183] Fps is (10 sec: 10238.8, 60 sec: 12219.5, 300 sec: 8205.9). Total num frames: 22519808. Throughput: 0: 2939.4. Samples: 4624182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:24:36,032][06183] Avg episode reward: [(0, '4.430')] -[2023-02-22 21:24:39,549][28133] Updated weights for policy 0, policy_version 5508 (0.0017) -[2023-02-22 21:24:41,027][06183] Fps is (10 sec: 10240.0, 60 sec: 11878.4, 300 sec: 8261.4). Total num frames: 22573056. Throughput: 0: 2855.1. Samples: 4639952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:24:41,030][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 21:24:43,474][28133] Updated weights for policy 0, policy_version 5518 (0.0013) -[2023-02-22 21:24:46,027][06183] Fps is (10 sec: 10650.7, 60 sec: 11537.0, 300 sec: 8317.0). Total num frames: 22626304. Throughput: 0: 2751.2. Samples: 4655362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:24:46,030][06183] Avg episode reward: [(0, '4.288')] -[2023-02-22 21:24:47,541][28133] Updated weights for policy 0, policy_version 5528 (0.0014) -[2023-02-22 21:24:51,029][06183] Fps is (10 sec: 10238.4, 60 sec: 11332.0, 300 sec: 8358.6). Total num frames: 22675456. Throughput: 0: 2714.4. Samples: 4662902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:24:51,037][06183] Avg episode reward: [(0, '4.583')] -[2023-02-22 21:24:51,700][28133] Updated weights for policy 0, policy_version 5538 (0.0016) -[2023-02-22 21:24:55,812][28133] Updated weights for policy 0, policy_version 5548 (0.0014) -[2023-02-22 21:24:56,027][06183] Fps is (10 sec: 9830.5, 60 sec: 10991.0, 300 sec: 8400.3). Total num frames: 22724608. Throughput: 0: 2649.2. Samples: 4677784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:24:56,031][06183] Avg episode reward: [(0, '4.321')] -[2023-02-22 21:25:00,000][28133] Updated weights for policy 0, policy_version 5558 (0.0017) -[2023-02-22 21:25:01,027][06183] Fps is (10 sec: 9831.9, 60 sec: 10717.9, 300 sec: 8455.8). Total num frames: 22773760. Throughput: 0: 2598.7. Samples: 4692568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:25:01,030][06183] Avg episode reward: [(0, '4.301')] -[2023-02-22 21:25:04,246][28133] Updated weights for policy 0, policy_version 5568 (0.0012) -[2023-02-22 21:25:06,027][06183] Fps is (10 sec: 9830.2, 60 sec: 10513.0, 300 sec: 8497.5). Total num frames: 22822912. Throughput: 0: 2570.6. Samples: 4699854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:06,033][06183] Avg episode reward: [(0, '4.363')] -[2023-02-22 21:25:08,501][28133] Updated weights for policy 0, policy_version 5578 (0.0015) -[2023-02-22 21:25:11,027][06183] Fps is (10 sec: 9420.9, 60 sec: 10308.3, 300 sec: 8525.2). Total num frames: 22867968. Throughput: 0: 2527.1. Samples: 4714232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:11,031][06183] Avg episode reward: [(0, '4.302')] -[2023-02-22 21:25:12,770][28133] Updated weights for policy 0, policy_version 5588 (0.0017) -[2023-02-22 21:25:16,027][06183] Fps is (10 sec: 9421.0, 60 sec: 10171.7, 300 sec: 8580.8). Total num frames: 22917120. Throughput: 0: 2495.1. Samples: 4728548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:25:16,031][06183] Avg episode reward: [(0, '4.218')] -[2023-02-22 21:25:17,089][28133] Updated weights for policy 0, policy_version 5598 (0.0018) -[2023-02-22 21:25:21,027][06183] Fps is (10 sec: 9830.4, 60 sec: 10035.2, 300 sec: 8622.4). Total num frames: 22966272. Throughput: 0: 2476.8. Samples: 4735634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:21,031][06183] Avg episode reward: [(0, '4.302')] -[2023-02-22 21:25:21,408][28133] Updated weights for policy 0, policy_version 5608 (0.0014) -[2023-02-22 21:25:25,724][28133] Updated weights for policy 0, policy_version 5618 (0.0016) -[2023-02-22 21:25:26,028][06183] Fps is (10 sec: 9420.4, 60 sec: 9898.6, 300 sec: 8664.1). Total num frames: 23011328. Throughput: 0: 2444.3. Samples: 4749946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:26,031][06183] Avg episode reward: [(0, '4.318')] -[2023-02-22 21:25:30,108][28133] Updated weights for policy 0, policy_version 5628 (0.0013) -[2023-02-22 21:25:31,027][06183] Fps is (10 sec: 9420.6, 60 sec: 9830.4, 300 sec: 8705.7). Total num frames: 23060480. Throughput: 0: 2414.3. Samples: 4764006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:25:31,031][06183] Avg episode reward: [(0, '4.372')] -[2023-02-22 21:25:34,479][28133] Updated weights for policy 0, policy_version 5638 (0.0016) -[2023-02-22 21:25:36,027][06183] Fps is (10 sec: 9421.2, 60 sec: 9762.3, 300 sec: 8733.5). Total num frames: 23105536. Throughput: 0: 2402.6. Samples: 4771016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:25:36,030][06183] Avg episode reward: [(0, '4.485')] -[2023-02-22 21:25:38,927][28133] Updated weights for policy 0, policy_version 5648 (0.0018) -[2023-02-22 21:25:41,027][06183] Fps is (10 sec: 9011.4, 60 sec: 9625.6, 300 sec: 8775.2). Total num frames: 23150592. Throughput: 0: 2379.7. Samples: 4784872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:41,031][06183] Avg episode reward: [(0, '4.451')] -[2023-02-22 21:25:43,376][28133] Updated weights for policy 0, policy_version 5658 (0.0019) -[2023-02-22 21:25:46,027][06183] Fps is (10 sec: 9420.7, 60 sec: 9557.3, 300 sec: 8830.7). Total num frames: 23199744. Throughput: 0: 2358.6. Samples: 4798706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:25:46,030][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 21:25:47,824][28133] Updated weights for policy 0, policy_version 5668 (0.0016) -[2023-02-22 21:25:51,027][06183] Fps is (10 sec: 9420.7, 60 sec: 9489.3, 300 sec: 8858.5). Total num frames: 23244800. Throughput: 0: 2346.9. Samples: 4805464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:25:51,030][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 21:25:51,056][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005675_23244800.pth... -[2023-02-22 21:25:51,457][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005125_20992000.pth -[2023-02-22 21:25:52,279][28133] Updated weights for policy 0, policy_version 5678 (0.0017) -[2023-02-22 21:25:56,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 8900.1). Total num frames: 23289856. Throughput: 0: 2329.6. Samples: 4819064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:25:56,031][06183] Avg episode reward: [(0, '4.652')] -[2023-02-22 21:25:56,846][28133] Updated weights for policy 0, policy_version 5688 (0.0013) -[2023-02-22 21:26:01,027][06183] Fps is (10 sec: 9011.3, 60 sec: 9352.5, 300 sec: 8941.8). Total num frames: 23334912. Throughput: 0: 2315.0. Samples: 4832724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:26:01,031][06183] Avg episode reward: [(0, '4.490')] -[2023-02-22 21:26:01,316][28133] Updated weights for policy 0, policy_version 5698 (0.0019) -[2023-02-22 21:26:05,761][28133] Updated weights for policy 0, policy_version 5708 (0.0021) -[2023-02-22 21:26:06,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 8983.4). Total num frames: 23379968. Throughput: 0: 2309.2. Samples: 4839550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:26:06,030][06183] Avg episode reward: [(0, '4.509')] -[2023-02-22 21:26:10,342][28133] Updated weights for policy 0, policy_version 5718 (0.0014) -[2023-02-22 21:26:11,027][06183] Fps is (10 sec: 9011.0, 60 sec: 9284.2, 300 sec: 9025.1). Total num frames: 23425024. Throughput: 0: 2292.7. Samples: 4853116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:11,031][06183] Avg episode reward: [(0, '4.624')] -[2023-02-22 21:26:14,907][28133] Updated weights for policy 0, policy_version 5728 (0.0014) -[2023-02-22 21:26:16,027][06183] Fps is (10 sec: 9011.0, 60 sec: 9216.0, 300 sec: 9052.9). Total num frames: 23470080. Throughput: 0: 2276.6. Samples: 4866452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:16,031][06183] Avg episode reward: [(0, '4.574')] -[2023-02-22 21:26:19,574][28133] Updated weights for policy 0, policy_version 5738 (0.0015) -[2023-02-22 21:26:21,036][06183] Fps is (10 sec: 9003.1, 60 sec: 9146.3, 300 sec: 9094.2). Total num frames: 23515136. Throughput: 0: 2267.6. Samples: 4873078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:26:21,044][06183] Avg episode reward: [(0, '4.386')] -[2023-02-22 21:26:24,270][28133] Updated weights for policy 0, policy_version 5748 (0.0018) -[2023-02-22 21:26:26,027][06183] Fps is (10 sec: 8601.8, 60 sec: 9079.5, 300 sec: 9136.2). Total num frames: 23556096. Throughput: 0: 2251.6. Samples: 4886192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:26:26,030][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 21:26:28,991][28133] Updated weights for policy 0, policy_version 5758 (0.0014) -[2023-02-22 21:26:31,027][06183] Fps is (10 sec: 8609.3, 60 sec: 9011.2, 300 sec: 9177.8). Total num frames: 23601152. Throughput: 0: 2233.9. Samples: 4899230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:31,031][06183] Avg episode reward: [(0, '4.435')] -[2023-02-22 21:26:33,766][28133] Updated weights for policy 0, policy_version 5768 (0.0018) -[2023-02-22 21:26:36,027][06183] Fps is (10 sec: 8601.5, 60 sec: 8942.9, 300 sec: 9205.6). Total num frames: 23642112. Throughput: 0: 2223.2. Samples: 4905508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:36,031][06183] Avg episode reward: [(0, '4.340')] -[2023-02-22 21:26:38,555][28133] Updated weights for policy 0, policy_version 5778 (0.0017) -[2023-02-22 21:26:41,027][06183] Fps is (10 sec: 8601.8, 60 sec: 8942.9, 300 sec: 9247.2). Total num frames: 23687168. Throughput: 0: 2206.1. Samples: 4918338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:41,031][06183] Avg episode reward: [(0, '4.326')] -[2023-02-22 21:26:43,439][28133] Updated weights for policy 0, policy_version 5788 (0.0018) -[2023-02-22 21:26:46,027][06183] Fps is (10 sec: 8601.7, 60 sec: 8806.4, 300 sec: 9275.0). Total num frames: 23728128. Throughput: 0: 2182.0. Samples: 4930912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:26:46,031][06183] Avg episode reward: [(0, '4.177')] -[2023-02-22 21:26:48,299][28133] Updated weights for policy 0, policy_version 5798 (0.0020) -[2023-02-22 21:26:51,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8738.1, 300 sec: 9302.8). Total num frames: 23769088. Throughput: 0: 2168.9. Samples: 4937150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:26:51,030][06183] Avg episode reward: [(0, '4.458')] -[2023-02-22 21:26:53,323][28133] Updated weights for policy 0, policy_version 5808 (0.0023) -[2023-02-22 21:26:56,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8669.8, 300 sec: 9330.6). Total num frames: 23810048. Throughput: 0: 2144.2. Samples: 4949606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:26:56,032][06183] Avg episode reward: [(0, '4.319')] -[2023-02-22 21:26:58,347][28133] Updated weights for policy 0, policy_version 5818 (0.0020) -[2023-02-22 21:27:01,035][06183] Fps is (10 sec: 8185.7, 60 sec: 8600.5, 300 sec: 9358.1). Total num frames: 23851008. Throughput: 0: 2117.9. Samples: 4961772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:27:01,041][06183] Avg episode reward: [(0, '4.567')] -[2023-02-22 21:27:03,487][28133] Updated weights for policy 0, policy_version 5828 (0.0027) -[2023-02-22 21:27:06,027][06183] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 9400.0). Total num frames: 23891968. Throughput: 0: 2107.9. Samples: 4967916. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:27:06,032][06183] Avg episode reward: [(0, '4.384')] -[2023-02-22 21:27:08,493][28133] Updated weights for policy 0, policy_version 5838 (0.0015) -[2023-02-22 21:27:11,027][06183] Fps is (10 sec: 7788.4, 60 sec: 8396.8, 300 sec: 9413.9). Total num frames: 23928832. Throughput: 0: 2085.6. Samples: 4980046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:27:11,031][06183] Avg episode reward: [(0, '4.616')] -[2023-02-22 21:27:13,626][28133] Updated weights for policy 0, policy_version 5848 (0.0023) -[2023-02-22 21:27:16,028][06183] Fps is (10 sec: 7782.2, 60 sec: 8328.5, 300 sec: 9441.6). Total num frames: 23969792. Throughput: 0: 2060.7. Samples: 4991962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:27:16,032][06183] Avg episode reward: [(0, '4.399')] -[2023-02-22 21:27:18,881][28133] Updated weights for policy 0, policy_version 5858 (0.0026) -[2023-02-22 21:27:21,027][06183] Fps is (10 sec: 8192.1, 60 sec: 8261.5, 300 sec: 9469.4). Total num frames: 24010752. Throughput: 0: 2051.4. Samples: 4997822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:27:21,031][06183] Avg episode reward: [(0, '4.482')] -[2023-02-22 21:27:24,268][28133] Updated weights for policy 0, policy_version 5868 (0.0027) -[2023-02-22 21:27:26,028][06183] Fps is (10 sec: 7782.3, 60 sec: 8191.9, 300 sec: 9483.3). Total num frames: 24047616. Throughput: 0: 2019.1. Samples: 5009198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:27:26,033][06183] Avg episode reward: [(0, '4.532')] -[2023-02-22 21:27:29,538][28133] Updated weights for policy 0, policy_version 5878 (0.0016) -[2023-02-22 21:27:31,027][06183] Fps is (10 sec: 7372.7, 60 sec: 8055.5, 300 sec: 9497.2). Total num frames: 24084480. Throughput: 0: 1998.9. Samples: 5020864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:27:31,032][06183] Avg episode reward: [(0, '4.247')] -[2023-02-22 21:27:34,769][28133] Updated weights for policy 0, policy_version 5888 (0.0027) -[2023-02-22 21:27:36,027][06183] Fps is (10 sec: 7782.7, 60 sec: 8055.5, 300 sec: 9524.9). Total num frames: 24125440. Throughput: 0: 1988.8. Samples: 5026646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:27:36,031][06183] Avg episode reward: [(0, '4.356')] -[2023-02-22 21:27:40,118][28133] Updated weights for policy 0, policy_version 5898 (0.0020) -[2023-02-22 21:27:41,027][06183] Fps is (10 sec: 7782.5, 60 sec: 7918.9, 300 sec: 9552.7). Total num frames: 24162304. Throughput: 0: 1970.6. Samples: 5038282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:27:41,031][06183] Avg episode reward: [(0, '4.250')] -[2023-02-22 21:27:45,495][28133] Updated weights for policy 0, policy_version 5908 (0.0022) -[2023-02-22 21:27:46,027][06183] Fps is (10 sec: 7782.4, 60 sec: 7918.9, 300 sec: 9594.4). Total num frames: 24203264. Throughput: 0: 1956.2. Samples: 5049786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:27:46,031][06183] Avg episode reward: [(0, '4.318')] -[2023-02-22 21:27:50,861][28133] Updated weights for policy 0, policy_version 5918 (0.0018) -[2023-02-22 21:27:51,027][06183] Fps is (10 sec: 7782.2, 60 sec: 7850.7, 300 sec: 9608.2). Total num frames: 24240128. Throughput: 0: 1945.6. Samples: 5055466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:27:51,033][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 21:27:51,067][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005918_24240128.pth... -[2023-02-22 21:27:51,492][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005370_21995520.pth -[2023-02-22 21:27:56,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7782.4, 300 sec: 9622.1). Total num frames: 24276992. Throughput: 0: 1923.4. Samples: 5066600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) -[2023-02-22 21:27:56,031][06183] Avg episode reward: [(0, '4.553')] -[2023-02-22 21:27:56,373][28133] Updated weights for policy 0, policy_version 5928 (0.0023) -[2023-02-22 21:28:01,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7715.1, 300 sec: 9636.0). Total num frames: 24313856. Throughput: 0: 1909.7. Samples: 5077896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:28:01,030][06183] Avg episode reward: [(0, '4.271')] -[2023-02-22 21:28:01,891][28133] Updated weights for policy 0, policy_version 5938 (0.0022) -[2023-02-22 21:28:06,028][06183] Fps is (10 sec: 7372.5, 60 sec: 7645.8, 300 sec: 9649.9). Total num frames: 24350720. Throughput: 0: 1902.7. Samples: 5083446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:28:06,033][06183] Avg episode reward: [(0, '4.322')] -[2023-02-22 21:28:07,447][28133] Updated weights for policy 0, policy_version 5948 (0.0021) -[2023-02-22 21:28:11,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7645.9, 300 sec: 9677.7). Total num frames: 24387584. Throughput: 0: 1896.3. Samples: 5094530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:28:11,031][06183] Avg episode reward: [(0, '4.244')] -[2023-02-22 21:28:12,906][28133] Updated weights for policy 0, policy_version 5958 (0.0021) -[2023-02-22 21:28:16,027][06183] Fps is (10 sec: 7373.1, 60 sec: 7577.6, 300 sec: 9677.7). Total num frames: 24424448. Throughput: 0: 1882.4. Samples: 5105574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:28:16,033][06183] Avg episode reward: [(0, '4.166')] -[2023-02-22 21:28:18,477][28133] Updated weights for policy 0, policy_version 5968 (0.0020) -[2023-02-22 21:28:21,028][06183] Fps is (10 sec: 7372.7, 60 sec: 7509.3, 300 sec: 9538.8). Total num frames: 24461312. Throughput: 0: 1875.5. Samples: 5111046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:28:21,032][06183] Avg episode reward: [(0, '4.432')] -[2023-02-22 21:28:24,100][28133] Updated weights for policy 0, policy_version 5978 (0.0020) -[2023-02-22 21:28:26,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7509.3, 300 sec: 9427.7). Total num frames: 24498176. Throughput: 0: 1857.8. Samples: 5121884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:28:26,031][06183] Avg episode reward: [(0, '4.539')] -[2023-02-22 21:28:29,759][28133] Updated weights for policy 0, policy_version 5988 (0.0021) -[2023-02-22 21:28:31,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7509.2, 300 sec: 9316.6). Total num frames: 24535040. Throughput: 0: 1844.6. Samples: 5132794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:28:31,032][06183] Avg episode reward: [(0, '4.495')] -[2023-02-22 21:28:35,581][28133] Updated weights for policy 0, policy_version 5998 (0.0020) -[2023-02-22 21:28:36,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7372.8, 300 sec: 9177.8). Total num frames: 24567808. Throughput: 0: 1836.4. Samples: 5138102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:28:36,032][06183] Avg episode reward: [(0, '4.380')] -[2023-02-22 21:28:41,027][06183] Fps is (10 sec: 6963.6, 60 sec: 7372.8, 300 sec: 9052.8). Total num frames: 24604672. Throughput: 0: 1826.6. Samples: 5148798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:28:41,032][06183] Avg episode reward: [(0, '4.530')] -[2023-02-22 21:28:41,370][28133] Updated weights for policy 0, policy_version 6008 (0.0028) -[2023-02-22 21:28:46,028][06183] Fps is (10 sec: 7372.5, 60 sec: 7304.5, 300 sec: 8969.5). Total num frames: 24641536. Throughput: 0: 1813.3. Samples: 5159496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:28:46,034][06183] Avg episode reward: [(0, '4.256')] -[2023-02-22 21:28:47,011][28133] Updated weights for policy 0, policy_version 6018 (0.0021) -[2023-02-22 21:28:51,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7236.3, 300 sec: 8844.6). Total num frames: 24674304. Throughput: 0: 1804.5. Samples: 5164650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:28:51,033][06183] Avg episode reward: [(0, '4.330')] -[2023-02-22 21:28:52,947][28133] Updated weights for policy 0, policy_version 6028 (0.0027) -[2023-02-22 21:28:56,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7236.3, 300 sec: 8747.4). Total num frames: 24711168. Throughput: 0: 1789.8. Samples: 5175070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:28:56,031][06183] Avg episode reward: [(0, '4.290')] -[2023-02-22 21:28:58,828][28133] Updated weights for policy 0, policy_version 6038 (0.0026) -[2023-02-22 21:29:01,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 8650.2). Total num frames: 24743936. Throughput: 0: 1777.6. Samples: 5185568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:01,031][06183] Avg episode reward: [(0, '4.453')] -[2023-02-22 21:29:04,817][28133] Updated weights for policy 0, policy_version 6048 (0.0021) -[2023-02-22 21:29:06,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7168.1, 300 sec: 8580.8). Total num frames: 24780800. Throughput: 0: 1766.5. Samples: 5190540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:29:06,032][06183] Avg episode reward: [(0, '4.358')] -[2023-02-22 21:29:10,819][28133] Updated weights for policy 0, policy_version 6058 (0.0023) -[2023-02-22 21:29:11,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7099.7, 300 sec: 8497.5). Total num frames: 24813568. Throughput: 0: 1757.0. Samples: 5200946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:11,032][06183] Avg episode reward: [(0, '4.193')] -[2023-02-22 21:29:16,027][06183] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 8414.1). Total num frames: 24846336. Throughput: 0: 1745.4. Samples: 5211338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:16,031][06183] Avg episode reward: [(0, '4.617')] -[2023-02-22 21:29:16,802][28133] Updated weights for policy 0, policy_version 6068 (0.0021) -[2023-02-22 21:29:21,028][06183] Fps is (10 sec: 6962.8, 60 sec: 7031.4, 300 sec: 8358.6). Total num frames: 24883200. Throughput: 0: 1741.2. Samples: 5216456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:21,033][06183] Avg episode reward: [(0, '4.593')] -[2023-02-22 21:29:22,815][28133] Updated weights for policy 0, policy_version 6078 (0.0026) -[2023-02-22 21:29:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 8289.2). Total num frames: 24915968. Throughput: 0: 1728.5. Samples: 5226582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:29:26,033][06183] Avg episode reward: [(0, '4.717')] -[2023-02-22 21:29:28,811][28133] Updated weights for policy 0, policy_version 6088 (0.0031) -[2023-02-22 21:29:31,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6895.0, 300 sec: 8233.7). Total num frames: 24948736. Throughput: 0: 1715.2. Samples: 5236678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:31,036][06183] Avg episode reward: [(0, '4.423')] -[2023-02-22 21:29:34,933][28133] Updated weights for policy 0, policy_version 6098 (0.0030) -[2023-02-22 21:29:36,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 8164.2). Total num frames: 24981504. Throughput: 0: 1711.1. Samples: 5241648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:36,034][06183] Avg episode reward: [(0, '4.611')] -[2023-02-22 21:29:40,993][28133] Updated weights for policy 0, policy_version 6108 (0.0031) -[2023-02-22 21:29:41,027][06183] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 8108.7). Total num frames: 25018368. Throughput: 0: 1705.0. Samples: 5251796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:29:41,031][06183] Avg episode reward: [(0, '4.312')] -[2023-02-22 21:29:46,027][06183] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 8053.2). Total num frames: 25051136. Throughput: 0: 1698.0. Samples: 5261978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:29:46,031][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 21:29:47,093][28133] Updated weights for policy 0, policy_version 6118 (0.0025) -[2023-02-22 21:29:51,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 7997.6). Total num frames: 25083904. Throughput: 0: 1697.2. Samples: 5266914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:29:51,035][06183] Avg episode reward: [(0, '4.375')] -[2023-02-22 21:29:51,071][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006124_25083904.pth... -[2023-02-22 21:29:51,573][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005675_23244800.pth -[2023-02-22 21:29:53,407][28133] Updated weights for policy 0, policy_version 6128 (0.0031) -[2023-02-22 21:29:56,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6758.4, 300 sec: 7942.1). Total num frames: 25116672. Throughput: 0: 1679.0. Samples: 5276500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:29:56,033][06183] Avg episode reward: [(0, '4.491')] -[2023-02-22 21:29:59,577][28133] Updated weights for policy 0, policy_version 6138 (0.0038) -[2023-02-22 21:30:01,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6758.3, 300 sec: 7886.5). Total num frames: 25149440. Throughput: 0: 1665.0. Samples: 5286266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:30:01,033][06183] Avg episode reward: [(0, '4.519')] -[2023-02-22 21:30:05,953][28133] Updated weights for policy 0, policy_version 6148 (0.0035) -[2023-02-22 21:30:06,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6690.1, 300 sec: 7844.9). Total num frames: 25182208. Throughput: 0: 1655.5. Samples: 5290952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:30:06,031][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 21:30:11,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6690.1, 300 sec: 7789.3). Total num frames: 25214976. Throughput: 0: 1654.9. Samples: 5301052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:30:11,032][06183] Avg episode reward: [(0, '4.243')] -[2023-02-22 21:30:12,105][28133] Updated weights for policy 0, policy_version 6158 (0.0025) -[2023-02-22 21:30:16,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 7733.8). Total num frames: 25247744. Throughput: 0: 1648.9. Samples: 5310880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:30:16,031][06183] Avg episode reward: [(0, '4.559')] -[2023-02-22 21:30:18,581][28133] Updated weights for policy 0, policy_version 6168 (0.0030) -[2023-02-22 21:30:21,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6553.7, 300 sec: 7678.3). Total num frames: 25276416. Throughput: 0: 1642.0. Samples: 5315536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:30:21,031][06183] Avg episode reward: [(0, '4.453')] -[2023-02-22 21:30:24,989][28133] Updated weights for policy 0, policy_version 6178 (0.0027) -[2023-02-22 21:30:26,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 7622.7). Total num frames: 25309184. Throughput: 0: 1628.6. Samples: 5325082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:30:26,032][06183] Avg episode reward: [(0, '4.645')] -[2023-02-22 21:30:31,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7581.1). Total num frames: 25341952. Throughput: 0: 1623.0. Samples: 5335012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:30:31,033][06183] Avg episode reward: [(0, '4.355')] -[2023-02-22 21:30:31,196][28133] Updated weights for policy 0, policy_version 6188 (0.0028) -[2023-02-22 21:30:36,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 7539.4). Total num frames: 25374720. Throughput: 0: 1620.3. Samples: 5339830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:30:36,033][06183] Avg episode reward: [(0, '4.415')] -[2023-02-22 21:30:37,593][28133] Updated weights for policy 0, policy_version 6198 (0.0036) -[2023-02-22 21:30:41,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 7483.9). Total num frames: 25407488. Throughput: 0: 1618.6. Samples: 5349336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:30:41,032][06183] Avg episode reward: [(0, '4.415')] -[2023-02-22 21:30:44,095][28133] Updated weights for policy 0, policy_version 6208 (0.0029) -[2023-02-22 21:30:46,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 7442.2). Total num frames: 25440256. Throughput: 0: 1619.2. Samples: 5359128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:30:46,032][06183] Avg episode reward: [(0, '4.278')] -[2023-02-22 21:30:50,349][28133] Updated weights for policy 0, policy_version 6218 (0.0038) -[2023-02-22 21:30:51,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6417.0, 300 sec: 7386.7). Total num frames: 25468928. Throughput: 0: 1620.9. Samples: 5363892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:30:51,033][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:30:56,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6417.1, 300 sec: 7345.0). Total num frames: 25501696. Throughput: 0: 1612.4. Samples: 5373612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:30:56,032][06183] Avg episode reward: [(0, '4.504')] -[2023-02-22 21:30:56,720][28133] Updated weights for policy 0, policy_version 6228 (0.0024) -[2023-02-22 21:31:01,028][06183] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 7303.4). Total num frames: 25534464. Throughput: 0: 1607.9. Samples: 5383234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:01,033][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 21:31:03,281][28133] Updated weights for policy 0, policy_version 6238 (0.0037) -[2023-02-22 21:31:06,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 7261.7). Total num frames: 25567232. Throughput: 0: 1607.8. Samples: 5387886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:31:06,033][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 21:31:09,640][28133] Updated weights for policy 0, policy_version 6248 (0.0026) -[2023-02-22 21:31:11,028][06183] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 7220.1). Total num frames: 25600000. Throughput: 0: 1610.4. Samples: 5397552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:31:11,034][06183] Avg episode reward: [(0, '4.314')] -[2023-02-22 21:31:16,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6348.8, 300 sec: 7164.7). Total num frames: 25628672. Throughput: 0: 1599.5. Samples: 5406988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:16,034][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 21:31:16,072][28133] Updated weights for policy 0, policy_version 6258 (0.0028) -[2023-02-22 21:31:21,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6417.1, 300 sec: 7136.8). Total num frames: 25661440. Throughput: 0: 1594.0. Samples: 5411558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:21,032][06183] Avg episode reward: [(0, '4.598')] -[2023-02-22 21:31:22,500][28133] Updated weights for policy 0, policy_version 6268 (0.0029) -[2023-02-22 21:31:26,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 7095.1). Total num frames: 25694208. Throughput: 0: 1592.4. Samples: 5420994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:31:26,032][06183] Avg episode reward: [(0, '4.485')] -[2023-02-22 21:31:29,185][28133] Updated weights for policy 0, policy_version 6278 (0.0031) -[2023-02-22 21:31:31,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 7053.4). Total num frames: 25722880. Throughput: 0: 1584.5. Samples: 5430432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:31,031][06183] Avg episode reward: [(0, '4.572')] -[2023-02-22 21:31:35,691][28133] Updated weights for policy 0, policy_version 6288 (0.0029) -[2023-02-22 21:31:36,027][06183] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 7011.8). Total num frames: 25755648. Throughput: 0: 1585.6. Samples: 5435242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:36,032][06183] Avg episode reward: [(0, '4.559')] -[2023-02-22 21:31:40,020][28133] Updated weights for policy 0, policy_version 6298 (0.0016) -[2023-02-22 21:31:41,027][06183] Fps is (10 sec: 9011.5, 60 sec: 6758.4, 300 sec: 7067.3). Total num frames: 25812992. Throughput: 0: 1640.2. Samples: 5447422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:41,029][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:31:42,547][28133] Updated weights for policy 0, policy_version 6308 (0.0011) -[2023-02-22 21:31:45,071][28133] Updated weights for policy 0, policy_version 6318 (0.0010) -[2023-02-22 21:31:46,027][06183] Fps is (10 sec: 13517.3, 60 sec: 7509.4, 300 sec: 7192.3). Total num frames: 25890816. Throughput: 0: 1967.2. Samples: 5471756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:31:46,029][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 21:31:47,603][28133] Updated weights for policy 0, policy_version 6328 (0.0009) -[2023-02-22 21:31:50,132][28133] Updated weights for policy 0, policy_version 6338 (0.0008) -[2023-02-22 21:31:51,027][06183] Fps is (10 sec: 15974.2, 60 sec: 8396.9, 300 sec: 7331.2). Total num frames: 25972736. Throughput: 0: 2134.3. Samples: 5483928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:31:51,030][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 21:31:51,045][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006341_25972736.pth... -[2023-02-22 21:31:51,215][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000005918_24240128.pth -[2023-02-22 21:31:52,677][28133] Updated weights for policy 0, policy_version 6348 (0.0009) -[2023-02-22 21:31:55,657][28133] Updated weights for policy 0, policy_version 6358 (0.0011) -[2023-02-22 21:31:56,027][06183] Fps is (10 sec: 15564.6, 60 sec: 9079.5, 300 sec: 7442.4). Total num frames: 26046464. Throughput: 0: 2432.9. Samples: 5507030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:31:56,029][06183] Avg episode reward: [(0, '4.466')] -[2023-02-22 21:31:58,649][28133] Updated weights for policy 0, policy_version 6368 (0.0012) -[2023-02-22 21:32:01,027][06183] Fps is (10 sec: 14336.2, 60 sec: 9694.0, 300 sec: 7539.4). Total num frames: 26116096. Throughput: 0: 2685.5. Samples: 5527836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:32:01,028][06183] Avg episode reward: [(0, '4.548')] -[2023-02-22 21:32:01,397][28133] Updated weights for policy 0, policy_version 6378 (0.0009) -[2023-02-22 21:32:04,384][28133] Updated weights for policy 0, policy_version 6388 (0.0010) -[2023-02-22 21:32:06,027][06183] Fps is (10 sec: 13926.6, 60 sec: 10308.4, 300 sec: 7650.5). Total num frames: 26185728. Throughput: 0: 2815.4. Samples: 5538252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:32:06,028][06183] Avg episode reward: [(0, '4.402')] -[2023-02-22 21:32:07,231][28133] Updated weights for policy 0, policy_version 6398 (0.0010) -[2023-02-22 21:32:10,252][28133] Updated weights for policy 0, policy_version 6408 (0.0014) -[2023-02-22 21:32:11,027][06183] Fps is (10 sec: 13926.3, 60 sec: 10922.8, 300 sec: 7747.7). Total num frames: 26255360. Throughput: 0: 3083.2. Samples: 5559738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:32:11,028][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 21:32:13,177][28133] Updated weights for policy 0, policy_version 6418 (0.0010) -[2023-02-22 21:32:15,918][28133] Updated weights for policy 0, policy_version 6428 (0.0012) -[2023-02-22 21:32:16,027][06183] Fps is (10 sec: 14336.1, 60 sec: 11673.7, 300 sec: 7858.8). Total num frames: 26329088. Throughput: 0: 3351.3. Samples: 5581240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:32:16,029][06183] Avg episode reward: [(0, '4.596')] -[2023-02-22 21:32:18,955][28133] Updated weights for policy 0, policy_version 6438 (0.0010) -[2023-02-22 21:32:21,027][06183] Fps is (10 sec: 13925.9, 60 sec: 12219.7, 300 sec: 7956.0). Total num frames: 26394624. Throughput: 0: 3467.9. Samples: 5591298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:32:21,030][06183] Avg episode reward: [(0, '4.321')] -[2023-02-22 21:32:22,027][28133] Updated weights for policy 0, policy_version 6448 (0.0010) -[2023-02-22 21:32:25,137][28133] Updated weights for policy 0, policy_version 6458 (0.0010) -[2023-02-22 21:32:26,027][06183] Fps is (10 sec: 13516.6, 60 sec: 12834.2, 300 sec: 8067.0). Total num frames: 26464256. Throughput: 0: 3643.0. Samples: 5611356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:32:26,030][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 21:32:28,071][28133] Updated weights for policy 0, policy_version 6468 (0.0012) -[2023-02-22 21:32:31,027][06183] Fps is (10 sec: 13516.6, 60 sec: 13448.5, 300 sec: 8150.3). Total num frames: 26529792. Throughput: 0: 3552.6. Samples: 5631624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:32:31,030][06183] Avg episode reward: [(0, '4.311')] -[2023-02-22 21:32:31,238][28133] Updated weights for policy 0, policy_version 6478 (0.0013) -[2023-02-22 21:32:34,358][28133] Updated weights for policy 0, policy_version 6488 (0.0012) -[2023-02-22 21:32:36,027][06183] Fps is (10 sec: 13106.8, 60 sec: 13994.7, 300 sec: 8247.5). Total num frames: 26595328. Throughput: 0: 3498.5. Samples: 5641362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:32:36,031][06183] Avg episode reward: [(0, '4.236')] -[2023-02-22 21:32:37,539][28133] Updated weights for policy 0, policy_version 6498 (0.0013) -[2023-02-22 21:32:40,685][28133] Updated weights for policy 0, policy_version 6508 (0.0012) -[2023-02-22 21:32:41,027][06183] Fps is (10 sec: 12698.2, 60 sec: 14062.9, 300 sec: 8317.0). Total num frames: 26656768. Throughput: 0: 3407.7. Samples: 5660374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:32:41,031][06183] Avg episode reward: [(0, '4.448')] -[2023-02-22 21:32:43,759][28133] Updated weights for policy 0, policy_version 6518 (0.0011) -[2023-02-22 21:32:46,027][06183] Fps is (10 sec: 12698.2, 60 sec: 13858.2, 300 sec: 8414.2). Total num frames: 26722304. Throughput: 0: 3383.8. Samples: 5680106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:32:46,029][06183] Avg episode reward: [(0, '4.410')] -[2023-02-22 21:32:47,112][28133] Updated weights for policy 0, policy_version 6528 (0.0017) -[2023-02-22 21:32:50,290][28133] Updated weights for policy 0, policy_version 6538 (0.0012) -[2023-02-22 21:32:51,027][06183] Fps is (10 sec: 13106.8, 60 sec: 13585.0, 300 sec: 8511.3). Total num frames: 26787840. Throughput: 0: 3352.5. Samples: 5689114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:32:51,030][06183] Avg episode reward: [(0, '4.514')] -[2023-02-22 21:32:53,750][28133] Updated weights for policy 0, policy_version 6548 (0.0011) -[2023-02-22 21:32:56,027][06183] Fps is (10 sec: 12288.0, 60 sec: 13312.1, 300 sec: 8580.8). Total num frames: 26845184. Throughput: 0: 3284.4. Samples: 5707538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:32:56,031][06183] Avg episode reward: [(0, '4.337')] -[2023-02-22 21:32:57,249][28133] Updated weights for policy 0, policy_version 6558 (0.0015) -[2023-02-22 21:33:00,717][28133] Updated weights for policy 0, policy_version 6568 (0.0015) -[2023-02-22 21:33:01,027][06183] Fps is (10 sec: 11469.2, 60 sec: 13107.2, 300 sec: 8650.2). Total num frames: 26902528. Throughput: 0: 3198.1. Samples: 5725154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:33:01,030][06183] Avg episode reward: [(0, '4.364')] -[2023-02-22 21:33:04,151][28133] Updated weights for policy 0, policy_version 6578 (0.0013) -[2023-02-22 21:33:06,028][06183] Fps is (10 sec: 11877.7, 60 sec: 12970.6, 300 sec: 8733.5). Total num frames: 26963968. Throughput: 0: 3171.2. Samples: 5734002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:33:06,030][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 21:33:07,717][28133] Updated weights for policy 0, policy_version 6588 (0.0012) -[2023-02-22 21:33:11,027][06183] Fps is (10 sec: 11468.7, 60 sec: 12697.6, 300 sec: 8789.0). Total num frames: 27017216. Throughput: 0: 3108.4. Samples: 5751236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:33:11,031][06183] Avg episode reward: [(0, '4.382')] -[2023-02-22 21:33:11,405][28133] Updated weights for policy 0, policy_version 6598 (0.0015) -[2023-02-22 21:33:14,985][28133] Updated weights for policy 0, policy_version 6608 (0.0012) -[2023-02-22 21:33:16,028][06183] Fps is (10 sec: 11058.9, 60 sec: 12424.4, 300 sec: 8858.5). Total num frames: 27074560. Throughput: 0: 3030.1. Samples: 5767978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:33:16,031][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 21:33:18,741][28133] Updated weights for policy 0, policy_version 6618 (0.0012) -[2023-02-22 21:33:21,027][06183] Fps is (10 sec: 11468.4, 60 sec: 12288.0, 300 sec: 8927.9). Total num frames: 27131904. Throughput: 0: 2995.3. Samples: 5776152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:33:21,031][06183] Avg episode reward: [(0, '4.541')] -[2023-02-22 21:33:22,469][28133] Updated weights for policy 0, policy_version 6628 (0.0012) -[2023-02-22 21:33:26,027][06183] Fps is (10 sec: 11060.1, 60 sec: 12015.0, 300 sec: 8983.5). Total num frames: 27185152. Throughput: 0: 2935.2. Samples: 5792458. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:33:26,029][06183] Avg episode reward: [(0, '4.315')] -[2023-02-22 21:33:26,236][28133] Updated weights for policy 0, policy_version 6638 (0.0015) -[2023-02-22 21:33:30,083][28133] Updated weights for policy 0, policy_version 6648 (0.0013) -[2023-02-22 21:33:31,027][06183] Fps is (10 sec: 10649.6, 60 sec: 11810.1, 300 sec: 9052.9). Total num frames: 27238400. Throughput: 0: 2854.1. Samples: 5808544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:33:31,030][06183] Avg episode reward: [(0, '4.439')] -[2023-02-22 21:33:34,035][28133] Updated weights for policy 0, policy_version 6658 (0.0014) -[2023-02-22 21:33:36,027][06183] Fps is (10 sec: 10649.4, 60 sec: 11605.4, 300 sec: 9108.4). Total num frames: 27291648. Throughput: 0: 2826.1. Samples: 5816290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:33:36,029][06183] Avg episode reward: [(0, '4.669')] -[2023-02-22 21:33:37,862][28133] Updated weights for policy 0, policy_version 6668 (0.0014) -[2023-02-22 21:33:41,027][06183] Fps is (10 sec: 10649.9, 60 sec: 11468.8, 300 sec: 9163.9). Total num frames: 27344896. Throughput: 0: 2767.9. Samples: 5832094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:33:41,030][06183] Avg episode reward: [(0, '4.495')] -[2023-02-22 21:33:41,840][28133] Updated weights for policy 0, policy_version 6678 (0.0013) -[2023-02-22 21:33:45,887][28133] Updated weights for policy 0, policy_version 6688 (0.0015) -[2023-02-22 21:33:46,027][06183] Fps is (10 sec: 10240.1, 60 sec: 11195.7, 300 sec: 9219.5). Total num frames: 27394048. Throughput: 0: 2718.7. Samples: 5847496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:33:46,031][06183] Avg episode reward: [(0, '4.510')] -[2023-02-22 21:33:50,006][28133] Updated weights for policy 0, policy_version 6698 (0.0012) -[2023-02-22 21:33:51,027][06183] Fps is (10 sec: 9830.3, 60 sec: 10922.7, 300 sec: 9261.1). Total num frames: 27443200. Throughput: 0: 2689.5. Samples: 5855028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:33:51,031][06183] Avg episode reward: [(0, '4.396')] -[2023-02-22 21:33:51,065][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006700_27443200.pth... -[2023-02-22 21:33:51,526][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006124_25083904.pth -[2023-02-22 21:33:54,130][28133] Updated weights for policy 0, policy_version 6708 (0.0018) -[2023-02-22 21:33:56,027][06183] Fps is (10 sec: 9830.4, 60 sec: 10786.1, 300 sec: 9316.7). Total num frames: 27492352. Throughput: 0: 2627.1. Samples: 5869454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:33:56,030][06183] Avg episode reward: [(0, '4.442')] -[2023-02-22 21:33:58,301][28133] Updated weights for policy 0, policy_version 6718 (0.0012) -[2023-02-22 21:34:01,027][06183] Fps is (10 sec: 9830.5, 60 sec: 10649.6, 300 sec: 9358.3). Total num frames: 27541504. Throughput: 0: 2581.6. Samples: 5884150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:34:01,029][06183] Avg episode reward: [(0, '4.581')] -[2023-02-22 21:34:02,219][28133] Updated weights for policy 0, policy_version 6728 (0.0015) -[2023-02-22 21:34:05,662][28133] Updated weights for policy 0, policy_version 6738 (0.0014) -[2023-02-22 21:34:06,027][06183] Fps is (10 sec: 11059.1, 60 sec: 10649.7, 300 sec: 9455.5). Total num frames: 27602944. Throughput: 0: 2596.4. Samples: 5892988. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2023-02-22 21:34:06,031][06183] Avg episode reward: [(0, '4.533')] -[2023-02-22 21:34:09,717][28133] Updated weights for policy 0, policy_version 6748 (0.0014) -[2023-02-22 21:34:11,027][06183] Fps is (10 sec: 11059.3, 60 sec: 10581.3, 300 sec: 9511.1). Total num frames: 27652096. Throughput: 0: 2594.3. Samples: 5909202. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:34:11,030][06183] Avg episode reward: [(0, '4.391')] -[2023-02-22 21:34:13,931][28133] Updated weights for policy 0, policy_version 6758 (0.0019) -[2023-02-22 21:34:16,027][06183] Fps is (10 sec: 9420.9, 60 sec: 10376.7, 300 sec: 9538.8). Total num frames: 27697152. Throughput: 0: 2556.8. Samples: 5923600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:34:16,032][06183] Avg episode reward: [(0, '4.296')] -[2023-02-22 21:34:18,194][28133] Updated weights for policy 0, policy_version 6768 (0.0013) -[2023-02-22 21:34:21,027][06183] Fps is (10 sec: 9421.0, 60 sec: 10240.1, 300 sec: 9594.4). Total num frames: 27746304. Throughput: 0: 2544.8. Samples: 5930804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:34:21,030][06183] Avg episode reward: [(0, '4.305')] -[2023-02-22 21:34:22,530][28133] Updated weights for policy 0, policy_version 6778 (0.0016) -[2023-02-22 21:34:26,027][06183] Fps is (10 sec: 9830.3, 60 sec: 10171.7, 300 sec: 9649.9). Total num frames: 27795456. Throughput: 0: 2510.5. Samples: 5945066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:34:26,030][06183] Avg episode reward: [(0, '4.336')] -[2023-02-22 21:34:26,840][28133] Updated weights for policy 0, policy_version 6788 (0.0013) -[2023-02-22 21:34:31,027][06183] Fps is (10 sec: 9420.5, 60 sec: 10035.2, 300 sec: 9691.6). Total num frames: 27840512. Throughput: 0: 2479.9. Samples: 5959092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:34:31,031][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 21:34:31,368][28133] Updated weights for policy 0, policy_version 6798 (0.0015) -[2023-02-22 21:34:35,782][28133] Updated weights for policy 0, policy_version 6808 (0.0019) -[2023-02-22 21:34:36,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9898.7, 300 sec: 9719.3). Total num frames: 27885568. Throughput: 0: 2464.8. Samples: 5965944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:34:36,031][06183] Avg episode reward: [(0, '4.417')] -[2023-02-22 21:34:40,176][28133] Updated weights for policy 0, policy_version 6818 (0.0014) -[2023-02-22 21:34:41,027][06183] Fps is (10 sec: 9011.3, 60 sec: 9762.1, 300 sec: 9761.0). Total num frames: 27930624. Throughput: 0: 2453.8. Samples: 5979876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:34:41,032][06183] Avg episode reward: [(0, '4.354')] -[2023-02-22 21:34:44,689][28133] Updated weights for policy 0, policy_version 6828 (0.0013) -[2023-02-22 21:34:46,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9693.8, 300 sec: 9802.6). Total num frames: 27975680. Throughput: 0: 2429.3. Samples: 5993470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:34:46,030][06183] Avg episode reward: [(0, '4.262')] -[2023-02-22 21:34:49,184][28133] Updated weights for policy 0, policy_version 6838 (0.0018) -[2023-02-22 21:34:51,027][06183] Fps is (10 sec: 9421.0, 60 sec: 9693.9, 300 sec: 9858.2). Total num frames: 28024832. Throughput: 0: 2385.5. Samples: 6000336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:34:51,031][06183] Avg episode reward: [(0, '4.406')] -[2023-02-22 21:34:53,792][28133] Updated weights for policy 0, policy_version 6848 (0.0015) -[2023-02-22 21:34:56,027][06183] Fps is (10 sec: 9011.1, 60 sec: 9557.3, 300 sec: 9886.0). Total num frames: 28065792. Throughput: 0: 2323.6. Samples: 6013766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:34:56,031][06183] Avg episode reward: [(0, '4.230')] -[2023-02-22 21:34:58,341][28133] Updated weights for policy 0, policy_version 6858 (0.0012) -[2023-02-22 21:35:01,027][06183] Fps is (10 sec: 8601.5, 60 sec: 9489.1, 300 sec: 9927.6). Total num frames: 28110848. Throughput: 0: 2300.7. Samples: 6027132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:01,031][06183] Avg episode reward: [(0, '4.358')] -[2023-02-22 21:35:03,041][28133] Updated weights for policy 0, policy_version 6868 (0.0016) -[2023-02-22 21:35:06,027][06183] Fps is (10 sec: 9011.2, 60 sec: 9216.0, 300 sec: 9969.3). Total num frames: 28155904. Throughput: 0: 2287.1. Samples: 6033722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:06,031][06183] Avg episode reward: [(0, '4.378')] -[2023-02-22 21:35:07,668][28133] Updated weights for policy 0, policy_version 6878 (0.0014) -[2023-02-22 21:35:11,027][06183] Fps is (10 sec: 9011.3, 60 sec: 9147.7, 300 sec: 10010.9). Total num frames: 28200960. Throughput: 0: 2266.1. Samples: 6047040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:11,031][06183] Avg episode reward: [(0, '4.129')] -[2023-02-22 21:35:12,307][28133] Updated weights for policy 0, policy_version 6888 (0.0020) -[2023-02-22 21:35:16,027][06183] Fps is (10 sec: 9011.3, 60 sec: 9147.7, 300 sec: 10066.4). Total num frames: 28246016. Throughput: 0: 2250.5. Samples: 6060366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:16,033][06183] Avg episode reward: [(0, '4.669')] -[2023-02-22 21:35:16,909][28133] Updated weights for policy 0, policy_version 6898 (0.0014) -[2023-02-22 21:35:21,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9011.2, 300 sec: 10094.2). Total num frames: 28286976. Throughput: 0: 2242.1. Samples: 6066840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:35:21,032][06183] Avg episode reward: [(0, '4.327')] -[2023-02-22 21:35:21,657][28133] Updated weights for policy 0, policy_version 6908 (0.0018) -[2023-02-22 21:35:26,027][06183] Fps is (10 sec: 8601.6, 60 sec: 8942.9, 300 sec: 10135.9). Total num frames: 28332032. Throughput: 0: 2220.8. Samples: 6079812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:26,031][06183] Avg episode reward: [(0, '4.389')] -[2023-02-22 21:35:26,478][28133] Updated weights for policy 0, policy_version 6918 (0.0019) -[2023-02-22 21:35:31,027][06183] Fps is (10 sec: 8601.6, 60 sec: 8874.7, 300 sec: 10163.6). Total num frames: 28372992. Throughput: 0: 2202.4. Samples: 6092580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:35:31,033][06183] Avg episode reward: [(0, '4.428')] -[2023-02-22 21:35:31,295][28133] Updated weights for policy 0, policy_version 6928 (0.0019) -[2023-02-22 21:35:36,027][06183] Fps is (10 sec: 8192.1, 60 sec: 8806.4, 300 sec: 10191.4). Total num frames: 28413952. Throughput: 0: 2190.0. Samples: 6098888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:35:36,032][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 21:35:36,172][28133] Updated weights for policy 0, policy_version 6938 (0.0018) -[2023-02-22 21:35:40,957][28133] Updated weights for policy 0, policy_version 6948 (0.0017) -[2023-02-22 21:35:41,027][06183] Fps is (10 sec: 8601.7, 60 sec: 8806.4, 300 sec: 10233.1). Total num frames: 28459008. Throughput: 0: 2172.6. Samples: 6111532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:41,031][06183] Avg episode reward: [(0, '4.374')] -[2023-02-22 21:35:45,922][28133] Updated weights for policy 0, policy_version 6958 (0.0016) -[2023-02-22 21:35:46,027][06183] Fps is (10 sec: 8601.5, 60 sec: 8738.1, 300 sec: 10274.7). Total num frames: 28499968. Throughput: 0: 2152.0. Samples: 6123974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:35:46,031][06183] Avg episode reward: [(0, '4.596')] -[2023-02-22 21:35:50,904][28133] Updated weights for policy 0, policy_version 6968 (0.0017) -[2023-02-22 21:35:51,028][06183] Fps is (10 sec: 8191.2, 60 sec: 8601.5, 300 sec: 10302.5). Total num frames: 28540928. Throughput: 0: 2142.8. Samples: 6130148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:51,031][06183] Avg episode reward: [(0, '4.581')] -[2023-02-22 21:35:51,066][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006968_28540928.pth... -[2023-02-22 21:35:51,554][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006341_25972736.pth -[2023-02-22 21:35:56,028][06183] Fps is (10 sec: 7782.0, 60 sec: 8533.3, 300 sec: 10316.4). Total num frames: 28577792. Throughput: 0: 2109.7. Samples: 6141976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:35:56,033][06183] Avg episode reward: [(0, '4.267')] -[2023-02-22 21:35:56,140][28133] Updated weights for policy 0, policy_version 6978 (0.0016) -[2023-02-22 21:36:01,027][06183] Fps is (10 sec: 7783.0, 60 sec: 8465.1, 300 sec: 10344.1). Total num frames: 28618752. Throughput: 0: 2085.6. Samples: 6154218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:01,031][06183] Avg episode reward: [(0, '4.489')] -[2023-02-22 21:36:01,140][28133] Updated weights for policy 0, policy_version 6988 (0.0021) -[2023-02-22 21:36:06,028][06183] Fps is (10 sec: 8192.0, 60 sec: 8396.7, 300 sec: 10371.9). Total num frames: 28659712. Throughput: 0: 2076.1. Samples: 6160264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:06,031][06183] Avg episode reward: [(0, '4.368')] -[2023-02-22 21:36:06,240][28133] Updated weights for policy 0, policy_version 6998 (0.0023) -[2023-02-22 21:36:11,028][06183] Fps is (10 sec: 8191.6, 60 sec: 8328.5, 300 sec: 10413.5). Total num frames: 28700672. Throughput: 0: 2050.8. Samples: 6172100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:11,031][06183] Avg episode reward: [(0, '4.366')] -[2023-02-22 21:36:11,504][28133] Updated weights for policy 0, policy_version 7008 (0.0018) -[2023-02-22 21:36:16,027][06183] Fps is (10 sec: 7782.9, 60 sec: 8192.0, 300 sec: 10427.4). Total num frames: 28737536. Throughput: 0: 2031.6. Samples: 6184002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:16,031][06183] Avg episode reward: [(0, '4.301')] -[2023-02-22 21:36:16,699][28133] Updated weights for policy 0, policy_version 7018 (0.0018) -[2023-02-22 21:36:21,027][06183] Fps is (10 sec: 7782.6, 60 sec: 8192.0, 300 sec: 10455.2). Total num frames: 28778496. Throughput: 0: 2021.4. Samples: 6189852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:21,032][06183] Avg episode reward: [(0, '4.458')] -[2023-02-22 21:36:21,868][28133] Updated weights for policy 0, policy_version 7028 (0.0018) -[2023-02-22 21:36:26,028][06183] Fps is (10 sec: 7781.9, 60 sec: 8055.4, 300 sec: 10483.0). Total num frames: 28815360. Throughput: 0: 1998.1. Samples: 6201450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:26,037][06183] Avg episode reward: [(0, '4.520')] -[2023-02-22 21:36:27,243][28133] Updated weights for policy 0, policy_version 7038 (0.0025) -[2023-02-22 21:36:31,028][06183] Fps is (10 sec: 7782.2, 60 sec: 8055.4, 300 sec: 10510.7). Total num frames: 28856320. Throughput: 0: 1981.0. Samples: 6213118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:31,031][06183] Avg episode reward: [(0, '4.539')] -[2023-02-22 21:36:32,516][28133] Updated weights for policy 0, policy_version 7048 (0.0021) -[2023-02-22 21:36:36,027][06183] Fps is (10 sec: 7782.8, 60 sec: 7987.2, 300 sec: 10441.3). Total num frames: 28893184. Throughput: 0: 1971.7. Samples: 6218874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:36,033][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 21:36:37,889][28133] Updated weights for policy 0, policy_version 7058 (0.0024) -[2023-02-22 21:36:41,027][06183] Fps is (10 sec: 7373.2, 60 sec: 7850.7, 300 sec: 10302.5). Total num frames: 28930048. Throughput: 0: 1963.1. Samples: 6230316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:41,031][06183] Avg episode reward: [(0, '4.356')] -[2023-02-22 21:36:43,293][28133] Updated weights for policy 0, policy_version 7068 (0.0024) -[2023-02-22 21:36:46,028][06183] Fps is (10 sec: 7372.5, 60 sec: 7782.3, 300 sec: 10149.7). Total num frames: 28966912. Throughput: 0: 1942.3. Samples: 6241622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:46,042][06183] Avg episode reward: [(0, '4.276')] -[2023-02-22 21:36:48,707][28133] Updated weights for policy 0, policy_version 7078 (0.0023) -[2023-02-22 21:36:51,027][06183] Fps is (10 sec: 7782.3, 60 sec: 7782.5, 300 sec: 10038.7). Total num frames: 29007872. Throughput: 0: 1934.1. Samples: 6247300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:51,031][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 21:36:54,204][28133] Updated weights for policy 0, policy_version 7088 (0.0021) -[2023-02-22 21:36:56,028][06183] Fps is (10 sec: 7782.3, 60 sec: 7782.4, 300 sec: 9927.6). Total num frames: 29044736. Throughput: 0: 1920.3. Samples: 6258514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:36:56,033][06183] Avg episode reward: [(0, '4.574')] -[2023-02-22 21:36:59,748][28133] Updated weights for policy 0, policy_version 7098 (0.0024) -[2023-02-22 21:37:01,028][06183] Fps is (10 sec: 7372.6, 60 sec: 7714.1, 300 sec: 9816.5). Total num frames: 29081600. Throughput: 0: 1903.0. Samples: 6269636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:37:01,034][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 21:37:05,416][28133] Updated weights for policy 0, policy_version 7108 (0.0026) -[2023-02-22 21:37:06,027][06183] Fps is (10 sec: 7373.1, 60 sec: 7645.9, 300 sec: 9705.4). Total num frames: 29118464. Throughput: 0: 1893.0. Samples: 6275036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:37:06,031][06183] Avg episode reward: [(0, '4.460')] -[2023-02-22 21:37:11,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7509.4, 300 sec: 9566.6). Total num frames: 29151232. Throughput: 0: 1872.1. Samples: 6285692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:11,031][06183] Avg episode reward: [(0, '4.210')] -[2023-02-22 21:37:11,124][28133] Updated weights for policy 0, policy_version 7118 (0.0022) -[2023-02-22 21:37:16,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7509.3, 300 sec: 9469.4). Total num frames: 29188096. Throughput: 0: 1853.8. Samples: 6296540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:16,031][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:37:16,782][28133] Updated weights for policy 0, policy_version 7128 (0.0022) -[2023-02-22 21:37:21,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7441.0, 300 sec: 9358.3). Total num frames: 29224960. Throughput: 0: 1845.0. Samples: 6301898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:21,034][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 21:37:22,561][28133] Updated weights for policy 0, policy_version 7138 (0.0035) -[2023-02-22 21:37:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7372.8, 300 sec: 9247.2). Total num frames: 29257728. Throughput: 0: 1827.9. Samples: 6312572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:26,033][06183] Avg episode reward: [(0, '4.645')] -[2023-02-22 21:37:28,286][28133] Updated weights for policy 0, policy_version 7148 (0.0029) -[2023-02-22 21:37:31,027][06183] Fps is (10 sec: 6963.6, 60 sec: 7304.6, 300 sec: 9150.1). Total num frames: 29294592. Throughput: 0: 1812.3. Samples: 6323176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:37:31,032][06183] Avg episode reward: [(0, '4.346')] -[2023-02-22 21:37:34,007][28133] Updated weights for policy 0, policy_version 7158 (0.0021) -[2023-02-22 21:37:36,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7304.5, 300 sec: 9066.7). Total num frames: 29331456. Throughput: 0: 1805.6. Samples: 6328554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:36,033][06183] Avg episode reward: [(0, '4.316')] -[2023-02-22 21:37:39,851][28133] Updated weights for policy 0, policy_version 7168 (0.0028) -[2023-02-22 21:37:41,027][06183] Fps is (10 sec: 7372.9, 60 sec: 7304.5, 300 sec: 8969.5). Total num frames: 29368320. Throughput: 0: 1789.7. Samples: 6339050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:41,031][06183] Avg episode reward: [(0, '4.406')] -[2023-02-22 21:37:45,742][28133] Updated weights for policy 0, policy_version 7178 (0.0026) -[2023-02-22 21:37:46,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7236.3, 300 sec: 8858.5). Total num frames: 29401088. Throughput: 0: 1775.7. Samples: 6349540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:46,032][06183] Avg episode reward: [(0, '4.381')] -[2023-02-22 21:37:51,027][06183] Fps is (10 sec: 6553.5, 60 sec: 7099.7, 300 sec: 8775.2). Total num frames: 29433856. Throughput: 0: 1768.5. Samples: 6354618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:51,032][06183] Avg episode reward: [(0, '4.304')] -[2023-02-22 21:37:51,072][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007186_29433856.pth... -[2023-02-22 21:37:51,588][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006700_27443200.pth -[2023-02-22 21:37:51,941][28133] Updated weights for policy 0, policy_version 7188 (0.0031) -[2023-02-22 21:37:56,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7099.8, 300 sec: 8705.7). Total num frames: 29470720. Throughput: 0: 1759.4. Samples: 6364864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:37:56,031][06183] Avg episode reward: [(0, '4.536')] -[2023-02-22 21:37:57,766][28133] Updated weights for policy 0, policy_version 7198 (0.0024) -[2023-02-22 21:38:01,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 8608.6). Total num frames: 29503488. Throughput: 0: 1747.2. Samples: 6375166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:01,034][06183] Avg episode reward: [(0, '4.372')] -[2023-02-22 21:38:03,758][28133] Updated weights for policy 0, policy_version 7208 (0.0031) -[2023-02-22 21:38:06,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6963.2, 300 sec: 8539.1). Total num frames: 29536256. Throughput: 0: 1742.0. Samples: 6380290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:06,034][06183] Avg episode reward: [(0, '4.465')] -[2023-02-22 21:38:09,780][28133] Updated weights for policy 0, policy_version 7218 (0.0021) -[2023-02-22 21:38:11,028][06183] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 8469.7). Total num frames: 29573120. Throughput: 0: 1731.2. Samples: 6390478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:11,033][06183] Avg episode reward: [(0, '4.379')] -[2023-02-22 21:38:15,755][28133] Updated weights for policy 0, policy_version 7228 (0.0026) -[2023-02-22 21:38:16,027][06183] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 8386.4). Total num frames: 29605888. Throughput: 0: 1724.0. Samples: 6400754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:38:16,032][06183] Avg episode reward: [(0, '4.434')] -[2023-02-22 21:38:21,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6895.0, 300 sec: 8317.0). Total num frames: 29638656. Throughput: 0: 1718.8. Samples: 6405902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:21,044][06183] Avg episode reward: [(0, '4.504')] -[2023-02-22 21:38:21,654][28133] Updated weights for policy 0, policy_version 7238 (0.0030) -[2023-02-22 21:38:26,028][06183] Fps is (10 sec: 6962.9, 60 sec: 6963.2, 300 sec: 8261.4). Total num frames: 29675520. Throughput: 0: 1710.9. Samples: 6416042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:38:26,033][06183] Avg episode reward: [(0, '4.472')] -[2023-02-22 21:38:27,787][28133] Updated weights for policy 0, policy_version 7248 (0.0024) -[2023-02-22 21:38:31,027][06183] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 8192.0). Total num frames: 29708288. Throughput: 0: 1703.5. Samples: 6426198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:38:31,032][06183] Avg episode reward: [(0, '4.556')] -[2023-02-22 21:38:33,914][28133] Updated weights for policy 0, policy_version 7258 (0.0026) -[2023-02-22 21:38:36,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6826.6, 300 sec: 8122.6). Total num frames: 29741056. Throughput: 0: 1701.4. Samples: 6431182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:36,032][06183] Avg episode reward: [(0, '4.494')] -[2023-02-22 21:38:40,217][28133] Updated weights for policy 0, policy_version 7268 (0.0030) -[2023-02-22 21:38:41,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 8067.0). Total num frames: 29773824. Throughput: 0: 1691.5. Samples: 6440982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:41,031][06183] Avg episode reward: [(0, '4.419')] -[2023-02-22 21:38:46,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6758.3, 300 sec: 8011.5). Total num frames: 29806592. Throughput: 0: 1686.2. Samples: 6451048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:46,034][06183] Avg episode reward: [(0, '4.410')] -[2023-02-22 21:38:46,267][28133] Updated weights for policy 0, policy_version 7278 (0.0033) -[2023-02-22 21:38:51,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 7956.0). Total num frames: 29839360. Throughput: 0: 1679.7. Samples: 6455874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:38:51,032][06183] Avg episode reward: [(0, '4.169')] -[2023-02-22 21:38:52,564][28133] Updated weights for policy 0, policy_version 7288 (0.0024) -[2023-02-22 21:38:56,027][06183] Fps is (10 sec: 6554.0, 60 sec: 6690.1, 300 sec: 7900.4). Total num frames: 29872128. Throughput: 0: 1675.1. Samples: 6465856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:38:56,031][06183] Avg episode reward: [(0, '4.315')] -[2023-02-22 21:38:58,819][28133] Updated weights for policy 0, policy_version 7298 (0.0029) -[2023-02-22 21:39:01,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 7803.2). Total num frames: 29904896. Throughput: 0: 1663.7. Samples: 6475622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:01,033][06183] Avg episode reward: [(0, '4.442')] -[2023-02-22 21:39:04,939][28133] Updated weights for policy 0, policy_version 7308 (0.0027) -[2023-02-22 21:39:06,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6690.2, 300 sec: 7747.7). Total num frames: 29937664. Throughput: 0: 1661.8. Samples: 6480682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:06,032][06183] Avg episode reward: [(0, '4.352')] -[2023-02-22 21:39:11,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 7706.0). Total num frames: 29970432. Throughput: 0: 1661.3. Samples: 6490800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:11,035][06183] Avg episode reward: [(0, '4.177')] -[2023-02-22 21:39:11,127][28133] Updated weights for policy 0, policy_version 7318 (0.0035) -[2023-02-22 21:39:16,027][06183] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 7664.4). Total num frames: 30007296. Throughput: 0: 1657.8. Samples: 6500798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:16,031][06183] Avg episode reward: [(0, '4.547')] -[2023-02-22 21:39:17,342][28133] Updated weights for policy 0, policy_version 7328 (0.0026) -[2023-02-22 21:39:21,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 7595.0). Total num frames: 30035968. Throughput: 0: 1653.1. Samples: 6505572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:21,032][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:39:23,655][28133] Updated weights for policy 0, policy_version 7338 (0.0037) -[2023-02-22 21:39:26,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 30068736. Throughput: 0: 1648.7. Samples: 6515172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:26,032][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:39:29,944][28133] Updated weights for policy 0, policy_version 7348 (0.0026) -[2023-02-22 21:39:31,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7511.6). Total num frames: 30101504. Throughput: 0: 1641.4. Samples: 6524910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:31,032][06183] Avg episode reward: [(0, '4.621')] -[2023-02-22 21:39:36,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7470.0). Total num frames: 30134272. Throughput: 0: 1644.0. Samples: 6529854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:39:36,031][06183] Avg episode reward: [(0, '4.611')] -[2023-02-22 21:39:36,295][28133] Updated weights for policy 0, policy_version 7358 (0.0031) -[2023-02-22 21:39:41,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7428.3). Total num frames: 30167040. Throughput: 0: 1640.8. Samples: 6539694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:41,031][06183] Avg episode reward: [(0, '4.477')] -[2023-02-22 21:39:42,460][28133] Updated weights for policy 0, policy_version 7368 (0.0037) -[2023-02-22 21:39:46,027][06183] Fps is (10 sec: 6143.9, 60 sec: 6485.4, 300 sec: 7358.9). Total num frames: 30195712. Throughput: 0: 1608.6. Samples: 6548010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:46,036][06183] Avg episode reward: [(0, '4.404')] -[2023-02-22 21:39:50,075][28133] Updated weights for policy 0, policy_version 7378 (0.0031) -[2023-02-22 21:39:51,028][06183] Fps is (10 sec: 5734.1, 60 sec: 6417.0, 300 sec: 7317.2). Total num frames: 30224384. Throughput: 0: 1591.6. Samples: 6552306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:39:51,033][06183] Avg episode reward: [(0, '4.643')] -[2023-02-22 21:39:51,067][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007379_30224384.pth... -[2023-02-22 21:39:51,582][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006968_28540928.pth -[2023-02-22 21:39:56,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 7275.6). Total num frames: 30257152. Throughput: 0: 1567.4. Samples: 6561332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:39:56,032][06183] Avg episode reward: [(0, '4.351')] -[2023-02-22 21:39:56,714][28133] Updated weights for policy 0, policy_version 7388 (0.0033) -[2023-02-22 21:40:01,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6348.8, 300 sec: 7220.1). Total num frames: 30285824. Throughput: 0: 1561.9. Samples: 6571082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:40:01,035][06183] Avg episode reward: [(0, '4.446')] -[2023-02-22 21:40:02,971][28133] Updated weights for policy 0, policy_version 7398 (0.0037) -[2023-02-22 21:40:06,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6348.8, 300 sec: 7178.4). Total num frames: 30318592. Throughput: 0: 1559.7. Samples: 6575760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:40:06,033][06183] Avg episode reward: [(0, '4.505')] -[2023-02-22 21:40:09,534][28133] Updated weights for policy 0, policy_version 7408 (0.0032) -[2023-02-22 21:40:11,028][06183] Fps is (10 sec: 6553.3, 60 sec: 6348.8, 300 sec: 7136.7). Total num frames: 30351360. Throughput: 0: 1559.9. Samples: 6585370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:40:11,032][06183] Avg episode reward: [(0, '4.543')] -[2023-02-22 21:40:16,027][06183] Fps is (10 sec: 6144.3, 60 sec: 6212.3, 300 sec: 7095.1). Total num frames: 30380032. Throughput: 0: 1552.8. Samples: 6594784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:40:16,043][06183] Avg episode reward: [(0, '4.458')] -[2023-02-22 21:40:16,104][28133] Updated weights for policy 0, policy_version 7418 (0.0038) -[2023-02-22 21:40:21,028][06183] Fps is (10 sec: 6144.2, 60 sec: 6280.5, 300 sec: 7053.4). Total num frames: 30412800. Throughput: 0: 1549.5. Samples: 6599580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:40:21,033][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:40:22,858][28133] Updated weights for policy 0, policy_version 7428 (0.0035) -[2023-02-22 21:40:26,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6212.2, 300 sec: 7011.8). Total num frames: 30441472. Throughput: 0: 1529.1. Samples: 6608506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:40:26,032][06183] Avg episode reward: [(0, '4.426')] -[2023-02-22 21:40:29,576][28133] Updated weights for policy 0, policy_version 7438 (0.0027) -[2023-02-22 21:40:31,027][06183] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6984.0). Total num frames: 30474240. Throughput: 0: 1550.0. Samples: 6617762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:40:31,032][06183] Avg episode reward: [(0, '4.584')] -[2023-02-22 21:40:35,984][28133] Updated weights for policy 0, policy_version 7448 (0.0033) -[2023-02-22 21:40:36,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6212.3, 300 sec: 6942.4). Total num frames: 30507008. Throughput: 0: 1558.3. Samples: 6622428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:40:36,033][06183] Avg episode reward: [(0, '4.537')] -[2023-02-22 21:40:38,902][28133] Updated weights for policy 0, policy_version 7458 (0.0010) -[2023-02-22 21:40:41,027][06183] Fps is (10 sec: 10649.7, 60 sec: 6894.9, 300 sec: 7053.4). Total num frames: 30580736. Throughput: 0: 1746.7. Samples: 6639932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:40:41,030][06183] Avg episode reward: [(0, '4.608')] -[2023-02-22 21:40:41,665][28133] Updated weights for policy 0, policy_version 7468 (0.0010) -[2023-02-22 21:40:44,369][28133] Updated weights for policy 0, policy_version 7478 (0.0012) -[2023-02-22 21:40:46,027][06183] Fps is (10 sec: 14746.1, 60 sec: 7645.9, 300 sec: 7164.6). Total num frames: 30654464. Throughput: 0: 2034.3. Samples: 6662624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:40:46,030][06183] Avg episode reward: [(0, '4.461')] -[2023-02-22 21:40:47,236][28133] Updated weights for policy 0, policy_version 7488 (0.0013) -[2023-02-22 21:40:50,029][28133] Updated weights for policy 0, policy_version 7498 (0.0012) -[2023-02-22 21:40:51,027][06183] Fps is (10 sec: 14745.7, 60 sec: 8396.9, 300 sec: 7289.5). Total num frames: 30728192. Throughput: 0: 2135.8. Samples: 6671870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:40:51,030][06183] Avg episode reward: [(0, '4.435')] -[2023-02-22 21:40:52,715][28133] Updated weights for policy 0, policy_version 7508 (0.0012) -[2023-02-22 21:40:56,019][28133] Updated weights for policy 0, policy_version 7518 (0.0011) -[2023-02-22 21:40:56,027][06183] Fps is (10 sec: 13926.4, 60 sec: 8943.0, 300 sec: 7372.8). Total num frames: 30793728. Throughput: 0: 2405.0. Samples: 6693594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:40:56,029][06183] Avg episode reward: [(0, '4.439')] -[2023-02-22 21:40:58,692][28133] Updated weights for policy 0, policy_version 7528 (0.0009) -[2023-02-22 21:41:01,027][06183] Fps is (10 sec: 13926.4, 60 sec: 9693.9, 300 sec: 7483.9). Total num frames: 30867456. Throughput: 0: 2678.3. Samples: 6715306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:41:01,029][06183] Avg episode reward: [(0, '4.460')] -[2023-02-22 21:41:01,677][28133] Updated weights for policy 0, policy_version 7538 (0.0010) -[2023-02-22 21:41:04,692][28133] Updated weights for policy 0, policy_version 7548 (0.0012) -[2023-02-22 21:41:06,027][06183] Fps is (10 sec: 14335.9, 60 sec: 10308.4, 300 sec: 7581.1). Total num frames: 30937088. Throughput: 0: 2773.8. Samples: 6724400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:41:06,029][06183] Avg episode reward: [(0, '4.429')] -[2023-02-22 21:41:07,364][28133] Updated weights for policy 0, policy_version 7558 (0.0010) -[2023-02-22 21:41:10,722][28133] Updated weights for policy 0, policy_version 7568 (0.0013) -[2023-02-22 21:41:11,027][06183] Fps is (10 sec: 13107.0, 60 sec: 10786.2, 300 sec: 7664.4). Total num frames: 30998528. Throughput: 0: 3050.6. Samples: 6745782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:41:11,030][06183] Avg episode reward: [(0, '4.336')] -[2023-02-22 21:41:13,779][28133] Updated weights for policy 0, policy_version 7578 (0.0014) -[2023-02-22 21:41:16,027][06183] Fps is (10 sec: 12697.5, 60 sec: 11400.6, 300 sec: 7747.7). Total num frames: 31064064. Throughput: 0: 3272.3. Samples: 6765016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:41:16,029][06183] Avg episode reward: [(0, '4.613')] -[2023-02-22 21:41:17,171][28133] Updated weights for policy 0, policy_version 7588 (0.0012) -[2023-02-22 21:41:20,063][28133] Updated weights for policy 0, policy_version 7598 (0.0010) -[2023-02-22 21:41:21,027][06183] Fps is (10 sec: 13107.5, 60 sec: 11946.8, 300 sec: 7844.9). Total num frames: 31129600. Throughput: 0: 3383.9. Samples: 6774702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:41:21,030][06183] Avg episode reward: [(0, '4.496')] -[2023-02-22 21:41:23,246][28133] Updated weights for policy 0, policy_version 7608 (0.0012) -[2023-02-22 21:41:26,027][06183] Fps is (10 sec: 13107.5, 60 sec: 12561.2, 300 sec: 7928.2). Total num frames: 31195136. Throughput: 0: 3448.0. Samples: 6795090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:41:26,029][06183] Avg episode reward: [(0, '4.539')] -[2023-02-22 21:41:26,335][28133] Updated weights for policy 0, policy_version 7618 (0.0011) -[2023-02-22 21:41:29,614][28133] Updated weights for policy 0, policy_version 7628 (0.0011) -[2023-02-22 21:41:31,029][06183] Fps is (10 sec: 13104.3, 60 sec: 13106.8, 300 sec: 8025.3). Total num frames: 31260672. Throughput: 0: 3361.4. Samples: 6813896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:41:31,032][06183] Avg episode reward: [(0, '4.460')] -[2023-02-22 21:41:32,814][28133] Updated weights for policy 0, policy_version 7638 (0.0010) -[2023-02-22 21:41:35,714][28133] Updated weights for policy 0, policy_version 7648 (0.0010) -[2023-02-22 21:41:36,027][06183] Fps is (10 sec: 13107.2, 60 sec: 13653.4, 300 sec: 8122.6). Total num frames: 31326208. Throughput: 0: 3392.1. Samples: 6824512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:41:36,029][06183] Avg episode reward: [(0, '4.469')] -[2023-02-22 21:41:39,137][28133] Updated weights for policy 0, policy_version 7658 (0.0013) -[2023-02-22 21:41:41,027][06183] Fps is (10 sec: 13110.1, 60 sec: 13516.8, 300 sec: 8219.8). Total num frames: 31391744. Throughput: 0: 3340.4. Samples: 6843914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:41:41,030][06183] Avg episode reward: [(0, '4.457')] -[2023-02-22 21:41:42,173][28133] Updated weights for policy 0, policy_version 7668 (0.0012) -[2023-02-22 21:41:45,596][28133] Updated weights for policy 0, policy_version 7678 (0.0013) -[2023-02-22 21:41:46,028][06183] Fps is (10 sec: 12696.9, 60 sec: 13311.9, 300 sec: 8289.2). Total num frames: 31453184. Throughput: 0: 3266.2. Samples: 6862284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:41:46,030][06183] Avg episode reward: [(0, '4.502')] -[2023-02-22 21:41:48,973][28133] Updated weights for policy 0, policy_version 7688 (0.0011) -[2023-02-22 21:41:51,027][06183] Fps is (10 sec: 11878.5, 60 sec: 13039.0, 300 sec: 8358.6). Total num frames: 31510528. Throughput: 0: 3263.8. Samples: 6871270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:41:51,029][06183] Avg episode reward: [(0, '4.508')] -[2023-02-22 21:41:51,077][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007694_31514624.pth... -[2023-02-22 21:41:51,333][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007186_29433856.pth -[2023-02-22 21:41:52,475][28133] Updated weights for policy 0, policy_version 7698 (0.0010) -[2023-02-22 21:41:56,027][06183] Fps is (10 sec: 11469.3, 60 sec: 12902.4, 300 sec: 8428.1). Total num frames: 31567872. Throughput: 0: 3175.0. Samples: 6888654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:41:56,029][06183] Avg episode reward: [(0, '4.801')] -[2023-02-22 21:41:56,074][28133] Updated weights for policy 0, policy_version 7708 (0.0014) -[2023-02-22 21:41:59,648][28133] Updated weights for policy 0, policy_version 7718 (0.0012) -[2023-02-22 21:42:01,028][06183] Fps is (10 sec: 11468.2, 60 sec: 12629.3, 300 sec: 8497.5). Total num frames: 31625216. Throughput: 0: 3129.9. Samples: 6905862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:01,030][06183] Avg episode reward: [(0, '4.236')] -[2023-02-22 21:42:03,275][28133] Updated weights for policy 0, policy_version 7728 (0.0012) -[2023-02-22 21:42:06,027][06183] Fps is (10 sec: 11468.7, 60 sec: 12424.5, 300 sec: 8580.8). Total num frames: 31682560. Throughput: 0: 3101.7. Samples: 6914280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:06,031][06183] Avg episode reward: [(0, '4.331')] -[2023-02-22 21:42:06,832][28133] Updated weights for policy 0, policy_version 7738 (0.0011) -[2023-02-22 21:42:10,513][28133] Updated weights for policy 0, policy_version 7748 (0.0013) -[2023-02-22 21:42:11,027][06183] Fps is (10 sec: 11469.3, 60 sec: 12356.3, 300 sec: 8650.2). Total num frames: 31739904. Throughput: 0: 3022.3. Samples: 6931092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:11,030][06183] Avg episode reward: [(0, '4.558')] -[2023-02-22 21:42:14,287][28133] Updated weights for policy 0, policy_version 7758 (0.0016) -[2023-02-22 21:42:16,027][06183] Fps is (10 sec: 11059.3, 60 sec: 12151.5, 300 sec: 8705.8). Total num frames: 31793152. Throughput: 0: 2969.3. Samples: 6947508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:16,031][06183] Avg episode reward: [(0, '4.215')] -[2023-02-22 21:42:18,148][28133] Updated weights for policy 0, policy_version 7768 (0.0016) -[2023-02-22 21:42:21,027][06183] Fps is (10 sec: 10649.6, 60 sec: 11946.7, 300 sec: 8775.2). Total num frames: 31846400. Throughput: 0: 2909.6. Samples: 6955444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:21,031][06183] Avg episode reward: [(0, '4.403')] -[2023-02-22 21:42:21,918][28133] Updated weights for policy 0, policy_version 7778 (0.0014) -[2023-02-22 21:42:25,871][28133] Updated weights for policy 0, policy_version 7788 (0.0013) -[2023-02-22 21:42:26,027][06183] Fps is (10 sec: 10649.5, 60 sec: 11741.8, 300 sec: 8830.7). Total num frames: 31899648. Throughput: 0: 2831.3. Samples: 6971324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:26,030][06183] Avg episode reward: [(0, '4.784')] -[2023-02-22 21:42:29,860][28133] Updated weights for policy 0, policy_version 7798 (0.0017) -[2023-02-22 21:42:31,027][06183] Fps is (10 sec: 10240.1, 60 sec: 11469.2, 300 sec: 8872.4). Total num frames: 31948800. Throughput: 0: 2766.9. Samples: 6986794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:42:31,029][06183] Avg episode reward: [(0, '4.262')] -[2023-02-22 21:42:33,855][28133] Updated weights for policy 0, policy_version 7808 (0.0016) -[2023-02-22 21:42:36,027][06183] Fps is (10 sec: 10239.9, 60 sec: 11264.0, 300 sec: 8927.9). Total num frames: 32002048. Throughput: 0: 2737.6. Samples: 6994462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:36,030][06183] Avg episode reward: [(0, '4.494')] -[2023-02-22 21:42:37,943][28133] Updated weights for policy 0, policy_version 7818 (0.0019) -[2023-02-22 21:42:41,027][06183] Fps is (10 sec: 10239.9, 60 sec: 10990.9, 300 sec: 8983.4). Total num frames: 32051200. Throughput: 0: 2682.9. Samples: 7009384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:41,030][06183] Avg episode reward: [(0, '4.360')] -[2023-02-22 21:42:42,167][28133] Updated weights for policy 0, policy_version 7828 (0.0020) -[2023-02-22 21:42:46,027][06183] Fps is (10 sec: 9830.5, 60 sec: 10786.2, 300 sec: 9039.0). Total num frames: 32100352. Throughput: 0: 2628.8. Samples: 7024156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:42:46,030][06183] Avg episode reward: [(0, '4.368')] -[2023-02-22 21:42:46,329][28133] Updated weights for policy 0, policy_version 7838 (0.0017) -[2023-02-22 21:42:50,547][28133] Updated weights for policy 0, policy_version 7848 (0.0013) -[2023-02-22 21:42:51,027][06183] Fps is (10 sec: 9830.3, 60 sec: 10649.6, 300 sec: 9080.6). Total num frames: 32149504. Throughput: 0: 2603.3. Samples: 7031428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:42:51,030][06183] Avg episode reward: [(0, '4.433')] -[2023-02-22 21:42:54,765][28133] Updated weights for policy 0, policy_version 7858 (0.0016) -[2023-02-22 21:42:56,027][06183] Fps is (10 sec: 9420.6, 60 sec: 10444.8, 300 sec: 9122.3). Total num frames: 32194560. Throughput: 0: 2549.5. Samples: 7045822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:42:56,031][06183] Avg episode reward: [(0, '4.476')] -[2023-02-22 21:42:59,108][28133] Updated weights for policy 0, policy_version 7868 (0.0014) -[2023-02-22 21:43:01,027][06183] Fps is (10 sec: 9420.8, 60 sec: 10308.3, 300 sec: 9177.8). Total num frames: 32243712. Throughput: 0: 2499.7. Samples: 7059996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:01,036][06183] Avg episode reward: [(0, '4.579')] -[2023-02-22 21:43:03,521][28133] Updated weights for policy 0, policy_version 7878 (0.0025) -[2023-02-22 21:43:06,027][06183] Fps is (10 sec: 9420.9, 60 sec: 10103.5, 300 sec: 9205.6). Total num frames: 32288768. Throughput: 0: 2479.2. Samples: 7067006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:06,030][06183] Avg episode reward: [(0, '4.364')] -[2023-02-22 21:43:07,940][28133] Updated weights for policy 0, policy_version 7888 (0.0015) -[2023-02-22 21:43:11,028][06183] Fps is (10 sec: 9010.6, 60 sec: 9898.5, 300 sec: 9247.2). Total num frames: 32333824. Throughput: 0: 2433.0. Samples: 7080810. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:11,033][06183] Avg episode reward: [(0, '4.296')] -[2023-02-22 21:43:12,371][28133] Updated weights for policy 0, policy_version 7898 (0.0023) -[2023-02-22 21:43:16,027][06183] Fps is (10 sec: 9420.9, 60 sec: 9830.4, 300 sec: 9302.8). Total num frames: 32382976. Throughput: 0: 2394.5. Samples: 7094548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:43:16,030][06183] Avg episode reward: [(0, '4.234')] -[2023-02-22 21:43:16,819][28133] Updated weights for policy 0, policy_version 7908 (0.0015) -[2023-02-22 21:43:20,788][28133] Updated weights for policy 0, policy_version 7918 (0.0014) -[2023-02-22 21:43:21,028][06183] Fps is (10 sec: 9830.5, 60 sec: 9762.0, 300 sec: 9344.4). Total num frames: 32432128. Throughput: 0: 2393.3. Samples: 7102160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:43:21,035][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 21:43:25,323][28133] Updated weights for policy 0, policy_version 7928 (0.0016) -[2023-02-22 21:43:26,027][06183] Fps is (10 sec: 9420.7, 60 sec: 9625.6, 300 sec: 9386.1). Total num frames: 32477184. Throughput: 0: 2375.5. Samples: 7116280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:43:26,031][06183] Avg episode reward: [(0, '4.361')] -[2023-02-22 21:43:29,871][28133] Updated weights for policy 0, policy_version 7938 (0.0021) -[2023-02-22 21:43:31,027][06183] Fps is (10 sec: 9011.7, 60 sec: 9557.3, 300 sec: 9427.8). Total num frames: 32522240. Throughput: 0: 2346.7. Samples: 7129758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:43:31,030][06183] Avg episode reward: [(0, '4.522')] -[2023-02-22 21:43:34,523][28133] Updated weights for policy 0, policy_version 7948 (0.0016) -[2023-02-22 21:43:36,028][06183] Fps is (10 sec: 9010.9, 60 sec: 9420.7, 300 sec: 9469.4). Total num frames: 32567296. Throughput: 0: 2332.0. Samples: 7136368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:36,034][06183] Avg episode reward: [(0, '4.368')] -[2023-02-22 21:43:39,262][28133] Updated weights for policy 0, policy_version 7958 (0.0013) -[2023-02-22 21:43:41,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9284.3, 300 sec: 9497.2). Total num frames: 32608256. Throughput: 0: 2303.8. Samples: 7149492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:41,031][06183] Avg episode reward: [(0, '4.305')] -[2023-02-22 21:43:44,007][28133] Updated weights for policy 0, policy_version 7968 (0.0019) -[2023-02-22 21:43:46,027][06183] Fps is (10 sec: 8602.1, 60 sec: 9216.0, 300 sec: 9538.8). Total num frames: 32653312. Throughput: 0: 2276.8. Samples: 7162454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:43:46,030][06183] Avg episode reward: [(0, '4.455')] -[2023-02-22 21:43:48,821][28133] Updated weights for policy 0, policy_version 7978 (0.0018) -[2023-02-22 21:43:51,027][06183] Fps is (10 sec: 8601.5, 60 sec: 9079.5, 300 sec: 9566.6). Total num frames: 32694272. Throughput: 0: 2261.6. Samples: 7168778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:43:51,031][06183] Avg episode reward: [(0, '4.393')] -[2023-02-22 21:43:51,062][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007982_32694272.pth... -[2023-02-22 21:43:51,461][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007379_30224384.pth -[2023-02-22 21:43:53,655][28133] Updated weights for policy 0, policy_version 7988 (0.0017) -[2023-02-22 21:43:56,028][06183] Fps is (10 sec: 8191.6, 60 sec: 9011.2, 300 sec: 9594.3). Total num frames: 32735232. Throughput: 0: 2225.9. Samples: 7180974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:43:56,031][06183] Avg episode reward: [(0, '4.475')] -[2023-02-22 21:43:58,638][28133] Updated weights for policy 0, policy_version 7998 (0.0027) -[2023-02-22 21:44:01,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8874.6, 300 sec: 9622.1). Total num frames: 32776192. Throughput: 0: 2199.9. Samples: 7193542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:44:01,032][06183] Avg episode reward: [(0, '4.328')] -[2023-02-22 21:44:03,606][28133] Updated weights for policy 0, policy_version 8008 (0.0020) -[2023-02-22 21:44:06,027][06183] Fps is (10 sec: 8192.4, 60 sec: 8806.4, 300 sec: 9649.9). Total num frames: 32817152. Throughput: 0: 2169.1. Samples: 7199768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:44:06,030][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 21:44:08,586][28133] Updated weights for policy 0, policy_version 8018 (0.0022) -[2023-02-22 21:44:11,027][06183] Fps is (10 sec: 8192.2, 60 sec: 8738.2, 300 sec: 9663.8). Total num frames: 32858112. Throughput: 0: 2129.5. Samples: 7212106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:44:11,031][06183] Avg episode reward: [(0, '4.393')] -[2023-02-22 21:44:13,622][28133] Updated weights for policy 0, policy_version 8028 (0.0021) -[2023-02-22 21:44:16,028][06183] Fps is (10 sec: 8191.4, 60 sec: 8601.5, 300 sec: 9705.4). Total num frames: 32899072. Throughput: 0: 2097.6. Samples: 7224150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:44:16,034][06183] Avg episode reward: [(0, '4.182')] -[2023-02-22 21:44:18,684][28133] Updated weights for policy 0, policy_version 8038 (0.0018) -[2023-02-22 21:44:21,027][06183] Fps is (10 sec: 8191.8, 60 sec: 8465.1, 300 sec: 9733.2). Total num frames: 32940032. Throughput: 0: 2087.0. Samples: 7230284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:44:21,032][06183] Avg episode reward: [(0, '4.396')] -[2023-02-22 21:44:23,742][28133] Updated weights for policy 0, policy_version 8048 (0.0017) -[2023-02-22 21:44:26,028][06183] Fps is (10 sec: 8192.0, 60 sec: 8396.7, 300 sec: 9761.0). Total num frames: 32980992. Throughput: 0: 2065.3. Samples: 7242430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:44:26,034][06183] Avg episode reward: [(0, '4.221')] -[2023-02-22 21:44:28,849][28133] Updated weights for policy 0, policy_version 8058 (0.0019) -[2023-02-22 21:44:31,028][06183] Fps is (10 sec: 8191.5, 60 sec: 8328.4, 300 sec: 9788.7). Total num frames: 33021952. Throughput: 0: 2044.6. Samples: 7254462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:44:31,032][06183] Avg episode reward: [(0, '4.389')] -[2023-02-22 21:44:34,065][28133] Updated weights for policy 0, policy_version 8068 (0.0022) -[2023-02-22 21:44:36,027][06183] Fps is (10 sec: 7782.9, 60 sec: 8192.1, 300 sec: 9802.6). Total num frames: 33058816. Throughput: 0: 2035.7. Samples: 7260386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:44:36,032][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 21:44:39,163][28133] Updated weights for policy 0, policy_version 8078 (0.0020) -[2023-02-22 21:44:41,027][06183] Fps is (10 sec: 7783.0, 60 sec: 8192.0, 300 sec: 9844.3). Total num frames: 33099776. Throughput: 0: 2032.8. Samples: 7272448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:44:41,031][06183] Avg episode reward: [(0, '4.170')] -[2023-02-22 21:44:44,296][28133] Updated weights for policy 0, policy_version 8088 (0.0021) -[2023-02-22 21:44:46,027][06183] Fps is (10 sec: 8192.1, 60 sec: 8123.7, 300 sec: 9886.0). Total num frames: 33140736. Throughput: 0: 2015.7. Samples: 7284250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:44:46,032][06183] Avg episode reward: [(0, '4.280')] -[2023-02-22 21:44:49,649][28133] Updated weights for policy 0, policy_version 8098 (0.0022) -[2023-02-22 21:44:51,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8055.5, 300 sec: 9899.8). Total num frames: 33177600. Throughput: 0: 2003.9. Samples: 7289944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2023-02-22 21:44:51,031][06183] Avg episode reward: [(0, '4.280')] -[2023-02-22 21:44:54,842][28133] Updated weights for policy 0, policy_version 8108 (0.0016) -[2023-02-22 21:44:56,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8055.5, 300 sec: 9941.5). Total num frames: 33218560. Throughput: 0: 1991.9. Samples: 7301744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:44:56,030][06183] Avg episode reward: [(0, '4.366')] -[2023-02-22 21:45:00,252][28133] Updated weights for policy 0, policy_version 8118 (0.0024) -[2023-02-22 21:45:01,028][06183] Fps is (10 sec: 7781.9, 60 sec: 7987.1, 300 sec: 9955.4). Total num frames: 33255424. Throughput: 0: 1979.5. Samples: 7313228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:45:01,033][06183] Avg episode reward: [(0, '4.561')] -[2023-02-22 21:45:05,694][28133] Updated weights for policy 0, policy_version 8128 (0.0021) -[2023-02-22 21:45:06,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7918.9, 300 sec: 9969.3). Total num frames: 33292288. Throughput: 0: 1964.5. Samples: 7318686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:45:06,031][06183] Avg episode reward: [(0, '4.348')] -[2023-02-22 21:45:10,994][28133] Updated weights for policy 0, policy_version 8138 (0.0022) -[2023-02-22 21:45:11,027][06183] Fps is (10 sec: 7782.9, 60 sec: 7918.9, 300 sec: 10010.9). Total num frames: 33333248. Throughput: 0: 1952.4. Samples: 7330286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:45:11,032][06183] Avg episode reward: [(0, '4.372')] -[2023-02-22 21:45:16,028][06183] Fps is (10 sec: 7782.2, 60 sec: 7850.7, 300 sec: 10024.8). Total num frames: 33370112. Throughput: 0: 1934.9. Samples: 7341530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:45:16,031][06183] Avg episode reward: [(0, '4.298')] -[2023-02-22 21:45:16,413][28133] Updated weights for policy 0, policy_version 8148 (0.0028) -[2023-02-22 21:45:21,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7782.3, 300 sec: 10052.5). Total num frames: 33406976. Throughput: 0: 1926.9. Samples: 7347100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:45:21,033][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 21:45:21,949][28133] Updated weights for policy 0, policy_version 8158 (0.0027) -[2023-02-22 21:45:26,027][06183] Fps is (10 sec: 7373.0, 60 sec: 7714.2, 300 sec: 10066.4). Total num frames: 33443840. Throughput: 0: 1902.9. Samples: 7358078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:45:26,032][06183] Avg episode reward: [(0, '4.479')] -[2023-02-22 21:45:27,566][28133] Updated weights for policy 0, policy_version 8168 (0.0020) -[2023-02-22 21:45:31,028][06183] Fps is (10 sec: 7372.9, 60 sec: 7645.9, 300 sec: 10080.3). Total num frames: 33480704. Throughput: 0: 1885.0. Samples: 7369078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:45:31,032][06183] Avg episode reward: [(0, '4.361')] -[2023-02-22 21:45:33,187][28133] Updated weights for policy 0, policy_version 8178 (0.0028) -[2023-02-22 21:45:36,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7577.6, 300 sec: 9941.5). Total num frames: 33513472. Throughput: 0: 1879.0. Samples: 7374500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:45:36,033][06183] Avg episode reward: [(0, '4.529')] -[2023-02-22 21:45:38,934][28133] Updated weights for policy 0, policy_version 8188 (0.0025) -[2023-02-22 21:45:41,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7509.3, 300 sec: 9816.5). Total num frames: 33550336. Throughput: 0: 1858.6. Samples: 7385380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:45:41,031][06183] Avg episode reward: [(0, '4.507')] -[2023-02-22 21:45:44,566][28133] Updated weights for policy 0, policy_version 8198 (0.0025) -[2023-02-22 21:45:46,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7441.0, 300 sec: 9691.5). Total num frames: 33587200. Throughput: 0: 1842.8. Samples: 7396152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:45:46,032][06183] Avg episode reward: [(0, '4.359')] -[2023-02-22 21:45:50,459][28133] Updated weights for policy 0, policy_version 8208 (0.0026) -[2023-02-22 21:45:51,028][06183] Fps is (10 sec: 6962.5, 60 sec: 7372.7, 300 sec: 9580.4). Total num frames: 33619968. Throughput: 0: 1840.5. Samples: 7401510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:45:51,034][06183] Avg episode reward: [(0, '4.549')] -[2023-02-22 21:45:51,073][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008209_33624064.pth... -[2023-02-22 21:45:51,558][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007694_31514624.pth -[2023-02-22 21:45:56,027][06183] Fps is (10 sec: 6963.4, 60 sec: 7304.6, 300 sec: 9455.5). Total num frames: 33656832. Throughput: 0: 1812.0. Samples: 7411824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:45:56,031][06183] Avg episode reward: [(0, '4.381')] -[2023-02-22 21:45:56,234][28133] Updated weights for policy 0, policy_version 8218 (0.0020) -[2023-02-22 21:46:01,027][06183] Fps is (10 sec: 7373.5, 60 sec: 7304.6, 300 sec: 9344.4). Total num frames: 33693696. Throughput: 0: 1799.3. Samples: 7422498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:46:01,033][06183] Avg episode reward: [(0, '4.371')] -[2023-02-22 21:46:02,082][28133] Updated weights for policy 0, policy_version 8228 (0.0023) -[2023-02-22 21:46:06,028][06183] Fps is (10 sec: 6962.9, 60 sec: 7236.2, 300 sec: 9247.2). Total num frames: 33726464. Throughput: 0: 1790.4. Samples: 7427666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:46:06,033][06183] Avg episode reward: [(0, '4.463')] -[2023-02-22 21:46:07,963][28133] Updated weights for policy 0, policy_version 8238 (0.0025) -[2023-02-22 21:46:11,027][06183] Fps is (10 sec: 6553.6, 60 sec: 7099.7, 300 sec: 9136.2). Total num frames: 33759232. Throughput: 0: 1774.7. Samples: 7437940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:46:11,033][06183] Avg episode reward: [(0, '4.231')] -[2023-02-22 21:46:14,068][28133] Updated weights for policy 0, policy_version 8248 (0.0032) -[2023-02-22 21:46:16,027][06183] Fps is (10 sec: 6963.4, 60 sec: 7099.8, 300 sec: 9039.0). Total num frames: 33796096. Throughput: 0: 1756.9. Samples: 7448136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:46:16,031][06183] Avg episode reward: [(0, '4.292')] -[2023-02-22 21:46:19,940][28133] Updated weights for policy 0, policy_version 8258 (0.0032) -[2023-02-22 21:46:21,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 8927.9). Total num frames: 33828864. Throughput: 0: 1751.1. Samples: 7453300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:46:21,031][06183] Avg episode reward: [(0, '4.286')] -[2023-02-22 21:46:25,754][28133] Updated weights for policy 0, policy_version 8268 (0.0026) -[2023-02-22 21:46:26,027][06183] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 8830.8). Total num frames: 33865728. Throughput: 0: 1742.8. Samples: 7463806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:46:26,032][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 21:46:31,028][06183] Fps is (10 sec: 7372.1, 60 sec: 7031.4, 300 sec: 8733.5). Total num frames: 33902592. Throughput: 0: 1737.8. Samples: 7474356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:46:31,034][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:46:31,685][28133] Updated weights for policy 0, policy_version 8278 (0.0021) -[2023-02-22 21:46:36,027][06183] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 8622.4). Total num frames: 33935360. Throughput: 0: 1726.7. Samples: 7479208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:46:36,032][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 21:46:37,617][28133] Updated weights for policy 0, policy_version 8288 (0.0022) -[2023-02-22 21:46:41,028][06183] Fps is (10 sec: 6553.8, 60 sec: 6963.1, 300 sec: 8525.2). Total num frames: 33968128. Throughput: 0: 1725.6. Samples: 7489476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:46:41,034][06183] Avg episode reward: [(0, '4.495')] -[2023-02-22 21:46:43,700][28133] Updated weights for policy 0, policy_version 8298 (0.0028) -[2023-02-22 21:46:46,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 8441.9). Total num frames: 34000896. Throughput: 0: 1715.4. Samples: 7499692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2023-02-22 21:46:46,033][06183] Avg episode reward: [(0, '4.451')] -[2023-02-22 21:46:49,739][28133] Updated weights for policy 0, policy_version 8308 (0.0024) -[2023-02-22 21:46:51,028][06183] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 8372.5). Total num frames: 34037760. Throughput: 0: 1715.8. Samples: 7504878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:46:51,037][06183] Avg episode reward: [(0, '4.617')] -[2023-02-22 21:46:55,966][28133] Updated weights for policy 0, policy_version 8318 (0.0031) -[2023-02-22 21:46:56,028][06183] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 8289.2). Total num frames: 34070528. Throughput: 0: 1708.1. Samples: 7514804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:46:56,032][06183] Avg episode reward: [(0, '4.220')] -[2023-02-22 21:47:01,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6826.6, 300 sec: 8205.9). Total num frames: 34103296. Throughput: 0: 1705.0. Samples: 7524862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:47:01,033][06183] Avg episode reward: [(0, '4.376')] -[2023-02-22 21:47:02,117][28133] Updated weights for policy 0, policy_version 8328 (0.0026) -[2023-02-22 21:47:06,027][06183] Fps is (10 sec: 6554.0, 60 sec: 6826.7, 300 sec: 8122.6). Total num frames: 34136064. Throughput: 0: 1699.5. Samples: 7529776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:47:06,031][06183] Avg episode reward: [(0, '4.421')] -[2023-02-22 21:47:08,284][28133] Updated weights for policy 0, policy_version 8338 (0.0028) -[2023-02-22 21:47:11,027][06183] Fps is (10 sec: 6553.9, 60 sec: 6826.7, 300 sec: 8053.1). Total num frames: 34168832. Throughput: 0: 1686.5. Samples: 7539698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:47:11,032][06183] Avg episode reward: [(0, '4.384')] -[2023-02-22 21:47:14,565][28133] Updated weights for policy 0, policy_version 8348 (0.0029) -[2023-02-22 21:47:16,029][06183] Fps is (10 sec: 6552.8, 60 sec: 6758.3, 300 sec: 7983.7). Total num frames: 34201600. Throughput: 0: 1672.6. Samples: 7549624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:47:16,033][06183] Avg episode reward: [(0, '4.608')] -[2023-02-22 21:47:20,898][28133] Updated weights for policy 0, policy_version 8358 (0.0031) -[2023-02-22 21:47:21,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 7914.3). Total num frames: 34234368. Throughput: 0: 1669.9. Samples: 7554356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:47:21,032][06183] Avg episode reward: [(0, '4.424')] -[2023-02-22 21:47:26,028][06183] Fps is (10 sec: 6144.6, 60 sec: 6621.9, 300 sec: 7844.9). Total num frames: 34263040. Throughput: 0: 1656.1. Samples: 7564000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:26,034][06183] Avg episode reward: [(0, '4.320')] -[2023-02-22 21:47:27,287][28133] Updated weights for policy 0, policy_version 8368 (0.0025) -[2023-02-22 21:47:31,055][06183] Fps is (10 sec: 6535.3, 60 sec: 6618.9, 300 sec: 7788.6). Total num frames: 34299904. Throughput: 0: 1651.3. Samples: 7574048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:31,065][06183] Avg episode reward: [(0, '4.518')] -[2023-02-22 21:47:33,431][28133] Updated weights for policy 0, policy_version 8378 (0.0023) -[2023-02-22 21:47:36,027][06183] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 7733.8). Total num frames: 34332672. Throughput: 0: 1646.4. Samples: 7578966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:36,031][06183] Avg episode reward: [(0, '4.252')] -[2023-02-22 21:47:39,619][28133] Updated weights for policy 0, policy_version 8388 (0.0031) -[2023-02-22 21:47:41,027][06183] Fps is (10 sec: 6572.1, 60 sec: 6621.9, 300 sec: 7678.3). Total num frames: 34365440. Throughput: 0: 1646.0. Samples: 7588872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:47:41,032][06183] Avg episode reward: [(0, '4.186')] -[2023-02-22 21:47:45,925][28133] Updated weights for policy 0, policy_version 8398 (0.0037) -[2023-02-22 21:47:46,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 7622.7). Total num frames: 34398208. Throughput: 0: 1640.6. Samples: 7598686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:46,032][06183] Avg episode reward: [(0, '4.329')] -[2023-02-22 21:47:51,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 7567.2). Total num frames: 34426880. Throughput: 0: 1632.9. Samples: 7603256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:51,032][06183] Avg episode reward: [(0, '4.353')] -[2023-02-22 21:47:51,178][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008406_34430976.pth... -[2023-02-22 21:47:51,672][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000007982_32694272.pth -[2023-02-22 21:47:52,427][28133] Updated weights for policy 0, policy_version 8408 (0.0031) -[2023-02-22 21:47:56,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 7511.6). Total num frames: 34459648. Throughput: 0: 1627.2. Samples: 7612924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:47:56,038][06183] Avg episode reward: [(0, '4.279')] -[2023-02-22 21:47:58,757][28133] Updated weights for policy 0, policy_version 8418 (0.0028) -[2023-02-22 21:48:01,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6485.4, 300 sec: 7470.0). Total num frames: 34492416. Throughput: 0: 1618.9. Samples: 7622472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:48:01,032][06183] Avg episode reward: [(0, '4.386')] -[2023-02-22 21:48:05,102][28133] Updated weights for policy 0, policy_version 8428 (0.0040) -[2023-02-22 21:48:06,028][06183] Fps is (10 sec: 6553.8, 60 sec: 6485.3, 300 sec: 7428.3). Total num frames: 34525184. Throughput: 0: 1620.7. Samples: 7627288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:48:06,035][06183] Avg episode reward: [(0, '4.734')] -[2023-02-22 21:48:11,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 7372.8). Total num frames: 34557952. Throughput: 0: 1624.6. Samples: 7637108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:48:11,033][06183] Avg episode reward: [(0, '4.717')] -[2023-02-22 21:48:11,368][28133] Updated weights for policy 0, policy_version 8438 (0.0024) -[2023-02-22 21:48:16,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6485.5, 300 sec: 7317.3). Total num frames: 34590720. Throughput: 0: 1620.0. Samples: 7646904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:48:16,033][06183] Avg episode reward: [(0, '4.417')] -[2023-02-22 21:48:17,773][28133] Updated weights for policy 0, policy_version 8448 (0.0023) -[2023-02-22 21:48:21,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 7261.7). Total num frames: 34619392. Throughput: 0: 1616.1. Samples: 7651692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:48:21,032][06183] Avg episode reward: [(0, '4.164')] -[2023-02-22 21:48:24,439][28133] Updated weights for policy 0, policy_version 8458 (0.0030) -[2023-02-22 21:48:26,028][06183] Fps is (10 sec: 6143.6, 60 sec: 6485.3, 300 sec: 7220.0). Total num frames: 34652160. Throughput: 0: 1601.2. Samples: 7660928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:48:26,033][06183] Avg episode reward: [(0, '4.355')] -[2023-02-22 21:48:30,775][28133] Updated weights for policy 0, policy_version 8468 (0.0035) -[2023-02-22 21:48:31,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6420.1, 300 sec: 7178.4). Total num frames: 34684928. Throughput: 0: 1599.3. Samples: 7670656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:48:31,032][06183] Avg episode reward: [(0, '4.443')] -[2023-02-22 21:48:36,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 7150.6). Total num frames: 34717696. Throughput: 0: 1602.2. Samples: 7675354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:48:36,035][06183] Avg episode reward: [(0, '4.418')] -[2023-02-22 21:48:37,292][28133] Updated weights for policy 0, policy_version 8478 (0.0037) -[2023-02-22 21:48:41,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 7095.1). Total num frames: 34746368. Throughput: 0: 1599.3. Samples: 7684894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:48:41,032][06183] Avg episode reward: [(0, '4.318')] -[2023-02-22 21:48:43,658][28133] Updated weights for policy 0, policy_version 8488 (0.0028) -[2023-02-22 21:48:46,027][06183] Fps is (10 sec: 6144.5, 60 sec: 6348.8, 300 sec: 7067.3). Total num frames: 34779136. Throughput: 0: 1598.0. Samples: 7694382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:48:46,033][06183] Avg episode reward: [(0, '4.260')] -[2023-02-22 21:48:50,064][28133] Updated weights for policy 0, policy_version 8498 (0.0031) -[2023-02-22 21:48:51,030][06183] Fps is (10 sec: 6552.5, 60 sec: 6416.9, 300 sec: 7039.5). Total num frames: 34811904. Throughput: 0: 1596.8. Samples: 7699146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:48:51,034][06183] Avg episode reward: [(0, '4.169')] -[2023-02-22 21:48:56,028][06183] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6997.9). Total num frames: 34840576. Throughput: 0: 1585.3. Samples: 7708446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:48:56,034][06183] Avg episode reward: [(0, '4.449')] -[2023-02-22 21:48:57,270][28133] Updated weights for policy 0, policy_version 8508 (0.0039) -[2023-02-22 21:49:01,028][06183] Fps is (10 sec: 5735.1, 60 sec: 6280.5, 300 sec: 6956.2). Total num frames: 34869248. Throughput: 0: 1548.3. Samples: 7716580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:01,032][06183] Avg episode reward: [(0, '4.387')] -[2023-02-22 21:49:04,094][28133] Updated weights for policy 0, policy_version 8518 (0.0026) -[2023-02-22 21:49:06,040][06183] Fps is (10 sec: 6136.2, 60 sec: 6279.2, 300 sec: 6928.2). Total num frames: 34902016. Throughput: 0: 1544.4. Samples: 7721208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:49:06,052][06183] Avg episode reward: [(0, '4.286')] -[2023-02-22 21:49:10,648][28133] Updated weights for policy 0, policy_version 8528 (0.0039) -[2023-02-22 21:49:11,028][06183] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6886.8). Total num frames: 34930688. Throughput: 0: 1547.6. Samples: 7730568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:49:11,034][06183] Avg episode reward: [(0, '4.269')] -[2023-02-22 21:49:16,028][06183] Fps is (10 sec: 6151.3, 60 sec: 6212.2, 300 sec: 6859.0). Total num frames: 34963456. Throughput: 0: 1540.6. Samples: 7739984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:16,037][06183] Avg episode reward: [(0, '4.350')] -[2023-02-22 21:49:17,179][28133] Updated weights for policy 0, policy_version 8538 (0.0029) -[2023-02-22 21:49:21,027][06183] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6817.4). Total num frames: 34992128. Throughput: 0: 1540.4. Samples: 7744672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:21,032][06183] Avg episode reward: [(0, '4.517')] -[2023-02-22 21:49:22,226][28133] Updated weights for policy 0, policy_version 8548 (0.0026) -[2023-02-22 21:49:24,809][28133] Updated weights for policy 0, policy_version 8558 (0.0010) -[2023-02-22 21:49:26,027][06183] Fps is (10 sec: 10651.1, 60 sec: 6963.3, 300 sec: 6942.4). Total num frames: 35069952. Throughput: 0: 1729.1. Samples: 7762702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:26,029][06183] Avg episode reward: [(0, '4.489')] -[2023-02-22 21:49:27,616][28133] Updated weights for policy 0, policy_version 8568 (0.0010) -[2023-02-22 21:49:30,220][28133] Updated weights for policy 0, policy_version 8578 (0.0009) -[2023-02-22 21:49:31,027][06183] Fps is (10 sec: 15155.5, 60 sec: 7645.9, 300 sec: 7067.3). Total num frames: 35143680. Throughput: 0: 2015.2. Samples: 7785064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:49:31,029][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 21:49:33,252][28133] Updated weights for policy 0, policy_version 8588 (0.0010) -[2023-02-22 21:49:35,849][28133] Updated weights for policy 0, policy_version 8598 (0.0009) -[2023-02-22 21:49:36,027][06183] Fps is (10 sec: 14745.3, 60 sec: 8328.7, 300 sec: 7178.4). Total num frames: 35217408. Throughput: 0: 2135.4. Samples: 7795234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:49:36,031][06183] Avg episode reward: [(0, '4.341')] -[2023-02-22 21:49:38,874][28133] Updated weights for policy 0, policy_version 8608 (0.0009) -[2023-02-22 21:49:41,027][06183] Fps is (10 sec: 14336.0, 60 sec: 9011.3, 300 sec: 7275.6). Total num frames: 35287040. Throughput: 0: 2408.6. Samples: 7816832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:49:41,029][06183] Avg episode reward: [(0, '4.306')] -[2023-02-22 21:49:41,858][28133] Updated weights for policy 0, policy_version 8618 (0.0010) -[2023-02-22 21:49:44,799][28133] Updated weights for policy 0, policy_version 8628 (0.0011) -[2023-02-22 21:49:46,027][06183] Fps is (10 sec: 13926.3, 60 sec: 9625.6, 300 sec: 7386.7). Total num frames: 35356672. Throughput: 0: 2709.2. Samples: 7838492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) -[2023-02-22 21:49:46,030][06183] Avg episode reward: [(0, '4.395')] -[2023-02-22 21:49:47,724][28133] Updated weights for policy 0, policy_version 8638 (0.0012) -[2023-02-22 21:49:50,424][28133] Updated weights for policy 0, policy_version 8648 (0.0010) -[2023-02-22 21:49:51,027][06183] Fps is (10 sec: 13926.3, 60 sec: 10240.3, 300 sec: 7483.9). Total num frames: 35426304. Throughput: 0: 2836.3. Samples: 7848804. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:51,030][06183] Avg episode reward: [(0, '4.370')] -[2023-02-22 21:49:51,124][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008650_35430400.pth... -[2023-02-22 21:49:51,348][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008209_33624064.pth -[2023-02-22 21:49:53,475][28133] Updated weights for policy 0, policy_version 8658 (0.0011) -[2023-02-22 21:49:56,027][06183] Fps is (10 sec: 13926.7, 60 sec: 10922.8, 300 sec: 7595.0). Total num frames: 35495936. Throughput: 0: 3103.5. Samples: 7870226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:49:56,029][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 21:49:56,313][28133] Updated weights for policy 0, policy_version 8668 (0.0013) -[2023-02-22 21:49:59,328][28133] Updated weights for policy 0, policy_version 8678 (0.0011) -[2023-02-22 21:50:01,027][06183] Fps is (10 sec: 13926.4, 60 sec: 11605.5, 300 sec: 7706.0). Total num frames: 35565568. Throughput: 0: 3340.7. Samples: 7890312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:50:01,030][06183] Avg episode reward: [(0, '4.588')] -[2023-02-22 21:50:02,529][28133] Updated weights for policy 0, policy_version 8688 (0.0013) -[2023-02-22 21:50:05,678][28133] Updated weights for policy 0, policy_version 8698 (0.0010) -[2023-02-22 21:50:06,027][06183] Fps is (10 sec: 13516.8, 60 sec: 12154.1, 300 sec: 7789.3). Total num frames: 35631104. Throughput: 0: 3454.0. Samples: 7900100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:50:06,029][06183] Avg episode reward: [(0, '4.265')] -[2023-02-22 21:50:08,870][28133] Updated weights for policy 0, policy_version 8708 (0.0011) -[2023-02-22 21:50:11,027][06183] Fps is (10 sec: 12697.7, 60 sec: 12697.7, 300 sec: 7872.7). Total num frames: 35692544. Throughput: 0: 3473.1. Samples: 7918994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:50:11,030][06183] Avg episode reward: [(0, '4.320')] -[2023-02-22 21:50:12,384][28133] Updated weights for policy 0, policy_version 8718 (0.0012) -[2023-02-22 21:50:15,493][28133] Updated weights for policy 0, policy_version 8728 (0.0012) -[2023-02-22 21:50:16,027][06183] Fps is (10 sec: 12287.6, 60 sec: 13175.7, 300 sec: 7956.0). Total num frames: 35753984. Throughput: 0: 3390.0. Samples: 7937614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:50:16,030][06183] Avg episode reward: [(0, '4.256')] -[2023-02-22 21:50:18,940][28133] Updated weights for policy 0, policy_version 8738 (0.0011) -[2023-02-22 21:50:21,027][06183] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 8039.3). Total num frames: 35815424. Throughput: 0: 3361.3. Samples: 7946490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:50:21,030][06183] Avg episode reward: [(0, '4.452')] -[2023-02-22 21:50:22,293][28133] Updated weights for policy 0, policy_version 8748 (0.0013) -[2023-02-22 21:50:25,738][28133] Updated weights for policy 0, policy_version 8758 (0.0012) -[2023-02-22 21:50:26,027][06183] Fps is (10 sec: 11878.7, 60 sec: 13380.2, 300 sec: 8108.7). Total num frames: 35872768. Throughput: 0: 3281.6. Samples: 7964506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:50:26,035][06183] Avg episode reward: [(0, '4.442')] -[2023-02-22 21:50:29,226][28133] Updated weights for policy 0, policy_version 8768 (0.0017) -[2023-02-22 21:50:31,027][06183] Fps is (10 sec: 11468.8, 60 sec: 13107.2, 300 sec: 8192.0). Total num frames: 35930112. Throughput: 0: 3193.7. Samples: 7982208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:50:31,030][06183] Avg episode reward: [(0, '4.080')] -[2023-02-22 21:50:32,791][28133] Updated weights for policy 0, policy_version 8778 (0.0016) -[2023-02-22 21:50:36,027][06183] Fps is (10 sec: 11468.9, 60 sec: 12834.2, 300 sec: 8261.4). Total num frames: 35987456. Throughput: 0: 3151.8. Samples: 7990634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:50:36,030][06183] Avg episode reward: [(0, '4.418')] -[2023-02-22 21:50:36,488][28133] Updated weights for policy 0, policy_version 8788 (0.0013) -[2023-02-22 21:50:40,195][28133] Updated weights for policy 0, policy_version 8798 (0.0012) -[2023-02-22 21:50:41,027][06183] Fps is (10 sec: 11468.8, 60 sec: 12629.3, 300 sec: 8330.9). Total num frames: 36044800. Throughput: 0: 3046.4. Samples: 8007314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:50:41,030][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 21:50:44,033][28133] Updated weights for policy 0, policy_version 8808 (0.0013) -[2023-02-22 21:50:46,027][06183] Fps is (10 sec: 11059.1, 60 sec: 12356.3, 300 sec: 8400.3). Total num frames: 36098048. Throughput: 0: 2961.8. Samples: 8023592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:50:46,030][06183] Avg episode reward: [(0, '4.408')] -[2023-02-22 21:50:47,746][28133] Updated weights for policy 0, policy_version 8818 (0.0015) -[2023-02-22 21:50:51,027][06183] Fps is (10 sec: 10649.6, 60 sec: 12083.2, 300 sec: 8455.8). Total num frames: 36151296. Throughput: 0: 2923.9. Samples: 8031676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:50:51,030][06183] Avg episode reward: [(0, '4.246')] -[2023-02-22 21:50:51,658][28133] Updated weights for policy 0, policy_version 8828 (0.0012) -[2023-02-22 21:50:55,521][28133] Updated weights for policy 0, policy_version 8838 (0.0011) -[2023-02-22 21:50:56,027][06183] Fps is (10 sec: 10649.4, 60 sec: 11810.1, 300 sec: 8511.4). Total num frames: 36204544. Throughput: 0: 2857.1. Samples: 8047562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:50:56,031][06183] Avg episode reward: [(0, '4.508')] -[2023-02-22 21:50:59,431][28133] Updated weights for policy 0, policy_version 8848 (0.0014) -[2023-02-22 21:51:01,027][06183] Fps is (10 sec: 10240.1, 60 sec: 11468.8, 300 sec: 8566.9). Total num frames: 36253696. Throughput: 0: 2785.7. Samples: 8062970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:51:01,032][06183] Avg episode reward: [(0, '4.499')] -[2023-02-22 21:51:03,526][28133] Updated weights for policy 0, policy_version 8858 (0.0016) -[2023-02-22 21:51:06,027][06183] Fps is (10 sec: 9830.6, 60 sec: 11195.7, 300 sec: 8622.4). Total num frames: 36302848. Throughput: 0: 2754.4. Samples: 8070436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:51:06,031][06183] Avg episode reward: [(0, '4.593')] -[2023-02-22 21:51:07,510][28133] Updated weights for policy 0, policy_version 8868 (0.0021) -[2023-02-22 21:51:11,027][06183] Fps is (10 sec: 10240.0, 60 sec: 11059.2, 300 sec: 8678.0). Total num frames: 36356096. Throughput: 0: 2702.7. Samples: 8086126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:11,030][06183] Avg episode reward: [(0, '4.244')] -[2023-02-22 21:51:11,583][28133] Updated weights for policy 0, policy_version 8878 (0.0012) -[2023-02-22 21:51:15,839][28133] Updated weights for policy 0, policy_version 8888 (0.0016) -[2023-02-22 21:51:16,027][06183] Fps is (10 sec: 10239.9, 60 sec: 10854.4, 300 sec: 8733.5). Total num frames: 36405248. Throughput: 0: 2628.6. Samples: 8100494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:16,030][06183] Avg episode reward: [(0, '4.305')] -[2023-02-22 21:51:20,156][28133] Updated weights for policy 0, policy_version 8898 (0.0018) -[2023-02-22 21:51:21,027][06183] Fps is (10 sec: 9830.4, 60 sec: 10649.6, 300 sec: 8775.2). Total num frames: 36454400. Throughput: 0: 2598.8. Samples: 8107582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:21,030][06183] Avg episode reward: [(0, '4.363')] -[2023-02-22 21:51:24,506][28133] Updated weights for policy 0, policy_version 8908 (0.0013) -[2023-02-22 21:51:26,027][06183] Fps is (10 sec: 9420.9, 60 sec: 10444.8, 300 sec: 8803.0). Total num frames: 36499456. Throughput: 0: 2542.6. Samples: 8121730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:26,031][06183] Avg episode reward: [(0, '4.694')] -[2023-02-22 21:51:28,952][28133] Updated weights for policy 0, policy_version 8918 (0.0020) -[2023-02-22 21:51:31,027][06183] Fps is (10 sec: 9011.3, 60 sec: 10240.0, 300 sec: 8844.6). Total num frames: 36544512. Throughput: 0: 2486.0. Samples: 8135462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:31,030][06183] Avg episode reward: [(0, '4.492')] -[2023-02-22 21:51:33,466][28133] Updated weights for policy 0, policy_version 8928 (0.0019) -[2023-02-22 21:51:36,027][06183] Fps is (10 sec: 9011.2, 60 sec: 10035.2, 300 sec: 8886.3). Total num frames: 36589568. Throughput: 0: 2460.8. Samples: 8142412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:51:36,031][06183] Avg episode reward: [(0, '4.483')] -[2023-02-22 21:51:38,000][28133] Updated weights for policy 0, policy_version 8938 (0.0018) -[2023-02-22 21:51:41,028][06183] Fps is (10 sec: 9010.4, 60 sec: 9830.3, 300 sec: 8927.9). Total num frames: 36634624. Throughput: 0: 2403.5. Samples: 8155720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:41,034][06183] Avg episode reward: [(0, '4.332')] -[2023-02-22 21:51:42,655][28133] Updated weights for policy 0, policy_version 8948 (0.0017) -[2023-02-22 21:51:46,027][06183] Fps is (10 sec: 9011.1, 60 sec: 9693.9, 300 sec: 8955.7). Total num frames: 36679680. Throughput: 0: 2354.6. Samples: 8168926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:46,031][06183] Avg episode reward: [(0, '4.393')] -[2023-02-22 21:51:47,295][28133] Updated weights for policy 0, policy_version 8958 (0.0016) -[2023-02-22 21:51:51,027][06183] Fps is (10 sec: 9011.9, 60 sec: 9557.3, 300 sec: 8997.3). Total num frames: 36724736. Throughput: 0: 2338.5. Samples: 8175670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:51:51,031][06183] Avg episode reward: [(0, '4.379')] -[2023-02-22 21:51:51,060][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008966_36724736.pth... -[2023-02-22 21:51:51,503][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008406_34430976.pth -[2023-02-22 21:51:51,979][28133] Updated weights for policy 0, policy_version 8968 (0.0020) -[2023-02-22 21:51:56,028][06183] Fps is (10 sec: 8600.8, 60 sec: 9352.4, 300 sec: 9025.1). Total num frames: 36765696. Throughput: 0: 2279.4. Samples: 8188702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:51:56,032][06183] Avg episode reward: [(0, '4.357')] -[2023-02-22 21:51:56,628][28133] Updated weights for policy 0, policy_version 8978 (0.0017) -[2023-02-22 21:52:01,027][06183] Fps is (10 sec: 8601.6, 60 sec: 9284.2, 300 sec: 9066.7). Total num frames: 36810752. Throughput: 0: 2254.8. Samples: 8201962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:01,031][06183] Avg episode reward: [(0, '4.515')] -[2023-02-22 21:52:01,375][28133] Updated weights for policy 0, policy_version 8988 (0.0020) -[2023-02-22 21:52:06,028][06183] Fps is (10 sec: 8601.9, 60 sec: 9147.6, 300 sec: 9094.5). Total num frames: 36851712. Throughput: 0: 2237.9. Samples: 8208288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:52:06,034][06183] Avg episode reward: [(0, '4.463')] -[2023-02-22 21:52:06,115][28133] Updated weights for policy 0, policy_version 8998 (0.0015) -[2023-02-22 21:52:10,891][28133] Updated weights for policy 0, policy_version 9008 (0.0020) -[2023-02-22 21:52:11,028][06183] Fps is (10 sec: 8600.6, 60 sec: 9011.0, 300 sec: 9136.2). Total num frames: 36896768. Throughput: 0: 2210.0. Samples: 8221182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:11,032][06183] Avg episode reward: [(0, '4.395')] -[2023-02-22 21:52:15,724][28133] Updated weights for policy 0, policy_version 9018 (0.0022) -[2023-02-22 21:52:16,027][06183] Fps is (10 sec: 8602.1, 60 sec: 8874.7, 300 sec: 9163.9). Total num frames: 36937728. Throughput: 0: 2186.6. Samples: 8233858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:16,031][06183] Avg episode reward: [(0, '4.317')] -[2023-02-22 21:52:20,505][28133] Updated weights for policy 0, policy_version 9028 (0.0017) -[2023-02-22 21:52:21,027][06183] Fps is (10 sec: 8602.5, 60 sec: 8806.4, 300 sec: 9219.5). Total num frames: 36982784. Throughput: 0: 2172.9. Samples: 8240194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:21,030][06183] Avg episode reward: [(0, '4.374')] -[2023-02-22 21:52:25,546][28133] Updated weights for policy 0, policy_version 9038 (0.0015) -[2023-02-22 21:52:26,027][06183] Fps is (10 sec: 8191.9, 60 sec: 8669.8, 300 sec: 9220.4). Total num frames: 37019648. Throughput: 0: 2155.8. Samples: 8252730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:26,032][06183] Avg episode reward: [(0, '4.376')] -[2023-02-22 21:52:30,443][28133] Updated weights for policy 0, policy_version 9048 (0.0019) -[2023-02-22 21:52:31,028][06183] Fps is (10 sec: 8191.6, 60 sec: 8669.8, 300 sec: 9261.1). Total num frames: 37064704. Throughput: 0: 2137.4. Samples: 8265112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:52:31,032][06183] Avg episode reward: [(0, '4.332')] -[2023-02-22 21:52:35,476][28133] Updated weights for policy 0, policy_version 9058 (0.0020) -[2023-02-22 21:52:36,028][06183] Fps is (10 sec: 8601.2, 60 sec: 8601.5, 300 sec: 9288.9). Total num frames: 37105664. Throughput: 0: 2121.5. Samples: 8271138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:52:36,031][06183] Avg episode reward: [(0, '4.441')] -[2023-02-22 21:52:40,507][28133] Updated weights for policy 0, policy_version 9068 (0.0016) -[2023-02-22 21:52:41,027][06183] Fps is (10 sec: 8192.4, 60 sec: 8533.4, 300 sec: 9316.7). Total num frames: 37146624. Throughput: 0: 2103.5. Samples: 8283356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:52:41,033][06183] Avg episode reward: [(0, '4.294')] -[2023-02-22 21:52:45,541][28133] Updated weights for policy 0, policy_version 9078 (0.0017) -[2023-02-22 21:52:46,028][06183] Fps is (10 sec: 7782.6, 60 sec: 8396.8, 300 sec: 9344.4). Total num frames: 37183488. Throughput: 0: 2079.3. Samples: 8295530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:52:46,033][06183] Avg episode reward: [(0, '4.425')] -[2023-02-22 21:52:50,672][28133] Updated weights for policy 0, policy_version 9088 (0.0022) -[2023-02-22 21:52:51,027][06183] Fps is (10 sec: 7782.5, 60 sec: 8328.5, 300 sec: 9372.2). Total num frames: 37224448. Throughput: 0: 2074.7. Samples: 8301648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:52:51,030][06183] Avg episode reward: [(0, '4.417')] -[2023-02-22 21:52:55,802][28133] Updated weights for policy 0, policy_version 9098 (0.0018) -[2023-02-22 21:52:56,027][06183] Fps is (10 sec: 8192.3, 60 sec: 8328.7, 300 sec: 9400.0). Total num frames: 37265408. Throughput: 0: 2054.1. Samples: 8313614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:52:56,032][06183] Avg episode reward: [(0, '4.570')] -[2023-02-22 21:53:00,936][28133] Updated weights for policy 0, policy_version 9108 (0.0016) -[2023-02-22 21:53:01,027][06183] Fps is (10 sec: 8191.9, 60 sec: 8260.2, 300 sec: 9427.7). Total num frames: 37306368. Throughput: 0: 2038.3. Samples: 8325584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:01,031][06183] Avg episode reward: [(0, '4.612')] -[2023-02-22 21:53:06,022][28133] Updated weights for policy 0, policy_version 9118 (0.0022) -[2023-02-22 21:53:06,027][06183] Fps is (10 sec: 8192.0, 60 sec: 8260.3, 300 sec: 9455.5). Total num frames: 37347328. Throughput: 0: 2029.0. Samples: 8331500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:06,030][06183] Avg episode reward: [(0, '4.461')] -[2023-02-22 21:53:11,027][06183] Fps is (10 sec: 7782.5, 60 sec: 8123.9, 300 sec: 9469.4). Total num frames: 37384192. Throughput: 0: 2013.6. Samples: 8343344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:11,033][06183] Avg episode reward: [(0, '4.325')] -[2023-02-22 21:53:11,245][28133] Updated weights for policy 0, policy_version 9128 (0.0025) -[2023-02-22 21:53:16,027][06183] Fps is (10 sec: 7782.3, 60 sec: 8123.7, 300 sec: 9511.1). Total num frames: 37425152. Throughput: 0: 1998.8. Samples: 8355058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:53:16,030][06183] Avg episode reward: [(0, '4.382')] -[2023-02-22 21:53:16,541][28133] Updated weights for policy 0, policy_version 9138 (0.0017) -[2023-02-22 21:53:21,027][06183] Fps is (10 sec: 7782.2, 60 sec: 7987.2, 300 sec: 9525.0). Total num frames: 37462016. Throughput: 0: 1990.3. Samples: 8360700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:53:21,031][06183] Avg episode reward: [(0, '4.558')] -[2023-02-22 21:53:21,941][28133] Updated weights for policy 0, policy_version 9148 (0.0023) -[2023-02-22 21:53:26,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7987.2, 300 sec: 9538.8). Total num frames: 37498880. Throughput: 0: 1971.4. Samples: 8372068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:53:26,031][06183] Avg episode reward: [(0, '4.429')] -[2023-02-22 21:53:27,300][28133] Updated weights for policy 0, policy_version 9158 (0.0024) -[2023-02-22 21:53:31,027][06183] Fps is (10 sec: 7372.7, 60 sec: 7850.7, 300 sec: 9552.7). Total num frames: 37535744. Throughput: 0: 1951.7. Samples: 8383358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:53:31,032][06183] Avg episode reward: [(0, '4.413')] -[2023-02-22 21:53:32,830][28133] Updated weights for policy 0, policy_version 9168 (0.0020) -[2023-02-22 21:53:36,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7782.5, 300 sec: 9580.5). Total num frames: 37572608. Throughput: 0: 1940.1. Samples: 8388954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:36,032][06183] Avg episode reward: [(0, '4.412')] -[2023-02-22 21:53:38,318][28133] Updated weights for policy 0, policy_version 9178 (0.0021) -[2023-02-22 21:53:41,027][06183] Fps is (10 sec: 7373.0, 60 sec: 7714.1, 300 sec: 9594.4). Total num frames: 37609472. Throughput: 0: 1924.7. Samples: 8400224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:41,032][06183] Avg episode reward: [(0, '4.687')] -[2023-02-22 21:53:43,838][28133] Updated weights for policy 0, policy_version 9188 (0.0025) -[2023-02-22 21:53:46,027][06183] Fps is (10 sec: 7372.8, 60 sec: 7714.2, 300 sec: 9608.3). Total num frames: 37646336. Throughput: 0: 1905.2. Samples: 8411318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:53:46,031][06183] Avg episode reward: [(0, '4.552')] -[2023-02-22 21:53:49,436][28133] Updated weights for policy 0, policy_version 9198 (0.0024) -[2023-02-22 21:53:51,028][06183] Fps is (10 sec: 7782.0, 60 sec: 7714.1, 300 sec: 9649.9). Total num frames: 37687296. Throughput: 0: 1897.0. Samples: 8416864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:53:51,041][06183] Avg episode reward: [(0, '4.400')] -[2023-02-22 21:53:51,104][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009201_37687296.pth... -[2023-02-22 21:53:51,622][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008650_35430400.pth -[2023-02-22 21:53:55,049][28133] Updated weights for policy 0, policy_version 9208 (0.0023) -[2023-02-22 21:53:56,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7577.5, 300 sec: 9663.8). Total num frames: 37720064. Throughput: 0: 1874.6. Samples: 8427700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:53:56,033][06183] Avg episode reward: [(0, '4.554')] -[2023-02-22 21:54:00,649][28133] Updated weights for policy 0, policy_version 9218 (0.0024) -[2023-02-22 21:54:01,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7509.3, 300 sec: 9678.1). Total num frames: 37756928. Throughput: 0: 1859.0. Samples: 8438712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:54:01,031][06183] Avg episode reward: [(0, '4.391')] -[2023-02-22 21:54:06,028][06183] Fps is (10 sec: 7373.0, 60 sec: 7441.0, 300 sec: 9705.4). Total num frames: 37793792. Throughput: 0: 1850.4. Samples: 8443970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:06,032][06183] Avg episode reward: [(0, '4.366')] -[2023-02-22 21:54:06,371][28133] Updated weights for policy 0, policy_version 9228 (0.0024) -[2023-02-22 21:54:11,028][06183] Fps is (10 sec: 7372.4, 60 sec: 7441.0, 300 sec: 9719.3). Total num frames: 37830656. Throughput: 0: 1836.6. Samples: 8454718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:11,033][06183] Avg episode reward: [(0, '4.368')] -[2023-02-22 21:54:12,117][28133] Updated weights for policy 0, policy_version 9238 (0.0027) -[2023-02-22 21:54:16,027][06183] Fps is (10 sec: 6963.5, 60 sec: 7304.5, 300 sec: 9733.2). Total num frames: 37863424. Throughput: 0: 1819.4. Samples: 8465230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:16,032][06183] Avg episode reward: [(0, '4.476')] -[2023-02-22 21:54:17,969][28133] Updated weights for policy 0, policy_version 9248 (0.0026) -[2023-02-22 21:54:21,027][06183] Fps is (10 sec: 6963.7, 60 sec: 7304.6, 300 sec: 9594.4). Total num frames: 37900288. Throughput: 0: 1815.3. Samples: 8470642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:54:21,031][06183] Avg episode reward: [(0, '4.454')] -[2023-02-22 21:54:23,923][28133] Updated weights for policy 0, policy_version 9258 (0.0032) -[2023-02-22 21:54:26,027][06183] Fps is (10 sec: 6963.0, 60 sec: 7236.2, 300 sec: 9455.5). Total num frames: 37933056. Throughput: 0: 1793.6. Samples: 8480936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:54:26,031][06183] Avg episode reward: [(0, '4.452')] -[2023-02-22 21:54:29,828][28133] Updated weights for policy 0, policy_version 9268 (0.0024) -[2023-02-22 21:54:31,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 9330.5). Total num frames: 37969920. Throughput: 0: 1778.4. Samples: 8491346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:54:31,031][06183] Avg episode reward: [(0, '4.647')] -[2023-02-22 21:54:35,766][28133] Updated weights for policy 0, policy_version 9278 (0.0029) -[2023-02-22 21:54:36,027][06183] Fps is (10 sec: 6963.4, 60 sec: 7168.0, 300 sec: 9205.6). Total num frames: 38002688. Throughput: 0: 1772.4. Samples: 8496622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:54:36,033][06183] Avg episode reward: [(0, '4.386')] -[2023-02-22 21:54:41,030][06183] Fps is (10 sec: 6551.6, 60 sec: 7099.4, 300 sec: 9080.5). Total num frames: 38035456. Throughput: 0: 1758.3. Samples: 8506828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:54:41,038][06183] Avg episode reward: [(0, '4.384')] -[2023-02-22 21:54:41,653][28133] Updated weights for policy 0, policy_version 9288 (0.0027) -[2023-02-22 21:54:46,027][06183] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 8969.5). Total num frames: 38072320. Throughput: 0: 1744.3. Samples: 8517204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:46,032][06183] Avg episode reward: [(0, '4.297')] -[2023-02-22 21:54:47,583][28133] Updated weights for policy 0, policy_version 9298 (0.0025) -[2023-02-22 21:54:51,027][06183] Fps is (10 sec: 6965.3, 60 sec: 6963.2, 300 sec: 8844.6). Total num frames: 38105088. Throughput: 0: 1742.7. Samples: 8522390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:51,034][06183] Avg episode reward: [(0, '4.401')] -[2023-02-22 21:54:53,676][28133] Updated weights for policy 0, policy_version 9308 (0.0031) -[2023-02-22 21:54:56,028][06183] Fps is (10 sec: 6962.8, 60 sec: 7031.5, 300 sec: 8733.5). Total num frames: 38141952. Throughput: 0: 1729.6. Samples: 8532550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:54:56,034][06183] Avg episode reward: [(0, '4.335')] -[2023-02-22 21:54:59,602][28133] Updated weights for policy 0, policy_version 9318 (0.0020) -[2023-02-22 21:55:01,027][06183] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 8622.4). Total num frames: 38174720. Throughput: 0: 1721.4. Samples: 8542692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:01,038][06183] Avg episode reward: [(0, '4.527')] -[2023-02-22 21:55:05,729][28133] Updated weights for policy 0, policy_version 9328 (0.0037) -[2023-02-22 21:55:06,028][06183] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 8525.2). Total num frames: 38207488. Throughput: 0: 1708.4. Samples: 8547520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:06,034][06183] Avg episode reward: [(0, '4.401')] -[2023-02-22 21:55:11,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6826.7, 300 sec: 8428.0). Total num frames: 38240256. Throughput: 0: 1707.7. Samples: 8557782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2023-02-22 21:55:11,033][06183] Avg episode reward: [(0, '4.418')] -[2023-02-22 21:55:11,822][28133] Updated weights for policy 0, policy_version 9338 (0.0025) -[2023-02-22 21:55:16,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6826.6, 300 sec: 8330.8). Total num frames: 38273024. Throughput: 0: 1700.6. Samples: 8567874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:16,033][06183] Avg episode reward: [(0, '4.548')] -[2023-02-22 21:55:18,015][28133] Updated weights for policy 0, policy_version 9348 (0.0028) -[2023-02-22 21:55:21,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6758.3, 300 sec: 8247.5). Total num frames: 38305792. Throughput: 0: 1690.8. Samples: 8572708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:21,033][06183] Avg episode reward: [(0, '4.438')] -[2023-02-22 21:55:24,257][28133] Updated weights for policy 0, policy_version 9358 (0.0039) -[2023-02-22 21:55:26,027][06183] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 8164.2). Total num frames: 38338560. Throughput: 0: 1684.9. Samples: 8582644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:55:26,033][06183] Avg episode reward: [(0, '4.221')] -[2023-02-22 21:55:30,520][28133] Updated weights for policy 0, policy_version 9368 (0.0031) -[2023-02-22 21:55:31,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6690.1, 300 sec: 8080.9). Total num frames: 38371328. Throughput: 0: 1670.5. Samples: 8592376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:55:31,033][06183] Avg episode reward: [(0, '4.341')] -[2023-02-22 21:55:36,028][06183] Fps is (10 sec: 6553.4, 60 sec: 6690.1, 300 sec: 7997.6). Total num frames: 38404096. Throughput: 0: 1663.8. Samples: 8597262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:36,034][06183] Avg episode reward: [(0, '4.325')] -[2023-02-22 21:55:36,954][28133] Updated weights for policy 0, policy_version 9378 (0.0030) -[2023-02-22 21:55:41,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6690.4, 300 sec: 7928.2). Total num frames: 38436864. Throughput: 0: 1646.0. Samples: 8606618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:55:41,031][06183] Avg episode reward: [(0, '4.431')] -[2023-02-22 21:55:43,623][28133] Updated weights for policy 0, policy_version 9388 (0.0030) -[2023-02-22 21:55:46,028][06183] Fps is (10 sec: 6143.9, 60 sec: 6553.5, 300 sec: 7844.9). Total num frames: 38465536. Throughput: 0: 1624.1. Samples: 8615778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:55:46,035][06183] Avg episode reward: [(0, '4.454')] -[2023-02-22 21:55:50,051][28133] Updated weights for policy 0, policy_version 9398 (0.0033) -[2023-02-22 21:55:51,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 38498304. Throughput: 0: 1625.7. Samples: 8620674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2023-02-22 21:55:51,032][06183] Avg episode reward: [(0, '4.386')] -[2023-02-22 21:55:51,061][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009399_38498304.pth... -[2023-02-22 21:55:51,779][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008966_36724736.pth -[2023-02-22 21:55:56,028][06183] Fps is (10 sec: 5734.5, 60 sec: 6348.8, 300 sec: 7692.1). Total num frames: 38522880. Throughput: 0: 1577.2. Samples: 8628758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:55:56,033][06183] Avg episode reward: [(0, '4.335')] -[2023-02-22 21:55:57,916][28133] Updated weights for policy 0, policy_version 9408 (0.0045) -[2023-02-22 21:56:01,028][06183] Fps is (10 sec: 5324.8, 60 sec: 6280.5, 300 sec: 7622.7). Total num frames: 38551552. Throughput: 0: 1532.1. Samples: 8636820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:56:01,033][06183] Avg episode reward: [(0, '4.471')] -[2023-02-22 21:56:05,306][28133] Updated weights for policy 0, policy_version 9418 (0.0027) -[2023-02-22 21:56:06,027][06183] Fps is (10 sec: 5734.6, 60 sec: 6212.3, 300 sec: 7539.4). Total num frames: 38580224. Throughput: 0: 1514.5. Samples: 8640860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:56:06,031][06183] Avg episode reward: [(0, '4.568')] -[2023-02-22 21:56:11,028][06183] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 7470.0). Total num frames: 38608896. Throughput: 0: 1505.0. Samples: 8650368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:11,033][06183] Avg episode reward: [(0, '4.422')] -[2023-02-22 21:56:11,869][28133] Updated weights for policy 0, policy_version 9428 (0.0027) -[2023-02-22 21:56:16,028][06183] Fps is (10 sec: 6143.8, 60 sec: 6144.0, 300 sec: 7414.4). Total num frames: 38641664. Throughput: 0: 1497.4. Samples: 8659758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:16,034][06183] Avg episode reward: [(0, '4.215')] -[2023-02-22 21:56:18,288][28133] Updated weights for policy 0, policy_version 9438 (0.0035) -[2023-02-22 21:56:21,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 7372.8). Total num frames: 38674432. Throughput: 0: 1495.8. Samples: 8664572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:56:21,037][06183] Avg episode reward: [(0, '4.476')] -[2023-02-22 21:56:24,633][28133] Updated weights for policy 0, policy_version 9448 (0.0026) -[2023-02-22 21:56:26,027][06183] Fps is (10 sec: 6553.8, 60 sec: 6144.0, 300 sec: 7331.1). Total num frames: 38707200. Throughput: 0: 1503.1. Samples: 8674258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:26,034][06183] Avg episode reward: [(0, '4.427')] -[2023-02-22 21:56:30,947][28133] Updated weights for policy 0, policy_version 9458 (0.0028) -[2023-02-22 21:56:31,028][06183] Fps is (10 sec: 6553.5, 60 sec: 6144.0, 300 sec: 7289.5). Total num frames: 38739968. Throughput: 0: 1513.2. Samples: 8683870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:31,034][06183] Avg episode reward: [(0, '4.304')] -[2023-02-22 21:56:36,030][06183] Fps is (10 sec: 6552.2, 60 sec: 6143.8, 300 sec: 7247.8). Total num frames: 38772736. Throughput: 0: 1512.8. Samples: 8688752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:36,038][06183] Avg episode reward: [(0, '4.449')] -[2023-02-22 21:56:37,205][28133] Updated weights for policy 0, policy_version 9468 (0.0026) -[2023-02-22 21:56:41,028][06183] Fps is (10 sec: 6144.2, 60 sec: 6075.7, 300 sec: 7192.3). Total num frames: 38801408. Throughput: 0: 1548.4. Samples: 8698438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:41,033][06183] Avg episode reward: [(0, '4.192')] -[2023-02-22 21:56:43,794][28133] Updated weights for policy 0, policy_version 9478 (0.0028) -[2023-02-22 21:56:46,027][06183] Fps is (10 sec: 6145.4, 60 sec: 6144.1, 300 sec: 7150.6). Total num frames: 38834176. Throughput: 0: 1578.9. Samples: 8707870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:46,032][06183] Avg episode reward: [(0, '4.384')] -[2023-02-22 21:56:50,204][28133] Updated weights for policy 0, policy_version 9488 (0.0025) -[2023-02-22 21:56:51,027][06183] Fps is (10 sec: 6554.0, 60 sec: 6144.0, 300 sec: 7122.9). Total num frames: 38866944. Throughput: 0: 1595.9. Samples: 8712674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:51,031][06183] Avg episode reward: [(0, '4.477')] -[2023-02-22 21:56:56,027][06183] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 7081.2). Total num frames: 38899712. Throughput: 0: 1598.3. Samples: 8722292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:56:56,033][06183] Avg episode reward: [(0, '4.381')] -[2023-02-22 21:56:56,598][28133] Updated weights for policy 0, policy_version 9498 (0.0029) -[2023-02-22 21:57:01,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6280.5, 300 sec: 7039.6). Total num frames: 38928384. Throughput: 0: 1600.9. Samples: 8731798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:01,033][06183] Avg episode reward: [(0, '4.598')] -[2023-02-22 21:57:03,009][28133] Updated weights for policy 0, policy_version 9508 (0.0031) -[2023-02-22 21:57:06,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6348.7, 300 sec: 6997.9). Total num frames: 38961152. Throughput: 0: 1598.9. Samples: 8736522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:06,033][06183] Avg episode reward: [(0, '4.410')] -[2023-02-22 21:57:09,580][28133] Updated weights for policy 0, policy_version 9518 (0.0033) -[2023-02-22 21:57:11,027][06183] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6970.1). Total num frames: 38993920. Throughput: 0: 1594.0. Samples: 8745988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:11,032][06183] Avg episode reward: [(0, '4.326')] -[2023-02-22 21:57:16,028][06183] Fps is (10 sec: 6144.2, 60 sec: 6348.8, 300 sec: 6914.6). Total num frames: 39022592. Throughput: 0: 1590.3. Samples: 8755432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:57:16,032][06183] Avg episode reward: [(0, '4.346')] -[2023-02-22 21:57:16,144][28133] Updated weights for policy 0, policy_version 9528 (0.0030) -[2023-02-22 21:57:21,027][06183] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6900.7). Total num frames: 39055360. Throughput: 0: 1583.7. Samples: 8760016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:21,032][06183] Avg episode reward: [(0, '4.170')] -[2023-02-22 21:57:22,747][28133] Updated weights for policy 0, policy_version 9538 (0.0031) -[2023-02-22 21:57:26,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6280.5, 300 sec: 6845.2). Total num frames: 39084032. Throughput: 0: 1568.8. Samples: 8769036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:26,033][06183] Avg episode reward: [(0, '4.227')] -[2023-02-22 21:57:29,494][28133] Updated weights for policy 0, policy_version 9548 (0.0041) -[2023-02-22 21:57:31,028][06183] Fps is (10 sec: 6143.7, 60 sec: 6280.5, 300 sec: 6817.4). Total num frames: 39116800. Throughput: 0: 1569.6. Samples: 8778502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2023-02-22 21:57:31,034][06183] Avg episode reward: [(0, '4.584')] -[2023-02-22 21:57:34,187][28133] Updated weights for policy 0, policy_version 9558 (0.0017) -[2023-02-22 21:57:36,027][06183] Fps is (10 sec: 9421.6, 60 sec: 6758.7, 300 sec: 6886.8). Total num frames: 39178240. Throughput: 0: 1605.6. Samples: 8784926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:57:36,029][06183] Avg episode reward: [(0, '4.572')] -[2023-02-22 21:57:36,703][28133] Updated weights for policy 0, policy_version 9568 (0.0009) -[2023-02-22 21:57:39,510][28133] Updated weights for policy 0, policy_version 9578 (0.0013) -[2023-02-22 21:57:41,027][06183] Fps is (10 sec: 13927.4, 60 sec: 7577.7, 300 sec: 7025.7). Total num frames: 39256064. Throughput: 0: 1911.7. Samples: 8808318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:57:41,029][06183] Avg episode reward: [(0, '4.504')] -[2023-02-22 21:57:41,977][28133] Updated weights for policy 0, policy_version 9588 (0.0010) -[2023-02-22 21:57:44,936][28133] Updated weights for policy 0, policy_version 9598 (0.0009) -[2023-02-22 21:57:46,027][06183] Fps is (10 sec: 14745.8, 60 sec: 8192.0, 300 sec: 7122.9). Total num frames: 39325696. Throughput: 0: 2194.3. Samples: 8830538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:57:46,029][06183] Avg episode reward: [(0, '4.337')] -[2023-02-22 21:57:47,568][28133] Updated weights for policy 0, policy_version 9608 (0.0010) -[2023-02-22 21:57:50,501][28133] Updated weights for policy 0, policy_version 9618 (0.0010) -[2023-02-22 21:57:51,027][06183] Fps is (10 sec: 14336.1, 60 sec: 8874.7, 300 sec: 7234.0). Total num frames: 39399424. Throughput: 0: 2345.5. Samples: 8842066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:57:51,030][06183] Avg episode reward: [(0, '4.410')] -[2023-02-22 21:57:51,049][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009619_39399424.pth... -[2023-02-22 21:57:51,311][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009201_37687296.pth -[2023-02-22 21:57:53,524][28133] Updated weights for policy 0, policy_version 9628 (0.0011) -[2023-02-22 21:57:56,027][06183] Fps is (10 sec: 14745.6, 60 sec: 9557.4, 300 sec: 7345.0). Total num frames: 39473152. Throughput: 0: 2598.8. Samples: 8862932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) -[2023-02-22 21:57:56,029][06183] Avg episode reward: [(0, '4.281')] -[2023-02-22 21:57:56,276][28133] Updated weights for policy 0, policy_version 9638 (0.0010) -[2023-02-22 21:57:59,569][28133] Updated weights for policy 0, policy_version 9648 (0.0013) -[2023-02-22 21:58:01,027][06183] Fps is (10 sec: 13926.3, 60 sec: 10171.8, 300 sec: 7428.3). Total num frames: 39538688. Throughput: 0: 2843.5. Samples: 8883388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:58:01,029][06183] Avg episode reward: [(0, '4.239')] -[2023-02-22 21:58:02,116][28133] Updated weights for policy 0, policy_version 9658 (0.0009) -[2023-02-22 21:58:05,253][28133] Updated weights for policy 0, policy_version 9668 (0.0012) -[2023-02-22 21:58:06,027][06183] Fps is (10 sec: 13516.8, 60 sec: 10786.3, 300 sec: 7539.4). Total num frames: 39608320. Throughput: 0: 2987.5. Samples: 8894454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2023-02-22 21:58:06,029][06183] Avg episode reward: [(0, '4.626')] -[2023-02-22 21:58:08,211][28133] Updated weights for policy 0, policy_version 9678 (0.0011) -[2023-02-22 21:58:11,027][06183] Fps is (10 sec: 13926.5, 60 sec: 11400.6, 300 sec: 7636.6). Total num frames: 39677952. Throughput: 0: 3255.0. Samples: 8915508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2023-02-22 21:58:11,030][06183] Avg episode reward: [(0, '4.388')] -[2023-02-22 21:58:11,206][28133] Updated weights for policy 0, policy_version 9688 (0.0010) -[2023-02-22 21:58:14,658][28133] Updated weights for policy 0, policy_version 9698 (0.0012) -[2023-02-22 21:58:16,027][06183] Fps is (10 sec: 13516.7, 60 sec: 12015.0, 300 sec: 7733.8). Total num frames: 39743488. Throughput: 0: 3460.2. Samples: 8934210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:58:16,029][06183] Avg episode reward: [(0, '4.423')] -[2023-02-22 21:58:17,502][28133] Updated weights for policy 0, policy_version 9708 (0.0012) -[2023-02-22 21:58:20,884][28133] Updated weights for policy 0, policy_version 9718 (0.0015) -[2023-02-22 21:58:21,027][06183] Fps is (10 sec: 12697.7, 60 sec: 12492.9, 300 sec: 7817.1). Total num frames: 39804928. Throughput: 0: 3537.6. Samples: 8944116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:58:21,029][06183] Avg episode reward: [(0, '4.401')] -[2023-02-22 21:58:23,712][28133] Updated weights for policy 0, policy_version 9728 (0.0011) -[2023-02-22 21:58:26,028][06183] Fps is (10 sec: 12696.9, 60 sec: 13107.3, 300 sec: 7914.3). Total num frames: 39870464. Throughput: 0: 3458.1. Samples: 8963936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:58:26,031][06183] Avg episode reward: [(0, '4.336')] -[2023-02-22 21:58:27,156][28133] Updated weights for policy 0, policy_version 9738 (0.0014) -[2023-02-22 21:58:30,590][28133] Updated weights for policy 0, policy_version 9748 (0.0011) -[2023-02-22 21:58:31,027][06183] Fps is (10 sec: 12697.5, 60 sec: 13585.2, 300 sec: 7997.6). Total num frames: 39931904. Throughput: 0: 3362.8. Samples: 8981864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:58:31,029][06183] Avg episode reward: [(0, '4.247')] -[2023-02-22 21:58:34,265][28133] Updated weights for policy 0, policy_version 9758 (0.0012) -[2023-02-22 21:58:36,027][06183] Fps is (10 sec: 11469.1, 60 sec: 13448.5, 300 sec: 8053.1). Total num frames: 39985152. Throughput: 0: 3291.8. Samples: 8990200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) -[2023-02-22 21:58:36,031][06183] Avg episode reward: [(0, '4.598')] -[2023-02-22 21:58:37,525][28120] Stopping Batcher_0... -[2023-02-22 21:58:37,532][28120] Loop batcher_evt_loop terminating... -[2023-02-22 21:58:37,526][06183] Component Batcher_0 stopped! -[2023-02-22 21:58:37,537][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... -[2023-02-22 21:58:37,598][06183] Component RolloutWorker_w1 stopped! -[2023-02-22 21:58:37,599][28142] Stopping RolloutWorker_w6... -[2023-02-22 21:58:37,598][28138] Stopping RolloutWorker_w1... -[2023-02-22 21:58:37,602][28142] Loop rollout_proc6_evt_loop terminating... -[2023-02-22 21:58:37,601][28146] Stopping RolloutWorker_w7... -[2023-02-22 21:58:37,601][28134] Stopping RolloutWorker_w0... -[2023-02-22 21:58:37,604][28138] Loop rollout_proc1_evt_loop terminating... -[2023-02-22 21:58:37,602][28141] Stopping RolloutWorker_w4... -[2023-02-22 21:58:37,602][06183] Component RolloutWorker_w5 stopped! -[2023-02-22 21:58:37,599][28163] Stopping RolloutWorker_w5... -[2023-02-22 21:58:37,605][28146] Loop rollout_proc7_evt_loop terminating... -[2023-02-22 21:58:37,605][28134] Loop rollout_proc0_evt_loop terminating... -[2023-02-22 21:58:37,603][28139] Stopping RolloutWorker_w2... -[2023-02-22 21:58:37,602][28140] Stopping RolloutWorker_w3... -[2023-02-22 21:58:37,607][28163] Loop rollout_proc5_evt_loop terminating... -[2023-02-22 21:58:37,607][28141] Loop rollout_proc4_evt_loop terminating... -[2023-02-22 21:58:37,608][28140] Loop rollout_proc3_evt_loop terminating... -[2023-02-22 21:58:37,608][28139] Loop rollout_proc2_evt_loop terminating... -[2023-02-22 21:58:37,607][06183] Component RolloutWorker_w6 stopped! -[2023-02-22 21:58:37,611][06183] Component RolloutWorker_w7 stopped! -[2023-02-22 21:58:37,614][06183] Component RolloutWorker_w3 stopped! -[2023-02-22 21:58:37,617][06183] Component RolloutWorker_w0 stopped! -[2023-02-22 21:58:37,619][06183] Component RolloutWorker_w2 stopped! -[2023-02-22 21:58:37,622][06183] Component RolloutWorker_w4 stopped! -[2023-02-22 21:58:37,826][28133] Weights refcount: 2 0 -[2023-02-22 21:58:37,851][28120] Removing /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009399_38498304.pth -[2023-02-22 21:58:37,864][28133] Stopping InferenceWorker_p0-w0... -[2023-02-22 21:58:37,866][28133] Loop inference_proc0-0_evt_loop terminating... -[2023-02-22 21:58:37,865][06183] Component InferenceWorker_p0-w0 stopped! -[2023-02-22 21:58:37,893][28120] Saving /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... -[2023-02-22 21:58:38,238][28120] Stopping LearnerWorker_p0... -[2023-02-22 21:58:38,238][06183] Component LearnerWorker_p0 stopped! -[2023-02-22 21:58:38,239][28120] Loop learner_proc0_evt_loop terminating... -[2023-02-22 21:58:38,241][06183] Waiting for process learner_proc0 to stop... -[2023-02-22 21:58:41,555][06183] Waiting for process inference_proc0-0 to join... -[2023-02-22 21:58:43,994][06183] Waiting for process rollout_proc0 to join... -[2023-02-22 21:58:44,000][06183] Waiting for process rollout_proc1 to join... -[2023-02-22 21:58:44,004][06183] Waiting for process rollout_proc2 to join... -[2023-02-22 21:58:44,012][06183] Waiting for process rollout_proc3 to join... -[2023-02-22 21:58:44,019][06183] Waiting for process rollout_proc4 to join... -[2023-02-22 21:58:44,030][06183] Waiting for process rollout_proc5 to join... -[2023-02-22 21:58:44,041][06183] Waiting for process rollout_proc6 to join... -[2023-02-22 21:58:44,047][06183] Waiting for process rollout_proc7 to join... -[2023-02-22 21:58:44,055][06183] Batcher 0 profile tree view: -batching: 185.3369, releasing_batches: 0.6314 -[2023-02-22 21:58:44,060][06183] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0000 - wait_policy_total: 42.0737 -update_model: 65.3219 - weight_update: 0.0016 -one_step: 0.0044 - handle_policy_step: 4167.8213 - deserialize: 119.1477, stack: 21.5758, obs_to_device_normalize: 1076.4030, forward: 1745.5604, send_messages: 319.2566 - prepare_outputs: 743.0808 - to_cpu: 577.7572 -[2023-02-22 21:58:44,066][06183] Learner 0 profile tree view: -misc: 0.0848, prepare_batch: 216.1311 -train: 781.9580 - epoch_init: 0.0762, minibatch_init: 0.0901, losses_postprocess: 8.2760, kl_divergence: 8.7140, after_optimizer: 21.2589 - calculate_losses: 229.2631 - losses_init: 0.0416, forward_head: 17.0756, bptt_initial: 148.0398, tail: 10.1427, advantages_returns: 2.9037, losses: 26.7282 - bptt: 21.4703 - bptt_forward_core: 20.5585 - update: 508.6566 - clip: 22.7779 -[2023-02-22 21:58:44,073][06183] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 2.0629, enqueue_policy_requests: 117.4843, env_step: 2588.7147, overhead: 186.7985, complete_rollouts: 4.4749 -save_policy_outputs: 149.9693 - split_output_tensors: 72.7159 -[2023-02-22 21:58:44,077][06183] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 2.0370, enqueue_policy_requests: 117.3589, env_step: 2582.3587, overhead: 186.3442, complete_rollouts: 4.4831 -save_policy_outputs: 147.1610 - split_output_tensors: 70.9204 -[2023-02-22 21:58:44,083][06183] Loop Runner_EvtLoop terminating... -[2023-02-22 21:58:44,087][06183] Runner profile tree view: -main_loop: 4487.8837 -[2023-02-22 21:58:44,093][06183] Collected {0: 40005632}, FPS: 8021.5 -[2023-02-23 09:20:32,896][06183] Loading existing experiment configuration from /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2023-02-23 09:20:32,901][06183] Overriding arg 'num_workers' with value 1 passed from command line -[2023-02-23 09:20:32,902][06183] Adding new argument 'no_render'=True that is not in the saved config file! -[2023-02-23 09:20:32,904][06183] Adding new argument 'save_video'=True that is not in the saved config file! -[2023-02-23 09:20:32,905][06183] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2023-02-23 09:20:32,907][06183] Adding new argument 'video_name'=None that is not in the saved config file! -[2023-02-23 09:20:32,908][06183] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2023-02-23 09:20:32,910][06183] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2023-02-23 09:20:32,911][06183] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2023-02-23 09:20:32,913][06183] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2023-02-23 09:20:32,914][06183] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2023-02-23 09:20:32,916][06183] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2023-02-23 09:20:32,918][06183] Adding new argument 'train_script'=None that is not in the saved config file! -[2023-02-23 09:20:32,920][06183] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2023-02-23 09:20:32,921][06183] Using frameskip 1 and render_action_repeat=4 for evaluation -[2023-02-23 09:20:33,006][06183] RunningMeanStd input shape: (3, 72, 128) -[2023-02-23 09:20:33,013][06183] RunningMeanStd input shape: (1,) -[2023-02-23 09:20:33,089][06183] ConvEncoder: input_channels=3 -[2023-02-23 09:20:33,200][06183] Conv encoder output size: 512 -[2023-02-23 09:20:33,204][06183] Policy head output size: 512 -[2023-02-23 09:20:33,714][06183] Loading state from checkpoint /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... -[2023-02-23 09:20:36,038][06183] Num frames 100... -[2023-02-23 09:20:36,223][06183] Num frames 200... -[2023-02-23 09:20:36,507][06183] Num frames 300... -[2023-02-23 09:20:36,729][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-23 09:20:36,731][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-23 09:20:36,766][06183] Num frames 400... -[2023-02-23 09:20:36,957][06183] Num frames 500... -[2023-02-23 09:20:37,146][06183] Num frames 600... -[2023-02-23 09:20:37,340][06183] Num frames 700... -[2023-02-23 09:20:37,521][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-23 09:20:37,526][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-23 09:20:37,589][06183] Num frames 800... -[2023-02-23 09:20:37,776][06183] Num frames 900... -[2023-02-23 09:20:38,004][06183] Num frames 1000... -[2023-02-23 09:20:38,289][06183] Num frames 1100... -[2023-02-23 09:20:38,472][06183] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2023-02-23 09:20:38,476][06183] Avg episode reward: 3.840, avg true_objective: 3.840 -[2023-02-23 09:20:38,574][06183] Num frames 1200... -[2023-02-23 09:20:38,844][06183] Num frames 1300... -[2023-02-23 09:20:39,105][06183] Num frames 1400... -[2023-02-23 09:20:39,347][06183] Num frames 1500... -[2023-02-23 09:20:39,530][06183] Avg episode rewards: #0: 3.920, true rewards: #0: 3.920 -[2023-02-23 09:20:39,533][06183] Avg episode reward: 3.920, avg true_objective: 3.920 -[2023-02-23 09:20:39,598][06183] Num frames 1600... -[2023-02-23 09:20:39,790][06183] Num frames 1700... -[2023-02-23 09:20:39,971][06183] Num frames 1800... -[2023-02-23 09:20:40,185][06183] Num frames 1900... -[2023-02-23 09:20:40,338][06183] Avg episode rewards: #0: 3.904, true rewards: #0: 3.904 -[2023-02-23 09:20:40,343][06183] Avg episode reward: 3.904, avg true_objective: 3.904 -[2023-02-23 09:20:40,451][06183] Num frames 2000... -[2023-02-23 09:20:40,645][06183] Num frames 2100... -[2023-02-23 09:20:40,838][06183] Num frames 2200... -[2023-02-23 09:20:41,049][06183] Num frames 2300... -[2023-02-23 09:20:41,176][06183] Avg episode rewards: #0: 3.893, true rewards: #0: 3.893 -[2023-02-23 09:20:41,179][06183] Avg episode reward: 3.893, avg true_objective: 3.893 -[2023-02-23 09:20:41,333][06183] Num frames 2400... -[2023-02-23 09:20:41,515][06183] Num frames 2500... -[2023-02-23 09:20:41,711][06183] Num frames 2600... -[2023-02-23 09:20:41,890][06183] Num frames 2700... -[2023-02-23 09:20:41,988][06183] Avg episode rewards: #0: 3.886, true rewards: #0: 3.886 -[2023-02-23 09:20:41,991][06183] Avg episode reward: 3.886, avg true_objective: 3.886 -[2023-02-23 09:20:42,177][06183] Num frames 2800... -[2023-02-23 09:20:42,381][06183] Num frames 2900... -[2023-02-23 09:20:42,567][06183] Num frames 3000... -[2023-02-23 09:20:42,794][06183] Num frames 3100... -[2023-02-23 09:20:42,975][06183] Avg episode rewards: #0: 4.085, true rewards: #0: 3.960 -[2023-02-23 09:20:42,979][06183] Avg episode reward: 4.085, avg true_objective: 3.960 -[2023-02-23 09:20:43,049][06183] Num frames 3200... -[2023-02-23 09:20:43,223][06183] Num frames 3300... -[2023-02-23 09:20:43,394][06183] Num frames 3400... -[2023-02-23 09:20:43,573][06183] Num frames 3500... -[2023-02-23 09:20:43,741][06183] Avg episode rewards: #0: 4.058, true rewards: #0: 3.947 -[2023-02-23 09:20:43,744][06183] Avg episode reward: 4.058, avg true_objective: 3.947 -[2023-02-23 09:20:43,839][06183] Num frames 3600... -[2023-02-23 09:20:44,030][06183] Num frames 3700... -[2023-02-23 09:20:44,210][06183] Num frames 3800... -[2023-02-23 09:20:44,392][06183] Num frames 3900... -[2023-02-23 09:20:44,515][06183] Avg episode rewards: #0: 4.036, true rewards: #0: 3.936 -[2023-02-23 09:20:44,517][06183] Avg episode reward: 4.036, avg true_objective: 3.936 -[2023-02-23 09:20:46,216][06183] Replay video saved to /mnt/c/Users/chqma/projects/ai/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2023-02-24 07:56:04,122][794019] Using optimizer +[2023-02-24 07:56:04,122][794019] No checkpoints found +[2023-02-24 07:56:04,122][794019] Did not load from checkpoint, starting from scratch! +[2023-02-24 07:56:04,122][794019] Initialized policy 0 weights for model version 0 +[2023-02-24 07:56:04,124][794019] LearnerWorker_p0 finished initialization! +[2023-02-24 07:56:04,124][794019] Using GPUs [0] for process 0 (actually maps to GPUs [1]) +[2023-02-24 07:56:05,229][794032] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 07:56:05,229][794032] RunningMeanStd input shape: (1,) +[2023-02-24 07:56:05,237][794032] ConvEncoder: input_channels=3 +[2023-02-24 07:56:05,307][794032] Conv encoder output size: 512 +[2023-02-24 07:56:05,307][794032] Policy head output size: 512 +[2023-02-24 07:56:06,350][784615] Inference worker 0-0 is ready! +[2023-02-24 07:56:06,350][784615] All inference workers are ready! Signal rollout workers to start! +[2023-02-24 07:56:06,367][794036] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,368][794039] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,368][794034] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,368][794035] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,369][794038] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,374][794040] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,390][794037] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,393][794033] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:56:06,669][794035] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,672][794039] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,678][794036] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,681][794037] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,683][794040] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,929][794035] Decorrelating experience for 32 frames... +[2023-02-24 07:56:06,946][794040] Decorrelating experience for 32 frames... +[2023-02-24 07:56:06,958][794036] Decorrelating experience for 32 frames... +[2023-02-24 07:56:06,977][794037] Decorrelating experience for 32 frames... +[2023-02-24 07:56:06,986][794038] Decorrelating experience for 0 frames... +[2023-02-24 07:56:06,997][794039] Decorrelating experience for 32 frames... +[2023-02-24 07:56:07,033][794033] Decorrelating experience for 0 frames... +[2023-02-24 07:56:07,227][794038] Decorrelating experience for 32 frames... +[2023-02-24 07:56:07,268][794034] Decorrelating experience for 0 frames... +[2023-02-24 07:56:07,269][794036] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,278][794037] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,293][794033] Decorrelating experience for 32 frames... +[2023-02-24 07:56:07,502][794038] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,545][794039] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,584][794036] Decorrelating experience for 96 frames... +[2023-02-24 07:56:07,591][794033] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,594][794040] Decorrelating experience for 64 frames... +[2023-02-24 07:56:07,600][794037] Decorrelating experience for 96 frames... +[2023-02-24 07:56:07,628][794034] Decorrelating experience for 32 frames... +[2023-02-24 07:56:07,822][794033] Decorrelating experience for 96 frames... +[2023-02-24 07:56:07,846][794039] Decorrelating experience for 96 frames... +[2023-02-24 07:56:07,890][794038] Decorrelating experience for 96 frames... +[2023-02-24 07:56:07,895][794040] Decorrelating experience for 96 frames... +[2023-02-24 07:56:08,065][794035] Decorrelating experience for 64 frames... +[2023-02-24 07:56:08,153][784615] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-24 07:56:08,319][794034] Decorrelating experience for 64 frames... +[2023-02-24 07:56:08,572][794035] Decorrelating experience for 96 frames... +[2023-02-24 07:56:08,577][794034] Decorrelating experience for 96 frames... +[2023-02-24 07:56:10,130][794019] Signal inference workers to stop experience collection... +[2023-02-24 07:56:10,137][794032] InferenceWorker_p0-w0: stopping experience collection +[2023-02-24 07:56:12,636][794019] Signal inference workers to resume experience collection... +[2023-02-24 07:56:12,637][794032] InferenceWorker_p0-w0: resuming experience collection +[2023-02-24 07:56:13,153][784615] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4096. Throughput: 0: 494.8. Samples: 2474. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-02-24 07:56:13,154][784615] Avg episode reward: [(0, '2.494')] +[2023-02-24 07:56:15,144][794032] Updated weights for policy 0, policy_version 10 (0.0249) +[2023-02-24 07:56:17,664][794032] Updated weights for policy 0, policy_version 20 (0.0006) +[2023-02-24 07:56:18,153][784615] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8601.6). Total num frames: 86016. Throughput: 0: 1166.6. Samples: 11666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 07:56:18,154][784615] Avg episode reward: [(0, '4.437')] +[2023-02-24 07:56:19,351][784615] Heartbeat connected on Batcher_0 +[2023-02-24 07:56:19,354][784615] Heartbeat connected on LearnerWorker_p0 +[2023-02-24 07:56:19,362][784615] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-24 07:56:19,364][784615] Heartbeat connected on RolloutWorker_w0 +[2023-02-24 07:56:19,365][784615] Heartbeat connected on RolloutWorker_w1 +[2023-02-24 07:56:19,368][784615] Heartbeat connected on RolloutWorker_w3 +[2023-02-24 07:56:19,373][784615] Heartbeat connected on RolloutWorker_w4 +[2023-02-24 07:56:19,374][784615] Heartbeat connected on RolloutWorker_w5 +[2023-02-24 07:56:19,375][784615] Heartbeat connected on RolloutWorker_w2 +[2023-02-24 07:56:19,377][784615] Heartbeat connected on RolloutWorker_w7 +[2023-02-24 07:56:19,378][784615] Heartbeat connected on RolloutWorker_w6 +[2023-02-24 07:56:20,182][794032] Updated weights for policy 0, policy_version 30 (0.0006) +[2023-02-24 07:56:22,465][794032] Updated weights for policy 0, policy_version 40 (0.0007) +[2023-02-24 07:56:23,153][784615] Fps is (10 sec: 16793.7, 60 sec: 11468.8, 300 sec: 11468.8). Total num frames: 172032. Throughput: 0: 2462.0. Samples: 36930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-24 07:56:23,154][784615] Avg episode reward: [(0, '4.501')] +[2023-02-24 07:56:23,155][794019] Saving new best policy, reward=4.501! +[2023-02-24 07:56:25,053][794032] Updated weights for policy 0, policy_version 50 (0.0007) +[2023-02-24 07:56:27,490][794032] Updated weights for policy 0, policy_version 60 (0.0007) +[2023-02-24 07:56:28,153][784615] Fps is (10 sec: 16793.6, 60 sec: 12697.6, 300 sec: 12697.6). Total num frames: 253952. Throughput: 0: 3086.0. Samples: 61720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:56:28,154][784615] Avg episode reward: [(0, '4.402')] +[2023-02-24 07:56:30,062][794032] Updated weights for policy 0, policy_version 70 (0.0007) +[2023-02-24 07:56:32,640][794032] Updated weights for policy 0, policy_version 80 (0.0007) +[2023-02-24 07:56:33,153][784615] Fps is (10 sec: 15974.4, 60 sec: 13271.1, 300 sec: 13271.1). Total num frames: 331776. Throughput: 0: 2951.7. Samples: 73792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:56:33,154][784615] Avg episode reward: [(0, '4.289')] +[2023-02-24 07:56:35,246][794032] Updated weights for policy 0, policy_version 90 (0.0007) +[2023-02-24 07:56:37,855][794032] Updated weights for policy 0, policy_version 100 (0.0008) +[2023-02-24 07:56:38,153][784615] Fps is (10 sec: 15974.5, 60 sec: 13789.9, 300 sec: 13789.9). Total num frames: 413696. Throughput: 0: 3249.9. Samples: 97496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:56:38,154][784615] Avg episode reward: [(0, '4.457')] +[2023-02-24 07:56:40,454][794032] Updated weights for policy 0, policy_version 110 (0.0007) +[2023-02-24 07:56:43,061][794032] Updated weights for policy 0, policy_version 120 (0.0006) +[2023-02-24 07:56:43,153][784615] Fps is (10 sec: 15973.9, 60 sec: 14043.3, 300 sec: 14043.3). Total num frames: 491520. Throughput: 0: 3461.9. Samples: 121166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:56:43,154][784615] Avg episode reward: [(0, '4.401')] +[2023-02-24 07:56:45,699][794032] Updated weights for policy 0, policy_version 130 (0.0007) +[2023-02-24 07:56:48,153][784615] Fps is (10 sec: 15564.7, 60 sec: 14233.6, 300 sec: 14233.6). Total num frames: 569344. Throughput: 0: 3324.0. Samples: 132960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-24 07:56:48,154][784615] Avg episode reward: [(0, '4.667')] +[2023-02-24 07:56:48,157][794019] Saving new best policy, reward=4.667! +[2023-02-24 07:56:48,334][794032] Updated weights for policy 0, policy_version 140 (0.0007) +[2023-02-24 07:56:50,891][794032] Updated weights for policy 0, policy_version 150 (0.0006) +[2023-02-24 07:56:53,153][784615] Fps is (10 sec: 15565.3, 60 sec: 14381.5, 300 sec: 14381.5). Total num frames: 647168. Throughput: 0: 3474.6. Samples: 156358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:56:53,154][784615] Avg episode reward: [(0, '4.622')] +[2023-02-24 07:56:53,523][794032] Updated weights for policy 0, policy_version 160 (0.0008) +[2023-02-24 07:56:56,158][794032] Updated weights for policy 0, policy_version 170 (0.0007) +[2023-02-24 07:56:58,153][784615] Fps is (10 sec: 15565.0, 60 sec: 14499.9, 300 sec: 14499.9). Total num frames: 724992. Throughput: 0: 3940.0. Samples: 179774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-24 07:56:58,153][784615] Avg episode reward: [(0, '4.603')] +[2023-02-24 07:56:58,738][794032] Updated weights for policy 0, policy_version 180 (0.0008) +[2023-02-24 07:57:01,298][794032] Updated weights for policy 0, policy_version 190 (0.0007) +[2023-02-24 07:57:03,153][784615] Fps is (10 sec: 15974.3, 60 sec: 14671.1, 300 sec: 14671.1). Total num frames: 806912. Throughput: 0: 3998.0. Samples: 191576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 07:57:03,154][784615] Avg episode reward: [(0, '4.850')] +[2023-02-24 07:57:03,155][794019] Saving new best policy, reward=4.850! +[2023-02-24 07:57:03,931][794032] Updated weights for policy 0, policy_version 200 (0.0007) +[2023-02-24 07:57:06,517][794032] Updated weights for policy 0, policy_version 210 (0.0007) +[2023-02-24 07:57:08,153][784615] Fps is (10 sec: 15974.2, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 884736. Throughput: 0: 3958.6. Samples: 215068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 07:57:08,154][784615] Avg episode reward: [(0, '5.203')] +[2023-02-24 07:57:08,156][794019] Saving new best policy, reward=5.203! +[2023-02-24 07:57:09,207][794032] Updated weights for policy 0, policy_version 220 (0.0006) +[2023-02-24 07:57:11,788][794032] Updated weights for policy 0, policy_version 230 (0.0006) +[2023-02-24 07:57:13,153][784615] Fps is (10 sec: 15564.9, 60 sec: 15974.4, 300 sec: 14808.6). Total num frames: 962560. Throughput: 0: 3932.8. Samples: 238698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:57:13,154][784615] Avg episode reward: [(0, '4.831')] +[2023-02-24 07:57:14,400][794032] Updated weights for policy 0, policy_version 240 (0.0006) +[2023-02-24 07:57:17,007][794032] Updated weights for policy 0, policy_version 250 (0.0008) +[2023-02-24 07:57:18,153][784615] Fps is (10 sec: 15564.8, 60 sec: 15906.1, 300 sec: 14862.6). Total num frames: 1040384. Throughput: 0: 3928.0. Samples: 250550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2023-02-24 07:57:18,154][784615] Avg episode reward: [(0, '6.010')] +[2023-02-24 07:57:18,157][794019] Saving new best policy, reward=6.010! +[2023-02-24 07:57:19,591][794032] Updated weights for policy 0, policy_version 260 (0.0006) +[2023-02-24 07:57:21,628][784615] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 784615], exiting... +[2023-02-24 07:57:21,629][794019] Stopping Batcher_0... +[2023-02-24 07:57:21,629][794019] Loop batcher_evt_loop terminating... +[2023-02-24 07:57:21,628][784615] Runner profile tree view: +main_loop: 82.2510 +[2023-02-24 07:57:21,630][784615] Collected {0: 1093632}, FPS: 13296.3 +[2023-02-24 07:57:21,638][794039] Stopping RolloutWorker_w6... +[2023-02-24 07:57:21,638][794035] Stopping RolloutWorker_w3... +[2023-02-24 07:57:21,638][794035] Loop rollout_proc3_evt_loop terminating... +[2023-02-24 07:57:21,638][794034] Stopping RolloutWorker_w1... +[2023-02-24 07:57:21,639][794039] Loop rollout_proc6_evt_loop terminating... +[2023-02-24 07:57:21,639][794034] Loop rollout_proc1_evt_loop terminating... +[2023-02-24 07:57:21,645][794033] Stopping RolloutWorker_w0... +[2023-02-24 07:57:21,646][794033] Loop rollout_proc0_evt_loop terminating... +[2023-02-24 07:57:21,646][794037] Stopping RolloutWorker_w4... +[2023-02-24 07:57:21,647][794037] Loop rollout_proc4_evt_loop terminating... +[2023-02-24 07:57:21,647][794038] Stopping RolloutWorker_w2... +[2023-02-24 07:57:21,648][794038] Loop rollout_proc2_evt_loop terminating... +[2023-02-24 07:57:21,651][794036] Stopping RolloutWorker_w5... +[2023-02-24 07:57:21,651][794036] Loop rollout_proc5_evt_loop terminating... +[2023-02-24 07:57:21,652][794040] Stopping RolloutWorker_w7... +[2023-02-24 07:57:21,653][794040] Loop rollout_proc7_evt_loop terminating... +[2023-02-24 07:57:21,667][794019] Saving /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... +[2023-02-24 07:57:21,688][794032] Weights refcount: 2 0 +[2023-02-24 07:57:21,695][794032] Stopping InferenceWorker_p0-w0... +[2023-02-24 07:57:21,696][794032] Loop inference_proc0-0_evt_loop terminating... +[2023-02-24 07:57:21,825][794019] Stopping LearnerWorker_p0... +[2023-02-24 07:57:21,826][794019] Loop learner_proc0_evt_loop terminating... +[2023-02-24 07:57:37,644][784615] Loading existing experiment configuration from /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2023-02-24 07:57:37,644][784615] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-24 07:57:37,645][784615] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-24 07:57:37,645][784615] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-24 07:57:37,645][784615] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-24 07:57:37,646][784615] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-24 07:57:37,646][784615] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-24 07:57:37,646][784615] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-24 07:57:37,647][784615] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-24 07:57:37,647][784615] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-24 07:57:37,647][784615] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-24 07:57:37,648][784615] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-24 07:57:37,648][784615] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-24 07:57:37,648][784615] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-24 07:57:37,649][784615] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-24 07:57:37,655][784615] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 07:57:37,656][784615] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 07:57:37,657][784615] RunningMeanStd input shape: (1,) +[2023-02-24 07:57:37,665][784615] ConvEncoder: input_channels=3 +[2023-02-24 07:57:37,755][784615] Conv encoder output size: 512 +[2023-02-24 07:57:37,755][784615] Policy head output size: 512 +[2023-02-24 07:57:40,369][784615] Loading state from checkpoint /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... +[2023-02-24 07:57:42,531][784615] Num frames 100... +[2023-02-24 07:57:42,595][784615] Num frames 200... +[2023-02-24 07:57:42,663][784615] Num frames 300... +[2023-02-24 07:57:42,732][784615] Num frames 400... +[2023-02-24 07:57:42,832][784615] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 +[2023-02-24 07:57:42,833][784615] Avg episode reward: 6.800, avg true_objective: 4.800 +[2023-02-24 07:57:42,848][784615] Num frames 500... +[2023-02-24 07:57:42,911][784615] Num frames 600... +[2023-02-24 07:57:42,969][784615] Num frames 700... +[2023-02-24 07:57:43,043][784615] Num frames 800... +[2023-02-24 07:57:43,113][784615] Avg episode rewards: #0: 5.630, true rewards: #0: 4.130 +[2023-02-24 07:57:43,113][784615] Avg episode reward: 5.630, avg true_objective: 4.130 +[2023-02-24 07:57:43,175][784615] Num frames 900... +[2023-02-24 07:57:43,244][784615] Num frames 1000... +[2023-02-24 07:57:43,309][784615] Num frames 1100... +[2023-02-24 07:57:43,371][784615] Num frames 1200... +[2023-02-24 07:57:43,470][784615] Avg episode rewards: #0: 5.580, true rewards: #0: 4.247 +[2023-02-24 07:57:43,471][784615] Avg episode reward: 5.580, avg true_objective: 4.247 +[2023-02-24 07:57:43,487][784615] Num frames 1300... +[2023-02-24 07:57:43,564][784615] Num frames 1400... +[2023-02-24 07:57:43,628][784615] Num frames 1500... +[2023-02-24 07:57:43,689][784615] Num frames 1600... +[2023-02-24 07:57:43,781][784615] Avg episode rewards: #0: 5.145, true rewards: #0: 4.145 +[2023-02-24 07:57:43,781][784615] Avg episode reward: 5.145, avg true_objective: 4.145 +[2023-02-24 07:57:43,817][784615] Num frames 1700... +[2023-02-24 07:57:43,887][784615] Num frames 1800... +[2023-02-24 07:57:43,956][784615] Num frames 1900... +[2023-02-24 07:57:44,024][784615] Num frames 2000... +[2023-02-24 07:57:44,089][784615] Num frames 2100... +[2023-02-24 07:57:44,147][784615] Avg episode rewards: #0: 5.612, true rewards: #0: 4.212 +[2023-02-24 07:57:44,148][784615] Avg episode reward: 5.612, avg true_objective: 4.212 +[2023-02-24 07:57:44,208][784615] Num frames 2200... +[2023-02-24 07:57:44,288][784615] Num frames 2300... +[2023-02-24 07:57:44,350][784615] Num frames 2400... +[2023-02-24 07:57:44,463][784615] Avg episode rewards: #0: 5.317, true rewards: #0: 4.150 +[2023-02-24 07:57:44,463][784615] Avg episode reward: 5.317, avg true_objective: 4.150 +[2023-02-24 07:57:44,471][784615] Num frames 2500... +[2023-02-24 07:57:44,540][784615] Num frames 2600... +[2023-02-24 07:57:44,607][784615] Num frames 2700... +[2023-02-24 07:57:44,671][784615] Num frames 2800... +[2023-02-24 07:57:44,738][784615] Num frames 2900... +[2023-02-24 07:57:44,810][784615] Num frames 3000... +[2023-02-24 07:57:44,865][784615] Avg episode rewards: #0: 5.717, true rewards: #0: 4.289 +[2023-02-24 07:57:44,866][784615] Avg episode reward: 5.717, avg true_objective: 4.289 +[2023-02-24 07:57:44,925][784615] Num frames 3100... +[2023-02-24 07:57:44,988][784615] Num frames 3200... +[2023-02-24 07:57:45,057][784615] Num frames 3300... +[2023-02-24 07:57:45,126][784615] Num frames 3400... +[2023-02-24 07:57:45,200][784615] Num frames 3500... +[2023-02-24 07:57:45,287][784615] Avg episode rewards: #0: 5.933, true rewards: #0: 4.432 +[2023-02-24 07:57:45,287][784615] Avg episode reward: 5.933, avg true_objective: 4.432 +[2023-02-24 07:57:45,332][784615] Num frames 3600... +[2023-02-24 07:57:45,410][784615] Num frames 3700... +[2023-02-24 07:57:45,468][784615] Num frames 3800... +[2023-02-24 07:57:45,570][784615] Num frames 3900... +[2023-02-24 07:57:45,634][784615] Num frames 4000... +[2023-02-24 07:57:45,698][784615] Num frames 4100... +[2023-02-24 07:57:45,762][784615] Num frames 4200... +[2023-02-24 07:57:45,873][784615] Avg episode rewards: #0: 6.869, true rewards: #0: 4.758 +[2023-02-24 07:57:45,874][784615] Avg episode reward: 6.869, avg true_objective: 4.758 +[2023-02-24 07:57:45,886][784615] Num frames 4300... +[2023-02-24 07:57:45,952][784615] Num frames 4400... +[2023-02-24 07:57:46,014][784615] Num frames 4500... +[2023-02-24 07:57:46,087][784615] Num frames 4600... +[2023-02-24 07:57:46,149][784615] Num frames 4700... +[2023-02-24 07:57:46,224][784615] Avg episode rewards: #0: 6.730, true rewards: #0: 4.730 +[2023-02-24 07:57:46,225][784615] Avg episode reward: 6.730, avg true_objective: 4.730 +[2023-02-24 07:57:48,353][784615] Replay video saved to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2023-02-24 07:58:39,896][784615] Loading existing experiment configuration from /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2023-02-24 07:58:39,896][784615] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-24 07:58:39,897][784615] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-24 07:58:39,897][784615] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-24 07:58:39,898][784615] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-24 07:58:39,898][784615] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-24 07:58:39,899][784615] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-24 07:58:39,899][784615] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-24 07:58:39,900][784615] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-24 07:58:39,900][784615] Adding new argument 'hf_repository'='chqmatteo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-24 07:58:39,900][784615] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-24 07:58:39,901][784615] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-24 07:58:39,901][784615] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-24 07:58:39,902][784615] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-24 07:58:39,902][784615] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-24 07:58:39,911][784615] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 07:58:39,912][784615] RunningMeanStd input shape: (1,) +[2023-02-24 07:58:39,919][784615] ConvEncoder: input_channels=3 +[2023-02-24 07:58:39,943][784615] Conv encoder output size: 512 +[2023-02-24 07:58:39,944][784615] Policy head output size: 512 +[2023-02-24 07:58:39,980][784615] Loading state from checkpoint /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... +[2023-02-24 07:58:40,400][784615] Num frames 100... +[2023-02-24 07:58:40,470][784615] Num frames 200... +[2023-02-24 07:58:40,530][784615] Num frames 300... +[2023-02-24 07:58:40,596][784615] Num frames 400... +[2023-02-24 07:58:40,684][784615] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2023-02-24 07:58:40,685][784615] Avg episode reward: 5.480, avg true_objective: 4.480 +[2023-02-24 07:58:40,719][784615] Num frames 500... +[2023-02-24 07:58:40,787][784615] Num frames 600... +[2023-02-24 07:58:40,850][784615] Num frames 700... +[2023-02-24 07:58:40,918][784615] Num frames 800... +[2023-02-24 07:58:41,005][784615] Num frames 900... +[2023-02-24 07:58:41,078][784615] Avg episode rewards: #0: 6.640, true rewards: #0: 4.640 +[2023-02-24 07:58:41,079][784615] Avg episode reward: 6.640, avg true_objective: 4.640 +[2023-02-24 07:58:41,129][784615] Num frames 1000... +[2023-02-24 07:58:41,202][784615] Num frames 1100... +[2023-02-24 07:58:41,276][784615] Num frames 1200... +[2023-02-24 07:58:41,360][784615] Num frames 1300... +[2023-02-24 07:58:41,429][784615] Num frames 1400... +[2023-02-24 07:58:41,493][784615] Num frames 1500... +[2023-02-24 07:58:41,554][784615] Num frames 1600... +[2023-02-24 07:58:41,667][784615] Avg episode rewards: #0: 8.653, true rewards: #0: 5.653 +[2023-02-24 07:58:41,668][784615] Avg episode reward: 8.653, avg true_objective: 5.653 +[2023-02-24 07:58:41,673][784615] Num frames 1700... +[2023-02-24 07:58:41,734][784615] Num frames 1800... +[2023-02-24 07:58:41,791][784615] Num frames 1900... +[2023-02-24 07:58:41,848][784615] Num frames 2000... +[2023-02-24 07:58:41,905][784615] Num frames 2100... +[2023-02-24 07:58:41,961][784615] Num frames 2200... +[2023-02-24 07:58:42,055][784615] Avg episode rewards: #0: 8.680, true rewards: #0: 5.680 +[2023-02-24 07:58:42,056][784615] Avg episode reward: 8.680, avg true_objective: 5.680 +[2023-02-24 07:58:42,077][784615] Num frames 2300... +[2023-02-24 07:58:42,141][784615] Num frames 2400... +[2023-02-24 07:58:42,203][784615] Num frames 2500... +[2023-02-24 07:58:42,260][784615] Num frames 2600... +[2023-02-24 07:58:42,346][784615] Avg episode rewards: #0: 7.712, true rewards: #0: 5.312 +[2023-02-24 07:58:42,348][784615] Avg episode reward: 7.712, avg true_objective: 5.312 +[2023-02-24 07:58:42,385][784615] Num frames 2700... +[2023-02-24 07:58:42,449][784615] Num frames 2800... +[2023-02-24 07:58:42,506][784615] Num frames 2900... +[2023-02-24 07:58:42,563][784615] Num frames 3000... +[2023-02-24 07:58:42,621][784615] Num frames 3100... +[2023-02-24 07:58:42,688][784615] Num frames 3200... +[2023-02-24 07:58:42,788][784615] Avg episode rewards: #0: 7.940, true rewards: #0: 5.440 +[2023-02-24 07:58:42,788][784615] Avg episode reward: 7.940, avg true_objective: 5.440 +[2023-02-24 07:58:42,817][784615] Num frames 3300... +[2023-02-24 07:58:42,890][784615] Num frames 3400... +[2023-02-24 07:58:42,964][784615] Num frames 3500... +[2023-02-24 07:58:43,040][784615] Num frames 3600... +[2023-02-24 07:58:43,128][784615] Avg episode rewards: #0: 7.354, true rewards: #0: 5.211 +[2023-02-24 07:58:43,130][784615] Avg episode reward: 7.354, avg true_objective: 5.211 +[2023-02-24 07:58:43,174][784615] Num frames 3700... +[2023-02-24 07:58:43,248][784615] Num frames 3800... +[2023-02-24 07:58:43,316][784615] Num frames 3900... +[2023-02-24 07:58:43,383][784615] Num frames 4000... +[2023-02-24 07:58:43,475][784615] Avg episode rewards: #0: 7.205, true rewards: #0: 5.080 +[2023-02-24 07:58:43,476][784615] Avg episode reward: 7.205, avg true_objective: 5.080 +[2023-02-24 07:58:43,498][784615] Num frames 4100... +[2023-02-24 07:58:43,556][784615] Num frames 4200... +[2023-02-24 07:58:43,614][784615] Num frames 4300... +[2023-02-24 07:58:43,670][784615] Num frames 4400... +[2023-02-24 07:58:43,727][784615] Num frames 4500... +[2023-02-24 07:58:43,790][784615] Num frames 4600... +[2023-02-24 07:58:43,891][784615] Avg episode rewards: #0: 7.413, true rewards: #0: 5.191 +[2023-02-24 07:58:43,891][784615] Avg episode reward: 7.413, avg true_objective: 5.191 +[2023-02-24 07:58:43,912][784615] Num frames 4700... +[2023-02-24 07:58:43,982][784615] Num frames 4800... +[2023-02-24 07:58:44,049][784615] Num frames 4900... +[2023-02-24 07:58:44,117][784615] Num frames 5000... +[2023-02-24 07:58:44,184][784615] Num frames 5100... +[2023-02-24 07:58:44,250][784615] Avg episode rewards: #0: 7.220, true rewards: #0: 5.120 +[2023-02-24 07:58:44,250][784615] Avg episode reward: 7.220, avg true_objective: 5.120 +[2023-02-24 07:58:46,584][784615] Replay video saved to /mnt/chqma/data-ssd-01/dataset/oss/RWKV-LM/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!