diff --git "a/sf_log.txt" "b/sf_log.txt"
--- "a/sf_log.txt"
+++ "b/sf_log.txt"
@@ -1,50 +1,50 @@
-[2023-02-22 15:55:46,126][11727] Saving configuration to /content/train_dir/default_experiment/config.json...
-[2023-02-22 15:55:46,128][11727] Rollout worker 0 uses device cpu
-[2023-02-22 15:55:46,129][11727] Rollout worker 1 uses device cpu
-[2023-02-22 15:55:46,130][11727] Rollout worker 2 uses device cpu
-[2023-02-22 15:55:46,132][11727] Rollout worker 3 uses device cpu
-[2023-02-22 15:55:46,133][11727] Rollout worker 4 uses device cpu
-[2023-02-22 15:55:46,136][11727] Rollout worker 5 uses device cpu
-[2023-02-22 15:55:46,137][11727] Rollout worker 6 uses device cpu
-[2023-02-22 15:55:46,139][11727] Rollout worker 7 uses device cpu
-[2023-02-22 15:55:46,236][11727] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-22 15:55:46,238][11727] InferenceWorker_p0-w0: min num requests: 2
-[2023-02-22 15:55:46,268][11727] Starting all processes...
-[2023-02-22 15:55:46,270][11727] Starting process learner_proc0
-[2023-02-22 15:55:46,326][11727] Starting all processes...
-[2023-02-22 15:55:46,338][11727] Starting process inference_proc0-0
-[2023-02-22 15:55:46,338][11727] Starting process rollout_proc0
-[2023-02-22 15:55:46,339][11727] Starting process rollout_proc1
-[2023-02-22 15:55:46,340][11727] Starting process rollout_proc2
-[2023-02-22 15:55:46,342][11727] Starting process rollout_proc3
-[2023-02-22 15:55:46,342][11727] Starting process rollout_proc4
-[2023-02-22 15:55:46,357][11727] Starting process rollout_proc5
-[2023-02-22 15:55:46,358][11727] Starting process rollout_proc6
-[2023-02-22 15:55:46,358][11727] Starting process rollout_proc7
-[2023-02-22 15:55:48,129][11948] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-22 15:55:48,129][11948] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
-[2023-02-22 15:55:48,448][11949] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,552][11934] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-22 15:55:48,553][11934] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
-[2023-02-22 15:55:48,778][11974] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,778][11953] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,807][11950] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,860][11951] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,864][11973] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,895][11975] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:48,947][11970] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
-[2023-02-22 15:55:49,003][11948] Num visible devices: 1
-[2023-02-22 15:55:49,003][11934] Num visible devices: 1
-[2023-02-22 15:55:49,028][11934] Starting seed is not provided
-[2023-02-22 15:55:49,028][11934] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-22 15:55:49,028][11934] Initializing actor-critic model on device cuda:0
-[2023-02-22 15:55:49,028][11934] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-22 15:55:49,030][11934] RunningMeanStd input shape: (1,)
-[2023-02-22 15:55:49,044][11934] ConvEncoder: input_channels=3
-[2023-02-22 15:55:49,304][11934] Conv encoder output size: 512
-[2023-02-22 15:55:49,304][11934] Policy head output size: 512
-[2023-02-22 15:55:49,345][11934] Created Actor Critic model with architecture:
-[2023-02-22 15:55:49,345][11934] ActorCriticSharedWeights(
+[2023-02-23 10:01:14,889][07928] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2023-02-23 10:01:14,892][07928] Rollout worker 0 uses device cpu
+[2023-02-23 10:01:14,894][07928] Rollout worker 1 uses device cpu
+[2023-02-23 10:01:14,896][07928] Rollout worker 2 uses device cpu
+[2023-02-23 10:01:14,897][07928] Rollout worker 3 uses device cpu
+[2023-02-23 10:01:14,899][07928] Rollout worker 4 uses device cpu
+[2023-02-23 10:01:14,900][07928] Rollout worker 5 uses device cpu
+[2023-02-23 10:01:14,902][07928] Rollout worker 6 uses device cpu
+[2023-02-23 10:01:14,904][07928] Rollout worker 7 uses device cpu
+[2023-02-23 10:01:15,008][07928] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-02-23 10:01:15,010][07928] InferenceWorker_p0-w0: min num requests: 2
+[2023-02-23 10:01:15,041][07928] Starting all processes...
+[2023-02-23 10:01:15,043][07928] Starting process learner_proc0
+[2023-02-23 10:01:15,098][07928] Starting all processes...
+[2023-02-23 10:01:15,106][07928] Starting process inference_proc0-0
+[2023-02-23 10:01:15,107][07928] Starting process rollout_proc0
+[2023-02-23 10:01:15,108][07928] Starting process rollout_proc1
+[2023-02-23 10:01:15,110][07928] Starting process rollout_proc2
+[2023-02-23 10:01:15,111][07928] Starting process rollout_proc3
+[2023-02-23 10:01:15,114][07928] Starting process rollout_proc4
+[2023-02-23 10:01:15,121][07928] Starting process rollout_proc5
+[2023-02-23 10:01:15,124][07928] Starting process rollout_proc6
+[2023-02-23 10:01:15,125][07928] Starting process rollout_proc7
+[2023-02-23 10:01:17,133][12605] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,157][12588] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,338][12608] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,425][12572] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-02-23 10:01:17,426][12572] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2023-02-23 10:01:17,504][12606] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,518][12586] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-02-23 10:01:17,518][12586] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2023-02-23 10:01:17,522][12587] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,525][12572] Num visible devices: 1
+[2023-02-23 10:01:17,531][12586] Num visible devices: 1
+[2023-02-23 10:01:17,561][12572] Starting seed is not provided
+[2023-02-23 10:01:17,562][12572] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-02-23 10:01:17,562][12572] Initializing actor-critic model on device cuda:0
+[2023-02-23 10:01:17,562][12572] RunningMeanStd input shape: (3, 72, 128)
+[2023-02-23 10:01:17,564][12572] RunningMeanStd input shape: (1,)
+[2023-02-23 10:01:17,569][12589] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,579][12572] ConvEncoder: input_channels=3
+[2023-02-23 10:01:17,594][12607] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,596][12590] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
+[2023-02-23 10:01:17,842][12572] Conv encoder output size: 512
+[2023-02-23 10:01:17,842][12572] Policy head output size: 512
+[2023-02-23 10:01:17,891][12572] Created Actor Critic model with architecture:
+[2023-02-23 10:01:17,891][12572] ActorCriticSharedWeights(
   (obs_normalizer): ObservationNormalizer(
     (running_mean_std): RunningMeanStdDictInPlace(
       (running_mean_std): ModuleDict(
@@ -85,30 +85,33 @@
     (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
   )
 )
-[2023-02-22 15:55:56,154][11934] Using optimizer <class 'torch.optim.adam.Adam'>
-[2023-02-22 15:55:56,155][11934] No checkpoints found
-[2023-02-22 15:55:56,156][11934] Did not load from checkpoint, starting from scratch!
-[2023-02-22 15:55:56,156][11934] Initialized policy 0 weights for model version 0
-[2023-02-22 15:55:56,158][11934] LearnerWorker_p0 finished initialization!
-[2023-02-22 15:55:56,159][11934] Using GPUs [0] for process 0 (actually maps to GPUs [0])
-[2023-02-22 15:55:56,267][11948] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-22 15:55:56,268][11948] RunningMeanStd input shape: (1,)
-[2023-02-22 15:55:56,284][11948] ConvEncoder: input_channels=3
-[2023-02-22 15:55:56,395][11948] Conv encoder output size: 512
-[2023-02-22 15:55:56,396][11948] Policy head output size: 512
-[2023-02-22 15:55:57,406][11727] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-22 15:55:59,174][11727] Inference worker 0-0 is ready!
-[2023-02-22 15:55:59,176][11727] All inference workers are ready! Signal rollout workers to start!
-[2023-02-22 15:55:59,195][11974] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,195][11973] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,201][11950] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,202][11970] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,202][11951] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,202][11975] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,202][11953] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,202][11949] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 15:55:59,257][11974] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process...
-[2023-02-22 15:55:59,258][11974] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=()
+[2023-02-23 10:01:24,796][12572] Using optimizer <class 'torch.optim.adam.Adam'>
+[2023-02-23 10:01:24,797][12572] No checkpoints found
+[2023-02-23 10:01:24,798][12572] Did not load from checkpoint, starting from scratch!
+[2023-02-23 10:01:24,798][12572] Initialized policy 0 weights for model version 0
+[2023-02-23 10:01:24,801][12572] LearnerWorker_p0 finished initialization!
+[2023-02-23 10:01:24,801][12572] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2023-02-23 10:01:24,910][12586] RunningMeanStd input shape: (3, 72, 128)
+[2023-02-23 10:01:24,911][12586] RunningMeanStd input shape: (1,)
+[2023-02-23 10:01:24,926][12586] ConvEncoder: input_channels=3
+[2023-02-23 10:01:25,036][12586] Conv encoder output size: 512
+[2023-02-23 10:01:25,036][12586] Policy head output size: 512
+[2023-02-23 10:01:25,316][07928] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-02-23 10:01:27,796][07928] Inference worker 0-0 is ready!
+[2023-02-23 10:01:27,798][07928] All inference workers are ready! Signal rollout workers to start!
+[2023-02-23 10:01:27,818][12587] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,818][12590] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,823][12589] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,825][12608] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,825][12607] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,825][12588] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,825][12606] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,825][12605] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:01:27,886][12606] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process...
+[2023-02-23 10:01:27,886][12605] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process...
+[2023-02-23 10:01:27,887][12587] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process...
+[2023-02-23 10:01:27,887][12589] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process...
+[2023-02-23 10:01:27,887][12606] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=()
 Traceback (most recent call last):
   File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init
     self.game.init()
@@ -150,585 +153,1841 @@ Traceback (most recent call last):
   File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init
     raise EnvCriticalError()
 sample_factory.envs.env_utils.EnvCriticalError
-[2023-02-22 15:55:59,260][11974] Unhandled exception  in evt loop rollout_proc6_evt_loop
-[2023-02-22 15:55:59,528][11973] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,528][11953] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,528][11951] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,528][11949] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,599][11950] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,600][11970] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,776][11951] Decorrelating experience for 32 frames...
-[2023-02-22 15:55:59,801][11953] Decorrelating experience for 32 frames...
-[2023-02-22 15:55:59,821][11975] Decorrelating experience for 0 frames...
-[2023-02-22 15:55:59,852][11970] Decorrelating experience for 32 frames...
-[2023-02-22 15:55:59,871][11950] Decorrelating experience for 32 frames...
-[2023-02-22 15:55:59,887][11949] Decorrelating experience for 32 frames...
-[2023-02-22 15:56:00,034][11973] Decorrelating experience for 32 frames...
-[2023-02-22 15:56:00,121][11951] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,134][11975] Decorrelating experience for 32 frames...
-[2023-02-22 15:56:00,157][11970] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,175][11950] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,349][11973] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,423][11953] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,427][11949] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,428][11951] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,460][11970] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,695][11950] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,714][11975] Decorrelating experience for 64 frames...
-[2023-02-22 15:56:00,726][11949] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,736][11953] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,775][11973] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:00,999][11975] Decorrelating experience for 96 frames...
-[2023-02-22 15:56:02,406][11727] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-22 15:56:05,026][11934] Signal inference workers to stop experience collection...
-[2023-02-22 15:56:05,032][11948] InferenceWorker_p0-w0: stopping experience collection
-[2023-02-22 15:56:06,229][11727] Heartbeat connected on Batcher_0
-[2023-02-22 15:56:06,237][11727] Heartbeat connected on InferenceWorker_p0-w0
-[2023-02-22 15:56:06,244][11727] Heartbeat connected on RolloutWorker_w0
-[2023-02-22 15:56:06,247][11727] Heartbeat connected on RolloutWorker_w1
-[2023-02-22 15:56:06,251][11727] Heartbeat connected on RolloutWorker_w2
-[2023-02-22 15:56:06,254][11727] Heartbeat connected on RolloutWorker_w3
-[2023-02-22 15:56:06,258][11727] Heartbeat connected on RolloutWorker_w4
-[2023-02-22 15:56:06,261][11727] Heartbeat connected on RolloutWorker_w5
-[2023-02-22 15:56:06,268][11727] Heartbeat connected on RolloutWorker_w7
-[2023-02-22 15:56:07,406][11727] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 310.2. Samples: 3102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
-[2023-02-22 15:56:07,408][11727] Avg episode reward: [(0, '2.718')]
-[2023-02-22 15:56:07,941][11934] Signal inference workers to resume experience collection...
-[2023-02-22 15:56:07,941][11948] InferenceWorker_p0-w0: resuming experience collection
-[2023-02-22 15:56:08,828][11727] Heartbeat connected on LearnerWorker_p0
-[2023-02-22 15:56:10,521][11948] Updated weights for policy 0, policy_version 10 (0.0011)
-[2023-02-22 15:56:12,406][11727] Fps is (10 sec: 6963.0, 60 sec: 4642.1, 300 sec: 4642.1). Total num frames: 69632. Throughput: 0: 901.7. Samples: 13526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2023-02-22 15:56:12,408][11727] Avg episode reward: [(0, '4.505')]
-[2023-02-22 15:56:12,915][11948] Updated weights for policy 0, policy_version 20 (0.0011)
-[2023-02-22 15:56:15,191][11948] Updated weights for policy 0, policy_version 30 (0.0011)
-[2023-02-22 15:56:17,406][11727] Fps is (10 sec: 15564.7, 60 sec: 7782.4, 300 sec: 7782.4). Total num frames: 155648. Throughput: 0: 1980.7. Samples: 39614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:56:17,408][11727] Avg episode reward: [(0, '4.543')]
-[2023-02-22 15:56:17,424][11934] Saving new best policy, reward=4.543!
-[2023-02-22 15:56:17,653][11948] Updated weights for policy 0, policy_version 40 (0.0011)
-[2023-02-22 15:56:20,028][11948] Updated weights for policy 0, policy_version 50 (0.0011)
-[2023-02-22 15:56:22,406][11727] Fps is (10 sec: 17203.5, 60 sec: 9666.5, 300 sec: 9666.5). Total num frames: 241664. Throughput: 0: 2092.2. Samples: 52304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:56:22,409][11727] Avg episode reward: [(0, '4.353')]
-[2023-02-22 15:56:22,468][11948] Updated weights for policy 0, policy_version 60 (0.0011)
-[2023-02-22 15:56:24,692][11948] Updated weights for policy 0, policy_version 70 (0.0011)
-[2023-02-22 15:56:26,917][11948] Updated weights for policy 0, policy_version 80 (0.0011)
-[2023-02-22 15:56:27,406][11727] Fps is (10 sec: 17612.8, 60 sec: 11059.2, 300 sec: 11059.2). Total num frames: 331776. Throughput: 0: 2634.7. Samples: 79042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:56:27,408][11727] Avg episode reward: [(0, '4.552')]
-[2023-02-22 15:56:27,412][11934] Saving new best policy, reward=4.552!
-[2023-02-22 15:56:29,208][11948] Updated weights for policy 0, policy_version 90 (0.0011)
-[2023-02-22 15:56:31,459][11948] Updated weights for policy 0, policy_version 100 (0.0010)
-[2023-02-22 15:56:32,406][11727] Fps is (10 sec: 18432.0, 60 sec: 12171.0, 300 sec: 12171.0). Total num frames: 425984. Throughput: 0: 3030.2. Samples: 106058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:56:32,408][11727] Avg episode reward: [(0, '4.755')]
-[2023-02-22 15:56:32,417][11934] Saving new best policy, reward=4.755!
-[2023-02-22 15:56:33,894][11948] Updated weights for policy 0, policy_version 110 (0.0011)
-[2023-02-22 15:56:36,358][11948] Updated weights for policy 0, policy_version 120 (0.0012)
-[2023-02-22 15:56:37,406][11727] Fps is (10 sec: 17612.8, 60 sec: 12697.6, 300 sec: 12697.6). Total num frames: 507904. Throughput: 0: 2967.0. Samples: 118682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
-[2023-02-22 15:56:37,408][11727] Avg episode reward: [(0, '4.517')]
-[2023-02-22 15:56:38,646][11948] Updated weights for policy 0, policy_version 130 (0.0011)
-[2023-02-22 15:56:40,961][11948] Updated weights for policy 0, policy_version 140 (0.0011)
-[2023-02-22 15:56:42,406][11727] Fps is (10 sec: 17203.3, 60 sec: 13289.2, 300 sec: 13289.2). Total num frames: 598016. Throughput: 0: 3222.5. Samples: 145014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:56:42,408][11727] Avg episode reward: [(0, '4.649')]
-[2023-02-22 15:56:43,202][11948] Updated weights for policy 0, policy_version 150 (0.0011)
-[2023-02-22 15:56:45,462][11948] Updated weights for policy 0, policy_version 160 (0.0011)
-[2023-02-22 15:56:47,406][11727] Fps is (10 sec: 18022.4, 60 sec: 13762.6, 300 sec: 13762.6). Total num frames: 688128. Throughput: 0: 3827.0. Samples: 172214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:56:47,409][11727] Avg episode reward: [(0, '4.915')]
-[2023-02-22 15:56:47,413][11934] Saving new best policy, reward=4.915!
-[2023-02-22 15:56:47,753][11948] Updated weights for policy 0, policy_version 170 (0.0011)
-[2023-02-22 15:56:50,203][11948] Updated weights for policy 0, policy_version 180 (0.0011)
-[2023-02-22 15:56:52,406][11727] Fps is (10 sec: 17612.7, 60 sec: 14075.3, 300 sec: 14075.3). Total num frames: 774144. Throughput: 0: 4038.2. Samples: 184820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:56:52,409][11727] Avg episode reward: [(0, '5.185')]
-[2023-02-22 15:56:52,417][11934] Saving new best policy, reward=5.185!
-[2023-02-22 15:56:52,622][11948] Updated weights for policy 0, policy_version 190 (0.0011)
-[2023-02-22 15:56:54,883][11948] Updated weights for policy 0, policy_version 200 (0.0011)
-[2023-02-22 15:56:57,222][11948] Updated weights for policy 0, policy_version 210 (0.0011)
-[2023-02-22 15:56:57,406][11727] Fps is (10 sec: 17203.3, 60 sec: 14336.0, 300 sec: 14336.0). Total num frames: 860160. Throughput: 0: 4388.8. Samples: 211022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:56:57,409][11727] Avg episode reward: [(0, '5.563')]
-[2023-02-22 15:56:57,425][11934] Saving new best policy, reward=5.563!
-[2023-02-22 15:56:59,425][11948] Updated weights for policy 0, policy_version 220 (0.0010)
-[2023-02-22 15:57:01,745][11948] Updated weights for policy 0, policy_version 230 (0.0011)
-[2023-02-22 15:57:02,406][11727] Fps is (10 sec: 18022.5, 60 sec: 15906.1, 300 sec: 14682.6). Total num frames: 954368. Throughput: 0: 4411.3. Samples: 238122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:57:02,409][11727] Avg episode reward: [(0, '6.131')]
-[2023-02-22 15:57:02,415][11934] Saving new best policy, reward=6.131!
-[2023-02-22 15:57:04,074][11948] Updated weights for policy 0, policy_version 240 (0.0011)
-[2023-02-22 15:57:06,481][11948] Updated weights for policy 0, policy_version 250 (0.0011)
-[2023-02-22 15:57:07,406][11727] Fps is (10 sec: 17612.8, 60 sec: 17271.5, 300 sec: 14804.1). Total num frames: 1036288. Throughput: 0: 4415.2. Samples: 250988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:57:07,409][11727] Avg episode reward: [(0, '6.780')]
-[2023-02-22 15:57:07,427][11934] Saving new best policy, reward=6.780!
-[2023-02-22 15:57:08,893][11948] Updated weights for policy 0, policy_version 260 (0.0011)
-[2023-02-22 15:57:11,132][11948] Updated weights for policy 0, policy_version 270 (0.0010)
-[2023-02-22 15:57:12,406][11727] Fps is (10 sec: 17203.3, 60 sec: 17612.9, 300 sec: 15018.7). Total num frames: 1126400. Throughput: 0: 4406.4. Samples: 277330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:57:12,408][11727] Avg episode reward: [(0, '7.336')]
-[2023-02-22 15:57:12,417][11934] Saving new best policy, reward=7.336!
-[2023-02-22 15:57:13,391][11948] Updated weights for policy 0, policy_version 280 (0.0012)
-[2023-02-22 15:57:15,635][11948] Updated weights for policy 0, policy_version 290 (0.0010)
-[2023-02-22 15:57:17,406][11727] Fps is (10 sec: 18022.3, 60 sec: 17681.1, 300 sec: 15206.4). Total num frames: 1216512. Throughput: 0: 4410.0. Samples: 304508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2023-02-22 15:57:17,408][11727] Avg episode reward: [(0, '7.986')]
-[2023-02-22 15:57:17,419][11934] Saving new best policy, reward=7.986!
-[2023-02-22 15:57:17,940][11948] Updated weights for policy 0, policy_version 300 (0.0011)
-[2023-02-22 15:57:20,251][11948] Updated weights for policy 0, policy_version 310 (0.0011)
-[2023-02-22 15:57:22,406][11727] Fps is (10 sec: 17612.6, 60 sec: 17681.1, 300 sec: 15323.9). Total num frames: 1302528. Throughput: 0: 4421.6. Samples: 317652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
-[2023-02-22 15:57:22,408][11727] Avg episode reward: [(0, '9.339')]
-[2023-02-22 15:57:22,419][11934] Saving new best policy, reward=9.339!
-[2023-02-22 15:57:22,705][11948] Updated weights for policy 0, policy_version 320 (0.0012)
-[2023-02-22 15:57:25,079][11948] Updated weights for policy 0, policy_version 330 (0.0011)
-[2023-02-22 15:57:27,406][11727] Fps is (10 sec: 17203.4, 60 sec: 17612.8, 300 sec: 15428.3). Total num frames: 1388544. Throughput: 0: 4407.8. Samples: 343364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2023-02-22 15:57:27,408][11727] Avg episode reward: [(0, '9.500')]
-[2023-02-22 15:57:27,420][11934] Saving new best policy, reward=9.500!
-[2023-02-22 15:57:27,422][11948] Updated weights for policy 0, policy_version 340 (0.0011)
-[2023-02-22 15:57:29,748][11948] Updated weights for policy 0, policy_version 350 (0.0011)
-[2023-02-22 15:57:32,048][11948] Updated weights for policy 0, policy_version 360 (0.0010)
-[2023-02-22 15:57:32,406][11727] Fps is (10 sec: 17612.9, 60 sec: 17544.5, 300 sec: 15564.8). Total num frames: 1478656. Throughput: 0: 4389.4. Samples: 369738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:57:32,408][11727] Avg episode reward: [(0, '10.590')]
-[2023-02-22 15:57:32,416][11934] Saving new best policy, reward=10.590!
-[2023-02-22 15:57:34,420][11948] Updated weights for policy 0, policy_version 370 (0.0011)
-[2023-02-22 15:57:36,777][11948] Updated weights for policy 0, policy_version 380 (0.0011)
-[2023-02-22 15:57:37,406][11727] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 15646.7). Total num frames: 1564672. Throughput: 0: 4400.5. Samples: 382842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2023-02-22 15:57:37,408][11727] Avg episode reward: [(0, '11.263')]
-[2023-02-22 15:57:37,412][11934] Saving new best policy, reward=11.263!
-[2023-02-22 15:57:39,255][11948] Updated weights for policy 0, policy_version 390 (0.0012)
-[2023-02-22 15:57:41,601][11948] Updated weights for policy 0, policy_version 400 (0.0011)
-[2023-02-22 15:57:42,406][11727] Fps is (10 sec: 17203.1, 60 sec: 17544.5, 300 sec: 15720.8). Total num frames: 1650688. Throughput: 0: 4381.9. Samples: 408208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:57:42,408][11727] Avg episode reward: [(0, '11.024')]
-[2023-02-22 15:57:42,417][11934] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth...
-[2023-02-22 15:57:43,882][11948] Updated weights for policy 0, policy_version 410 (0.0018)
-[2023-02-22 15:57:46,152][11948] Updated weights for policy 0, policy_version 420 (0.0011)
-[2023-02-22 15:57:47,406][11727] Fps is (10 sec: 17612.9, 60 sec: 17544.6, 300 sec: 15825.5). Total num frames: 1740800. Throughput: 0: 4378.0. Samples: 435132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:57:47,408][11727] Avg episode reward: [(0, '14.537')]
-[2023-02-22 15:57:47,410][11934] Saving new best policy, reward=14.537!
-[2023-02-22 15:57:48,472][11948] Updated weights for policy 0, policy_version 430 (0.0011)
-[2023-02-22 15:57:50,720][11948] Updated weights for policy 0, policy_version 440 (0.0011)
-[2023-02-22 15:57:52,406][11727] Fps is (10 sec: 17612.9, 60 sec: 17544.6, 300 sec: 15885.4). Total num frames: 1826816. Throughput: 0: 4394.8. Samples: 448752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:57:52,409][11727] Avg episode reward: [(0, '15.374')]
-[2023-02-22 15:57:52,415][11934] Saving new best policy, reward=15.374!
-[2023-02-22 15:57:53,133][11948] Updated weights for policy 0, policy_version 450 (0.0011)
-[2023-02-22 15:57:55,519][11948] Updated weights for policy 0, policy_version 460 (0.0011)
-[2023-02-22 15:57:57,406][11727] Fps is (10 sec: 17612.7, 60 sec: 17612.8, 300 sec: 15974.4). Total num frames: 1916928. Throughput: 0: 4381.9. Samples: 474514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:57:57,409][11727] Avg episode reward: [(0, '17.365')]
-[2023-02-22 15:57:57,411][11934] Saving new best policy, reward=17.365!
-[2023-02-22 15:57:57,797][11948] Updated weights for policy 0, policy_version 470 (0.0011)
-[2023-02-22 15:58:00,064][11948] Updated weights for policy 0, policy_version 480 (0.0011)
-[2023-02-22 15:58:02,345][11948] Updated weights for policy 0, policy_version 490 (0.0010)
-[2023-02-22 15:58:02,406][11727] Fps is (10 sec: 18022.6, 60 sec: 17544.6, 300 sec: 16056.3). Total num frames: 2007040. Throughput: 0: 4381.5. Samples: 501676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:58:02,409][11727] Avg episode reward: [(0, '15.039')]
-[2023-02-22 15:58:04,589][11948] Updated weights for policy 0, policy_version 500 (0.0011)
-[2023-02-22 15:58:06,878][11948] Updated weights for policy 0, policy_version 510 (0.0011)
-[2023-02-22 15:58:07,406][11727] Fps is (10 sec: 17612.8, 60 sec: 17612.8, 300 sec: 16100.4). Total num frames: 2093056. Throughput: 0: 4393.0. Samples: 515336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:58:07,409][11727] Avg episode reward: [(0, '16.264')]
-[2023-02-22 15:58:09,322][11948] Updated weights for policy 0, policy_version 520 (0.0011)
-[2023-02-22 15:58:11,720][11948] Updated weights for policy 0, policy_version 530 (0.0011)
-[2023-02-22 15:58:12,406][11727] Fps is (10 sec: 17203.1, 60 sec: 17544.5, 300 sec: 16141.3). Total num frames: 2179072. Throughput: 0: 4393.1. Samples: 541052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:58:12,409][11727] Avg episode reward: [(0, '16.818')]
-[2023-02-22 15:58:14,040][11948] Updated weights for policy 0, policy_version 540 (0.0011)
-[2023-02-22 15:58:16,254][11948] Updated weights for policy 0, policy_version 550 (0.0011)
-[2023-02-22 15:58:17,406][11727] Fps is (10 sec: 17612.7, 60 sec: 17544.5, 300 sec: 16208.5). Total num frames: 2269184. Throughput: 0: 4403.9. Samples: 567914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:58:17,409][11727] Avg episode reward: [(0, '17.111')]
-[2023-02-22 15:58:18,537][11948] Updated weights for policy 0, policy_version 560 (0.0011)
-[2023-02-22 15:58:20,818][11948] Updated weights for policy 0, policy_version 570 (0.0011)
-[2023-02-22 15:58:22,406][11727] Fps is (10 sec: 18432.0, 60 sec: 17681.1, 300 sec: 16299.3). Total num frames: 2363392. Throughput: 0: 4414.4. Samples: 581488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:58:22,408][11727] Avg episode reward: [(0, '18.845')]
-[2023-02-22 15:58:22,417][11934] Saving new best policy, reward=18.845!
-[2023-02-22 15:58:23,129][11948] Updated weights for policy 0, policy_version 580 (0.0011)
-[2023-02-22 15:58:25,602][11948] Updated weights for policy 0, policy_version 590 (0.0011)
-[2023-02-22 15:58:27,406][11727] Fps is (10 sec: 17613.0, 60 sec: 17612.8, 300 sec: 16302.1). Total num frames: 2445312. Throughput: 0: 4420.1. Samples: 607112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:58:27,409][11727] Avg episode reward: [(0, '17.650')]
-[2023-02-22 15:58:28,042][11948] Updated weights for policy 0, policy_version 600 (0.0012)
-[2023-02-22 15:58:30,381][11948] Updated weights for policy 0, policy_version 610 (0.0011)
-[2023-02-22 15:58:32,406][11727] Fps is (10 sec: 16793.4, 60 sec: 17544.5, 300 sec: 16331.1). Total num frames: 2531328. Throughput: 0: 4401.9. Samples: 633220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:58:32,409][11727] Avg episode reward: [(0, '18.158')]
-[2023-02-22 15:58:32,688][11948] Updated weights for policy 0, policy_version 620 (0.0011)
-[2023-02-22 15:58:35,144][11948] Updated weights for policy 0, policy_version 630 (0.0011)
-[2023-02-22 15:58:37,406][11727] Fps is (10 sec: 17203.0, 60 sec: 17544.5, 300 sec: 16358.4). Total num frames: 2617344. Throughput: 0: 4381.1. Samples: 645902. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:58:37,409][11727] Avg episode reward: [(0, '17.190')]
-[2023-02-22 15:58:37,540][11948] Updated weights for policy 0, policy_version 640 (0.0011)
-[2023-02-22 15:58:40,020][11948] Updated weights for policy 0, policy_version 650 (0.0012)
-[2023-02-22 15:58:42,406][11727] Fps is (10 sec: 16793.8, 60 sec: 17476.3, 300 sec: 16359.2). Total num frames: 2699264. Throughput: 0: 4369.1. Samples: 671124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:58:42,410][11727] Avg episode reward: [(0, '20.379')]
-[2023-02-22 15:58:42,443][11934] Saving new best policy, reward=20.379!
-[2023-02-22 15:58:42,447][11948] Updated weights for policy 0, policy_version 660 (0.0011)
-[2023-02-22 15:58:44,798][11948] Updated weights for policy 0, policy_version 670 (0.0012)
-[2023-02-22 15:58:47,071][11948] Updated weights for policy 0, policy_version 680 (0.0010)
-[2023-02-22 15:58:47,406][11727] Fps is (10 sec: 17203.2, 60 sec: 17476.2, 300 sec: 16408.1). Total num frames: 2789376. Throughput: 0: 4347.0. Samples: 697292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2023-02-22 15:58:47,409][11727] Avg episode reward: [(0, '20.186')]
-[2023-02-22 15:58:49,407][11948] Updated weights for policy 0, policy_version 690 (0.0010)
-[2023-02-22 15:58:51,689][11948] Updated weights for policy 0, policy_version 700 (0.0011)
-[2023-02-22 15:58:52,406][11727] Fps is (10 sec: 18022.2, 60 sec: 17544.5, 300 sec: 16454.2). Total num frames: 2879488. Throughput: 0: 4340.9. Samples: 710678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:58:52,408][11727] Avg episode reward: [(0, '22.052')]
-[2023-02-22 15:58:52,417][11934] Saving new best policy, reward=22.052!
-[2023-02-22 15:58:54,011][11948] Updated weights for policy 0, policy_version 710 (0.0011)
-[2023-02-22 15:58:56,392][11948] Updated weights for policy 0, policy_version 720 (0.0011)
-[2023-02-22 15:58:57,406][11727] Fps is (10 sec: 17613.0, 60 sec: 17476.3, 300 sec: 16475.0). Total num frames: 2965504. Throughput: 0: 4353.2. Samples: 736946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:58:57,409][11727] Avg episode reward: [(0, '21.568')]
-[2023-02-22 15:58:58,776][11948] Updated weights for policy 0, policy_version 730 (0.0011)
-[2023-02-22 15:59:01,128][11948] Updated weights for policy 0, policy_version 740 (0.0010)
-[2023-02-22 15:59:02,406][11727] Fps is (10 sec: 17203.3, 60 sec: 17408.0, 300 sec: 16494.7). Total num frames: 3051520. Throughput: 0: 4339.9. Samples: 763210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:59:02,409][11727] Avg episode reward: [(0, '20.665')]
-[2023-02-22 15:59:03,339][11948] Updated weights for policy 0, policy_version 750 (0.0011)
-[2023-02-22 15:59:05,685][11948] Updated weights for policy 0, policy_version 760 (0.0011)
-[2023-02-22 15:59:07,406][11727] Fps is (10 sec: 17612.6, 60 sec: 17476.3, 300 sec: 16534.9). Total num frames: 3141632. Throughput: 0: 4337.4. Samples: 776670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:59:07,409][11727] Avg episode reward: [(0, '21.001')]
-[2023-02-22 15:59:07,943][11948] Updated weights for policy 0, policy_version 770 (0.0010)
-[2023-02-22 15:59:10,233][11948] Updated weights for policy 0, policy_version 780 (0.0011)
-[2023-02-22 15:59:12,406][11727] Fps is (10 sec: 17612.8, 60 sec: 17476.3, 300 sec: 16552.0). Total num frames: 3227648. Throughput: 0: 4356.0. Samples: 803132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
-[2023-02-22 15:59:12,408][11727] Avg episode reward: [(0, '21.359')]
-[2023-02-22 15:59:12,713][11948] Updated weights for policy 0, policy_version 790 (0.0012)
-[2023-02-22 15:59:15,093][11948] Updated weights for policy 0, policy_version 800 (0.0011)
-[2023-02-22 15:59:17,407][11948] Updated weights for policy 0, policy_version 810 (0.0011)
-[2023-02-22 15:59:17,406][11727] Fps is (10 sec: 17613.2, 60 sec: 17476.3, 300 sec: 16588.8). Total num frames: 3317760. Throughput: 0: 4351.6. Samples: 829042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:59:17,409][11727] Avg episode reward: [(0, '19.476')]
-[2023-02-22 15:59:19,655][11948] Updated weights for policy 0, policy_version 820 (0.0011)
-[2023-02-22 15:59:21,941][11948] Updated weights for policy 0, policy_version 830 (0.0010)
-[2023-02-22 15:59:22,409][11727] Fps is (10 sec: 18016.5, 60 sec: 17407.0, 300 sec: 16623.5). Total num frames: 3407872. Throughput: 0: 4370.1. Samples: 842572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:59:22,411][11727] Avg episode reward: [(0, '19.382')]
-[2023-02-22 15:59:24,222][11948] Updated weights for policy 0, policy_version 840 (0.0011)
-[2023-02-22 15:59:26,601][11948] Updated weights for policy 0, policy_version 850 (0.0011)
-[2023-02-22 15:59:27,406][11727] Fps is (10 sec: 17612.3, 60 sec: 17476.2, 300 sec: 16637.6). Total num frames: 3493888. Throughput: 0: 4403.1. Samples: 869264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
-[2023-02-22 15:59:27,409][11727] Avg episode reward: [(0, '19.677')]
-[2023-02-22 15:59:29,011][11948] Updated weights for policy 0, policy_version 860 (0.0012)
-[2023-02-22 15:59:31,383][11948] Updated weights for policy 0, policy_version 870 (0.0011)
-[2023-02-22 15:59:32,406][11727] Fps is (10 sec: 17208.8, 60 sec: 17476.3, 300 sec: 16650.7). Total num frames: 3579904. Throughput: 0: 4391.0. Samples: 894886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:59:32,409][11727] Avg episode reward: [(0, '20.009')]
-[2023-02-22 15:59:33,716][11948] Updated weights for policy 0, policy_version 880 (0.0011)
-[2023-02-22 15:59:35,972][11948] Updated weights for policy 0, policy_version 890 (0.0011)
-[2023-02-22 15:59:37,406][11727] Fps is (10 sec: 17612.9, 60 sec: 17544.5, 300 sec: 16681.9). Total num frames: 3670016. Throughput: 0: 4391.6. Samples: 908300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
-[2023-02-22 15:59:37,408][11727] Avg episode reward: [(0, '19.299')]
-[2023-02-22 15:59:38,317][11948] Updated weights for policy 0, policy_version 900 (0.0011)
-[2023-02-22 15:59:40,527][11948] Updated weights for policy 0, policy_version 910 (0.0011)
-[2023-02-22 15:59:42,406][11727] Fps is (10 sec: 18022.4, 60 sec: 17681.1, 300 sec: 16711.7). Total num frames: 3760128. Throughput: 0: 4407.5. Samples: 935284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
-[2023-02-22 15:59:42,409][11727] Avg episode reward: [(0, '21.332')]
-[2023-02-22 15:59:42,417][11934] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000918_3760128.pth...
-[2023-02-22 15:59:42,893][11948] Updated weights for policy 0, policy_version 920 (0.0011)
-[2023-02-22 15:59:45,384][11948] Updated weights for policy 0, policy_version 930 (0.0012)
-[2023-02-22 15:59:47,406][11727] Fps is (10 sec: 17203.2, 60 sec: 17544.5, 300 sec: 16704.6). Total num frames: 3842048. Throughput: 0: 4387.0. Samples: 960626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
-[2023-02-22 15:59:47,409][11727] Avg episode reward: [(0, '19.742')]
-[2023-02-22 15:59:47,743][11948] Updated weights for policy 0, policy_version 940 (0.0011)
-[2023-02-22 15:59:50,051][11948] Updated weights for policy 0, policy_version 950 (0.0011)
-[2023-02-22 15:59:52,331][11948] Updated weights for policy 0, policy_version 960 (0.0011)
-[2023-02-22 15:59:52,406][11727] Fps is (10 sec: 17203.2, 60 sec: 17544.5, 300 sec: 16732.6). Total num frames: 3932160. Throughput: 0: 4384.1. Samples: 973954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
-[2023-02-22 15:59:52,408][11727] Avg episode reward: [(0, '23.771')]
-[2023-02-22 15:59:52,417][11934] Saving new best policy, reward=23.771!
-[2023-02-22 15:59:54,622][11948] Updated weights for policy 0, policy_version 970 (0.0011)
-[2023-02-22 15:59:56,458][11934] Stopping Batcher_0...
-[2023-02-22 15:59:56,459][11934] Loop batcher_evt_loop terminating...
-[2023-02-22 15:59:56,459][11934] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2023-02-22 15:59:56,459][11727] Component Batcher_0 stopped!
-[2023-02-22 15:59:56,462][11727] Component RolloutWorker_w6 process died already! Don't wait for it.
-[2023-02-22 15:59:56,473][11948] Weights refcount: 2 0
-[2023-02-22 15:59:56,473][11951] Stopping RolloutWorker_w2...
-[2023-02-22 15:59:56,473][11950] Stopping RolloutWorker_w1...
-[2023-02-22 15:59:56,474][11950] Loop rollout_proc1_evt_loop terminating...
-[2023-02-22 15:59:56,474][11951] Loop rollout_proc2_evt_loop terminating...
-[2023-02-22 15:59:56,475][11948] Stopping InferenceWorker_p0-w0...
-[2023-02-22 15:59:56,475][11948] Loop inference_proc0-0_evt_loop terminating...
-[2023-02-22 15:59:56,473][11727] Component RolloutWorker_w1 stopped!
-[2023-02-22 15:59:56,476][11970] Stopping RolloutWorker_w5...
-[2023-02-22 15:59:56,476][11953] Stopping RolloutWorker_w3...
-[2023-02-22 15:59:56,476][11970] Loop rollout_proc5_evt_loop terminating...
-[2023-02-22 15:59:56,476][11953] Loop rollout_proc3_evt_loop terminating...
-[2023-02-22 15:59:56,476][11975] Stopping RolloutWorker_w4...
-[2023-02-22 15:59:56,477][11975] Loop rollout_proc4_evt_loop terminating...
-[2023-02-22 15:59:56,478][11949] Stopping RolloutWorker_w0...
-[2023-02-22 15:59:56,479][11949] Loop rollout_proc0_evt_loop terminating...
-[2023-02-22 15:59:56,477][11727] Component RolloutWorker_w2 stopped!
-[2023-02-22 15:59:56,480][11727] Component InferenceWorker_p0-w0 stopped!
-[2023-02-22 15:59:56,481][11727] Component RolloutWorker_w5 stopped!
-[2023-02-22 15:59:56,482][11727] Component RolloutWorker_w3 stopped!
-[2023-02-22 15:59:56,484][11727] Component RolloutWorker_w4 stopped!
-[2023-02-22 15:59:56,484][11973] Stopping RolloutWorker_w7...
-[2023-02-22 15:59:56,485][11727] Component RolloutWorker_w0 stopped!
-[2023-02-22 15:59:56,486][11973] Loop rollout_proc7_evt_loop terminating...
-[2023-02-22 15:59:56,487][11727] Component RolloutWorker_w7 stopped!
-[2023-02-22 15:59:56,533][11934] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth
-[2023-02-22 15:59:56,542][11934] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2023-02-22 15:59:56,658][11934] Stopping LearnerWorker_p0...
-[2023-02-22 15:59:56,659][11934] Loop learner_proc0_evt_loop terminating...
-[2023-02-22 15:59:56,658][11727] Component LearnerWorker_p0 stopped!
-[2023-02-22 15:59:56,660][11727] Waiting for process learner_proc0 to stop...
-[2023-02-22 15:59:58,235][11727] Waiting for process inference_proc0-0 to join...
-[2023-02-22 15:59:58,237][11727] Waiting for process rollout_proc0 to join...
-[2023-02-22 15:59:58,239][11727] Waiting for process rollout_proc1 to join...
-[2023-02-22 15:59:58,241][11727] Waiting for process rollout_proc2 to join...
-[2023-02-22 15:59:58,243][11727] Waiting for process rollout_proc3 to join...
-[2023-02-22 15:59:58,245][11727] Waiting for process rollout_proc4 to join...
-[2023-02-22 15:59:58,246][11727] Waiting for process rollout_proc5 to join...
-[2023-02-22 15:59:58,248][11727] Waiting for process rollout_proc6 to join...
-[2023-02-22 15:59:58,249][11727] Waiting for process rollout_proc7 to join...
-[2023-02-22 15:59:58,252][11727] Batcher 0 profile tree view:
-batching: 15.6119, releasing_batches: 0.0487
-[2023-02-22 15:59:58,253][11727] InferenceWorker_p0-w0 profile tree view:
-wait_policy: 0.0001
-  wait_policy_total: 4.2324
-update_model: 3.4167
-  weight_update: 0.0011
-one_step: 0.0029
-  handle_policy_step: 214.4727
-    deserialize: 8.7479, stack: 1.4146, obs_to_device_normalize: 50.9557, forward: 97.6511, send_messages: 15.7666
-    prepare_outputs: 30.0742
-      to_cpu: 18.3660
-[2023-02-22 15:59:58,254][11727] Learner 0 profile tree view:
-misc: 0.0057, prepare_batch: 10.1041
-train: 19.8425
-  epoch_init: 0.0057, minibatch_init: 0.0062, losses_postprocess: 0.3212, kl_divergence: 0.4617, after_optimizer: 1.0259
-  calculate_losses: 7.7728
-    losses_init: 0.0032, forward_head: 1.1080, bptt_initial: 3.2464, tail: 0.6356, advantages_returns: 0.1694, losses: 1.0630
-    bptt: 1.3701
-      bptt_forward_core: 1.3165
-  update: 9.9027
-    clip: 1.1259
-[2023-02-22 15:59:58,257][11727] RolloutWorker_w0 profile tree view:
-wait_for_trajectories: 0.1715, enqueue_policy_requests: 8.7683, env_step: 144.6926, overhead: 11.5534, complete_rollouts: 0.2874
-save_policy_outputs: 9.9906
-  split_output_tensors: 4.7988
-[2023-02-22 15:59:58,258][11727] RolloutWorker_w7 profile tree view:
-wait_for_trajectories: 0.1719, enqueue_policy_requests: 8.7025, env_step: 145.0611, overhead: 11.7300, complete_rollouts: 0.2973
-save_policy_outputs: 9.8243
-  split_output_tensors: 4.7671
-[2023-02-22 15:59:58,260][11727] Loop Runner_EvtLoop terminating...
-[2023-02-22 15:59:58,263][11727] Runner profile tree view:
-main_loop: 251.9949
-[2023-02-22 15:59:58,264][11727] Collected {0: 4005888}, FPS: 15896.7
-[2023-02-22 16:11:15,029][11727] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-22 16:11:15,031][11727] Overriding arg 'num_workers' with value 1 passed from command line
-[2023-02-22 16:11:15,033][11727] Adding new argument 'no_render'=True that is not in the saved config file!
-[2023-02-22 16:11:15,034][11727] Adding new argument 'save_video'=True that is not in the saved config file!
-[2023-02-22 16:11:15,036][11727] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-22 16:11:15,037][11727] Adding new argument 'video_name'=None that is not in the saved config file!
-[2023-02-22 16:11:15,038][11727] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-22 16:11:15,040][11727] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2023-02-22 16:11:15,041][11727] Adding new argument 'push_to_hub'=False that is not in the saved config file!
-[2023-02-22 16:11:15,043][11727] Adding new argument 'hf_repository'=None that is not in the saved config file!
-[2023-02-22 16:11:15,044][11727] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2023-02-22 16:11:15,045][11727] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2023-02-22 16:11:15,047][11727] Adding new argument 'train_script'=None that is not in the saved config file!
-[2023-02-22 16:11:15,048][11727] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2023-02-22 16:11:15,049][11727] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2023-02-22 16:11:15,067][11727] Doom resolution: 160x120, resize resolution: (128, 72)
-[2023-02-22 16:11:15,070][11727] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-22 16:11:15,073][11727] RunningMeanStd input shape: (1,)
-[2023-02-22 16:11:15,093][11727] ConvEncoder: input_channels=3
-[2023-02-22 16:11:15,967][11727] Conv encoder output size: 512
-[2023-02-22 16:11:15,970][11727] Policy head output size: 512
-[2023-02-22 16:11:18,855][11727] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2023-02-22 16:11:20,688][11727] Num frames 100...
-[2023-02-22 16:11:20,811][11727] Num frames 200...
-[2023-02-22 16:11:20,926][11727] Num frames 300...
-[2023-02-22 16:11:21,038][11727] Num frames 400...
-[2023-02-22 16:11:21,156][11727] Num frames 500...
-[2023-02-22 16:11:21,269][11727] Num frames 600...
-[2023-02-22 16:11:21,382][11727] Num frames 700...
-[2023-02-22 16:11:21,496][11727] Num frames 800...
-[2023-02-22 16:11:21,608][11727] Num frames 900...
-[2023-02-22 16:11:21,720][11727] Num frames 1000...
-[2023-02-22 16:11:21,833][11727] Num frames 1100...
-[2023-02-22 16:11:21,947][11727] Avg episode rewards: #0: 24.520, true rewards: #0: 11.520
-[2023-02-22 16:11:21,948][11727] Avg episode reward: 24.520, avg true_objective: 11.520
-[2023-02-22 16:11:22,005][11727] Num frames 1200...
-[2023-02-22 16:11:22,118][11727] Num frames 1300...
-[2023-02-22 16:11:22,232][11727] Num frames 1400...
-[2023-02-22 16:11:22,346][11727] Num frames 1500...
-[2023-02-22 16:11:22,457][11727] Num frames 1600...
-[2023-02-22 16:11:22,568][11727] Num frames 1700...
-[2023-02-22 16:11:22,681][11727] Num frames 1800...
-[2023-02-22 16:11:22,794][11727] Num frames 1900...
-[2023-02-22 16:11:22,904][11727] Num frames 2000...
-[2023-02-22 16:11:23,020][11727] Num frames 2100...
-[2023-02-22 16:11:23,138][11727] Num frames 2200...
-[2023-02-22 16:11:23,252][11727] Num frames 2300...
-[2023-02-22 16:11:23,365][11727] Num frames 2400...
-[2023-02-22 16:11:23,484][11727] Num frames 2500...
-[2023-02-22 16:11:23,601][11727] Num frames 2600...
-[2023-02-22 16:11:23,718][11727] Num frames 2700...
-[2023-02-22 16:11:23,803][11727] Avg episode rewards: #0: 34.130, true rewards: #0: 13.630
-[2023-02-22 16:11:23,805][11727] Avg episode reward: 34.130, avg true_objective: 13.630
-[2023-02-22 16:11:23,891][11727] Num frames 2800...
-[2023-02-22 16:11:24,003][11727] Num frames 2900...
-[2023-02-22 16:11:24,121][11727] Num frames 3000...
-[2023-02-22 16:11:24,234][11727] Num frames 3100...
-[2023-02-22 16:11:24,312][11727] Avg episode rewards: #0: 25.063, true rewards: #0: 10.397
-[2023-02-22 16:11:24,313][11727] Avg episode reward: 25.063, avg true_objective: 10.397
-[2023-02-22 16:11:24,406][11727] Num frames 3200...
-[2023-02-22 16:11:24,518][11727] Num frames 3300...
-[2023-02-22 16:11:24,680][11727] Avg episode rewards: #0: 19.732, true rewards: #0: 8.482
-[2023-02-22 16:11:24,682][11727] Avg episode reward: 19.732, avg true_objective: 8.482
-[2023-02-22 16:11:24,692][11727] Num frames 3400...
-[2023-02-22 16:11:24,818][11727] Num frames 3500...
-[2023-02-22 16:11:24,939][11727] Num frames 3600...
-[2023-02-22 16:11:25,055][11727] Num frames 3700...
-[2023-02-22 16:11:25,171][11727] Num frames 3800...
-[2023-02-22 16:11:25,285][11727] Num frames 3900...
-[2023-02-22 16:11:25,401][11727] Num frames 4000...
-[2023-02-22 16:11:25,517][11727] Num frames 4100...
-[2023-02-22 16:11:25,635][11727] Num frames 4200...
-[2023-02-22 16:11:25,749][11727] Num frames 4300...
-[2023-02-22 16:11:25,864][11727] Num frames 4400...
-[2023-02-22 16:11:25,984][11727] Num frames 4500...
-[2023-02-22 16:11:26,099][11727] Num frames 4600...
-[2023-02-22 16:11:26,238][11727] Num frames 4700...
-[2023-02-22 16:11:26,362][11727] Num frames 4800...
-[2023-02-22 16:11:26,482][11727] Num frames 4900...
-[2023-02-22 16:11:26,602][11727] Num frames 5000...
-[2023-02-22 16:11:26,720][11727] Num frames 5100...
-[2023-02-22 16:11:26,822][11727] Avg episode rewards: #0: 24.078, true rewards: #0: 10.278
-[2023-02-22 16:11:26,824][11727] Avg episode reward: 24.078, avg true_objective: 10.278
-[2023-02-22 16:11:26,904][11727] Num frames 5200...
-[2023-02-22 16:11:27,021][11727] Num frames 5300...
-[2023-02-22 16:11:27,140][11727] Num frames 5400...
-[2023-02-22 16:11:27,256][11727] Num frames 5500...
-[2023-02-22 16:11:27,369][11727] Num frames 5600...
-[2023-02-22 16:11:27,479][11727] Num frames 5700...
-[2023-02-22 16:11:27,587][11727] Avg episode rewards: #0: 22.078, true rewards: #0: 9.578
-[2023-02-22 16:11:27,589][11727] Avg episode reward: 22.078, avg true_objective: 9.578
-[2023-02-22 16:11:27,652][11727] Num frames 5800...
-[2023-02-22 16:11:27,767][11727] Num frames 5900...
-[2023-02-22 16:11:27,883][11727] Num frames 6000...
-[2023-02-22 16:11:27,996][11727] Num frames 6100...
-[2023-02-22 16:11:28,109][11727] Num frames 6200...
-[2023-02-22 16:11:28,225][11727] Num frames 6300...
-[2023-02-22 16:11:28,338][11727] Num frames 6400...
-[2023-02-22 16:11:28,451][11727] Avg episode rewards: #0: 21.073, true rewards: #0: 9.216
-[2023-02-22 16:11:28,453][11727] Avg episode reward: 21.073, avg true_objective: 9.216
-[2023-02-22 16:11:28,511][11727] Num frames 6500...
-[2023-02-22 16:11:28,625][11727] Num frames 6600...
-[2023-02-22 16:11:28,743][11727] Num frames 6700...
-[2023-02-22 16:11:28,855][11727] Num frames 6800...
-[2023-02-22 16:11:28,951][11727] Avg episode rewards: #0: 19.294, true rewards: #0: 8.544
-[2023-02-22 16:11:28,953][11727] Avg episode reward: 19.294, avg true_objective: 8.544
-[2023-02-22 16:11:29,031][11727] Num frames 6900...
-[2023-02-22 16:11:29,146][11727] Num frames 7000...
-[2023-02-22 16:11:29,286][11727] Num frames 7100...
-[2023-02-22 16:11:29,396][11727] Num frames 7200...
-[2023-02-22 16:11:29,505][11727] Num frames 7300...
-[2023-02-22 16:11:29,619][11727] Num frames 7400...
-[2023-02-22 16:11:29,732][11727] Num frames 7500...
-[2023-02-22 16:11:29,849][11727] Num frames 7600...
-[2023-02-22 16:11:29,963][11727] Num frames 7700...
-[2023-02-22 16:11:30,076][11727] Num frames 7800...
-[2023-02-22 16:11:30,143][11727] Avg episode rewards: #0: 19.677, true rewards: #0: 8.677
-[2023-02-22 16:11:30,145][11727] Avg episode reward: 19.677, avg true_objective: 8.677
-[2023-02-22 16:11:30,249][11727] Num frames 7900...
-[2023-02-22 16:11:30,364][11727] Num frames 8000...
-[2023-02-22 16:11:30,478][11727] Num frames 8100...
-[2023-02-22 16:11:30,594][11727] Num frames 8200...
-[2023-02-22 16:11:30,707][11727] Num frames 8300...
-[2023-02-22 16:11:30,822][11727] Num frames 8400...
-[2023-02-22 16:11:30,937][11727] Num frames 8500...
-[2023-02-22 16:11:31,049][11727] Num frames 8600...
-[2023-02-22 16:11:31,162][11727] Num frames 8700...
-[2023-02-22 16:11:31,277][11727] Num frames 8800...
-[2023-02-22 16:11:31,371][11727] Avg episode rewards: #0: 19.533, true rewards: #0: 8.833
-[2023-02-22 16:11:31,373][11727] Avg episode reward: 19.533, avg true_objective: 8.833
-[2023-02-22 16:11:52,331][11727] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
-[2023-02-22 16:12:41,257][11727] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
-[2023-02-22 16:12:41,259][11727] Overriding arg 'num_workers' with value 1 passed from command line
-[2023-02-22 16:12:41,260][11727] Adding new argument 'no_render'=True that is not in the saved config file!
-[2023-02-22 16:12:41,261][11727] Adding new argument 'save_video'=True that is not in the saved config file!
-[2023-02-22 16:12:41,264][11727] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
-[2023-02-22 16:12:41,265][11727] Adding new argument 'video_name'=None that is not in the saved config file!
-[2023-02-22 16:12:41,266][11727] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
-[2023-02-22 16:12:41,268][11727] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
-[2023-02-22 16:12:41,270][11727] Adding new argument 'push_to_hub'=True that is not in the saved config file!
-[2023-02-22 16:12:41,271][11727] Adding new argument 'hf_repository'='Unterwexi/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
-[2023-02-22 16:12:41,272][11727] Adding new argument 'policy_index'=0 that is not in the saved config file!
-[2023-02-22 16:12:41,274][11727] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
-[2023-02-22 16:12:41,276][11727] Adding new argument 'train_script'=None that is not in the saved config file!
-[2023-02-22 16:12:41,277][11727] Adding new argument 'enjoy_script'=None that is not in the saved config file!
-[2023-02-22 16:12:41,279][11727] Using frameskip 1 and render_action_repeat=4 for evaluation
-[2023-02-22 16:12:41,297][11727] RunningMeanStd input shape: (3, 72, 128)
-[2023-02-22 16:12:41,300][11727] RunningMeanStd input shape: (1,)
-[2023-02-22 16:12:41,315][11727] ConvEncoder: input_channels=3
-[2023-02-22 16:12:41,358][11727] Conv encoder output size: 512
-[2023-02-22 16:12:41,359][11727] Policy head output size: 512
-[2023-02-22 16:12:41,382][11727] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
-[2023-02-22 16:12:41,857][11727] Num frames 100...
-[2023-02-22 16:12:41,979][11727] Num frames 200...
-[2023-02-22 16:12:42,097][11727] Num frames 300...
-[2023-02-22 16:12:42,233][11727] Avg episode rewards: #0: 7.670, true rewards: #0: 3.670
-[2023-02-22 16:12:42,235][11727] Avg episode reward: 7.670, avg true_objective: 3.670
-[2023-02-22 16:12:42,276][11727] Num frames 400...
-[2023-02-22 16:12:42,396][11727] Num frames 500...
-[2023-02-22 16:12:42,528][11727] Num frames 600...
-[2023-02-22 16:12:42,653][11727] Num frames 700...
-[2023-02-22 16:12:42,780][11727] Num frames 800...
-[2023-02-22 16:12:42,907][11727] Num frames 900...
-[2023-02-22 16:12:43,034][11727] Num frames 1000...
-[2023-02-22 16:12:43,181][11727] Avg episode rewards: #0: 12.870, true rewards: #0: 5.370
-[2023-02-22 16:12:43,183][11727] Avg episode reward: 12.870, avg true_objective: 5.370
-[2023-02-22 16:12:43,215][11727] Num frames 1100...
-[2023-02-22 16:12:43,331][11727] Num frames 1200...
-[2023-02-22 16:12:43,441][11727] Num frames 1300...
-[2023-02-22 16:12:43,554][11727] Num frames 1400...
-[2023-02-22 16:12:43,685][11727] Avg episode rewards: #0: 10.860, true rewards: #0: 4.860
-[2023-02-22 16:12:43,687][11727] Avg episode reward: 10.860, avg true_objective: 4.860
-[2023-02-22 16:12:43,745][11727] Num frames 1500...
-[2023-02-22 16:12:43,870][11727] Num frames 1600...
-[2023-02-22 16:12:43,992][11727] Num frames 1700...
-[2023-02-22 16:12:44,116][11727] Num frames 1800...
-[2023-02-22 16:12:44,184][11727] Avg episode rewards: #0: 9.775, true rewards: #0: 4.525
-[2023-02-22 16:12:44,187][11727] Avg episode reward: 9.775, avg true_objective: 4.525
-[2023-02-22 16:12:44,288][11727] Num frames 1900...
-[2023-02-22 16:12:44,400][11727] Num frames 2000...
-[2023-02-22 16:12:44,510][11727] Num frames 2100...
-[2023-02-22 16:12:44,622][11727] Num frames 2200...
-[2023-02-22 16:12:44,709][11727] Avg episode rewards: #0: 9.052, true rewards: #0: 4.452
-[2023-02-22 16:12:44,711][11727] Avg episode reward: 9.052, avg true_objective: 4.452
-[2023-02-22 16:12:44,795][11727] Num frames 2300...
-[2023-02-22 16:12:44,906][11727] Num frames 2400...
-[2023-02-22 16:12:45,039][11727] Num frames 2500...
-[2023-02-22 16:12:45,152][11727] Num frames 2600...
-[2023-02-22 16:12:45,268][11727] Num frames 2700...
-[2023-02-22 16:12:45,381][11727] Num frames 2800...
-[2023-02-22 16:12:45,492][11727] Num frames 2900...
-[2023-02-22 16:12:45,602][11727] Num frames 3000...
-[2023-02-22 16:12:45,715][11727] Num frames 3100...
-[2023-02-22 16:12:45,826][11727] Num frames 3200...
-[2023-02-22 16:12:45,936][11727] Num frames 3300...
-[2023-02-22 16:12:46,050][11727] Num frames 3400...
-[2023-02-22 16:12:46,166][11727] Num frames 3500...
-[2023-02-22 16:12:46,282][11727] Num frames 3600...
-[2023-02-22 16:12:46,392][11727] Num frames 3700...
-[2023-02-22 16:12:46,507][11727] Num frames 3800...
-[2023-02-22 16:12:46,618][11727] Num frames 3900...
-[2023-02-22 16:12:46,730][11727] Num frames 4000...
-[2023-02-22 16:12:46,845][11727] Num frames 4100...
-[2023-02-22 16:12:46,960][11727] Num frames 4200...
-[2023-02-22 16:12:47,087][11727] Num frames 4300...
-[2023-02-22 16:12:47,174][11727] Avg episode rewards: #0: 17.376, true rewards: #0: 7.210
-[2023-02-22 16:12:47,176][11727] Avg episode reward: 17.376, avg true_objective: 7.210
-[2023-02-22 16:12:47,260][11727] Num frames 4400...
-[2023-02-22 16:12:47,373][11727] Num frames 4500...
-[2023-02-22 16:12:47,485][11727] Num frames 4600...
-[2023-02-22 16:12:47,594][11727] Num frames 4700...
-[2023-02-22 16:12:47,706][11727] Num frames 4800...
-[2023-02-22 16:12:47,817][11727] Num frames 4900...
-[2023-02-22 16:12:47,929][11727] Num frames 5000...
-[2023-02-22 16:12:48,050][11727] Num frames 5100...
-[2023-02-22 16:12:48,165][11727] Num frames 5200...
-[2023-02-22 16:12:48,248][11727] Avg episode rewards: #0: 17.603, true rewards: #0: 7.460
-[2023-02-22 16:12:48,250][11727] Avg episode reward: 17.603, avg true_objective: 7.460
-[2023-02-22 16:12:48,337][11727] Num frames 5300...
-[2023-02-22 16:12:48,446][11727] Num frames 5400...
-[2023-02-22 16:12:48,557][11727] Num frames 5500...
-[2023-02-22 16:12:48,668][11727] Num frames 5600...
-[2023-02-22 16:12:48,779][11727] Num frames 5700...
-[2023-02-22 16:12:48,891][11727] Num frames 5800...
-[2023-02-22 16:12:49,005][11727] Num frames 5900...
-[2023-02-22 16:12:49,116][11727] Num frames 6000...
-[2023-02-22 16:12:49,225][11727] Num frames 6100...
-[2023-02-22 16:12:49,303][11727] Avg episode rewards: #0: 17.522, true rewards: #0: 7.647
-[2023-02-22 16:12:49,306][11727] Avg episode reward: 17.522, avg true_objective: 7.647
-[2023-02-22 16:12:49,397][11727] Num frames 6200...
-[2023-02-22 16:12:49,509][11727] Num frames 6300...
-[2023-02-22 16:12:49,625][11727] Num frames 6400...
-[2023-02-22 16:12:49,736][11727] Num frames 6500...
-[2023-02-22 16:12:49,849][11727] Num frames 6600...
-[2023-02-22 16:12:49,938][11727] Avg episode rewards: #0: 16.478, true rewards: #0: 7.367
-[2023-02-22 16:12:49,940][11727] Avg episode reward: 16.478, avg true_objective: 7.367
-[2023-02-22 16:12:50,021][11727] Num frames 6700...
-[2023-02-22 16:12:50,134][11727] Num frames 6800...
-[2023-02-22 16:12:50,248][11727] Num frames 6900...
-[2023-02-22 16:12:50,360][11727] Num frames 7000...
-[2023-02-22 16:12:50,470][11727] Num frames 7100...
-[2023-02-22 16:12:50,579][11727] Num frames 7200...
-[2023-02-22 16:12:50,690][11727] Num frames 7300...
-[2023-02-22 16:12:50,803][11727] Num frames 7400...
-[2023-02-22 16:12:50,929][11727] Avg episode rewards: #0: 16.461, true rewards: #0: 7.461
-[2023-02-22 16:12:50,931][11727] Avg episode reward: 16.461, avg true_objective: 7.461
-[2023-02-22 16:13:08,594][11727] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2023-02-23 10:01:27,887][12605] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=()
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init
+    self.game.init()
+vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly.
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
+    slot_callable(*args)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init
+    env_runner.init(self.timing)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init
+    self._reset()
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset
+    observations, info = e.reset(seed=seed)  # new way of doing seeding since Gym 0.26.0
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 379, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset
+    self._ensure_initialized()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized
+    self.initialize()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize
+    self._game_init()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init
+    raise EnvCriticalError()
+sample_factory.envs.env_utils.EnvCriticalError
+[2023-02-23 10:01:27,889][12606] Unhandled exception  in evt loop rollout_proc7_evt_loop
+[2023-02-23 10:01:27,887][12587] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=()
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init
+    self.game.init()
+vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly.
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
+    slot_callable(*args)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init
+    env_runner.init(self.timing)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init
+    self._reset()
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset
+    observations, info = e.reset(seed=seed)  # new way of doing seeding since Gym 0.26.0
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 379, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset
+    self._ensure_initialized()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized
+    self.initialize()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize
+    self._game_init()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init
+    raise EnvCriticalError()
+sample_factory.envs.env_utils.EnvCriticalError
+[2023-02-23 10:01:27,889][12605] Unhandled exception  in evt loop rollout_proc4_evt_loop
+[2023-02-23 10:01:27,889][12587] Unhandled exception  in evt loop rollout_proc1_evt_loop
+[2023-02-23 10:01:27,888][12589] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=()
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init
+    self.game.init()
+vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly.
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.8/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
+    slot_callable(*args)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init
+    env_runner.init(self.timing)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init
+    self._reset()
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset
+    observations, info = e.reset(seed=seed)  # new way of doing seeding since Gym 0.26.0
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 125, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/algo/utils/make_env.py", line 110, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 379, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sample_factory/envs/env_wrappers.py", line 84, in reset
+    obs, info = self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/gym/core.py", line 323, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset
+    return self.env.reset(**kwargs)
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset
+    self._ensure_initialized()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized
+    self.initialize()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize
+    self._game_init()
+  File "/usr/local/lib/python3.8/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init
+    raise EnvCriticalError()
+sample_factory.envs.env_utils.EnvCriticalError
+[2023-02-23 10:01:27,889][12589] Unhandled exception  in evt loop rollout_proc2_evt_loop
+[2023-02-23 10:01:28,189][12588] Decorrelating experience for 0 frames...
+[2023-02-23 10:01:28,216][12607] Decorrelating experience for 0 frames...
+[2023-02-23 10:01:28,228][12590] Decorrelating experience for 0 frames...
+[2023-02-23 10:01:28,429][12588] Decorrelating experience for 32 frames...
+[2023-02-23 10:01:28,469][12590] Decorrelating experience for 32 frames...
+[2023-02-23 10:01:28,495][12608] Decorrelating experience for 0 frames...
+[2023-02-23 10:01:28,722][12588] Decorrelating experience for 64 frames...
+[2023-02-23 10:01:28,764][12590] Decorrelating experience for 64 frames...
+[2023-02-23 10:01:28,765][12607] Decorrelating experience for 32 frames...
+[2023-02-23 10:01:28,989][12608] Decorrelating experience for 32 frames...
+[2023-02-23 10:01:29,010][12588] Decorrelating experience for 96 frames...
+[2023-02-23 10:01:29,249][12590] Decorrelating experience for 96 frames...
+[2023-02-23 10:01:29,276][12608] Decorrelating experience for 64 frames...
+[2023-02-23 10:01:29,524][12607] Decorrelating experience for 64 frames...
+[2023-02-23 10:01:29,554][12608] Decorrelating experience for 96 frames...
+[2023-02-23 10:01:29,799][12607] Decorrelating experience for 96 frames...
+[2023-02-23 10:01:30,316][07928] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-02-23 10:01:33,445][12572] Signal inference workers to stop experience collection...
+[2023-02-23 10:01:33,450][12586] InferenceWorker_p0-w0: stopping experience collection
+[2023-02-23 10:01:35,000][07928] Heartbeat connected on Batcher_0
+[2023-02-23 10:01:35,008][07928] Heartbeat connected on InferenceWorker_p0-w0
+[2023-02-23 10:01:35,016][07928] Heartbeat connected on RolloutWorker_w0
+[2023-02-23 10:01:35,027][07928] Heartbeat connected on RolloutWorker_w3
+[2023-02-23 10:01:35,033][07928] Heartbeat connected on RolloutWorker_w5
+[2023-02-23 10:01:35,037][07928] Heartbeat connected on RolloutWorker_w6
+[2023-02-23 10:01:35,316][07928] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 227.4. Samples: 2274. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2023-02-23 10:01:35,318][07928] Avg episode reward: [(0, '3.162')]
+[2023-02-23 10:01:36,532][12572] Signal inference workers to resume experience collection...
+[2023-02-23 10:01:36,533][12586] InferenceWorker_p0-w0: resuming experience collection
+[2023-02-23 10:01:37,513][07928] Heartbeat connected on LearnerWorker_p0
+[2023-02-23 10:01:39,988][12586] Updated weights for policy 0, policy_version 10 (0.0371)
+[2023-02-23 10:01:40,316][07928] Fps is (10 sec: 4096.0, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 40960. Throughput: 0: 653.2. Samples: 9798. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2023-02-23 10:01:40,319][07928] Avg episode reward: [(0, '4.392')]
+[2023-02-23 10:01:43,461][12586] Updated weights for policy 0, policy_version 20 (0.0009)
+[2023-02-23 10:01:45,316][07928] Fps is (10 sec: 10240.0, 60 sec: 5120.0, 300 sec: 5120.0). Total num frames: 102400. Throughput: 0: 939.1. Samples: 18782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:01:45,319][07928] Avg episode reward: [(0, '4.444')]
+[2023-02-23 10:01:45,321][12572] Saving new best policy, reward=4.444!
+[2023-02-23 10:01:46,885][12586] Updated weights for policy 0, policy_version 30 (0.0010)
+[2023-02-23 10:01:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 6389.8, 300 sec: 6389.8). Total num frames: 159744. Throughput: 0: 1467.9. Samples: 36698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:01:50,318][07928] Avg episode reward: [(0, '4.416')]
+[2023-02-23 10:01:50,418][12586] Updated weights for policy 0, policy_version 40 (0.0010)
+[2023-02-23 10:01:53,940][12586] Updated weights for policy 0, policy_version 50 (0.0010)
+[2023-02-23 10:01:55,316][07928] Fps is (10 sec: 11878.5, 60 sec: 7372.8, 300 sec: 7372.8). Total num frames: 221184. Throughput: 0: 1805.7. Samples: 54170. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:01:55,318][07928] Avg episode reward: [(0, '4.526')]
+[2023-02-23 10:01:55,320][12572] Saving new best policy, reward=4.526!
+[2023-02-23 10:01:57,287][12586] Updated weights for policy 0, policy_version 60 (0.0010)
+[2023-02-23 10:02:00,316][07928] Fps is (10 sec: 11878.4, 60 sec: 7957.9, 300 sec: 7957.9). Total num frames: 278528. Throughput: 0: 1806.2. Samples: 63218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:02:00,319][07928] Avg episode reward: [(0, '4.694')]
+[2023-02-23 10:02:00,326][12572] Saving new best policy, reward=4.694!
+[2023-02-23 10:02:00,757][12586] Updated weights for policy 0, policy_version 70 (0.0009)
+[2023-02-23 10:02:04,246][12586] Updated weights for policy 0, policy_version 80 (0.0009)
+[2023-02-23 10:02:05,316][07928] Fps is (10 sec: 11878.3, 60 sec: 8499.2, 300 sec: 8499.2). Total num frames: 339968. Throughput: 0: 2020.4. Samples: 80818. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2023-02-23 10:02:05,319][07928] Avg episode reward: [(0, '4.675')]
+[2023-02-23 10:02:07,736][12586] Updated weights for policy 0, policy_version 90 (0.0010)
+[2023-02-23 10:02:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 8829.1, 300 sec: 8829.1). Total num frames: 397312. Throughput: 0: 2191.0. Samples: 98596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:02:10,318][07928] Avg episode reward: [(0, '4.678')]
+[2023-02-23 10:02:11,161][12586] Updated weights for policy 0, policy_version 100 (0.0010)
+[2023-02-23 10:02:14,622][12586] Updated weights for policy 0, policy_version 110 (0.0009)
+[2023-02-23 10:02:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 9175.0, 300 sec: 9175.0). Total num frames: 458752. Throughput: 0: 2391.4. Samples: 107612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:15,319][07928] Avg episode reward: [(0, '4.539')]
+[2023-02-23 10:02:18,086][12586] Updated weights for policy 0, policy_version 120 (0.0009)
+[2023-02-23 10:02:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 9383.6, 300 sec: 9383.6). Total num frames: 516096. Throughput: 0: 2732.6. Samples: 125242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:20,319][07928] Avg episode reward: [(0, '4.457')]
+[2023-02-23 10:02:21,619][12586] Updated weights for policy 0, policy_version 130 (0.0011)
+[2023-02-23 10:02:25,015][12586] Updated weights for policy 0, policy_version 140 (0.0010)
+[2023-02-23 10:02:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 9557.3, 300 sec: 9557.3). Total num frames: 573440. Throughput: 0: 2961.1. Samples: 143046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:25,318][07928] Avg episode reward: [(0, '4.964')]
+[2023-02-23 10:02:25,320][12572] Saving new best policy, reward=4.964!
+[2023-02-23 10:02:28,403][12586] Updated weights for policy 0, policy_version 150 (0.0009)
+[2023-02-23 10:02:30,316][07928] Fps is (10 sec: 11878.4, 60 sec: 10581.3, 300 sec: 9767.4). Total num frames: 634880. Throughput: 0: 2960.2. Samples: 151990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:30,318][07928] Avg episode reward: [(0, '5.044')]
+[2023-02-23 10:02:30,326][12572] Saving new best policy, reward=5.044!
+[2023-02-23 10:02:31,868][12586] Updated weights for policy 0, policy_version 160 (0.0010)
+[2023-02-23 10:02:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 9888.9). Total num frames: 692224. Throughput: 0: 2955.5. Samples: 169696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:35,319][07928] Avg episode reward: [(0, '5.287')]
+[2023-02-23 10:02:35,322][12572] Saving new best policy, reward=5.287!
+[2023-02-23 10:02:35,422][12586] Updated weights for policy 0, policy_version 170 (0.0010)
+[2023-02-23 10:02:38,725][12586] Updated weights for policy 0, policy_version 180 (0.0010)
+[2023-02-23 10:02:40,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11878.4, 300 sec: 10048.8). Total num frames: 753664. Throughput: 0: 2970.3. Samples: 187836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:02:40,318][07928] Avg episode reward: [(0, '4.893')]
+[2023-02-23 10:02:42,075][12586] Updated weights for policy 0, policy_version 190 (0.0010)
+[2023-02-23 10:02:45,316][07928] Fps is (10 sec: 12288.0, 60 sec: 11878.4, 300 sec: 10188.8). Total num frames: 815104. Throughput: 0: 2970.4. Samples: 196886. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:02:45,319][07928] Avg episode reward: [(0, '5.249')]
+[2023-02-23 10:02:45,510][12586] Updated weights for policy 0, policy_version 200 (0.0011)
+[2023-02-23 10:02:49,010][12586] Updated weights for policy 0, policy_version 210 (0.0011)
+[2023-02-23 10:02:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11878.4, 300 sec: 10264.1). Total num frames: 872448. Throughput: 0: 2972.1. Samples: 214562. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:02:50,318][07928] Avg episode reward: [(0, '5.614')]
+[2023-02-23 10:02:50,333][12572] Saving new best policy, reward=5.614!
+[2023-02-23 10:02:52,397][12586] Updated weights for policy 0, policy_version 220 (0.0009)
+[2023-02-23 10:02:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11878.4, 300 sec: 10376.5). Total num frames: 933888. Throughput: 0: 2977.8. Samples: 232598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:02:55,319][07928] Avg episode reward: [(0, '5.575')]
+[2023-02-23 10:02:55,886][12586] Updated weights for policy 0, policy_version 230 (0.0009)
+[2023-02-23 10:02:59,279][12586] Updated weights for policy 0, policy_version 240 (0.0010)
+[2023-02-23 10:03:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11878.4, 300 sec: 10434.0). Total num frames: 991232. Throughput: 0: 2978.0. Samples: 241624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:00,319][07928] Avg episode reward: [(0, '6.060')]
+[2023-02-23 10:03:00,327][12572] Saving new best policy, reward=6.060!
+[2023-02-23 10:03:02,895][12586] Updated weights for policy 0, policy_version 250 (0.0011)
+[2023-02-23 10:03:05,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11878.4, 300 sec: 10526.7). Total num frames: 1052672. Throughput: 0: 2974.2. Samples: 259082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:05,318][07928] Avg episode reward: [(0, '6.367')]
+[2023-02-23 10:03:05,321][12572] Saving new best policy, reward=6.367!
+[2023-02-23 10:03:06,334][12586] Updated weights for policy 0, policy_version 260 (0.0009)
+[2023-02-23 10:03:09,754][12586] Updated weights for policy 0, policy_version 270 (0.0009)
+[2023-02-23 10:03:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 10571.6). Total num frames: 1110016. Throughput: 0: 2976.6. Samples: 276992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:03:10,319][07928] Avg episode reward: [(0, '5.979')]
+[2023-02-23 10:03:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000271_1110016.pth...
+[2023-02-23 10:03:13,173][12586] Updated weights for policy 0, policy_version 280 (0.0010)
+[2023-02-23 10:03:15,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11810.1, 300 sec: 10612.4). Total num frames: 1167360. Throughput: 0: 2976.8. Samples: 285944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:15,320][07928] Avg episode reward: [(0, '6.954')]
+[2023-02-23 10:03:15,343][12572] Saving new best policy, reward=6.954!
+[2023-02-23 10:03:16,759][12586] Updated weights for policy 0, policy_version 290 (0.0010)
+[2023-02-23 10:03:20,195][12586] Updated weights for policy 0, policy_version 300 (0.0010)
+[2023-02-23 10:03:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 10685.2). Total num frames: 1228800. Throughput: 0: 2969.2. Samples: 303310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:20,319][07928] Avg episode reward: [(0, '7.500')]
+[2023-02-23 10:03:20,326][12572] Saving new best policy, reward=7.500!
+[2023-02-23 10:03:23,581][12586] Updated weights for policy 0, policy_version 310 (0.0009)
+[2023-02-23 10:03:25,316][07928] Fps is (10 sec: 12288.1, 60 sec: 11946.7, 300 sec: 10752.0). Total num frames: 1290240. Throughput: 0: 2969.2. Samples: 321448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:25,318][07928] Avg episode reward: [(0, '8.413')]
+[2023-02-23 10:03:25,321][12572] Saving new best policy, reward=8.413!
+[2023-02-23 10:03:26,976][12586] Updated weights for policy 0, policy_version 320 (0.0009)
+[2023-02-23 10:03:30,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11878.4, 300 sec: 10780.7). Total num frames: 1347584. Throughput: 0: 2966.9. Samples: 330398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:30,318][07928] Avg episode reward: [(0, '9.060')]
+[2023-02-23 10:03:30,327][12572] Saving new best policy, reward=9.060!
+[2023-02-23 10:03:30,505][12586] Updated weights for policy 0, policy_version 330 (0.0011)
+[2023-02-23 10:03:33,984][12586] Updated weights for policy 0, policy_version 340 (0.0010)
+[2023-02-23 10:03:35,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11878.4, 300 sec: 10807.1). Total num frames: 1404928. Throughput: 0: 2963.5. Samples: 347922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:35,319][07928] Avg episode reward: [(0, '10.180')]
+[2023-02-23 10:03:35,321][12572] Saving new best policy, reward=10.180!
+[2023-02-23 10:03:37,401][12586] Updated weights for policy 0, policy_version 350 (0.0011)
+[2023-02-23 10:03:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11878.4, 300 sec: 10862.0). Total num frames: 1466368. Throughput: 0: 2955.1. Samples: 365576. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:40,319][07928] Avg episode reward: [(0, '8.840')]
+[2023-02-23 10:03:40,974][12586] Updated weights for policy 0, policy_version 360 (0.0009)
+[2023-02-23 10:03:44,668][12586] Updated weights for policy 0, policy_version 370 (0.0010)
+[2023-02-23 10:03:45,316][07928] Fps is (10 sec: 11469.1, 60 sec: 11741.9, 300 sec: 10854.4). Total num frames: 1519616. Throughput: 0: 2942.1. Samples: 374016. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:45,318][07928] Avg episode reward: [(0, '7.377')]
+[2023-02-23 10:03:48,226][12586] Updated weights for policy 0, policy_version 380 (0.0010)
+[2023-02-23 10:03:50,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11741.9, 300 sec: 10875.6). Total num frames: 1576960. Throughput: 0: 2934.8. Samples: 391148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:50,319][07928] Avg episode reward: [(0, '8.717')]
+[2023-02-23 10:03:51,725][12586] Updated weights for policy 0, policy_version 390 (0.0010)
+[2023-02-23 10:03:55,161][12586] Updated weights for policy 0, policy_version 400 (0.0009)
+[2023-02-23 10:03:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 10922.7). Total num frames: 1638400. Throughput: 0: 2931.5. Samples: 408908. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:03:55,319][07928] Avg episode reward: [(0, '10.533')]
+[2023-02-23 10:03:55,322][12572] Saving new best policy, reward=10.533!
+[2023-02-23 10:03:58,690][12586] Updated weights for policy 0, policy_version 410 (0.0011)
+[2023-02-23 10:04:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 10940.3). Total num frames: 1695744. Throughput: 0: 2926.7. Samples: 417644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:04:00,319][07928] Avg episode reward: [(0, '12.205')]
+[2023-02-23 10:04:00,327][12572] Saving new best policy, reward=12.205!
+[2023-02-23 10:04:02,165][12586] Updated weights for policy 0, policy_version 420 (0.0010)
+[2023-02-23 10:04:05,316][07928] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 10982.4). Total num frames: 1757184. Throughput: 0: 2934.4. Samples: 435358. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:05,319][07928] Avg episode reward: [(0, '15.359')]
+[2023-02-23 10:04:05,321][12572] Saving new best policy, reward=15.359!
+[2023-02-23 10:04:05,589][12586] Updated weights for policy 0, policy_version 430 (0.0009)
+[2023-02-23 10:04:08,980][12586] Updated weights for policy 0, policy_version 440 (0.0010)
+[2023-02-23 10:04:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 10997.1). Total num frames: 1814528. Throughput: 0: 2931.1. Samples: 453346. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:04:10,318][07928] Avg episode reward: [(0, '15.015')]
+[2023-02-23 10:04:12,555][12586] Updated weights for policy 0, policy_version 450 (0.0011)
+[2023-02-23 10:04:15,316][07928] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11011.0). Total num frames: 1871872. Throughput: 0: 2919.7. Samples: 461786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:04:15,318][07928] Avg episode reward: [(0, '14.508')]
+[2023-02-23 10:04:16,132][12586] Updated weights for policy 0, policy_version 460 (0.0010)
+[2023-02-23 10:04:19,579][12586] Updated weights for policy 0, policy_version 470 (0.0009)
+[2023-02-23 10:04:20,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11047.5). Total num frames: 1933312. Throughput: 0: 2918.5. Samples: 479256. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:20,318][07928] Avg episode reward: [(0, '14.599')]
+[2023-02-23 10:04:23,050][12586] Updated weights for policy 0, policy_version 480 (0.0010)
+[2023-02-23 10:04:25,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11059.2). Total num frames: 1990656. Throughput: 0: 2919.7. Samples: 496962. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:25,318][07928] Avg episode reward: [(0, '15.783')]
+[2023-02-23 10:04:25,320][12572] Saving new best policy, reward=15.783!
+[2023-02-23 10:04:26,564][12586] Updated weights for policy 0, policy_version 490 (0.0009)
+[2023-02-23 10:04:30,113][12586] Updated weights for policy 0, policy_version 500 (0.0010)
+[2023-02-23 10:04:30,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11070.3). Total num frames: 2048000. Throughput: 0: 2923.8. Samples: 505588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:04:30,319][07928] Avg episode reward: [(0, '18.673')]
+[2023-02-23 10:04:30,327][12572] Saving new best policy, reward=18.673!
+[2023-02-23 10:04:33,503][12586] Updated weights for policy 0, policy_version 510 (0.0009)
+[2023-02-23 10:04:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11102.3). Total num frames: 2109440. Throughput: 0: 2939.1. Samples: 523406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:35,318][07928] Avg episode reward: [(0, '19.466')]
+[2023-02-23 10:04:35,320][12572] Saving new best policy, reward=19.466!
+[2023-02-23 10:04:36,939][12586] Updated weights for policy 0, policy_version 520 (0.0010)
+[2023-02-23 10:04:40,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11111.7). Total num frames: 2166784. Throughput: 0: 2940.5. Samples: 541230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:40,319][07928] Avg episode reward: [(0, '19.047')]
+[2023-02-23 10:04:40,422][12586] Updated weights for policy 0, policy_version 530 (0.0010)
+[2023-02-23 10:04:43,939][12586] Updated weights for policy 0, policy_version 540 (0.0010)
+[2023-02-23 10:04:45,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11120.6). Total num frames: 2224128. Throughput: 0: 2937.0. Samples: 549808. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:04:45,318][07928] Avg episode reward: [(0, '19.725')]
+[2023-02-23 10:04:45,320][12572] Saving new best policy, reward=19.725!
+[2023-02-23 10:04:47,371][12586] Updated weights for policy 0, policy_version 550 (0.0010)
+[2023-02-23 10:04:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11149.1). Total num frames: 2285568. Throughput: 0: 2942.5. Samples: 567772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:04:50,318][07928] Avg episode reward: [(0, '19.666')]
+[2023-02-23 10:04:50,817][12586] Updated weights for policy 0, policy_version 560 (0.0010)
+[2023-02-23 10:04:54,339][12586] Updated weights for policy 0, policy_version 570 (0.0010)
+[2023-02-23 10:04:55,316][07928] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11156.7). Total num frames: 2342912. Throughput: 0: 2931.6. Samples: 585266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:04:55,319][07928] Avg episode reward: [(0, '19.913')]
+[2023-02-23 10:04:55,321][12572] Saving new best policy, reward=19.913!
+[2023-02-23 10:04:57,864][12586] Updated weights for policy 0, policy_version 580 (0.0010)
+[2023-02-23 10:05:00,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11183.0). Total num frames: 2404352. Throughput: 0: 2938.0. Samples: 593998. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:05:00,319][07928] Avg episode reward: [(0, '20.096')]
+[2023-02-23 10:05:00,327][12572] Saving new best policy, reward=20.096!
+[2023-02-23 10:05:01,312][12586] Updated weights for policy 0, policy_version 590 (0.0010)
+[2023-02-23 10:05:04,776][12586] Updated weights for policy 0, policy_version 600 (0.0010)
+[2023-02-23 10:05:05,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11189.5). Total num frames: 2461696. Throughput: 0: 2945.6. Samples: 611806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:05,318][07928] Avg episode reward: [(0, '20.172')]
+[2023-02-23 10:05:05,322][12572] Saving new best policy, reward=20.172!
+[2023-02-23 10:05:08,337][12586] Updated weights for policy 0, policy_version 610 (0.0011)
+[2023-02-23 10:05:10,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11195.7). Total num frames: 2519040. Throughput: 0: 2938.3. Samples: 629184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:10,319][07928] Avg episode reward: [(0, '19.201')]
+[2023-02-23 10:05:10,328][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000615_2519040.pth...
+[2023-02-23 10:05:11,846][12586] Updated weights for policy 0, policy_version 620 (0.0010)
+[2023-02-23 10:05:15,248][12586] Updated weights for policy 0, policy_version 630 (0.0009)
+[2023-02-23 10:05:15,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11219.5). Total num frames: 2580480. Throughput: 0: 2945.6. Samples: 638140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:05:15,319][07928] Avg episode reward: [(0, '17.916')]
+[2023-02-23 10:05:18,598][12586] Updated weights for policy 0, policy_version 640 (0.0010)
+[2023-02-23 10:05:20,316][07928] Fps is (10 sec: 12288.0, 60 sec: 11810.1, 300 sec: 11242.2). Total num frames: 2641920. Throughput: 0: 2953.6. Samples: 656318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:20,318][07928] Avg episode reward: [(0, '20.862')]
+[2023-02-23 10:05:20,328][12572] Saving new best policy, reward=20.862!
+[2023-02-23 10:05:22,066][12586] Updated weights for policy 0, policy_version 650 (0.0010)
+[2023-02-23 10:05:25,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11246.9). Total num frames: 2699264. Throughput: 0: 2943.2. Samples: 673672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:25,319][07928] Avg episode reward: [(0, '20.226')]
+[2023-02-23 10:05:25,643][12586] Updated weights for policy 0, policy_version 660 (0.0010)
+[2023-02-23 10:05:29,060][12586] Updated weights for policy 0, policy_version 670 (0.0009)
+[2023-02-23 10:05:30,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11810.1, 300 sec: 11251.5). Total num frames: 2756608. Throughput: 0: 2949.3. Samples: 682528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:05:30,318][07928] Avg episode reward: [(0, '20.335')]
+[2023-02-23 10:05:32,571][12586] Updated weights for policy 0, policy_version 680 (0.0010)
+[2023-02-23 10:05:35,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11741.8, 300 sec: 11255.8). Total num frames: 2813952. Throughput: 0: 2944.2. Samples: 700260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:35,319][07928] Avg episode reward: [(0, '20.945')]
+[2023-02-23 10:05:35,322][12572] Saving new best policy, reward=20.945!
+[2023-02-23 10:05:36,104][12586] Updated weights for policy 0, policy_version 690 (0.0009)
+[2023-02-23 10:05:39,672][12586] Updated weights for policy 0, policy_version 700 (0.0011)
+[2023-02-23 10:05:40,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11260.0). Total num frames: 2871296. Throughput: 0: 2938.4. Samples: 717494. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:40,319][07928] Avg episode reward: [(0, '21.987')]
+[2023-02-23 10:05:40,346][12572] Saving new best policy, reward=21.987!
+[2023-02-23 10:05:43,136][12586] Updated weights for policy 0, policy_version 710 (0.0010)
+[2023-02-23 10:05:45,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11810.1, 300 sec: 11279.8). Total num frames: 2932736. Throughput: 0: 2942.5. Samples: 726412. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:05:45,318][07928] Avg episode reward: [(0, '24.042')]
+[2023-02-23 10:05:45,320][12572] Saving new best policy, reward=24.042!
+[2023-02-23 10:05:46,578][12586] Updated weights for policy 0, policy_version 720 (0.0010)
+[2023-02-23 10:05:50,007][12586] Updated weights for policy 0, policy_version 730 (0.0009)
+[2023-02-23 10:05:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11283.3). Total num frames: 2990080. Throughput: 0: 2944.2. Samples: 744296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:05:50,319][07928] Avg episode reward: [(0, '19.464')]
+[2023-02-23 10:05:53,557][12586] Updated weights for policy 0, policy_version 740 (0.0010)
+[2023-02-23 10:05:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11301.9). Total num frames: 3051520. Throughput: 0: 2943.6. Samples: 761646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:05:55,319][07928] Avg episode reward: [(0, '19.617')]
+[2023-02-23 10:05:57,015][12586] Updated weights for policy 0, policy_version 750 (0.0009)
+[2023-02-23 10:06:00,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11305.0). Total num frames: 3108864. Throughput: 0: 2942.1. Samples: 770534. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:06:00,319][07928] Avg episode reward: [(0, '22.151')]
+[2023-02-23 10:06:00,508][12586] Updated weights for policy 0, policy_version 760 (0.0010)
+[2023-02-23 10:06:03,961][12586] Updated weights for policy 0, policy_version 770 (0.0011)
+[2023-02-23 10:06:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11307.9). Total num frames: 3166208. Throughput: 0: 2930.9. Samples: 788208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:06:05,318][07928] Avg episode reward: [(0, '20.739')]
+[2023-02-23 10:06:07,594][12586] Updated weights for policy 0, policy_version 780 (0.0010)
+[2023-02-23 10:06:10,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11310.7). Total num frames: 3223552. Throughput: 0: 2927.8. Samples: 805422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:06:10,319][07928] Avg episode reward: [(0, '21.233')]
+[2023-02-23 10:06:11,106][12586] Updated weights for policy 0, policy_version 790 (0.0019)
+[2023-02-23 10:06:14,540][12586] Updated weights for policy 0, policy_version 800 (0.0010)
+[2023-02-23 10:06:15,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11327.6). Total num frames: 3284992. Throughput: 0: 2929.5. Samples: 814354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:06:15,319][07928] Avg episode reward: [(0, '22.765')]
+[2023-02-23 10:06:17,943][12586] Updated weights for policy 0, policy_version 810 (0.0010)
+[2023-02-23 10:06:20,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11330.0). Total num frames: 3342336. Throughput: 0: 2931.4. Samples: 832174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:06:20,318][07928] Avg episode reward: [(0, '23.397')]
+[2023-02-23 10:06:21,504][12586] Updated weights for policy 0, policy_version 820 (0.0010)
+[2023-02-23 10:06:24,949][12586] Updated weights for policy 0, policy_version 830 (0.0009)
+[2023-02-23 10:06:25,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11538.2). Total num frames: 3403776. Throughput: 0: 2941.4. Samples: 849858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:06:25,319][07928] Avg episode reward: [(0, '25.012')]
+[2023-02-23 10:06:25,321][12572] Saving new best policy, reward=25.012!
+[2023-02-23 10:06:28,397][12586] Updated weights for policy 0, policy_version 840 (0.0010)
+[2023-02-23 10:06:30,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 3461120. Throughput: 0: 2940.3. Samples: 858728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:06:30,318][07928] Avg episode reward: [(0, '27.276')]
+[2023-02-23 10:06:30,327][12572] Saving new best policy, reward=27.276!
+[2023-02-23 10:06:31,843][12586] Updated weights for policy 0, policy_version 850 (0.0009)
+[2023-02-23 10:06:35,317][07928] Fps is (10 sec: 11468.3, 60 sec: 11741.8, 300 sec: 11788.1). Total num frames: 3518464. Throughput: 0: 2933.6. Samples: 876310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:06:35,318][07928] Avg episode reward: [(0, '27.989')]
+[2023-02-23 10:06:35,320][12572] Saving new best policy, reward=27.989!
+[2023-02-23 10:06:35,429][12586] Updated weights for policy 0, policy_version 860 (0.0011)
+[2023-02-23 10:06:38,844][12586] Updated weights for policy 0, policy_version 870 (0.0009)
+[2023-02-23 10:06:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11788.1). Total num frames: 3579904. Throughput: 0: 2940.6. Samples: 893972. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:06:40,319][07928] Avg episode reward: [(0, '24.301')]
+[2023-02-23 10:06:42,297][12586] Updated weights for policy 0, policy_version 880 (0.0010)
+[2023-02-23 10:06:45,316][07928] Fps is (10 sec: 11878.9, 60 sec: 11741.9, 300 sec: 11788.1). Total num frames: 3637248. Throughput: 0: 2940.5. Samples: 902858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:06:45,319][07928] Avg episode reward: [(0, '24.364')]
+[2023-02-23 10:06:45,870][12586] Updated weights for policy 0, policy_version 890 (0.0010)
+[2023-02-23 10:06:49,432][12586] Updated weights for policy 0, policy_version 900 (0.0009)
+[2023-02-23 10:06:50,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 3694592. Throughput: 0: 2933.5. Samples: 920216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:06:50,318][07928] Avg episode reward: [(0, '23.977')]
+[2023-02-23 10:06:52,845][12586] Updated weights for policy 0, policy_version 910 (0.0009)
+[2023-02-23 10:06:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11788.1). Total num frames: 3756032. Throughput: 0: 2949.1. Samples: 938132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:06:55,319][07928] Avg episode reward: [(0, '23.055')]
+[2023-02-23 10:06:56,270][12586] Updated weights for policy 0, policy_version 920 (0.0009)
+[2023-02-23 10:06:59,663][12586] Updated weights for policy 0, policy_version 930 (0.0009)
+[2023-02-23 10:07:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 3813376. Throughput: 0: 2949.1. Samples: 947062. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:07:00,320][07928] Avg episode reward: [(0, '22.458')]
+[2023-02-23 10:07:03,263][12586] Updated weights for policy 0, policy_version 940 (0.0011)
+[2023-02-23 10:07:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 3870720. Throughput: 0: 2939.6. Samples: 964458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:05,318][07928] Avg episode reward: [(0, '22.598')]
+[2023-02-23 10:07:06,732][12586] Updated weights for policy 0, policy_version 950 (0.0011)
+[2023-02-23 10:07:10,158][12586] Updated weights for policy 0, policy_version 960 (0.0010)
+[2023-02-23 10:07:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11774.3). Total num frames: 3932160. Throughput: 0: 2944.3. Samples: 982350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:07:10,319][07928] Avg episode reward: [(0, '25.482')]
+[2023-02-23 10:07:10,328][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000960_3932160.pth...
+[2023-02-23 10:07:10,390][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000271_1110016.pth
+[2023-02-23 10:07:13,589][12586] Updated weights for policy 0, policy_version 970 (0.0010)
+[2023-02-23 10:07:15,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 3989504. Throughput: 0: 2942.8. Samples: 991152. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:15,318][07928] Avg episode reward: [(0, '26.571')]
+[2023-02-23 10:07:17,197][12586] Updated weights for policy 0, policy_version 980 (0.0010)
+[2023-02-23 10:07:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11788.1). Total num frames: 4050944. Throughput: 0: 2940.2. Samples: 1008618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:07:20,318][07928] Avg episode reward: [(0, '26.161')]
+[2023-02-23 10:07:20,657][12586] Updated weights for policy 0, policy_version 990 (0.0009)
+[2023-02-23 10:07:24,028][12586] Updated weights for policy 0, policy_version 1000 (0.0009)
+[2023-02-23 10:07:25,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.8, 300 sec: 11774.3). Total num frames: 4108288. Throughput: 0: 2944.1. Samples: 1026456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:07:25,319][07928] Avg episode reward: [(0, '25.728')]
+[2023-02-23 10:07:27,516][12586] Updated weights for policy 0, policy_version 1010 (0.0008)
+[2023-02-23 10:07:30,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11774.3). Total num frames: 4165632. Throughput: 0: 2943.6. Samples: 1035322. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:07:30,319][07928] Avg episode reward: [(0, '26.442')]
+[2023-02-23 10:07:31,081][12586] Updated weights for policy 0, policy_version 1020 (0.0011)
+[2023-02-23 10:07:34,584][12586] Updated weights for policy 0, policy_version 1030 (0.0010)
+[2023-02-23 10:07:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11774.3). Total num frames: 4227072. Throughput: 0: 2944.8. Samples: 1052734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:07:35,318][07928] Avg episode reward: [(0, '25.407')]
+[2023-02-23 10:07:38,056][12586] Updated weights for policy 0, policy_version 1040 (0.0011)
+[2023-02-23 10:07:40,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 4284416. Throughput: 0: 2941.1. Samples: 1070482. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:40,319][07928] Avg episode reward: [(0, '26.590')]
+[2023-02-23 10:07:41,526][12586] Updated weights for policy 0, policy_version 1050 (0.0010)
+[2023-02-23 10:07:45,079][12586] Updated weights for policy 0, policy_version 1060 (0.0010)
+[2023-02-23 10:07:45,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 4341760. Throughput: 0: 2939.6. Samples: 1079344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:45,318][07928] Avg episode reward: [(0, '27.716')]
+[2023-02-23 10:07:48,524][12586] Updated weights for policy 0, policy_version 1070 (0.0010)
+[2023-02-23 10:07:50,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 4403200. Throughput: 0: 2943.1. Samples: 1096896. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:50,319][07928] Avg episode reward: [(0, '26.096')]
+[2023-02-23 10:07:51,934][12586] Updated weights for policy 0, policy_version 1080 (0.0009)
+[2023-02-23 10:07:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 4460544. Throughput: 0: 2944.0. Samples: 1114832. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:07:55,319][07928] Avg episode reward: [(0, '26.028')]
+[2023-02-23 10:07:55,356][12586] Updated weights for policy 0, policy_version 1090 (0.0009)
+[2023-02-23 10:07:58,938][12586] Updated weights for policy 0, policy_version 1100 (0.0010)
+[2023-02-23 10:08:00,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4517888. Throughput: 0: 2941.7. Samples: 1123528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:00,319][07928] Avg episode reward: [(0, '28.459')]
+[2023-02-23 10:08:00,327][12572] Saving new best policy, reward=28.459!
+[2023-02-23 10:08:02,490][12586] Updated weights for policy 0, policy_version 1110 (0.0011)
+[2023-02-23 10:08:05,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11760.4). Total num frames: 4579328. Throughput: 0: 2940.1. Samples: 1140924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:08:05,318][07928] Avg episode reward: [(0, '29.227')]
+[2023-02-23 10:08:05,321][12572] Saving new best policy, reward=29.227!
+[2023-02-23 10:08:05,893][12586] Updated weights for policy 0, policy_version 1120 (0.0010)
+[2023-02-23 10:08:09,322][12586] Updated weights for policy 0, policy_version 1130 (0.0009)
+[2023-02-23 10:08:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 4636672. Throughput: 0: 2938.9. Samples: 1158708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:10,319][07928] Avg episode reward: [(0, '25.131')]
+[2023-02-23 10:08:12,873][12586] Updated weights for policy 0, policy_version 1140 (0.0010)
+[2023-02-23 10:08:15,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4694016. Throughput: 0: 2933.7. Samples: 1167338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:15,319][07928] Avg episode reward: [(0, '24.979')]
+[2023-02-23 10:08:16,372][12586] Updated weights for policy 0, policy_version 1150 (0.0010)
+[2023-02-23 10:08:19,813][12586] Updated weights for policy 0, policy_version 1160 (0.0009)
+[2023-02-23 10:08:20,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4755456. Throughput: 0: 2942.8. Samples: 1185158. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:08:20,318][07928] Avg episode reward: [(0, '24.569')]
+[2023-02-23 10:08:23,211][12586] Updated weights for policy 0, policy_version 1170 (0.0010)
+[2023-02-23 10:08:25,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4812800. Throughput: 0: 2947.1. Samples: 1203102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:08:25,319][07928] Avg episode reward: [(0, '23.322')]
+[2023-02-23 10:08:26,811][12586] Updated weights for policy 0, policy_version 1180 (0.0009)
+[2023-02-23 10:08:30,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4870144. Throughput: 0: 2936.0. Samples: 1211462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:08:30,319][07928] Avg episode reward: [(0, '23.754')]
+[2023-02-23 10:08:30,340][12586] Updated weights for policy 0, policy_version 1190 (0.0009)
+[2023-02-23 10:08:33,807][12586] Updated weights for policy 0, policy_version 1200 (0.0010)
+[2023-02-23 10:08:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 4931584. Throughput: 0: 2939.4. Samples: 1229168. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:08:35,318][07928] Avg episode reward: [(0, '27.204')]
+[2023-02-23 10:08:37,270][12586] Updated weights for policy 0, policy_version 1210 (0.0010)
+[2023-02-23 10:08:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 4988928. Throughput: 0: 2929.0. Samples: 1246636. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:08:40,318][07928] Avg episode reward: [(0, '28.629')]
+[2023-02-23 10:08:40,861][12586] Updated weights for policy 0, policy_version 1220 (0.0010)
+[2023-02-23 10:08:44,373][12586] Updated weights for policy 0, policy_version 1230 (0.0010)
+[2023-02-23 10:08:45,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11760.4). Total num frames: 5046272. Throughput: 0: 2922.8. Samples: 1255052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:45,319][07928] Avg episode reward: [(0, '25.272')]
+[2023-02-23 10:08:47,903][12586] Updated weights for policy 0, policy_version 1240 (0.0010)
+[2023-02-23 10:08:50,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11746.5). Total num frames: 5103616. Throughput: 0: 2922.3. Samples: 1272430. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:50,320][07928] Avg episode reward: [(0, '24.644')]
+[2023-02-23 10:08:51,569][12586] Updated weights for policy 0, policy_version 1250 (0.0010)
+[2023-02-23 10:08:55,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11605.3, 300 sec: 11732.6). Total num frames: 5156864. Throughput: 0: 2894.8. Samples: 1288974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:08:55,319][07928] Avg episode reward: [(0, '25.320')]
+[2023-02-23 10:08:55,345][12586] Updated weights for policy 0, policy_version 1260 (0.0011)
+[2023-02-23 10:08:59,052][12586] Updated weights for policy 0, policy_version 1270 (0.0010)
+[2023-02-23 10:09:00,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 5214208. Throughput: 0: 2885.8. Samples: 1297198. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:00,319][07928] Avg episode reward: [(0, '23.456')]
+[2023-02-23 10:09:02,657][12586] Updated weights for policy 0, policy_version 1280 (0.0009)
+[2023-02-23 10:09:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11718.7). Total num frames: 5271552. Throughput: 0: 2873.4. Samples: 1314460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:05,319][07928] Avg episode reward: [(0, '23.322')]
+[2023-02-23 10:09:06,136][12586] Updated weights for policy 0, policy_version 1290 (0.0009)
+[2023-02-23 10:09:09,778][12586] Updated weights for policy 0, policy_version 1300 (0.0010)
+[2023-02-23 10:09:10,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11718.7). Total num frames: 5328896. Throughput: 0: 2854.5. Samples: 1331554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:10,319][07928] Avg episode reward: [(0, '26.863')]
+[2023-02-23 10:09:10,329][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001301_5328896.pth...
+[2023-02-23 10:09:10,393][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000615_2519040.pth
+[2023-02-23 10:09:13,315][12586] Updated weights for policy 0, policy_version 1310 (0.0010)
+[2023-02-23 10:09:15,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11704.8). Total num frames: 5386240. Throughput: 0: 2862.4. Samples: 1340272. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:15,319][07928] Avg episode reward: [(0, '28.845')]
+[2023-02-23 10:09:16,808][12586] Updated weights for policy 0, policy_version 1320 (0.0010)
+[2023-02-23 10:09:20,247][12586] Updated weights for policy 0, policy_version 1330 (0.0010)
+[2023-02-23 10:09:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 11718.7). Total num frames: 5447680. Throughput: 0: 2861.2. Samples: 1357920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:20,319][07928] Avg episode reward: [(0, '28.959')]
+[2023-02-23 10:09:23,772][12586] Updated weights for policy 0, policy_version 1340 (0.0010)
+[2023-02-23 10:09:25,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11537.0, 300 sec: 11718.7). Total num frames: 5505024. Throughput: 0: 2862.4. Samples: 1375442. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:25,319][07928] Avg episode reward: [(0, '28.191')]
+[2023-02-23 10:09:27,230][12586] Updated weights for policy 0, policy_version 1350 (0.0010)
+[2023-02-23 10:09:30,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11704.8). Total num frames: 5562368. Throughput: 0: 2873.4. Samples: 1384356. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:30,318][07928] Avg episode reward: [(0, '25.934')]
+[2023-02-23 10:09:30,727][12586] Updated weights for policy 0, policy_version 1360 (0.0010)
+[2023-02-23 10:09:34,145][12586] Updated weights for policy 0, policy_version 1370 (0.0010)
+[2023-02-23 10:09:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11537.1, 300 sec: 11718.7). Total num frames: 5623808. Throughput: 0: 2882.5. Samples: 1402144. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:35,318][07928] Avg episode reward: [(0, '25.730')]
+[2023-02-23 10:09:37,796][12586] Updated weights for policy 0, policy_version 1380 (0.0010)
+[2023-02-23 10:09:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11537.1, 300 sec: 11718.7). Total num frames: 5681152. Throughput: 0: 2892.9. Samples: 1419156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:40,319][07928] Avg episode reward: [(0, '26.536')]
+[2023-02-23 10:09:41,352][12586] Updated weights for policy 0, policy_version 1390 (0.0010)
+[2023-02-23 10:09:44,816][12586] Updated weights for policy 0, policy_version 1400 (0.0009)
+[2023-02-23 10:09:45,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11537.1, 300 sec: 11704.8). Total num frames: 5738496. Throughput: 0: 2907.1. Samples: 1428016. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:45,318][07928] Avg episode reward: [(0, '26.147')]
+[2023-02-23 10:09:48,218][12586] Updated weights for policy 0, policy_version 1410 (0.0010)
+[2023-02-23 10:09:50,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11537.1, 300 sec: 11704.8). Total num frames: 5795840. Throughput: 0: 2923.6. Samples: 1446022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:50,318][07928] Avg episode reward: [(0, '26.277')]
+[2023-02-23 10:09:51,759][12586] Updated weights for policy 0, policy_version 1420 (0.0009)
+[2023-02-23 10:09:55,179][12586] Updated weights for policy 0, policy_version 1430 (0.0009)
+[2023-02-23 10:09:55,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 5857280. Throughput: 0: 2932.6. Samples: 1463522. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:09:55,319][07928] Avg episode reward: [(0, '27.436')]
+[2023-02-23 10:09:58,617][12586] Updated weights for policy 0, policy_version 1440 (0.0010)
+[2023-02-23 10:10:00,316][07928] Fps is (10 sec: 12288.1, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 5918720. Throughput: 0: 2936.9. Samples: 1472432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:00,318][07928] Avg episode reward: [(0, '27.312')]
+[2023-02-23 10:10:02,027][12586] Updated weights for policy 0, policy_version 1450 (0.0010)
+[2023-02-23 10:10:05,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 5976064. Throughput: 0: 2940.4. Samples: 1490238. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:05,319][07928] Avg episode reward: [(0, '25.902')]
+[2023-02-23 10:10:05,605][12586] Updated weights for policy 0, policy_version 1460 (0.0010)
+[2023-02-23 10:10:09,152][12586] Updated weights for policy 0, policy_version 1470 (0.0010)
+[2023-02-23 10:10:10,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6033408. Throughput: 0: 2937.2. Samples: 1507614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:10:10,319][07928] Avg episode reward: [(0, '24.792')]
+[2023-02-23 10:10:12,588][12586] Updated weights for policy 0, policy_version 1480 (0.0011)
+[2023-02-23 10:10:15,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 6090752. Throughput: 0: 2934.4. Samples: 1516406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:10:15,319][07928] Avg episode reward: [(0, '25.819')]
+[2023-02-23 10:10:16,129][12586] Updated weights for policy 0, policy_version 1490 (0.0010)
+[2023-02-23 10:10:19,690][12586] Updated weights for policy 0, policy_version 1500 (0.0011)
+[2023-02-23 10:10:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 6148096. Throughput: 0: 2927.5. Samples: 1533882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:10:20,319][07928] Avg episode reward: [(0, '27.924')]
+[2023-02-23 10:10:23,215][12586] Updated weights for policy 0, policy_version 1510 (0.0011)
+[2023-02-23 10:10:25,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6209536. Throughput: 0: 2935.0. Samples: 1551232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:10:25,319][07928] Avg episode reward: [(0, '28.339')]
+[2023-02-23 10:10:26,632][12586] Updated weights for policy 0, policy_version 1520 (0.0010)
+[2023-02-23 10:10:30,073][12586] Updated weights for policy 0, policy_version 1530 (0.0010)
+[2023-02-23 10:10:30,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 6266880. Throughput: 0: 2937.6. Samples: 1560208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:30,319][07928] Avg episode reward: [(0, '25.792')]
+[2023-02-23 10:10:33,577][12586] Updated weights for policy 0, policy_version 1540 (0.0010)
+[2023-02-23 10:10:35,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 6324224. Throughput: 0: 2927.6. Samples: 1577762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:35,319][07928] Avg episode reward: [(0, '24.709')]
+[2023-02-23 10:10:37,101][12586] Updated weights for policy 0, policy_version 1550 (0.0011)
+[2023-02-23 10:10:40,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6385664. Throughput: 0: 2936.7. Samples: 1595672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:10:40,319][07928] Avg episode reward: [(0, '26.531')]
+[2023-02-23 10:10:40,519][12586] Updated weights for policy 0, policy_version 1560 (0.0010)
+[2023-02-23 10:10:43,949][12586] Updated weights for policy 0, policy_version 1570 (0.0009)
+[2023-02-23 10:10:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6443008. Throughput: 0: 2936.7. Samples: 1604584. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:10:45,319][07928] Avg episode reward: [(0, '29.442')]
+[2023-02-23 10:10:45,326][12572] Saving new best policy, reward=29.442!
+[2023-02-23 10:10:47,485][12586] Updated weights for policy 0, policy_version 1580 (0.0010)
+[2023-02-23 10:10:50,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 6500352. Throughput: 0: 2930.5. Samples: 1622112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:50,318][07928] Avg episode reward: [(0, '31.268')]
+[2023-02-23 10:10:50,338][12572] Saving new best policy, reward=31.268!
+[2023-02-23 10:10:51,035][12586] Updated weights for policy 0, policy_version 1590 (0.0010)
+[2023-02-23 10:10:54,440][12586] Updated weights for policy 0, policy_version 1600 (0.0010)
+[2023-02-23 10:10:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6561792. Throughput: 0: 2934.8. Samples: 1639682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:10:55,319][07928] Avg episode reward: [(0, '27.090')]
+[2023-02-23 10:10:57,928][12586] Updated weights for policy 0, policy_version 1610 (0.0010)
+[2023-02-23 10:11:00,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 6619136. Throughput: 0: 2934.9. Samples: 1648476. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:00,318][07928] Avg episode reward: [(0, '26.959')]
+[2023-02-23 10:11:01,480][12586] Updated weights for policy 0, policy_version 1620 (0.0010)
+[2023-02-23 10:11:05,006][12586] Updated weights for policy 0, policy_version 1630 (0.0010)
+[2023-02-23 10:11:05,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 6676480. Throughput: 0: 2931.0. Samples: 1665776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:05,319][07928] Avg episode reward: [(0, '25.257')]
+[2023-02-23 10:11:08,466][12586] Updated weights for policy 0, policy_version 1640 (0.0010)
+[2023-02-23 10:11:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6737920. Throughput: 0: 2939.2. Samples: 1683498. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:10,318][07928] Avg episode reward: [(0, '24.753')]
+[2023-02-23 10:11:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth...
+[2023-02-23 10:11:10,328][07928] Components not started: RolloutWorker_w1, RolloutWorker_w2, RolloutWorker_w4, RolloutWorker_w7, wait_time=600.0 seconds
+[2023-02-23 10:11:10,381][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000960_3932160.pth
+[2023-02-23 10:11:11,945][12586] Updated weights for policy 0, policy_version 1650 (0.0010)
+[2023-02-23 10:11:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6795264. Throughput: 0: 2937.5. Samples: 1692394. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:15,318][07928] Avg episode reward: [(0, '27.484')]
+[2023-02-23 10:11:15,508][12586] Updated weights for policy 0, policy_version 1660 (0.0010)
+[2023-02-23 10:11:19,063][12586] Updated weights for policy 0, policy_version 1670 (0.0010)
+[2023-02-23 10:11:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 6852608. Throughput: 0: 2930.5. Samples: 1709634. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:20,319][07928] Avg episode reward: [(0, '26.860')]
+[2023-02-23 10:11:22,489][12586] Updated weights for policy 0, policy_version 1680 (0.0010)
+[2023-02-23 10:11:25,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 6914048. Throughput: 0: 2928.6. Samples: 1727458. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:25,319][07928] Avg episode reward: [(0, '23.902')]
+[2023-02-23 10:11:25,952][12586] Updated weights for policy 0, policy_version 1690 (0.0009)
+[2023-02-23 10:11:29,441][12586] Updated weights for policy 0, policy_version 1700 (0.0009)
+[2023-02-23 10:11:30,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11704.9). Total num frames: 6971392. Throughput: 0: 2926.5. Samples: 1736278. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2023-02-23 10:11:30,319][07928] Avg episode reward: [(0, '24.960')]
+[2023-02-23 10:11:33,071][12586] Updated weights for policy 0, policy_version 1710 (0.0010)
+[2023-02-23 10:11:35,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7028736. Throughput: 0: 2917.7. Samples: 1753406. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:11:35,319][07928] Avg episode reward: [(0, '26.163')]
+[2023-02-23 10:11:36,579][12586] Updated weights for policy 0, policy_version 1720 (0.0010)
+[2023-02-23 10:11:39,974][12586] Updated weights for policy 0, policy_version 1730 (0.0010)
+[2023-02-23 10:11:40,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 7086080. Throughput: 0: 2923.9. Samples: 1771256. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:11:40,319][07928] Avg episode reward: [(0, '24.602')]
+[2023-02-23 10:11:43,523][12586] Updated weights for policy 0, policy_version 1740 (0.0009)
+[2023-02-23 10:11:45,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 7143424. Throughput: 0: 2925.6. Samples: 1780126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:11:45,319][07928] Avg episode reward: [(0, '26.056')]
+[2023-02-23 10:11:47,083][12586] Updated weights for policy 0, policy_version 1750 (0.0010)
+[2023-02-23 10:11:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7204864. Throughput: 0: 2930.2. Samples: 1797636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:11:50,318][07928] Avg episode reward: [(0, '27.538')]
+[2023-02-23 10:11:50,431][12586] Updated weights for policy 0, policy_version 1760 (0.0009)
+[2023-02-23 10:11:53,869][12586] Updated weights for policy 0, policy_version 1770 (0.0009)
+[2023-02-23 10:11:55,316][07928] Fps is (10 sec: 12288.1, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 7266304. Throughput: 0: 2937.5. Samples: 1815684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:11:55,319][07928] Avg episode reward: [(0, '25.179')]
+[2023-02-23 10:11:57,418][12586] Updated weights for policy 0, policy_version 1780 (0.0010)
+[2023-02-23 10:12:00,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 7319552. Throughput: 0: 2929.3. Samples: 1824212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:12:00,320][07928] Avg episode reward: [(0, '27.563')]
+[2023-02-23 10:12:00,999][12586] Updated weights for policy 0, policy_version 1790 (0.0010)
+[2023-02-23 10:12:04,429][12586] Updated weights for policy 0, policy_version 1800 (0.0010)
+[2023-02-23 10:12:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7380992. Throughput: 0: 2935.5. Samples: 1841732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:05,318][07928] Avg episode reward: [(0, '28.453')]
+[2023-02-23 10:12:07,861][12586] Updated weights for policy 0, policy_version 1810 (0.0010)
+[2023-02-23 10:12:10,316][07928] Fps is (10 sec: 12288.0, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 7442432. Throughput: 0: 2934.9. Samples: 1859528. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:10,319][07928] Avg episode reward: [(0, '26.893')]
+[2023-02-23 10:12:11,322][12586] Updated weights for policy 0, policy_version 1820 (0.0010)
+[2023-02-23 10:12:14,942][12586] Updated weights for policy 0, policy_version 1830 (0.0010)
+[2023-02-23 10:12:15,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7499776. Throughput: 0: 2929.3. Samples: 1868096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:12:15,318][07928] Avg episode reward: [(0, '27.101')]
+[2023-02-23 10:12:18,383][12586] Updated weights for policy 0, policy_version 1840 (0.0011)
+[2023-02-23 10:12:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7557120. Throughput: 0: 2938.9. Samples: 1885658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:12:20,319][07928] Avg episode reward: [(0, '28.484')]
+[2023-02-23 10:12:21,861][12586] Updated weights for policy 0, policy_version 1850 (0.0010)
+[2023-02-23 10:12:25,315][12586] Updated weights for policy 0, policy_version 1860 (0.0010)
+[2023-02-23 10:12:25,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 7618560. Throughput: 0: 2936.4. Samples: 1903396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:25,318][07928] Avg episode reward: [(0, '26.328')]
+[2023-02-23 10:12:28,920][12586] Updated weights for policy 0, policy_version 1870 (0.0011)
+[2023-02-23 10:12:30,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11677.1). Total num frames: 7671808. Throughput: 0: 2929.5. Samples: 1911952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:12:30,318][07928] Avg episode reward: [(0, '26.245')]
+[2023-02-23 10:12:32,439][12586] Updated weights for policy 0, policy_version 1880 (0.0009)
+[2023-02-23 10:12:35,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7733248. Throughput: 0: 2930.7. Samples: 1929516. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:35,319][07928] Avg episode reward: [(0, '29.782')]
+[2023-02-23 10:12:35,859][12586] Updated weights for policy 0, policy_version 1890 (0.0009)
+[2023-02-23 10:12:39,307][12586] Updated weights for policy 0, policy_version 1900 (0.0010)
+[2023-02-23 10:12:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 7790592. Throughput: 0: 2923.9. Samples: 1947260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:12:40,319][07928] Avg episode reward: [(0, '29.671')]
+[2023-02-23 10:12:42,901][12586] Updated weights for policy 0, policy_version 1910 (0.0010)
+[2023-02-23 10:12:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.2, 300 sec: 11691.0). Total num frames: 7852032. Throughput: 0: 2924.4. Samples: 1955810. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:45,319][07928] Avg episode reward: [(0, '27.291')]
+[2023-02-23 10:12:46,375][12586] Updated weights for policy 0, policy_version 1920 (0.0010)
+[2023-02-23 10:12:49,834][12586] Updated weights for policy 0, policy_version 1930 (0.0010)
+[2023-02-23 10:12:50,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11691.0). Total num frames: 7909376. Throughput: 0: 2930.5. Samples: 1973606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:12:50,319][07928] Avg episode reward: [(0, '24.834')]
+[2023-02-23 10:12:53,328][12586] Updated weights for policy 0, policy_version 1940 (0.0009)
+[2023-02-23 10:12:55,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 7966720. Throughput: 0: 2923.7. Samples: 1991094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:12:55,319][07928] Avg episode reward: [(0, '24.086')]
+[2023-02-23 10:12:56,941][12586] Updated weights for policy 0, policy_version 1950 (0.0011)
+[2023-02-23 10:13:00,316][07928] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11677.1). Total num frames: 8024064. Throughput: 0: 2927.7. Samples: 1999842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:13:00,318][07928] Avg episode reward: [(0, '26.631')]
+[2023-02-23 10:13:00,382][12586] Updated weights for policy 0, policy_version 1960 (0.0010)
+[2023-02-23 10:13:03,753][12586] Updated weights for policy 0, policy_version 1970 (0.0009)
+[2023-02-23 10:13:05,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 8085504. Throughput: 0: 2932.4. Samples: 2017616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:13:05,319][07928] Avg episode reward: [(0, '28.920')]
+[2023-02-23 10:13:07,174][12586] Updated weights for policy 0, policy_version 1980 (0.0010)
+[2023-02-23 10:13:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 8142848. Throughput: 0: 2931.6. Samples: 2035316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:10,318][07928] Avg episode reward: [(0, '29.284')]
+[2023-02-23 10:13:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001988_8142848.pth...
+[2023-02-23 10:13:10,399][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001301_5328896.pth
+[2023-02-23 10:13:10,768][12586] Updated weights for policy 0, policy_version 1990 (0.0010)
+[2023-02-23 10:13:14,210][12586] Updated weights for policy 0, policy_version 2000 (0.0011)
+[2023-02-23 10:13:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 8204288. Throughput: 0: 2937.5. Samples: 2044140. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:15,319][07928] Avg episode reward: [(0, '27.621')]
+[2023-02-23 10:13:17,632][12586] Updated weights for policy 0, policy_version 2010 (0.0010)
+[2023-02-23 10:13:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 8261632. Throughput: 0: 2943.2. Samples: 2061962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:13:20,319][07928] Avg episode reward: [(0, '24.027')]
+[2023-02-23 10:13:21,102][12586] Updated weights for policy 0, policy_version 2020 (0.0011)
+[2023-02-23 10:13:24,707][12586] Updated weights for policy 0, policy_version 2030 (0.0010)
+[2023-02-23 10:13:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11691.0). Total num frames: 8318976. Throughput: 0: 2933.3. Samples: 2079260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:25,318][07928] Avg episode reward: [(0, '23.015')]
+[2023-02-23 10:13:28,129][12586] Updated weights for policy 0, policy_version 2040 (0.0009)
+[2023-02-23 10:13:30,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11691.0). Total num frames: 8380416. Throughput: 0: 2941.6. Samples: 2088184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:13:30,319][07928] Avg episode reward: [(0, '24.891')]
+[2023-02-23 10:13:31,599][12586] Updated weights for policy 0, policy_version 2050 (0.0009)
+[2023-02-23 10:13:35,089][12586] Updated weights for policy 0, policy_version 2060 (0.0009)
+[2023-02-23 10:13:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 8437760. Throughput: 0: 2941.2. Samples: 2105960. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:35,318][07928] Avg episode reward: [(0, '26.526')]
+[2023-02-23 10:13:38,669][12586] Updated weights for policy 0, policy_version 2070 (0.0010)
+[2023-02-23 10:13:40,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11691.0). Total num frames: 8495104. Throughput: 0: 2937.3. Samples: 2123274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:40,319][07928] Avg episode reward: [(0, '27.356')]
+[2023-02-23 10:13:42,151][12586] Updated weights for policy 0, policy_version 2080 (0.0009)
+[2023-02-23 10:13:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 8556544. Throughput: 0: 2941.7. Samples: 2132220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:13:45,318][07928] Avg episode reward: [(0, '28.665')]
+[2023-02-23 10:13:45,569][12586] Updated weights for policy 0, policy_version 2090 (0.0010)
+[2023-02-23 10:13:49,117][12586] Updated weights for policy 0, policy_version 2100 (0.0009)
+[2023-02-23 10:13:50,316][07928] Fps is (10 sec: 11878.2, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 8613888. Throughput: 0: 2938.4. Samples: 2149846. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:13:50,318][07928] Avg episode reward: [(0, '28.402')]
+[2023-02-23 10:13:52,726][12586] Updated weights for policy 0, policy_version 2110 (0.0010)
+[2023-02-23 10:13:55,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 8671232. Throughput: 0: 2927.8. Samples: 2167068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:13:55,319][07928] Avg episode reward: [(0, '26.862')]
+[2023-02-23 10:13:56,179][12586] Updated weights for policy 0, policy_version 2120 (0.0010)
+[2023-02-23 10:13:59,755][12586] Updated weights for policy 0, policy_version 2130 (0.0010)
+[2023-02-23 10:14:00,316][07928] Fps is (10 sec: 11469.0, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 8728576. Throughput: 0: 2926.8. Samples: 2175848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:14:00,318][07928] Avg episode reward: [(0, '28.137')]
+[2023-02-23 10:14:03,353][12586] Updated weights for policy 0, policy_version 2140 (0.0010)
+[2023-02-23 10:14:05,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 8785920. Throughput: 0: 2908.4. Samples: 2192840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:14:05,318][07928] Avg episode reward: [(0, '29.132')]
+[2023-02-23 10:14:07,083][12586] Updated weights for policy 0, policy_version 2150 (0.0010)
+[2023-02-23 10:14:10,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 8839168. Throughput: 0: 2895.5. Samples: 2209560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:14:10,320][07928] Avg episode reward: [(0, '27.056')]
+[2023-02-23 10:14:10,691][12586] Updated weights for policy 0, policy_version 2160 (0.0009)
+[2023-02-23 10:14:14,225][12586] Updated weights for policy 0, policy_version 2170 (0.0010)
+[2023-02-23 10:14:15,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 8900608. Throughput: 0: 2889.0. Samples: 2218190. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:14:15,319][07928] Avg episode reward: [(0, '27.865')]
+[2023-02-23 10:14:17,689][12586] Updated weights for policy 0, policy_version 2180 (0.0010)
+[2023-02-23 10:14:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 8957952. Throughput: 0: 2885.7. Samples: 2235818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:14:20,318][07928] Avg episode reward: [(0, '25.145')]
+[2023-02-23 10:14:21,281][12586] Updated weights for policy 0, policy_version 2190 (0.0010)
+[2023-02-23 10:14:24,739][12586] Updated weights for policy 0, policy_version 2200 (0.0009)
+[2023-02-23 10:14:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 9015296. Throughput: 0: 2890.0. Samples: 2253326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:14:25,319][07928] Avg episode reward: [(0, '26.101')]
+[2023-02-23 10:14:28,179][12586] Updated weights for policy 0, policy_version 2210 (0.0010)
+[2023-02-23 10:14:30,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 9076736. Throughput: 0: 2889.1. Samples: 2262230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:14:30,318][07928] Avg episode reward: [(0, '29.454')]
+[2023-02-23 10:14:31,656][12586] Updated weights for policy 0, policy_version 2220 (0.0010)
+[2023-02-23 10:14:35,231][12586] Updated weights for policy 0, policy_version 2230 (0.0010)
+[2023-02-23 10:14:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 9134080. Throughput: 0: 2887.2. Samples: 2279768. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:14:35,319][07928] Avg episode reward: [(0, '28.350')]
+[2023-02-23 10:14:38,671][12586] Updated weights for policy 0, policy_version 2240 (0.0010)
+[2023-02-23 10:14:40,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 9191424. Throughput: 0: 2898.8. Samples: 2297512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:14:40,319][07928] Avg episode reward: [(0, '29.783')]
+[2023-02-23 10:14:42,130][12586] Updated weights for policy 0, policy_version 2250 (0.0010)
+[2023-02-23 10:14:45,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 9252864. Throughput: 0: 2901.2. Samples: 2306404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:14:45,319][07928] Avg episode reward: [(0, '28.516')]
+[2023-02-23 10:14:45,582][12586] Updated weights for policy 0, policy_version 2260 (0.0009)
+[2023-02-23 10:14:49,171][12586] Updated weights for policy 0, policy_version 2270 (0.0010)
+[2023-02-23 10:14:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11605.4, 300 sec: 11704.8). Total num frames: 9310208. Throughput: 0: 2910.6. Samples: 2323816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:14:50,318][07928] Avg episode reward: [(0, '27.668')]
+[2023-02-23 10:14:52,623][12586] Updated weights for policy 0, policy_version 2280 (0.0010)
+[2023-02-23 10:14:55,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11691.0). Total num frames: 9367552. Throughput: 0: 2932.6. Samples: 2341526. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:14:55,319][07928] Avg episode reward: [(0, '26.350')]
+[2023-02-23 10:14:56,097][12586] Updated weights for policy 0, policy_version 2290 (0.0009)
+[2023-02-23 10:14:59,501][12586] Updated weights for policy 0, policy_version 2300 (0.0010)
+[2023-02-23 10:15:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 9428992. Throughput: 0: 2938.3. Samples: 2350412. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:00,319][07928] Avg episode reward: [(0, '24.962')]
+[2023-02-23 10:15:03,093][12586] Updated weights for policy 0, policy_version 2310 (0.0010)
+[2023-02-23 10:15:05,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 9486336. Throughput: 0: 2931.1. Samples: 2367716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:05,318][07928] Avg episode reward: [(0, '26.976')]
+[2023-02-23 10:15:06,575][12586] Updated weights for policy 0, policy_version 2320 (0.0010)
+[2023-02-23 10:15:10,009][12586] Updated weights for policy 0, policy_version 2330 (0.0009)
+[2023-02-23 10:15:10,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 9543680. Throughput: 0: 2939.4. Samples: 2385598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:10,319][07928] Avg episode reward: [(0, '29.735')]
+[2023-02-23 10:15:10,346][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002331_9547776.pth...
+[2023-02-23 10:15:10,406][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001645_6737920.pth
+[2023-02-23 10:15:13,436][12586] Updated weights for policy 0, policy_version 2340 (0.0010)
+[2023-02-23 10:15:15,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 9605120. Throughput: 0: 2938.9. Samples: 2394482. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:15:15,318][07928] Avg episode reward: [(0, '26.928')]
+[2023-02-23 10:15:17,073][12586] Updated weights for policy 0, policy_version 2350 (0.0010)
+[2023-02-23 10:15:20,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 9662464. Throughput: 0: 2933.5. Samples: 2411776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:20,318][07928] Avg episode reward: [(0, '26.675')]
+[2023-02-23 10:15:20,512][12586] Updated weights for policy 0, policy_version 2360 (0.0010)
+[2023-02-23 10:15:23,849][12586] Updated weights for policy 0, policy_version 2370 (0.0009)
+[2023-02-23 10:15:25,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 9723904. Throughput: 0: 2942.5. Samples: 2429924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:15:25,319][07928] Avg episode reward: [(0, '26.843')]
+[2023-02-23 10:15:27,342][12586] Updated weights for policy 0, policy_version 2380 (0.0009)
+[2023-02-23 10:15:30,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 9781248. Throughput: 0: 2943.6. Samples: 2438866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:15:30,319][07928] Avg episode reward: [(0, '29.238')]
+[2023-02-23 10:15:30,858][12586] Updated weights for policy 0, policy_version 2390 (0.0010)
+[2023-02-23 10:15:34,373][12586] Updated weights for policy 0, policy_version 2400 (0.0009)
+[2023-02-23 10:15:35,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 9838592. Throughput: 0: 2941.5. Samples: 2456186. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:35,319][07928] Avg episode reward: [(0, '27.777')]
+[2023-02-23 10:15:37,827][12586] Updated weights for policy 0, policy_version 2410 (0.0010)
+[2023-02-23 10:15:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 9900032. Throughput: 0: 2943.9. Samples: 2474002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:40,319][07928] Avg episode reward: [(0, '25.937')]
+[2023-02-23 10:15:41,291][12586] Updated weights for policy 0, policy_version 2420 (0.0010)
+[2023-02-23 10:15:44,834][12586] Updated weights for policy 0, policy_version 2430 (0.0010)
+[2023-02-23 10:15:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 9957376. Throughput: 0: 2943.8. Samples: 2482882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:45,318][07928] Avg episode reward: [(0, '27.318')]
+[2023-02-23 10:15:48,325][12586] Updated weights for policy 0, policy_version 2440 (0.0009)
+[2023-02-23 10:15:50,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 10014720. Throughput: 0: 2944.5. Samples: 2500220. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:50,319][07928] Avg episode reward: [(0, '26.420')]
+[2023-02-23 10:15:51,782][12586] Updated weights for policy 0, policy_version 2450 (0.0010)
+[2023-02-23 10:15:55,191][12586] Updated weights for policy 0, policy_version 2460 (0.0009)
+[2023-02-23 10:15:55,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 10076160. Throughput: 0: 2945.8. Samples: 2518160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:15:55,318][07928] Avg episode reward: [(0, '25.901')]
+[2023-02-23 10:15:58,758][12586] Updated weights for policy 0, policy_version 2470 (0.0011)
+[2023-02-23 10:16:00,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10133504. Throughput: 0: 2942.1. Samples: 2526878. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:16:00,319][07928] Avg episode reward: [(0, '28.329')]
+[2023-02-23 10:16:02,262][12586] Updated weights for policy 0, policy_version 2480 (0.0010)
+[2023-02-23 10:16:05,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 10190848. Throughput: 0: 2945.6. Samples: 2544326. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:16:05,318][07928] Avg episode reward: [(0, '26.690')]
+[2023-02-23 10:16:05,699][12586] Updated weights for policy 0, policy_version 2490 (0.0010)
+[2023-02-23 10:16:09,133][12586] Updated weights for policy 0, policy_version 2500 (0.0010)
+[2023-02-23 10:16:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 10252288. Throughput: 0: 2942.0. Samples: 2562316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:16:10,318][07928] Avg episode reward: [(0, '25.406')]
+[2023-02-23 10:16:12,659][12586] Updated weights for policy 0, policy_version 2510 (0.0010)
+[2023-02-23 10:16:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10309632. Throughput: 0: 2935.8. Samples: 2570976. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:16:15,318][07928] Avg episode reward: [(0, '27.418')]
+[2023-02-23 10:16:16,186][12586] Updated weights for policy 0, policy_version 2520 (0.0009)
+[2023-02-23 10:16:19,676][12586] Updated weights for policy 0, policy_version 2530 (0.0010)
+[2023-02-23 10:16:20,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 10366976. Throughput: 0: 2941.4. Samples: 2588550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:16:20,319][07928] Avg episode reward: [(0, '27.411')]
+[2023-02-23 10:16:23,095][12586] Updated weights for policy 0, policy_version 2540 (0.0010)
+[2023-02-23 10:16:25,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10428416. Throughput: 0: 2942.4. Samples: 2606410. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:16:25,319][07928] Avg episode reward: [(0, '27.097')]
+[2023-02-23 10:16:26,602][12586] Updated weights for policy 0, policy_version 2550 (0.0010)
+[2023-02-23 10:16:30,085][12586] Updated weights for policy 0, policy_version 2560 (0.0010)
+[2023-02-23 10:16:30,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10485760. Throughput: 0: 2935.7. Samples: 2614990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:16:30,318][07928] Avg episode reward: [(0, '28.808')]
+[2023-02-23 10:16:33,532][12586] Updated weights for policy 0, policy_version 2570 (0.0010)
+[2023-02-23 10:16:35,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11732.6). Total num frames: 10547200. Throughput: 0: 2945.2. Samples: 2632756. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:16:35,319][07928] Avg episode reward: [(0, '29.456')]
+[2023-02-23 10:16:36,945][12586] Updated weights for policy 0, policy_version 2580 (0.0010)
+[2023-02-23 10:16:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 10604544. Throughput: 0: 2940.4. Samples: 2650480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:16:40,318][07928] Avg episode reward: [(0, '30.134')]
+[2023-02-23 10:16:40,500][12586] Updated weights for policy 0, policy_version 2590 (0.0010)
+[2023-02-23 10:16:44,041][12586] Updated weights for policy 0, policy_version 2600 (0.0009)
+[2023-02-23 10:16:45,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10661888. Throughput: 0: 2937.2. Samples: 2659052. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:16:45,318][07928] Avg episode reward: [(0, '27.245')]
+[2023-02-23 10:16:47,486][12586] Updated weights for policy 0, policy_version 2610 (0.0010)
+[2023-02-23 10:16:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11810.2, 300 sec: 11718.7). Total num frames: 10723328. Throughput: 0: 2946.4. Samples: 2676912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:16:50,318][07928] Avg episode reward: [(0, '25.823')]
+[2023-02-23 10:16:50,900][12586] Updated weights for policy 0, policy_version 2620 (0.0009)
+[2023-02-23 10:16:54,450][12586] Updated weights for policy 0, policy_version 2630 (0.0011)
+[2023-02-23 10:16:55,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 10780672. Throughput: 0: 2937.6. Samples: 2694508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:16:55,318][07928] Avg episode reward: [(0, '25.668')]
+[2023-02-23 10:16:57,962][12586] Updated weights for policy 0, policy_version 2640 (0.0010)
+[2023-02-23 10:17:00,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10838016. Throughput: 0: 2938.5. Samples: 2703208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:00,319][07928] Avg episode reward: [(0, '26.557')]
+[2023-02-23 10:17:01,414][12586] Updated weights for policy 0, policy_version 2650 (0.0009)
+[2023-02-23 10:17:04,866][12586] Updated weights for policy 0, policy_version 2660 (0.0009)
+[2023-02-23 10:17:05,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 10899456. Throughput: 0: 2943.5. Samples: 2721006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:05,319][07928] Avg episode reward: [(0, '28.197')]
+[2023-02-23 10:17:08,353][12586] Updated weights for policy 0, policy_version 2670 (0.0010)
+[2023-02-23 10:17:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 10956800. Throughput: 0: 2936.9. Samples: 2738572. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:10,319][07928] Avg episode reward: [(0, '29.509')]
+[2023-02-23 10:17:10,328][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002675_10956800.pth...
+[2023-02-23 10:17:10,387][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001988_8142848.pth
+[2023-02-23 10:17:11,908][12586] Updated weights for policy 0, policy_version 2680 (0.0010)
+[2023-02-23 10:17:15,318][07928] Fps is (10 sec: 11467.3, 60 sec: 11741.6, 300 sec: 11718.7). Total num frames: 11014144. Throughput: 0: 2940.3. Samples: 2747306. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:15,320][07928] Avg episode reward: [(0, '29.441')]
+[2023-02-23 10:17:15,346][12586] Updated weights for policy 0, policy_version 2690 (0.0010)
+[2023-02-23 10:17:18,850][12586] Updated weights for policy 0, policy_version 2700 (0.0010)
+[2023-02-23 10:17:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 11718.7). Total num frames: 11075584. Throughput: 0: 2940.9. Samples: 2765098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:20,319][07928] Avg episode reward: [(0, '27.494')]
+[2023-02-23 10:17:22,394][12586] Updated weights for policy 0, policy_version 2710 (0.0010)
+[2023-02-23 10:17:25,316][07928] Fps is (10 sec: 11880.1, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 11132928. Throughput: 0: 2931.4. Samples: 2782392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:25,319][07928] Avg episode reward: [(0, '27.068')]
+[2023-02-23 10:17:25,911][12586] Updated weights for policy 0, policy_version 2720 (0.0009)
+[2023-02-23 10:17:29,314][12586] Updated weights for policy 0, policy_version 2730 (0.0009)
+[2023-02-23 10:17:30,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11190272. Throughput: 0: 2938.5. Samples: 2791284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:30,319][07928] Avg episode reward: [(0, '27.498')]
+[2023-02-23 10:17:32,788][12586] Updated weights for policy 0, policy_version 2740 (0.0010)
+[2023-02-23 10:17:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 11251712. Throughput: 0: 2938.2. Samples: 2809132. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:35,319][07928] Avg episode reward: [(0, '25.918')]
+[2023-02-23 10:17:36,243][12586] Updated weights for policy 0, policy_version 2750 (0.0009)
+[2023-02-23 10:17:39,819][12586] Updated weights for policy 0, policy_version 2760 (0.0010)
+[2023-02-23 10:17:40,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11309056. Throughput: 0: 2933.4. Samples: 2826512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:17:40,319][07928] Avg episode reward: [(0, '25.305')]
+[2023-02-23 10:17:43,352][12586] Updated weights for policy 0, policy_version 2770 (0.0010)
+[2023-02-23 10:17:45,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11366400. Throughput: 0: 2935.6. Samples: 2835308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:45,319][07928] Avg episode reward: [(0, '28.672')]
+[2023-02-23 10:17:46,792][12586] Updated weights for policy 0, policy_version 2780 (0.0009)
+[2023-02-23 10:17:50,272][12586] Updated weights for policy 0, policy_version 2790 (0.0010)
+[2023-02-23 10:17:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 11427840. Throughput: 0: 2935.8. Samples: 2853118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:50,318][07928] Avg episode reward: [(0, '29.112')]
+[2023-02-23 10:17:53,874][12586] Updated weights for policy 0, policy_version 2800 (0.0010)
+[2023-02-23 10:17:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 11485184. Throughput: 0: 2927.6. Samples: 2870316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:17:55,318][07928] Avg episode reward: [(0, '25.457')]
+[2023-02-23 10:17:57,305][12586] Updated weights for policy 0, policy_version 2810 (0.0009)
+[2023-02-23 10:18:00,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11542528. Throughput: 0: 2932.8. Samples: 2879276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:00,318][07928] Avg episode reward: [(0, '26.369')]
+[2023-02-23 10:18:00,745][12586] Updated weights for policy 0, policy_version 2820 (0.0010)
+[2023-02-23 10:18:04,273][12586] Updated weights for policy 0, policy_version 2830 (0.0010)
+[2023-02-23 10:18:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 11599872. Throughput: 0: 2931.8. Samples: 2897028. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:05,318][07928] Avg episode reward: [(0, '28.658')]
+[2023-02-23 10:18:07,877][12586] Updated weights for policy 0, policy_version 2840 (0.0010)
+[2023-02-23 10:18:10,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11661312. Throughput: 0: 2934.5. Samples: 2914444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:18:10,318][07928] Avg episode reward: [(0, '33.540')]
+[2023-02-23 10:18:10,326][12572] Saving new best policy, reward=33.540!
+[2023-02-23 10:18:11,262][12586] Updated weights for policy 0, policy_version 2850 (0.0009)
+[2023-02-23 10:18:14,655][12586] Updated weights for policy 0, policy_version 2860 (0.0009)
+[2023-02-23 10:18:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11742.2, 300 sec: 11718.7). Total num frames: 11718656. Throughput: 0: 2934.7. Samples: 2923344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:15,318][07928] Avg episode reward: [(0, '32.987')]
+[2023-02-23 10:18:18,122][12586] Updated weights for policy 0, policy_version 2870 (0.0010)
+[2023-02-23 10:18:20,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 11780096. Throughput: 0: 2931.8. Samples: 2941064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:18:20,318][07928] Avg episode reward: [(0, '29.489')]
+[2023-02-23 10:18:21,734][12586] Updated weights for policy 0, policy_version 2880 (0.0011)
+[2023-02-23 10:18:25,203][12586] Updated weights for policy 0, policy_version 2890 (0.0009)
+[2023-02-23 10:18:25,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11837440. Throughput: 0: 2931.3. Samples: 2958422. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:25,319][07928] Avg episode reward: [(0, '29.296')]
+[2023-02-23 10:18:28,639][12586] Updated weights for policy 0, policy_version 2900 (0.0010)
+[2023-02-23 10:18:30,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 11894784. Throughput: 0: 2935.3. Samples: 2967396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:30,318][07928] Avg episode reward: [(0, '26.779')]
+[2023-02-23 10:18:32,175][12586] Updated weights for policy 0, policy_version 2910 (0.0010)
+[2023-02-23 10:18:35,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 11952128. Throughput: 0: 2925.3. Samples: 2984756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:35,318][07928] Avg episode reward: [(0, '27.933')]
+[2023-02-23 10:18:35,790][12586] Updated weights for policy 0, policy_version 2920 (0.0010)
+[2023-02-23 10:18:39,193][12586] Updated weights for policy 0, policy_version 2930 (0.0010)
+[2023-02-23 10:18:40,322][07928] Fps is (10 sec: 11871.6, 60 sec: 11740.8, 300 sec: 11718.5). Total num frames: 12013568. Throughput: 0: 2934.0. Samples: 3002364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:18:40,325][07928] Avg episode reward: [(0, '29.674')]
+[2023-02-23 10:18:42,671][12586] Updated weights for policy 0, policy_version 2940 (0.0009)
+[2023-02-23 10:18:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 12070912. Throughput: 0: 2932.7. Samples: 3011248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:45,319][07928] Avg episode reward: [(0, '30.354')]
+[2023-02-23 10:18:46,173][12586] Updated weights for policy 0, policy_version 2950 (0.0010)
+[2023-02-23 10:18:49,731][12586] Updated weights for policy 0, policy_version 2960 (0.0009)
+[2023-02-23 10:18:50,316][07928] Fps is (10 sec: 11475.4, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 12128256. Throughput: 0: 2925.4. Samples: 3028670. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:18:50,319][07928] Avg episode reward: [(0, '29.000')]
+[2023-02-23 10:18:53,188][12586] Updated weights for policy 0, policy_version 2970 (0.0010)
+[2023-02-23 10:18:55,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 12189696. Throughput: 0: 2930.9. Samples: 3046336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:18:55,319][07928] Avg episode reward: [(0, '27.217')]
+[2023-02-23 10:18:56,654][12586] Updated weights for policy 0, policy_version 2980 (0.0009)
+[2023-02-23 10:19:00,115][12586] Updated weights for policy 0, policy_version 2990 (0.0009)
+[2023-02-23 10:19:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11732.6). Total num frames: 12247040. Throughput: 0: 2931.0. Samples: 3055240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:19:00,319][07928] Avg episode reward: [(0, '29.113')]
+[2023-02-23 10:19:03,730][12586] Updated weights for policy 0, policy_version 3000 (0.0011)
+[2023-02-23 10:19:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11746.5). Total num frames: 12304384. Throughput: 0: 2923.7. Samples: 3072630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:19:05,319][07928] Avg episode reward: [(0, '30.398')]
+[2023-02-23 10:19:07,191][12586] Updated weights for policy 0, policy_version 3010 (0.0010)
+[2023-02-23 10:19:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11746.5). Total num frames: 12365824. Throughput: 0: 2928.5. Samples: 3090206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:19:10,318][07928] Avg episode reward: [(0, '27.805')]
+[2023-02-23 10:19:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003019_12365824.pth...
+[2023-02-23 10:19:10,390][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002331_9547776.pth
+[2023-02-23 10:19:10,659][12586] Updated weights for policy 0, policy_version 3020 (0.0011)
+[2023-02-23 10:19:14,281][12586] Updated weights for policy 0, policy_version 3030 (0.0009)
+[2023-02-23 10:19:15,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 12419072. Throughput: 0: 2919.4. Samples: 3098768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:19:15,319][07928] Avg episode reward: [(0, '27.242')]
+[2023-02-23 10:19:18,021][12586] Updated weights for policy 0, policy_version 3040 (0.0010)
+[2023-02-23 10:19:20,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11605.3, 300 sec: 11732.6). Total num frames: 12476416. Throughput: 0: 2902.3. Samples: 3115360. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:20,319][07928] Avg episode reward: [(0, '26.474')]
+[2023-02-23 10:19:21,630][12586] Updated weights for policy 0, policy_version 3050 (0.0009)
+[2023-02-23 10:19:25,148][12586] Updated weights for policy 0, policy_version 3060 (0.0010)
+[2023-02-23 10:19:25,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 12533760. Throughput: 0: 2897.6. Samples: 3132740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:25,319][07928] Avg episode reward: [(0, '26.344')]
+[2023-02-23 10:19:28,572][12586] Updated weights for policy 0, policy_version 3070 (0.0010)
+[2023-02-23 10:19:30,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 12591104. Throughput: 0: 2898.4. Samples: 3141676. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:30,320][07928] Avg episode reward: [(0, '29.466')]
+[2023-02-23 10:19:32,156][12586] Updated weights for policy 0, policy_version 3080 (0.0011)
+[2023-02-23 10:19:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 12652544. Throughput: 0: 2896.0. Samples: 3158992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:35,318][07928] Avg episode reward: [(0, '31.520')]
+[2023-02-23 10:19:35,669][12586] Updated weights for policy 0, policy_version 3090 (0.0009)
+[2023-02-23 10:19:39,082][12586] Updated weights for policy 0, policy_version 3100 (0.0010)
+[2023-02-23 10:19:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11606.4, 300 sec: 11718.7). Total num frames: 12709888. Throughput: 0: 2899.7. Samples: 3176824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:19:40,318][07928] Avg episode reward: [(0, '31.629')]
+[2023-02-23 10:19:42,508][12586] Updated weights for policy 0, policy_version 3110 (0.0010)
+[2023-02-23 10:19:45,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 12767232. Throughput: 0: 2899.8. Samples: 3185732. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:19:45,319][07928] Avg episode reward: [(0, '30.374')]
+[2023-02-23 10:19:46,123][12586] Updated weights for policy 0, policy_version 3120 (0.0010)
+[2023-02-23 10:19:49,602][12586] Updated weights for policy 0, policy_version 3130 (0.0010)
+[2023-02-23 10:19:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 12828672. Throughput: 0: 2897.8. Samples: 3203032. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:50,318][07928] Avg episode reward: [(0, '30.721')]
+[2023-02-23 10:19:53,062][12586] Updated weights for policy 0, policy_version 3140 (0.0011)
+[2023-02-23 10:19:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 12886016. Throughput: 0: 2905.0. Samples: 3220932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:19:55,319][07928] Avg episode reward: [(0, '30.237')]
+[2023-02-23 10:19:56,488][12586] Updated weights for policy 0, policy_version 3150 (0.0010)
+[2023-02-23 10:20:00,095][12586] Updated weights for policy 0, policy_version 3160 (0.0011)
+[2023-02-23 10:20:00,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 12943360. Throughput: 0: 2909.6. Samples: 3229702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:00,319][07928] Avg episode reward: [(0, '29.583')]
+[2023-02-23 10:20:03,522][12586] Updated weights for policy 0, policy_version 3170 (0.0010)
+[2023-02-23 10:20:05,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11673.6, 300 sec: 11732.6). Total num frames: 13004800. Throughput: 0: 2929.9. Samples: 3247206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:20:05,319][07928] Avg episode reward: [(0, '30.188')]
+[2023-02-23 10:20:07,005][12586] Updated weights for policy 0, policy_version 3180 (0.0011)
+[2023-02-23 10:20:10,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11605.3, 300 sec: 11718.7). Total num frames: 13062144. Throughput: 0: 2938.4. Samples: 3264966. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:10,319][07928] Avg episode reward: [(0, '28.855')]
+[2023-02-23 10:20:10,458][12586] Updated weights for policy 0, policy_version 3190 (0.0009)
+[2023-02-23 10:20:14,020][12586] Updated weights for policy 0, policy_version 3200 (0.0010)
+[2023-02-23 10:20:15,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 13119488. Throughput: 0: 2932.7. Samples: 3273646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:15,318][07928] Avg episode reward: [(0, '26.008')]
+[2023-02-23 10:20:17,473][12586] Updated weights for policy 0, policy_version 3210 (0.0009)
+[2023-02-23 10:20:20,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13180928. Throughput: 0: 2938.3. Samples: 3291214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:20,318][07928] Avg episode reward: [(0, '27.272')]
+[2023-02-23 10:20:21,001][12586] Updated weights for policy 0, policy_version 3220 (0.0010)
+[2023-02-23 10:20:24,468][12586] Updated weights for policy 0, policy_version 3230 (0.0010)
+[2023-02-23 10:20:25,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13238272. Throughput: 0: 2933.3. Samples: 3308822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:20:25,319][07928] Avg episode reward: [(0, '30.163')]
+[2023-02-23 10:20:28,011][12586] Updated weights for policy 0, policy_version 3240 (0.0011)
+[2023-02-23 10:20:30,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13295616. Throughput: 0: 2927.3. Samples: 3317462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:30,318][07928] Avg episode reward: [(0, '28.624')]
+[2023-02-23 10:20:31,464][12586] Updated weights for policy 0, policy_version 3250 (0.0010)
+[2023-02-23 10:20:34,920][12586] Updated weights for policy 0, policy_version 3260 (0.0009)
+[2023-02-23 10:20:35,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13357056. Throughput: 0: 2937.9. Samples: 3335238. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:20:35,319][07928] Avg episode reward: [(0, '29.683')]
+[2023-02-23 10:20:38,389][12586] Updated weights for policy 0, policy_version 3270 (0.0010)
+[2023-02-23 10:20:40,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13414400. Throughput: 0: 2933.4. Samples: 3352936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:20:40,319][07928] Avg episode reward: [(0, '30.999')]
+[2023-02-23 10:20:41,940][12586] Updated weights for policy 0, policy_version 3280 (0.0011)
+[2023-02-23 10:20:45,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13471744. Throughput: 0: 2929.8. Samples: 3361544. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:45,318][07928] Avg episode reward: [(0, '31.855')]
+[2023-02-23 10:20:45,422][12586] Updated weights for policy 0, policy_version 3290 (0.0010)
+[2023-02-23 10:20:48,891][12586] Updated weights for policy 0, policy_version 3300 (0.0010)
+[2023-02-23 10:20:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13533184. Throughput: 0: 2934.5. Samples: 3379260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:20:50,320][07928] Avg episode reward: [(0, '33.253')]
+[2023-02-23 10:20:52,396][12586] Updated weights for policy 0, policy_version 3310 (0.0010)
+[2023-02-23 10:20:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13590528. Throughput: 0: 2931.4. Samples: 3396880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:20:55,319][07928] Avg episode reward: [(0, '32.637')]
+[2023-02-23 10:20:55,906][12586] Updated weights for policy 0, policy_version 3320 (0.0010)
+[2023-02-23 10:20:59,384][12586] Updated weights for policy 0, policy_version 3330 (0.0011)
+[2023-02-23 10:21:00,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13647872. Throughput: 0: 2929.4. Samples: 3405470. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:00,318][07928] Avg episode reward: [(0, '34.519')]
+[2023-02-23 10:21:00,327][12572] Saving new best policy, reward=34.519!
+[2023-02-23 10:21:02,875][12586] Updated weights for policy 0, policy_version 3340 (0.0010)
+[2023-02-23 10:21:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 13705216. Throughput: 0: 2931.0. Samples: 3423108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:05,319][07928] Avg episode reward: [(0, '33.124')]
+[2023-02-23 10:21:06,387][12586] Updated weights for policy 0, policy_version 3350 (0.0010)
+[2023-02-23 10:21:09,890][12586] Updated weights for policy 0, policy_version 3360 (0.0011)
+[2023-02-23 10:21:10,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13766656. Throughput: 0: 2929.3. Samples: 3440642. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:10,318][07928] Avg episode reward: [(0, '30.978')]
+[2023-02-23 10:21:10,328][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003361_13766656.pth...
+[2023-02-23 10:21:10,328][07928] Components not started: RolloutWorker_w1, RolloutWorker_w2, RolloutWorker_w4, RolloutWorker_w7, wait_time=1200.0 seconds
+[2023-02-23 10:21:10,393][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002675_10956800.pth
+[2023-02-23 10:21:13,442][12586] Updated weights for policy 0, policy_version 3370 (0.0010)
+[2023-02-23 10:21:15,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 13824000. Throughput: 0: 2930.4. Samples: 3449330. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:15,319][07928] Avg episode reward: [(0, '31.329')]
+[2023-02-23 10:21:16,809][12586] Updated weights for policy 0, policy_version 3380 (0.0010)
+[2023-02-23 10:21:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 13881344. Throughput: 0: 2932.0. Samples: 3467176. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2023-02-23 10:21:20,319][07928] Avg episode reward: [(0, '31.064')]
+[2023-02-23 10:21:20,334][12586] Updated weights for policy 0, policy_version 3390 (0.0010)
+[2023-02-23 10:21:23,893][12586] Updated weights for policy 0, policy_version 3400 (0.0009)
+[2023-02-23 10:21:25,316][07928] Fps is (10 sec: 11468.6, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 13938688. Throughput: 0: 2922.2. Samples: 3484436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:21:25,319][07928] Avg episode reward: [(0, '31.357')]
+[2023-02-23 10:21:27,475][12586] Updated weights for policy 0, policy_version 3410 (0.0010)
+[2023-02-23 10:21:30,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 14000128. Throughput: 0: 2926.4. Samples: 3493234. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:30,319][07928] Avg episode reward: [(0, '32.724')]
+[2023-02-23 10:21:30,893][12586] Updated weights for policy 0, policy_version 3420 (0.0010)
+[2023-02-23 10:21:34,278][12586] Updated weights for policy 0, policy_version 3430 (0.0010)
+[2023-02-23 10:21:35,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14057472. Throughput: 0: 2932.5. Samples: 3511224. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:35,319][07928] Avg episode reward: [(0, '29.417')]
+[2023-02-23 10:21:37,855][12586] Updated weights for policy 0, policy_version 3440 (0.0009)
+[2023-02-23 10:21:40,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14114816. Throughput: 0: 2925.1. Samples: 3528510. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:40,319][07928] Avg episode reward: [(0, '30.837')]
+[2023-02-23 10:21:41,364][12586] Updated weights for policy 0, policy_version 3450 (0.0009)
+[2023-02-23 10:21:44,801][12586] Updated weights for policy 0, policy_version 3460 (0.0010)
+[2023-02-23 10:21:45,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14176256. Throughput: 0: 2931.7. Samples: 3537396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:21:45,319][07928] Avg episode reward: [(0, '34.276')]
+[2023-02-23 10:21:48,227][12586] Updated weights for policy 0, policy_version 3470 (0.0009)
+[2023-02-23 10:21:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 14233600. Throughput: 0: 2935.6. Samples: 3555210. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:21:50,318][07928] Avg episode reward: [(0, '35.306')]
+[2023-02-23 10:21:50,341][12572] Saving new best policy, reward=35.306!
+[2023-02-23 10:21:51,764][12586] Updated weights for policy 0, policy_version 3480 (0.0010)
+[2023-02-23 10:21:55,271][12586] Updated weights for policy 0, policy_version 3490 (0.0010)
+[2023-02-23 10:21:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 14295040. Throughput: 0: 2932.2. Samples: 3572592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:21:55,318][07928] Avg episode reward: [(0, '32.189')]
+[2023-02-23 10:21:58,694][12586] Updated weights for policy 0, policy_version 3500 (0.0009)
+[2023-02-23 10:22:00,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14352384. Throughput: 0: 2938.8. Samples: 3581574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:22:00,319][07928] Avg episode reward: [(0, '28.716')]
+[2023-02-23 10:22:02,141][12586] Updated weights for policy 0, policy_version 3510 (0.0009)
+[2023-02-23 10:22:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14409728. Throughput: 0: 2937.6. Samples: 3599368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:22:05,319][07928] Avg episode reward: [(0, '28.258')]
+[2023-02-23 10:22:05,692][12586] Updated weights for policy 0, policy_version 3520 (0.0010)
+[2023-02-23 10:22:09,260][12586] Updated weights for policy 0, policy_version 3530 (0.0010)
+[2023-02-23 10:22:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.8). Total num frames: 14471168. Throughput: 0: 2937.4. Samples: 3616620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:22:10,318][07928] Avg episode reward: [(0, '29.778')]
+[2023-02-23 10:22:12,666][12586] Updated weights for policy 0, policy_version 3540 (0.0010)
+[2023-02-23 10:22:15,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 14528512. Throughput: 0: 2940.4. Samples: 3625554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:22:15,319][07928] Avg episode reward: [(0, '29.483')]
+[2023-02-23 10:22:16,113][12586] Updated weights for policy 0, policy_version 3550 (0.0010)
+[2023-02-23 10:22:19,687][12586] Updated weights for policy 0, policy_version 3560 (0.0009)
+[2023-02-23 10:22:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14585856. Throughput: 0: 2934.9. Samples: 3643296. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:22:20,319][07928] Avg episode reward: [(0, '30.962')]
+[2023-02-23 10:22:23,228][12586] Updated weights for policy 0, policy_version 3570 (0.0010)
+[2023-02-23 10:22:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14643200. Throughput: 0: 2936.8. Samples: 3660668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:22:25,319][07928] Avg episode reward: [(0, '31.122')]
+[2023-02-23 10:22:26,679][12586] Updated weights for policy 0, policy_version 3580 (0.0010)
+[2023-02-23 10:22:30,176][12586] Updated weights for policy 0, policy_version 3590 (0.0010)
+[2023-02-23 10:22:30,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14704640. Throughput: 0: 2934.1. Samples: 3669430. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:22:30,319][07928] Avg episode reward: [(0, '30.864')]
+[2023-02-23 10:22:33,622][12586] Updated weights for policy 0, policy_version 3600 (0.0010)
+[2023-02-23 10:22:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14761984. Throughput: 0: 2930.5. Samples: 3687082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:22:35,319][07928] Avg episode reward: [(0, '30.638')]
+[2023-02-23 10:22:37,259][12586] Updated weights for policy 0, policy_version 3610 (0.0010)
+[2023-02-23 10:22:40,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14819328. Throughput: 0: 2932.3. Samples: 3704548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:22:40,319][07928] Avg episode reward: [(0, '30.407')]
+[2023-02-23 10:22:40,742][12586] Updated weights for policy 0, policy_version 3620 (0.0010)
+[2023-02-23 10:22:44,149][12586] Updated weights for policy 0, policy_version 3630 (0.0009)
+[2023-02-23 10:22:45,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 14880768. Throughput: 0: 2930.6. Samples: 3713450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:22:45,318][07928] Avg episode reward: [(0, '31.235')]
+[2023-02-23 10:22:47,630][12586] Updated weights for policy 0, policy_version 3640 (0.0010)
+[2023-02-23 10:22:50,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 14938112. Throughput: 0: 2923.0. Samples: 3730902. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:22:50,318][07928] Avg episode reward: [(0, '30.184')]
+[2023-02-23 10:22:51,189][12586] Updated weights for policy 0, policy_version 3650 (0.0009)
+[2023-02-23 10:22:54,633][12586] Updated weights for policy 0, policy_version 3660 (0.0010)
+[2023-02-23 10:22:55,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 14999552. Throughput: 0: 2933.1. Samples: 3748610. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:22:55,319][07928] Avg episode reward: [(0, '28.302')]
+[2023-02-23 10:22:58,063][12586] Updated weights for policy 0, policy_version 3670 (0.0009)
+[2023-02-23 10:23:00,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 11718.7). Total num frames: 15056896. Throughput: 0: 2935.8. Samples: 3757664. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:00,319][07928] Avg episode reward: [(0, '28.515')]
+[2023-02-23 10:23:01,514][12586] Updated weights for policy 0, policy_version 3680 (0.0010)
+[2023-02-23 10:23:05,150][12586] Updated weights for policy 0, policy_version 3690 (0.0010)
+[2023-02-23 10:23:05,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.8, 300 sec: 11704.8). Total num frames: 15114240. Throughput: 0: 2927.9. Samples: 3775050. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:23:05,319][07928] Avg episode reward: [(0, '29.399')]
+[2023-02-23 10:23:08,618][12586] Updated weights for policy 0, policy_version 3700 (0.0010)
+[2023-02-23 10:23:10,316][07928] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 15175680. Throughput: 0: 2935.1. Samples: 3792746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:10,318][07928] Avg episode reward: [(0, '29.447')]
+[2023-02-23 10:23:10,326][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003705_15175680.pth...
+[2023-02-23 10:23:10,384][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003019_12365824.pth
+[2023-02-23 10:23:12,040][12586] Updated weights for policy 0, policy_version 3710 (0.0009)
+[2023-02-23 10:23:15,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15233024. Throughput: 0: 2938.5. Samples: 3801662. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:15,319][07928] Avg episode reward: [(0, '27.997')]
+[2023-02-23 10:23:15,558][12586] Updated weights for policy 0, policy_version 3720 (0.0010)
+[2023-02-23 10:23:19,170][12586] Updated weights for policy 0, policy_version 3730 (0.0011)
+[2023-02-23 10:23:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15290368. Throughput: 0: 2929.0. Samples: 3818888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:23:20,319][07928] Avg episode reward: [(0, '28.780')]
+[2023-02-23 10:23:22,617][12586] Updated weights for policy 0, policy_version 3740 (0.0009)
+[2023-02-23 10:23:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15347712. Throughput: 0: 2938.2. Samples: 3836766. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:25,319][07928] Avg episode reward: [(0, '30.954')]
+[2023-02-23 10:23:26,014][12586] Updated weights for policy 0, policy_version 3750 (0.0010)
+[2023-02-23 10:23:29,490][12586] Updated weights for policy 0, policy_version 3760 (0.0010)
+[2023-02-23 10:23:30,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 15409152. Throughput: 0: 2937.1. Samples: 3845620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:23:30,320][07928] Avg episode reward: [(0, '30.788')]
+[2023-02-23 10:23:33,106][12586] Updated weights for policy 0, policy_version 3770 (0.0010)
+[2023-02-23 10:23:35,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11705.1). Total num frames: 15466496. Throughput: 0: 2930.9. Samples: 3862790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:35,319][07928] Avg episode reward: [(0, '29.478')]
+[2023-02-23 10:23:36,547][12586] Updated weights for policy 0, policy_version 3780 (0.0009)
+[2023-02-23 10:23:40,020][12586] Updated weights for policy 0, policy_version 3790 (0.0009)
+[2023-02-23 10:23:40,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15523840. Throughput: 0: 2937.1. Samples: 3880778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:40,318][07928] Avg episode reward: [(0, '28.195')]
+[2023-02-23 10:23:43,517][12586] Updated weights for policy 0, policy_version 3800 (0.0009)
+[2023-02-23 10:23:45,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11718.7). Total num frames: 15585280. Throughput: 0: 2931.7. Samples: 3889590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:23:45,319][07928] Avg episode reward: [(0, '31.478')]
+[2023-02-23 10:23:47,062][12586] Updated weights for policy 0, policy_version 3810 (0.0010)
+[2023-02-23 10:23:50,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15642624. Throughput: 0: 2928.2. Samples: 3906820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:23:50,319][07928] Avg episode reward: [(0, '31.893')]
+[2023-02-23 10:23:50,524][12586] Updated weights for policy 0, policy_version 3820 (0.0010)
+[2023-02-23 10:23:53,960][12586] Updated weights for policy 0, policy_version 3830 (0.0009)
+[2023-02-23 10:23:55,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11673.6, 300 sec: 11704.8). Total num frames: 15699968. Throughput: 0: 2932.4. Samples: 3924704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:23:55,319][07928] Avg episode reward: [(0, '31.387')]
+[2023-02-23 10:23:57,447][12586] Updated weights for policy 0, policy_version 3840 (0.0010)
+[2023-02-23 10:24:00,317][07928] Fps is (10 sec: 11877.9, 60 sec: 11741.8, 300 sec: 11718.7). Total num frames: 15761408. Throughput: 0: 2930.0. Samples: 3933514. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:24:00,321][07928] Avg episode reward: [(0, '32.249')]
+[2023-02-23 10:24:01,036][12586] Updated weights for policy 0, policy_version 3850 (0.0011)
+[2023-02-23 10:24:04,463][12586] Updated weights for policy 0, policy_version 3860 (0.0009)
+[2023-02-23 10:24:05,316][07928] Fps is (10 sec: 11878.3, 60 sec: 11741.9, 300 sec: 11704.8). Total num frames: 15818752. Throughput: 0: 2932.8. Samples: 3950866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:05,319][07928] Avg episode reward: [(0, '32.389')]
+[2023-02-23 10:24:07,955][12586] Updated weights for policy 0, policy_version 3870 (0.0011)
+[2023-02-23 10:24:10,316][07928] Fps is (10 sec: 11469.2, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 15876096. Throughput: 0: 2929.9. Samples: 3968612. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:24:10,319][07928] Avg episode reward: [(0, '31.641')]
+[2023-02-23 10:24:11,441][12586] Updated weights for policy 0, policy_version 3880 (0.0009)
+[2023-02-23 10:24:15,024][12586] Updated weights for policy 0, policy_version 3890 (0.0010)
+[2023-02-23 10:24:15,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 15933440. Throughput: 0: 2927.1. Samples: 3977338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:15,319][07928] Avg episode reward: [(0, '31.428')]
+[2023-02-23 10:24:18,503][12586] Updated weights for policy 0, policy_version 3900 (0.0010)
+[2023-02-23 10:24:20,316][07928] Fps is (10 sec: 11878.2, 60 sec: 11741.8, 300 sec: 11732.6). Total num frames: 15994880. Throughput: 0: 2932.2. Samples: 3994740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:20,318][07928] Avg episode reward: [(0, '30.172')]
+[2023-02-23 10:24:22,113][12586] Updated weights for policy 0, policy_version 3910 (0.0010)
+[2023-02-23 10:24:25,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11673.6, 300 sec: 11718.7). Total num frames: 16048128. Throughput: 0: 2908.8. Samples: 4011674. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:24:25,319][07928] Avg episode reward: [(0, '30.851')]
+[2023-02-23 10:24:25,766][12586] Updated weights for policy 0, policy_version 3920 (0.0009)
+[2023-02-23 10:24:29,573][12586] Updated weights for policy 0, policy_version 3930 (0.0010)
+[2023-02-23 10:24:30,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 16105472. Throughput: 0: 2894.3. Samples: 4019832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:30,319][07928] Avg episode reward: [(0, '30.528')]
+[2023-02-23 10:24:33,165][12586] Updated weights for policy 0, policy_version 3940 (0.0009)
+[2023-02-23 10:24:35,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 16162816. Throughput: 0: 2884.4. Samples: 4036620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:35,318][07928] Avg episode reward: [(0, '28.011')]
+[2023-02-23 10:24:36,709][12586] Updated weights for policy 0, policy_version 3950 (0.0009)
+[2023-02-23 10:24:40,151][12586] Updated weights for policy 0, policy_version 3960 (0.0009)
+[2023-02-23 10:24:40,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11605.3, 300 sec: 11704.8). Total num frames: 16220160. Throughput: 0: 2881.4. Samples: 4054366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:24:40,318][07928] Avg episode reward: [(0, '29.105')]
+[2023-02-23 10:24:43,746][12586] Updated weights for policy 0, policy_version 3970 (0.0010)
+[2023-02-23 10:24:45,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11691.0). Total num frames: 16277504. Throughput: 0: 2874.9. Samples: 4062882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:24:45,319][07928] Avg episode reward: [(0, '31.402')]
+[2023-02-23 10:24:47,318][12586] Updated weights for policy 0, policy_version 3980 (0.0009)
+[2023-02-23 10:24:50,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11537.0, 300 sec: 11691.0). Total num frames: 16334848. Throughput: 0: 2872.6. Samples: 4080132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:24:50,319][07928] Avg episode reward: [(0, '31.379')]
+[2023-02-23 10:24:50,821][12586] Updated weights for policy 0, policy_version 3990 (0.0010)
+[2023-02-23 10:24:54,337][12586] Updated weights for policy 0, policy_version 4000 (0.0010)
+[2023-02-23 10:24:55,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11537.1, 300 sec: 11691.0). Total num frames: 16392192. Throughput: 0: 2868.0. Samples: 4097672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:24:55,319][07928] Avg episode reward: [(0, '31.139')]
+[2023-02-23 10:24:57,912][12586] Updated weights for policy 0, policy_version 4010 (0.0010)
+[2023-02-23 10:25:00,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11468.9, 300 sec: 11677.1). Total num frames: 16449536. Throughput: 0: 2863.9. Samples: 4106212. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:25:00,318][07928] Avg episode reward: [(0, '29.004')]
+[2023-02-23 10:25:01,369][12586] Updated weights for policy 0, policy_version 4020 (0.0009)
+[2023-02-23 10:25:04,857][12586] Updated weights for policy 0, policy_version 4030 (0.0009)
+[2023-02-23 10:25:05,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11537.1, 300 sec: 11691.0). Total num frames: 16510976. Throughput: 0: 2869.7. Samples: 4123878. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:25:05,318][07928] Avg episode reward: [(0, '30.233')]
+[2023-02-23 10:25:08,369][12586] Updated weights for policy 0, policy_version 4040 (0.0010)
+[2023-02-23 10:25:10,316][07928] Fps is (10 sec: 11878.4, 60 sec: 11537.0, 300 sec: 11691.0). Total num frames: 16568320. Throughput: 0: 2882.0. Samples: 4141364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:25:10,319][07928] Avg episode reward: [(0, '32.020')]
+[2023-02-23 10:25:10,328][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004045_16568320.pth...
+[2023-02-23 10:25:10,390][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003361_13766656.pth
+[2023-02-23 10:25:11,965][12586] Updated weights for policy 0, policy_version 4050 (0.0011)
+[2023-02-23 10:25:15,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11537.1, 300 sec: 11677.1). Total num frames: 16625664. Throughput: 0: 2892.8. Samples: 4150010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:25:15,318][07928] Avg episode reward: [(0, '30.166')]
+[2023-02-23 10:25:15,408][12586] Updated weights for policy 0, policy_version 4060 (0.0010)
+[2023-02-23 10:25:18,919][12586] Updated weights for policy 0, policy_version 4070 (0.0010)
+[2023-02-23 10:25:20,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11677.1). Total num frames: 16683008. Throughput: 0: 2910.7. Samples: 4167602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:25:20,318][07928] Avg episode reward: [(0, '28.119')]
+[2023-02-23 10:25:22,629][12586] Updated weights for policy 0, policy_version 4080 (0.0010)
+[2023-02-23 10:25:25,316][07928] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 11677.1). Total num frames: 16740352. Throughput: 0: 2885.1. Samples: 4184196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:25:25,319][07928] Avg episode reward: [(0, '30.226')]
+[2023-02-23 10:25:26,387][12586] Updated weights for policy 0, policy_version 4090 (0.0010)
+[2023-02-23 10:25:30,051][12586] Updated weights for policy 0, policy_version 4100 (0.0010)
+[2023-02-23 10:25:30,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11468.8, 300 sec: 11649.3). Total num frames: 16793600. Throughput: 0: 2882.6. Samples: 4192600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:25:30,318][07928] Avg episode reward: [(0, '29.269')]
+[2023-02-23 10:25:33,785][12586] Updated weights for policy 0, policy_version 4110 (0.0010)
+[2023-02-23 10:25:35,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11468.8, 300 sec: 11649.3). Total num frames: 16850944. Throughput: 0: 2866.2. Samples: 4209112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:25:35,319][07928] Avg episode reward: [(0, '27.494')]
+[2023-02-23 10:25:37,472][12586] Updated weights for policy 0, policy_version 4120 (0.0010)
+[2023-02-23 10:25:40,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11400.5, 300 sec: 11635.4). Total num frames: 16904192. Throughput: 0: 2838.4. Samples: 4225402. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:25:40,318][07928] Avg episode reward: [(0, '29.727')]
+[2023-02-23 10:25:41,231][12586] Updated weights for policy 0, policy_version 4130 (0.0010)
+[2023-02-23 10:25:44,898][12586] Updated weights for policy 0, policy_version 4140 (0.0009)
+[2023-02-23 10:25:45,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11400.5, 300 sec: 11621.5). Total num frames: 16961536. Throughput: 0: 2835.3. Samples: 4233800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:25:45,319][07928] Avg episode reward: [(0, '30.776')]
+[2023-02-23 10:25:48,596][12586] Updated weights for policy 0, policy_version 4150 (0.0009)
+[2023-02-23 10:25:50,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11332.3, 300 sec: 11607.6). Total num frames: 17014784. Throughput: 0: 2810.2. Samples: 4250336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:25:50,319][07928] Avg episode reward: [(0, '31.474')]
+[2023-02-23 10:25:52,337][12586] Updated weights for policy 0, policy_version 4160 (0.0011)
+[2023-02-23 10:25:55,316][07928] Fps is (10 sec: 10649.5, 60 sec: 11264.0, 300 sec: 11593.8). Total num frames: 17068032. Throughput: 0: 2784.4. Samples: 4266662. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2023-02-23 10:25:55,318][07928] Avg episode reward: [(0, '29.910')]
+[2023-02-23 10:25:56,121][12586] Updated weights for policy 0, policy_version 4170 (0.0010)
+[2023-02-23 10:25:59,763][12586] Updated weights for policy 0, policy_version 4180 (0.0010)
+[2023-02-23 10:26:00,316][07928] Fps is (10 sec: 11059.0, 60 sec: 11264.0, 300 sec: 11593.8). Total num frames: 17125376. Throughput: 0: 2779.9. Samples: 4275108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:00,319][07928] Avg episode reward: [(0, '28.301')]
+[2023-02-23 10:26:03,505][12586] Updated weights for policy 0, policy_version 4190 (0.0010)
+[2023-02-23 10:26:05,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11127.5, 300 sec: 11566.0). Total num frames: 17178624. Throughput: 0: 2756.8. Samples: 4291656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:26:05,318][07928] Avg episode reward: [(0, '28.822')]
+[2023-02-23 10:26:07,273][12586] Updated weights for policy 0, policy_version 4200 (0.0010)
+[2023-02-23 10:26:10,316][07928] Fps is (10 sec: 11059.4, 60 sec: 11127.5, 300 sec: 11566.0). Total num frames: 17235968. Throughput: 0: 2752.4. Samples: 4308056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:26:10,319][07928] Avg episode reward: [(0, '29.432')]
+[2023-02-23 10:26:10,980][12586] Updated weights for policy 0, policy_version 4210 (0.0010)
+[2023-02-23 10:26:14,689][12586] Updated weights for policy 0, policy_version 4220 (0.0009)
+[2023-02-23 10:26:15,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11059.2, 300 sec: 11552.1). Total num frames: 17289216. Throughput: 0: 2750.4. Samples: 4316370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:26:15,318][07928] Avg episode reward: [(0, '33.160')]
+[2023-02-23 10:26:18,409][12586] Updated weights for policy 0, policy_version 4230 (0.0010)
+[2023-02-23 10:26:20,316][07928] Fps is (10 sec: 10649.5, 60 sec: 10990.9, 300 sec: 11538.2). Total num frames: 17342464. Throughput: 0: 2749.6. Samples: 4332842. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:20,319][07928] Avg episode reward: [(0, '33.464')]
+[2023-02-23 10:26:22,227][12586] Updated weights for policy 0, policy_version 4240 (0.0010)
+[2023-02-23 10:26:25,316][07928] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11524.3). Total num frames: 17399808. Throughput: 0: 2750.9. Samples: 4349194. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:25,319][07928] Avg episode reward: [(0, '32.479')]
+[2023-02-23 10:26:25,891][12586] Updated weights for policy 0, policy_version 4250 (0.0010)
+[2023-02-23 10:26:29,557][12586] Updated weights for policy 0, policy_version 4260 (0.0010)
+[2023-02-23 10:26:30,316][07928] Fps is (10 sec: 11468.8, 60 sec: 11059.2, 300 sec: 11524.3). Total num frames: 17457152. Throughput: 0: 2750.0. Samples: 4357550. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:26:30,319][07928] Avg episode reward: [(0, '31.179')]
+[2023-02-23 10:26:33,291][12586] Updated weights for policy 0, policy_version 4270 (0.0010)
+[2023-02-23 10:26:35,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11510.5). Total num frames: 17510400. Throughput: 0: 2750.4. Samples: 4374102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:26:35,319][07928] Avg episode reward: [(0, '29.562')]
+[2023-02-23 10:26:37,133][12586] Updated weights for policy 0, policy_version 4280 (0.0010)
+[2023-02-23 10:26:40,316][07928] Fps is (10 sec: 10649.7, 60 sec: 10990.9, 300 sec: 11482.7). Total num frames: 17563648. Throughput: 0: 2754.4. Samples: 4390610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:26:40,319][07928] Avg episode reward: [(0, '31.225')]
+[2023-02-23 10:26:40,773][12586] Updated weights for policy 0, policy_version 4290 (0.0009)
+[2023-02-23 10:26:44,424][12586] Updated weights for policy 0, policy_version 4300 (0.0009)
+[2023-02-23 10:26:45,316][07928] Fps is (10 sec: 11059.3, 60 sec: 10990.9, 300 sec: 11482.7). Total num frames: 17620992. Throughput: 0: 2754.3. Samples: 4399052. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:45,319][07928] Avg episode reward: [(0, '31.434')]
+[2023-02-23 10:26:48,249][12586] Updated weights for policy 0, policy_version 4310 (0.0011)
+[2023-02-23 10:26:50,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11454.9). Total num frames: 17674240. Throughput: 0: 2748.0. Samples: 4415314. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:50,319][07928] Avg episode reward: [(0, '32.764')]
+[2023-02-23 10:26:51,973][12586] Updated weights for policy 0, policy_version 4320 (0.0010)
+[2023-02-23 10:26:55,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 11454.9). Total num frames: 17731584. Throughput: 0: 2751.9. Samples: 4431892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:26:55,318][07928] Avg episode reward: [(0, '31.208')]
+[2023-02-23 10:26:55,663][12586] Updated weights for policy 0, policy_version 4330 (0.0010)
+[2023-02-23 10:26:59,335][12586] Updated weights for policy 0, policy_version 4340 (0.0010)
+[2023-02-23 10:27:00,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10991.0, 300 sec: 11441.0). Total num frames: 17784832. Throughput: 0: 2755.4. Samples: 4440364. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:00,319][07928] Avg episode reward: [(0, '29.610')]
+[2023-02-23 10:27:03,102][12586] Updated weights for policy 0, policy_version 4350 (0.0011)
+[2023-02-23 10:27:05,316][07928] Fps is (10 sec: 10649.6, 60 sec: 10990.9, 300 sec: 11413.3). Total num frames: 17838080. Throughput: 0: 2749.3. Samples: 4456560. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:05,319][07928] Avg episode reward: [(0, '30.766')]
+[2023-02-23 10:27:06,849][12586] Updated weights for policy 0, policy_version 4360 (0.0010)
+[2023-02-23 10:27:10,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11413.3). Total num frames: 17895424. Throughput: 0: 2756.9. Samples: 4473256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:27:10,318][07928] Avg episode reward: [(0, '30.851')]
+[2023-02-23 10:27:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004369_17895424.pth...
+[2023-02-23 10:27:10,383][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003705_15175680.pth
+[2023-02-23 10:27:10,530][12586] Updated weights for policy 0, policy_version 4370 (0.0010)
+[2023-02-23 10:27:14,226][12586] Updated weights for policy 0, policy_version 4380 (0.0009)
+[2023-02-23 10:27:15,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11399.4). Total num frames: 17948672. Throughput: 0: 2755.4. Samples: 4481542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:27:15,319][07928] Avg episode reward: [(0, '31.678')]
+[2023-02-23 10:27:18,077][12586] Updated weights for policy 0, policy_version 4390 (0.0010)
+[2023-02-23 10:27:20,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 11399.4). Total num frames: 18006016. Throughput: 0: 2747.7. Samples: 4497748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:27:20,319][07928] Avg episode reward: [(0, '29.802')]
+[2023-02-23 10:27:21,789][12586] Updated weights for policy 0, policy_version 4400 (0.0010)
+[2023-02-23 10:27:25,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11371.6). Total num frames: 18059264. Throughput: 0: 2751.9. Samples: 4514444. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:27:25,319][07928] Avg episode reward: [(0, '30.004')]
+[2023-02-23 10:27:25,352][12586] Updated weights for policy 0, policy_version 4410 (0.0010)
+[2023-02-23 10:27:28,901][12586] Updated weights for policy 0, policy_version 4420 (0.0010)
+[2023-02-23 10:27:30,316][07928] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11371.6). Total num frames: 18116608. Throughput: 0: 2759.5. Samples: 4523230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:30,319][07928] Avg episode reward: [(0, '30.341')]
+[2023-02-23 10:27:32,607][12586] Updated weights for policy 0, policy_version 4430 (0.0010)
+[2023-02-23 10:27:35,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11059.2, 300 sec: 11371.6). Total num frames: 18173952. Throughput: 0: 2769.0. Samples: 4539918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:35,319][07928] Avg episode reward: [(0, '34.548')]
+[2023-02-23 10:27:36,077][12586] Updated weights for policy 0, policy_version 4440 (0.0008)
+[2023-02-23 10:27:39,536][12586] Updated weights for policy 0, policy_version 4450 (0.0010)
+[2023-02-23 10:27:40,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11195.7, 300 sec: 11371.6). Total num frames: 18235392. Throughput: 0: 2798.5. Samples: 4557826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:27:40,319][07928] Avg episode reward: [(0, '34.705')]
+[2023-02-23 10:27:43,050][12586] Updated weights for policy 0, policy_version 4460 (0.0009)
+[2023-02-23 10:27:45,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11195.7, 300 sec: 11371.6). Total num frames: 18292736. Throughput: 0: 2805.0. Samples: 4566590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:45,319][07928] Avg episode reward: [(0, '33.618')]
+[2023-02-23 10:27:46,728][12586] Updated weights for policy 0, policy_version 4470 (0.0010)
+[2023-02-23 10:27:50,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11343.8). Total num frames: 18345984. Throughput: 0: 2821.5. Samples: 4583526. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:27:50,319][07928] Avg episode reward: [(0, '31.906')]
+[2023-02-23 10:27:50,338][12586] Updated weights for policy 0, policy_version 4480 (0.0009)
+[2023-02-23 10:27:53,837][12586] Updated weights for policy 0, policy_version 4490 (0.0009)
+[2023-02-23 10:27:55,331][07928] Fps is (10 sec: 11452.2, 60 sec: 11261.3, 300 sec: 11357.2). Total num frames: 18407424. Throughput: 0: 2836.6. Samples: 4600942. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:27:55,333][07928] Avg episode reward: [(0, '30.245')]
+[2023-02-23 10:27:57,354][12586] Updated weights for policy 0, policy_version 4500 (0.0010)
+[2023-02-23 10:28:00,316][07928] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 11343.8). Total num frames: 18460672. Throughput: 0: 2844.9. Samples: 4609564. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:28:00,319][07928] Avg episode reward: [(0, '28.957')]
+[2023-02-23 10:28:01,061][12586] Updated weights for policy 0, policy_version 4510 (0.0010)
+[2023-02-23 10:28:04,519][12586] Updated weights for policy 0, policy_version 4520 (0.0011)
+[2023-02-23 10:28:05,316][07928] Fps is (10 sec: 11485.5, 60 sec: 11400.5, 300 sec: 11343.8). Total num frames: 18522112. Throughput: 0: 2866.1. Samples: 4626724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:28:05,319][07928] Avg episode reward: [(0, '28.097')]
+[2023-02-23 10:28:08,030][12586] Updated weights for policy 0, policy_version 4530 (0.0010)
+[2023-02-23 10:28:10,316][07928] Fps is (10 sec: 11878.5, 60 sec: 11400.5, 300 sec: 11343.8). Total num frames: 18579456. Throughput: 0: 2886.3. Samples: 4644328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:28:10,319][07928] Avg episode reward: [(0, '29.610')]
+[2023-02-23 10:28:11,617][12586] Updated weights for policy 0, policy_version 4540 (0.0010)
+[2023-02-23 10:28:15,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 11330.0). Total num frames: 18632704. Throughput: 0: 2872.1. Samples: 4652476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:28:15,319][07928] Avg episode reward: [(0, '32.062')]
+[2023-02-23 10:28:15,488][12586] Updated weights for policy 0, policy_version 4550 (0.0011)
+[2023-02-23 10:28:19,182][12586] Updated weights for policy 0, policy_version 4560 (0.0011)
+[2023-02-23 10:28:20,316][07928] Fps is (10 sec: 11059.0, 60 sec: 11400.5, 300 sec: 11329.9). Total num frames: 18690048. Throughput: 0: 2860.3. Samples: 4668632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:28:20,319][07928] Avg episode reward: [(0, '32.626')]
+[2023-02-23 10:28:22,880][12586] Updated weights for policy 0, policy_version 4570 (0.0009)
+[2023-02-23 10:28:25,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 11302.2). Total num frames: 18743296. Throughput: 0: 2835.1. Samples: 4685406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:28:25,319][07928] Avg episode reward: [(0, '31.837')]
+[2023-02-23 10:28:26,573][12586] Updated weights for policy 0, policy_version 4580 (0.0010)
+[2023-02-23 10:28:30,316][07928] Fps is (10 sec: 10649.7, 60 sec: 11332.3, 300 sec: 11288.3). Total num frames: 18796544. Throughput: 0: 2820.9. Samples: 4693530. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:28:30,319][07928] Avg episode reward: [(0, '30.619')]
+[2023-02-23 10:28:30,354][12586] Updated weights for policy 0, policy_version 4590 (0.0010)
+[2023-02-23 10:28:34,027][12586] Updated weights for policy 0, policy_version 4600 (0.0010)
+[2023-02-23 10:28:35,316][07928] Fps is (10 sec: 11059.0, 60 sec: 11332.3, 300 sec: 11288.3). Total num frames: 18853888. Throughput: 0: 2811.5. Samples: 4710046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:28:35,318][07928] Avg episode reward: [(0, '30.610')]
+[2023-02-23 10:28:37,735][12586] Updated weights for policy 0, policy_version 4610 (0.0011)
+[2023-02-23 10:28:40,316][07928] Fps is (10 sec: 11059.2, 60 sec: 11195.7, 300 sec: 11260.5). Total num frames: 18907136. Throughput: 0: 2794.3. Samples: 4726646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:28:40,319][07928] Avg episode reward: [(0, '32.121')]
+[2023-02-23 10:28:41,476][12586] Updated weights for policy 0, policy_version 4620 (0.0010)
+[2023-02-23 10:28:45,279][12586] Updated weights for policy 0, policy_version 4630 (0.0010)
+[2023-02-23 10:28:45,316][07928] Fps is (10 sec: 11059.4, 60 sec: 11195.7, 300 sec: 11260.5). Total num frames: 18964480. Throughput: 0: 2780.0. Samples: 4734664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:28:45,319][07928] Avg episode reward: [(0, '30.664')]
+[2023-02-23 10:28:48,930][12586] Updated weights for policy 0, policy_version 4640 (0.0009)
+[2023-02-23 10:28:50,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11195.7, 300 sec: 11246.6). Total num frames: 19017728. Throughput: 0: 2768.9. Samples: 4751326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:28:50,318][07928] Avg episode reward: [(0, '30.023')]
+[2023-02-23 10:28:52,567][12586] Updated weights for policy 0, policy_version 4650 (0.0009)
+[2023-02-23 10:28:55,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11130.1, 300 sec: 11232.8). Total num frames: 19075072. Throughput: 0: 2748.3. Samples: 4768004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:28:55,319][07928] Avg episode reward: [(0, '32.518')]
+[2023-02-23 10:28:56,387][12586] Updated weights for policy 0, policy_version 4660 (0.0010)
+[2023-02-23 10:29:00,147][12586] Updated weights for policy 0, policy_version 4670 (0.0010)
+[2023-02-23 10:29:00,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 19128320. Throughput: 0: 2744.5. Samples: 4775978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:29:00,318][07928] Avg episode reward: [(0, '31.227')]
+[2023-02-23 10:29:03,794][12586] Updated weights for policy 0, policy_version 4680 (0.0010)
+[2023-02-23 10:29:05,316][07928] Fps is (10 sec: 11059.3, 60 sec: 11059.2, 300 sec: 11218.9). Total num frames: 19185664. Throughput: 0: 2754.9. Samples: 4792602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:29:05,318][07928] Avg episode reward: [(0, '29.746')]
+[2023-02-23 10:29:07,432][12586] Updated weights for policy 0, policy_version 4690 (0.0010)
+[2023-02-23 10:29:10,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10990.9, 300 sec: 11205.0). Total num frames: 19238912. Throughput: 0: 2751.9. Samples: 4809242. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:29:10,318][07928] Avg episode reward: [(0, '30.221')]
+[2023-02-23 10:29:10,327][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004697_19238912.pth...
+[2023-02-23 10:29:10,387][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004045_16568320.pth
+[2023-02-23 10:29:11,228][12586] Updated weights for policy 0, policy_version 4700 (0.0010)
+[2023-02-23 10:29:14,937][12586] Updated weights for policy 0, policy_version 4710 (0.0010)
+[2023-02-23 10:29:15,316][07928] Fps is (10 sec: 10649.6, 60 sec: 10990.9, 300 sec: 11177.2). Total num frames: 19292160. Throughput: 0: 2751.6. Samples: 4817352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:15,319][07928] Avg episode reward: [(0, '28.669')]
+[2023-02-23 10:29:18,698][12586] Updated weights for policy 0, policy_version 4720 (0.0010)
+[2023-02-23 10:29:20,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10991.0, 300 sec: 11191.1). Total num frames: 19349504. Throughput: 0: 2752.0. Samples: 4833886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:29:20,318][07928] Avg episode reward: [(0, '28.059')]
+[2023-02-23 10:29:22,328][12586] Updated weights for policy 0, policy_version 4730 (0.0010)
+[2023-02-23 10:29:25,316][07928] Fps is (10 sec: 11059.1, 60 sec: 10990.9, 300 sec: 11177.2). Total num frames: 19402752. Throughput: 0: 2749.9. Samples: 4850392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:25,320][07928] Avg episode reward: [(0, '29.552')]
+[2023-02-23 10:29:26,141][12586] Updated weights for policy 0, policy_version 4740 (0.0010)
+[2023-02-23 10:29:29,941][12586] Updated weights for policy 0, policy_version 4750 (0.0010)
+[2023-02-23 10:29:30,316][07928] Fps is (10 sec: 11059.1, 60 sec: 11059.2, 300 sec: 11177.2). Total num frames: 19460096. Throughput: 0: 2753.4. Samples: 4858568. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:29:30,319][07928] Avg episode reward: [(0, '32.846')]
+[2023-02-23 10:29:33,714][12586] Updated weights for policy 0, policy_version 4760 (0.0010)
+[2023-02-23 10:29:35,316][07928] Fps is (10 sec: 11059.3, 60 sec: 10991.0, 300 sec: 11163.3). Total num frames: 19513344. Throughput: 0: 2743.4. Samples: 4874778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:29:35,318][07928] Avg episode reward: [(0, '31.482')]
+[2023-02-23 10:29:37,563][12586] Updated weights for policy 0, policy_version 4770 (0.0010)
+[2023-02-23 10:29:40,316][07928] Fps is (10 sec: 10239.9, 60 sec: 10922.7, 300 sec: 11135.6). Total num frames: 19562496. Throughput: 0: 2721.5. Samples: 4890470. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:40,318][07928] Avg episode reward: [(0, '30.771')]
+[2023-02-23 10:29:41,497][12586] Updated weights for policy 0, policy_version 4780 (0.0010)
+[2023-02-23 10:29:45,211][12586] Updated weights for policy 0, policy_version 4790 (0.0010)
+[2023-02-23 10:29:45,316][07928] Fps is (10 sec: 10649.6, 60 sec: 10922.7, 300 sec: 11135.6). Total num frames: 19619840. Throughput: 0: 2725.0. Samples: 4898604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:45,318][07928] Avg episode reward: [(0, '32.326')]
+[2023-02-23 10:29:48,917][12586] Updated weights for policy 0, policy_version 4800 (0.0010)
+[2023-02-23 10:29:50,316][07928] Fps is (10 sec: 11059.4, 60 sec: 10922.7, 300 sec: 11121.7). Total num frames: 19673088. Throughput: 0: 2723.1. Samples: 4915140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:50,318][07928] Avg episode reward: [(0, '34.064')]
+[2023-02-23 10:29:52,655][12586] Updated weights for policy 0, policy_version 4810 (0.0011)
+[2023-02-23 10:29:55,316][07928] Fps is (10 sec: 11059.2, 60 sec: 10922.7, 300 sec: 11121.7). Total num frames: 19730432. Throughput: 0: 2716.2. Samples: 4931472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:29:55,318][07928] Avg episode reward: [(0, '32.391')]
+[2023-02-23 10:29:56,413][12586] Updated weights for policy 0, policy_version 4820 (0.0010)
+[2023-02-23 10:30:00,144][12586] Updated weights for policy 0, policy_version 4830 (0.0009)
+[2023-02-23 10:30:00,316][07928] Fps is (10 sec: 11059.1, 60 sec: 10922.7, 300 sec: 11093.9). Total num frames: 19783680. Throughput: 0: 2721.5. Samples: 4939818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2023-02-23 10:30:00,318][07928] Avg episode reward: [(0, '32.075')]
+[2023-02-23 10:30:03,782][12586] Updated weights for policy 0, policy_version 4840 (0.0011)
+[2023-02-23 10:30:05,316][07928] Fps is (10 sec: 11059.1, 60 sec: 10922.6, 300 sec: 11093.9). Total num frames: 19841024. Throughput: 0: 2725.5. Samples: 4956534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:30:05,318][07928] Avg episode reward: [(0, '32.543')]
+[2023-02-23 10:30:07,567][12586] Updated weights for policy 0, policy_version 4850 (0.0010)
+[2023-02-23 10:30:10,316][07928] Fps is (10 sec: 11059.3, 60 sec: 10922.7, 300 sec: 11080.0). Total num frames: 19894272. Throughput: 0: 2718.7. Samples: 4972732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2023-02-23 10:30:10,319][07928] Avg episode reward: [(0, '32.644')]
+[2023-02-23 10:30:11,317][12586] Updated weights for policy 0, policy_version 4860 (0.0010)
+[2023-02-23 10:30:15,019][12586] Updated weights for policy 0, policy_version 4870 (0.0010)
+[2023-02-23 10:30:15,316][07928] Fps is (10 sec: 10649.7, 60 sec: 10922.7, 300 sec: 11066.1). Total num frames: 19947520. Throughput: 0: 2721.8. Samples: 4981048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2023-02-23 10:30:15,319][07928] Avg episode reward: [(0, '33.707')]
+[2023-02-23 10:30:18,749][12586] Updated weights for policy 0, policy_version 4880 (0.0010)
+[2023-02-23 10:30:20,242][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2023-02-23 10:30:20,243][07928] Component Batcher_0 stopped!
+[2023-02-23 10:30:20,246][07928] Component RolloutWorker_w1 process died already! Don't wait for it.
+[2023-02-23 10:30:20,243][12572] Stopping Batcher_0...
+[2023-02-23 10:30:20,249][12572] Loop batcher_evt_loop terminating...
+[2023-02-23 10:30:20,249][07928] Component RolloutWorker_w2 process died already! Don't wait for it.
+[2023-02-23 10:30:20,251][07928] Component RolloutWorker_w4 process died already! Don't wait for it.
+[2023-02-23 10:30:20,253][07928] Component RolloutWorker_w7 process died already! Don't wait for it.
+[2023-02-23 10:30:20,255][12607] Stopping RolloutWorker_w6...
+[2023-02-23 10:30:20,255][12607] Loop rollout_proc6_evt_loop terminating...
+[2023-02-23 10:30:20,256][07928] Component RolloutWorker_w6 stopped!
+[2023-02-23 10:30:20,257][12588] Stopping RolloutWorker_w0...
+[2023-02-23 10:30:20,258][12588] Loop rollout_proc0_evt_loop terminating...
+[2023-02-23 10:30:20,258][07928] Component RolloutWorker_w0 stopped!
+[2023-02-23 10:30:20,259][12586] Weights refcount: 2 0
+[2023-02-23 10:30:20,261][12586] Stopping InferenceWorker_p0-w0...
+[2023-02-23 10:30:20,262][12586] Loop inference_proc0-0_evt_loop terminating...
+[2023-02-23 10:30:20,261][07928] Component InferenceWorker_p0-w0 stopped!
+[2023-02-23 10:30:20,265][12608] Stopping RolloutWorker_w5...
+[2023-02-23 10:30:20,265][12608] Loop rollout_proc5_evt_loop terminating...
+[2023-02-23 10:30:20,265][07928] Component RolloutWorker_w5 stopped!
+[2023-02-23 10:30:20,272][12590] Stopping RolloutWorker_w3...
+[2023-02-23 10:30:20,273][12590] Loop rollout_proc3_evt_loop terminating...
+[2023-02-23 10:30:20,272][07928] Component RolloutWorker_w3 stopped!
+[2023-02-23 10:30:20,307][12572] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004369_17895424.pth
+[2023-02-23 10:30:20,313][12572] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2023-02-23 10:30:20,392][12572] Stopping LearnerWorker_p0...
+[2023-02-23 10:30:20,393][12572] Loop learner_proc0_evt_loop terminating...
+[2023-02-23 10:30:20,393][07928] Component LearnerWorker_p0 stopped!
+[2023-02-23 10:30:20,396][07928] Waiting for process learner_proc0 to stop...
+[2023-02-23 10:30:21,886][07928] Waiting for process inference_proc0-0 to join...
+[2023-02-23 10:30:21,889][07928] Waiting for process rollout_proc0 to join...
+[2023-02-23 10:30:21,891][07928] Waiting for process rollout_proc1 to join...
+[2023-02-23 10:30:21,893][07928] Waiting for process rollout_proc2 to join...
+[2023-02-23 10:30:21,895][07928] Waiting for process rollout_proc3 to join...
+[2023-02-23 10:30:21,897][07928] Waiting for process rollout_proc4 to join...
+[2023-02-23 10:30:21,898][07928] Waiting for process rollout_proc5 to join...
+[2023-02-23 10:30:21,901][07928] Waiting for process rollout_proc6 to join...
+[2023-02-23 10:30:21,902][07928] Waiting for process rollout_proc7 to join...
+[2023-02-23 10:30:21,904][07928] Batcher 0 profile tree view:
+batching: 74.3165, releasing_batches: 0.1040
+[2023-02-23 10:30:21,906][07928] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 25.5259
+update_model: 23.4186
+  weight_update: 0.0009
+one_step: 0.0024
+  handle_policy_step: 1588.9123
+    deserialize: 50.7068, stack: 9.6286, obs_to_device_normalize: 364.3744, forward: 754.5875, send_messages: 95.9341
+    prepare_outputs: 236.1393
+      to_cpu: 145.2426
+[2023-02-23 10:30:21,907][07928] Learner 0 profile tree view:
+misc: 0.0290, prepare_batch: 34.3051
+train: 89.8894
+  epoch_init: 0.0258, minibatch_init: 0.0256, losses_postprocess: 2.3762, kl_divergence: 2.8221, after_optimizer: 6.4989
+  calculate_losses: 35.8647
+    losses_init: 0.0151, forward_head: 5.0676, bptt_initial: 15.4692, tail: 2.8205, advantages_returns: 0.7707, losses: 4.8844
+    bptt: 6.0336
+      bptt_forward_core: 5.7997
+  update: 40.6677
+    clip: 5.0121
+[2023-02-23 10:30:21,909][07928] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 1.4081, enqueue_policy_requests: 68.0882, env_step: 1057.1057, overhead: 94.3103, complete_rollouts: 2.2493
+save_policy_outputs: 76.6386
+  split_output_tensors: 37.6363
+[2023-02-23 10:30:21,911][07928] Loop Runner_EvtLoop terminating...
+[2023-02-23 10:30:21,914][07928] Runner profile tree view:
+main_loop: 1746.8731
+[2023-02-23 10:30:21,915][07928] Collected {0: 20004864}, FPS: 11451.8
+[2023-02-23 10:34:04,080][07928] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-02-23 10:34:04,082][07928] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-02-23 10:34:04,084][07928] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-02-23 10:34:04,085][07928] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-02-23 10:34:04,087][07928] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-02-23 10:34:04,089][07928] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-02-23 10:34:04,090][07928] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2023-02-23 10:34:04,092][07928] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-02-23 10:34:04,093][07928] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2023-02-23 10:34:04,095][07928] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2023-02-23 10:34:04,096][07928] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-02-23 10:34:04,098][07928] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-02-23 10:34:04,099][07928] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-02-23 10:34:04,101][07928] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-02-23 10:34:04,102][07928] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-02-23 10:34:04,120][07928] Doom resolution: 160x120, resize resolution: (128, 72)
+[2023-02-23 10:34:04,123][07928] RunningMeanStd input shape: (3, 72, 128)
+[2023-02-23 10:34:04,127][07928] RunningMeanStd input shape: (1,)
+[2023-02-23 10:34:04,146][07928] ConvEncoder: input_channels=3
+[2023-02-23 10:34:04,977][07928] Conv encoder output size: 512
+[2023-02-23 10:34:04,981][07928] Policy head output size: 512
+[2023-02-23 10:34:07,930][07928] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2023-02-23 10:34:09,832][07928] Num frames 100...
+[2023-02-23 10:34:09,956][07928] Num frames 200...
+[2023-02-23 10:34:10,082][07928] Num frames 300...
+[2023-02-23 10:34:10,211][07928] Num frames 400...
+[2023-02-23 10:34:10,336][07928] Num frames 500...
+[2023-02-23 10:34:10,479][07928] Avg episode rewards: #0: 11.690, true rewards: #0: 5.690
+[2023-02-23 10:34:10,481][07928] Avg episode reward: 11.690, avg true_objective: 5.690
+[2023-02-23 10:34:10,520][07928] Num frames 600...
+[2023-02-23 10:34:10,647][07928] Num frames 700...
+[2023-02-23 10:34:10,773][07928] Num frames 800...
+[2023-02-23 10:34:10,902][07928] Num frames 900...
+[2023-02-23 10:34:11,078][07928] Avg episode rewards: #0: 9.985, true rewards: #0: 4.985
+[2023-02-23 10:34:11,080][07928] Avg episode reward: 9.985, avg true_objective: 4.985
+[2023-02-23 10:34:11,085][07928] Num frames 1000...
+[2023-02-23 10:34:11,202][07928] Num frames 1100...
+[2023-02-23 10:34:11,315][07928] Num frames 1200...
+[2023-02-23 10:34:11,434][07928] Num frames 1300...
+[2023-02-23 10:34:11,547][07928] Num frames 1400...
+[2023-02-23 10:34:11,661][07928] Num frames 1500...
+[2023-02-23 10:34:11,778][07928] Num frames 1600...
+[2023-02-23 10:34:11,897][07928] Num frames 1700...
+[2023-02-23 10:34:12,020][07928] Num frames 1800...
+[2023-02-23 10:34:12,143][07928] Num frames 1900...
+[2023-02-23 10:34:12,290][07928] Num frames 2000...
+[2023-02-23 10:34:12,415][07928] Num frames 2100...
+[2023-02-23 10:34:12,539][07928] Num frames 2200...
+[2023-02-23 10:34:12,664][07928] Num frames 2300...
+[2023-02-23 10:34:12,793][07928] Num frames 2400...
+[2023-02-23 10:34:12,917][07928] Num frames 2500...
+[2023-02-23 10:34:13,041][07928] Num frames 2600...
+[2023-02-23 10:34:13,166][07928] Num frames 2700...
+[2023-02-23 10:34:13,293][07928] Num frames 2800...
+[2023-02-23 10:34:13,415][07928] Num frames 2900...
+[2023-02-23 10:34:13,557][07928] Avg episode rewards: #0: 23.906, true rewards: #0: 9.907
+[2023-02-23 10:34:13,560][07928] Avg episode reward: 23.906, avg true_objective: 9.907
+[2023-02-23 10:34:13,595][07928] Num frames 3000...
+[2023-02-23 10:34:13,712][07928] Num frames 3100...
+[2023-02-23 10:34:13,826][07928] Num frames 3200...
+[2023-02-23 10:34:13,940][07928] Num frames 3300...
+[2023-02-23 10:34:14,053][07928] Num frames 3400...
+[2023-02-23 10:34:14,171][07928] Num frames 3500...
+[2023-02-23 10:34:14,290][07928] Num frames 3600...
+[2023-02-23 10:34:14,410][07928] Num frames 3700...
+[2023-02-23 10:34:14,534][07928] Num frames 3800...
+[2023-02-23 10:34:14,657][07928] Num frames 3900...
+[2023-02-23 10:34:14,778][07928] Num frames 4000...
+[2023-02-23 10:34:14,900][07928] Num frames 4100...
+[2023-02-23 10:34:15,020][07928] Num frames 4200...
+[2023-02-23 10:34:15,142][07928] Num frames 4300...
+[2023-02-23 10:34:15,287][07928] Num frames 4400...
+[2023-02-23 10:34:15,411][07928] Num frames 4500...
+[2023-02-23 10:34:15,534][07928] Num frames 4600...
+[2023-02-23 10:34:15,662][07928] Num frames 4700...
+[2023-02-23 10:34:15,787][07928] Num frames 4800...
+[2023-02-23 10:34:15,910][07928] Num frames 4900...
+[2023-02-23 10:34:16,034][07928] Num frames 5000...
+[2023-02-23 10:34:16,174][07928] Avg episode rewards: #0: 32.679, true rewards: #0: 12.680
+[2023-02-23 10:34:16,176][07928] Avg episode reward: 32.679, avg true_objective: 12.680
+[2023-02-23 10:34:16,210][07928] Num frames 5100...
+[2023-02-23 10:34:16,332][07928] Num frames 5200...
+[2023-02-23 10:34:16,447][07928] Num frames 5300...
+[2023-02-23 10:34:16,561][07928] Num frames 5400...
+[2023-02-23 10:34:16,678][07928] Num frames 5500...
+[2023-02-23 10:34:16,793][07928] Num frames 5600...
+[2023-02-23 10:34:16,908][07928] Num frames 5700...
+[2023-02-23 10:34:17,027][07928] Num frames 5800...
+[2023-02-23 10:34:17,146][07928] Num frames 5900...
+[2023-02-23 10:34:17,266][07928] Num frames 6000...
+[2023-02-23 10:34:17,389][07928] Num frames 6100...
+[2023-02-23 10:34:17,512][07928] Num frames 6200...
+[2023-02-23 10:34:17,632][07928] Num frames 6300...
+[2023-02-23 10:34:17,799][07928] Avg episode rewards: #0: 32.966, true rewards: #0: 12.766
+[2023-02-23 10:34:17,801][07928] Avg episode reward: 32.966, avg true_objective: 12.766
+[2023-02-23 10:34:17,825][07928] Num frames 6400...
+[2023-02-23 10:34:17,944][07928] Num frames 6500...
+[2023-02-23 10:34:18,065][07928] Num frames 6600...
+[2023-02-23 10:34:18,181][07928] Num frames 6700...
+[2023-02-23 10:34:18,305][07928] Num frames 6800...
+[2023-02-23 10:34:18,428][07928] Num frames 6900...
+[2023-02-23 10:34:18,553][07928] Num frames 7000...
+[2023-02-23 10:34:18,668][07928] Num frames 7100...
+[2023-02-23 10:34:18,783][07928] Num frames 7200...
+[2023-02-23 10:34:18,899][07928] Num frames 7300...
+[2023-02-23 10:34:19,018][07928] Num frames 7400...
+[2023-02-23 10:34:19,136][07928] Num frames 7500...
+[2023-02-23 10:34:19,255][07928] Num frames 7600...
+[2023-02-23 10:34:19,381][07928] Num frames 7700...
+[2023-02-23 10:34:19,511][07928] Num frames 7800...
+[2023-02-23 10:34:19,641][07928] Num frames 7900...
+[2023-02-23 10:34:19,769][07928] Num frames 8000...
+[2023-02-23 10:34:19,901][07928] Avg episode rewards: #0: 35.266, true rewards: #0: 13.433
+[2023-02-23 10:34:19,903][07928] Avg episode reward: 35.266, avg true_objective: 13.433
+[2023-02-23 10:34:19,958][07928] Num frames 8100...
+[2023-02-23 10:34:20,085][07928] Num frames 8200...
+[2023-02-23 10:34:20,212][07928] Num frames 8300...
+[2023-02-23 10:34:20,330][07928] Num frames 8400...
+[2023-02-23 10:34:20,451][07928] Num frames 8500...
+[2023-02-23 10:34:20,578][07928] Num frames 8600...
+[2023-02-23 10:34:20,699][07928] Num frames 8700...
+[2023-02-23 10:34:20,821][07928] Num frames 8800...
+[2023-02-23 10:34:20,941][07928] Num frames 8900...
+[2023-02-23 10:34:21,055][07928] Num frames 9000...
+[2023-02-23 10:34:21,172][07928] Num frames 9100...
+[2023-02-23 10:34:21,291][07928] Num frames 9200...
+[2023-02-23 10:34:21,409][07928] Num frames 9300...
+[2023-02-23 10:34:21,523][07928] Num frames 9400...
+[2023-02-23 10:34:21,638][07928] Num frames 9500...
+[2023-02-23 10:34:21,760][07928] Num frames 9600...
+[2023-02-23 10:34:21,884][07928] Num frames 9700...
+[2023-02-23 10:34:22,005][07928] Num frames 9800...
+[2023-02-23 10:34:22,131][07928] Num frames 9900...
+[2023-02-23 10:34:22,255][07928] Num frames 10000...
+[2023-02-23 10:34:22,375][07928] Num frames 10100...
+[2023-02-23 10:34:22,506][07928] Avg episode rewards: #0: 38.942, true rewards: #0: 14.514
+[2023-02-23 10:34:22,508][07928] Avg episode reward: 38.942, avg true_objective: 14.514
+[2023-02-23 10:34:22,558][07928] Num frames 10200...
+[2023-02-23 10:34:22,677][07928] Num frames 10300...
+[2023-02-23 10:34:22,797][07928] Num frames 10400...
+[2023-02-23 10:34:22,915][07928] Num frames 10500...
+[2023-02-23 10:34:23,037][07928] Num frames 10600...
+[2023-02-23 10:34:23,162][07928] Num frames 10700...
+[2023-02-23 10:34:23,228][07928] Avg episode rewards: #0: 35.759, true rewards: #0: 13.385
+[2023-02-23 10:34:23,230][07928] Avg episode reward: 35.759, avg true_objective: 13.385
+[2023-02-23 10:34:23,344][07928] Num frames 10800...
+[2023-02-23 10:34:23,465][07928] Num frames 10900...
+[2023-02-23 10:34:23,582][07928] Num frames 11000...
+[2023-02-23 10:34:23,696][07928] Num frames 11100...
+[2023-02-23 10:34:23,807][07928] Num frames 11200...
+[2023-02-23 10:34:23,921][07928] Num frames 11300...
+[2023-02-23 10:34:24,033][07928] Num frames 11400...
+[2023-02-23 10:34:24,152][07928] Num frames 11500...
+[2023-02-23 10:34:24,271][07928] Num frames 11600...
+[2023-02-23 10:34:24,390][07928] Num frames 11700...
+[2023-02-23 10:34:24,524][07928] Num frames 11800...
+[2023-02-23 10:34:24,648][07928] Num frames 11900...
+[2023-02-23 10:34:24,775][07928] Num frames 12000...
+[2023-02-23 10:34:24,897][07928] Num frames 12100...
+[2023-02-23 10:34:25,025][07928] Num frames 12200...
+[2023-02-23 10:34:25,151][07928] Num frames 12300...
+[2023-02-23 10:34:25,276][07928] Num frames 12400...
+[2023-02-23 10:34:25,403][07928] Num frames 12500...
+[2023-02-23 10:34:25,533][07928] Num frames 12600...
+[2023-02-23 10:34:25,661][07928] Num frames 12700...
+[2023-02-23 10:34:25,782][07928] Num frames 12800...
+[2023-02-23 10:34:25,848][07928] Avg episode rewards: #0: 38.675, true rewards: #0: 14.231
+[2023-02-23 10:34:25,850][07928] Avg episode reward: 38.675, avg true_objective: 14.231
+[2023-02-23 10:34:25,963][07928] Num frames 12900...
+[2023-02-23 10:34:26,075][07928] Num frames 13000...
+[2023-02-23 10:34:26,190][07928] Num frames 13100...
+[2023-02-23 10:34:26,302][07928] Num frames 13200...
+[2023-02-23 10:34:26,416][07928] Num frames 13300...
+[2023-02-23 10:34:26,532][07928] Num frames 13400...
+[2023-02-23 10:34:26,650][07928] Num frames 13500...
+[2023-02-23 10:34:26,767][07928] Num frames 13600...
+[2023-02-23 10:34:26,905][07928] Avg episode rewards: #0: 36.571, true rewards: #0: 13.672
+[2023-02-23 10:34:26,908][07928] Avg episode reward: 36.571, avg true_objective: 13.672
+[2023-02-23 10:34:59,650][07928] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2023-02-23 10:40:24,704][07928] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2023-02-23 10:40:24,706][07928] Overriding arg 'num_workers' with value 1 passed from command line
+[2023-02-23 10:40:24,707][07928] Adding new argument 'no_render'=True that is not in the saved config file!
+[2023-02-23 10:40:24,710][07928] Adding new argument 'save_video'=True that is not in the saved config file!
+[2023-02-23 10:40:24,711][07928] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2023-02-23 10:40:24,712][07928] Adding new argument 'video_name'=None that is not in the saved config file!
+[2023-02-23 10:40:24,714][07928] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2023-02-23 10:40:24,716][07928] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2023-02-23 10:40:24,718][07928] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2023-02-23 10:40:24,719][07928] Adding new argument 'hf_repository'='Unterwexi/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2023-02-23 10:40:24,722][07928] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2023-02-23 10:40:24,723][07928] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2023-02-23 10:40:24,725][07928] Adding new argument 'train_script'=None that is not in the saved config file!
+[2023-02-23 10:40:24,726][07928] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2023-02-23 10:40:24,727][07928] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2023-02-23 10:40:24,745][07928] RunningMeanStd input shape: (3, 72, 128)
+[2023-02-23 10:40:24,748][07928] RunningMeanStd input shape: (1,)
+[2023-02-23 10:40:24,763][07928] ConvEncoder: input_channels=3
+[2023-02-23 10:40:24,804][07928] Conv encoder output size: 512
+[2023-02-23 10:40:24,806][07928] Policy head output size: 512
+[2023-02-23 10:40:24,831][07928] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...
+[2023-02-23 10:40:25,305][07928] Num frames 100...
+[2023-02-23 10:40:25,415][07928] Num frames 200...
+[2023-02-23 10:40:25,523][07928] Num frames 300...
+[2023-02-23 10:40:25,633][07928] Num frames 400...
+[2023-02-23 10:40:25,746][07928] Num frames 500...
+[2023-02-23 10:40:25,858][07928] Num frames 600...
+[2023-02-23 10:40:25,970][07928] Num frames 700...
+[2023-02-23 10:40:26,079][07928] Num frames 800...
+[2023-02-23 10:40:26,189][07928] Num frames 900...
+[2023-02-23 10:40:26,297][07928] Num frames 1000...
+[2023-02-23 10:40:26,406][07928] Num frames 1100...
+[2023-02-23 10:40:26,515][07928] Num frames 1200...
+[2023-02-23 10:40:26,628][07928] Num frames 1300...
+[2023-02-23 10:40:26,739][07928] Num frames 1400...
+[2023-02-23 10:40:26,808][07928] Avg episode rewards: #0: 37.120, true rewards: #0: 14.120
+[2023-02-23 10:40:26,810][07928] Avg episode reward: 37.120, avg true_objective: 14.120
+[2023-02-23 10:40:26,908][07928] Num frames 1500...
+[2023-02-23 10:40:27,032][07928] Num frames 1600...
+[2023-02-23 10:40:27,145][07928] Num frames 1700...
+[2023-02-23 10:40:27,258][07928] Num frames 1800...
+[2023-02-23 10:40:27,368][07928] Num frames 1900...
+[2023-02-23 10:40:27,481][07928] Num frames 2000...
+[2023-02-23 10:40:27,591][07928] Num frames 2100...
+[2023-02-23 10:40:27,703][07928] Num frames 2200...
+[2023-02-23 10:40:27,812][07928] Num frames 2300...
+[2023-02-23 10:40:27,925][07928] Num frames 2400...
+[2023-02-23 10:40:28,041][07928] Num frames 2500...
+[2023-02-23 10:40:28,153][07928] Num frames 2600...
+[2023-02-23 10:40:28,302][07928] Avg episode rewards: #0: 35.435, true rewards: #0: 13.435
+[2023-02-23 10:40:28,304][07928] Avg episode reward: 35.435, avg true_objective: 13.435
+[2023-02-23 10:40:28,320][07928] Num frames 2700...
+[2023-02-23 10:40:28,429][07928] Num frames 2800...
+[2023-02-23 10:40:28,543][07928] Num frames 2900...
+[2023-02-23 10:40:28,655][07928] Num frames 3000...
+[2023-02-23 10:40:28,764][07928] Num frames 3100...
+[2023-02-23 10:40:28,874][07928] Num frames 3200...
+[2023-02-23 10:40:28,998][07928] Num frames 3300...
+[2023-02-23 10:40:29,109][07928] Num frames 3400...
+[2023-02-23 10:40:29,217][07928] Num frames 3500...
+[2023-02-23 10:40:29,327][07928] Num frames 3600...
+[2023-02-23 10:40:29,437][07928] Num frames 3700...
+[2023-02-23 10:40:29,552][07928] Num frames 3800...
+[2023-02-23 10:40:29,663][07928] Num frames 3900...
+[2023-02-23 10:40:29,784][07928] Avg episode rewards: #0: 35.203, true rewards: #0: 13.203
+[2023-02-23 10:40:29,786][07928] Avg episode reward: 35.203, avg true_objective: 13.203
+[2023-02-23 10:40:29,831][07928] Num frames 4000...
+[2023-02-23 10:40:29,940][07928] Num frames 4100...
+[2023-02-23 10:40:30,055][07928] Num frames 4200...
+[2023-02-23 10:40:30,167][07928] Num frames 4300...
+[2023-02-23 10:40:30,280][07928] Num frames 4400...
+[2023-02-23 10:40:30,396][07928] Num frames 4500...
+[2023-02-23 10:40:30,507][07928] Num frames 4600...
+[2023-02-23 10:40:30,619][07928] Num frames 4700...
+[2023-02-23 10:40:30,730][07928] Num frames 4800...
+[2023-02-23 10:40:30,841][07928] Num frames 4900...
+[2023-02-23 10:40:30,950][07928] Num frames 5000...
+[2023-02-23 10:40:31,061][07928] Num frames 5100...
+[2023-02-23 10:40:31,169][07928] Num frames 5200...
+[2023-02-23 10:40:31,278][07928] Num frames 5300...
+[2023-02-23 10:40:31,388][07928] Num frames 5400...
+[2023-02-23 10:40:31,503][07928] Num frames 5500...
+[2023-02-23 10:40:31,616][07928] Num frames 5600...
+[2023-02-23 10:40:31,729][07928] Num frames 5700...
+[2023-02-23 10:40:31,841][07928] Num frames 5800...
+[2023-02-23 10:40:31,954][07928] Num frames 5900...
+[2023-02-23 10:40:32,085][07928] Num frames 6000...
+[2023-02-23 10:40:32,208][07928] Avg episode rewards: #0: 41.152, true rewards: #0: 15.152
+[2023-02-23 10:40:32,210][07928] Avg episode reward: 41.152, avg true_objective: 15.152
+[2023-02-23 10:40:32,256][07928] Num frames 6100...
+[2023-02-23 10:40:32,367][07928] Num frames 6200...
+[2023-02-23 10:40:32,479][07928] Num frames 6300...
+[2023-02-23 10:40:32,591][07928] Num frames 6400...
+[2023-02-23 10:40:32,703][07928] Num frames 6500...
+[2023-02-23 10:40:32,817][07928] Num frames 6600...
+[2023-02-23 10:40:32,933][07928] Num frames 6700...
+[2023-02-23 10:40:33,057][07928] Num frames 6800...
+[2023-02-23 10:40:33,189][07928] Num frames 6900...
+[2023-02-23 10:40:33,298][07928] Num frames 7000...
+[2023-02-23 10:40:33,410][07928] Num frames 7100...
+[2023-02-23 10:40:33,535][07928] Num frames 7200...
+[2023-02-23 10:40:33,650][07928] Num frames 7300...
+[2023-02-23 10:40:33,762][07928] Num frames 7400...
+[2023-02-23 10:40:33,876][07928] Num frames 7500...
+[2023-02-23 10:40:33,988][07928] Num frames 7600...
+[2023-02-23 10:40:34,157][07928] Avg episode rewards: #0: 41.786, true rewards: #0: 15.386
+[2023-02-23 10:40:34,159][07928] Avg episode reward: 41.786, avg true_objective: 15.386
+[2023-02-23 10:40:34,169][07928] Num frames 7700...
+[2023-02-23 10:40:34,283][07928] Num frames 7800...
+[2023-02-23 10:40:34,397][07928] Num frames 7900...
+[2023-02-23 10:40:34,513][07928] Num frames 8000...
+[2023-02-23 10:40:34,635][07928] Num frames 8100...
+[2023-02-23 10:40:34,752][07928] Num frames 8200...
+[2023-02-23 10:40:34,867][07928] Num frames 8300...
+[2023-02-23 10:40:34,984][07928] Num frames 8400...
+[2023-02-23 10:40:35,101][07928] Num frames 8500...
+[2023-02-23 10:40:35,217][07928] Num frames 8600...
+[2023-02-23 10:40:35,329][07928] Num frames 8700...
+[2023-02-23 10:40:35,448][07928] Num frames 8800...
+[2023-02-23 10:40:35,561][07928] Num frames 8900...
+[2023-02-23 10:40:35,674][07928] Num frames 9000...
+[2023-02-23 10:40:35,789][07928] Num frames 9100...
+[2023-02-23 10:40:35,900][07928] Num frames 9200...
+[2023-02-23 10:40:36,009][07928] Num frames 9300...
+[2023-02-23 10:40:36,118][07928] Num frames 9400...
+[2023-02-23 10:40:36,233][07928] Num frames 9500...
+[2023-02-23 10:40:36,345][07928] Num frames 9600...
+[2023-02-23 10:40:36,457][07928] Num frames 9700...
+[2023-02-23 10:40:36,616][07928] Avg episode rewards: #0: 44.988, true rewards: #0: 16.322
+[2023-02-23 10:40:36,617][07928] Avg episode reward: 44.988, avg true_objective: 16.322
+[2023-02-23 10:40:36,626][07928] Num frames 9800...
+[2023-02-23 10:40:36,737][07928] Num frames 9900...
+[2023-02-23 10:40:36,842][07928] Num frames 10000...
+[2023-02-23 10:40:36,949][07928] Num frames 10100...
+[2023-02-23 10:40:37,057][07928] Num frames 10200...
+[2023-02-23 10:40:37,166][07928] Num frames 10300...
+[2023-02-23 10:40:37,274][07928] Num frames 10400...
+[2023-02-23 10:40:37,380][07928] Num frames 10500...
+[2023-02-23 10:40:37,490][07928] Num frames 10600...
+[2023-02-23 10:40:37,602][07928] Num frames 10700...
+[2023-02-23 10:40:37,710][07928] Num frames 10800...
+[2023-02-23 10:40:37,818][07928] Num frames 10900...
+[2023-02-23 10:40:37,928][07928] Num frames 11000...
+[2023-02-23 10:40:38,040][07928] Num frames 11100...
+[2023-02-23 10:40:38,152][07928] Num frames 11200...
+[2023-02-23 10:40:38,263][07928] Num frames 11300...
+[2023-02-23 10:40:38,359][07928] Avg episode rewards: #0: 44.479, true rewards: #0: 16.194
+[2023-02-23 10:40:38,361][07928] Avg episode reward: 44.479, avg true_objective: 16.194
+[2023-02-23 10:40:38,434][07928] Num frames 11400...
+[2023-02-23 10:40:38,547][07928] Num frames 11500...
+[2023-02-23 10:40:38,660][07928] Num frames 11600...
+[2023-02-23 10:40:38,771][07928] Num frames 11700...
+[2023-02-23 10:40:38,882][07928] Num frames 11800...
+[2023-02-23 10:40:38,988][07928] Avg episode rewards: #0: 40.185, true rewards: #0: 14.810
+[2023-02-23 10:40:38,990][07928] Avg episode reward: 40.185, avg true_objective: 14.810
+[2023-02-23 10:40:39,049][07928] Num frames 11900...
+[2023-02-23 10:40:39,159][07928] Num frames 12000...
+[2023-02-23 10:40:39,272][07928] Num frames 12100...
+[2023-02-23 10:40:39,383][07928] Num frames 12200...
+[2023-02-23 10:40:39,491][07928] Num frames 12300...
+[2023-02-23 10:40:39,600][07928] Num frames 12400...
+[2023-02-23 10:40:39,713][07928] Num frames 12500...
+[2023-02-23 10:40:39,846][07928] Avg episode rewards: #0: 37.632, true rewards: #0: 13.966
+[2023-02-23 10:40:39,848][07928] Avg episode reward: 37.632, avg true_objective: 13.966
+[2023-02-23 10:40:39,884][07928] Num frames 12600...
+[2023-02-23 10:40:39,994][07928] Num frames 12700...
+[2023-02-23 10:40:40,102][07928] Num frames 12800...
+[2023-02-23 10:40:40,211][07928] Num frames 12900...
+[2023-02-23 10:40:40,324][07928] Num frames 13000...
+[2023-02-23 10:40:40,435][07928] Num frames 13100...
+[2023-02-23 10:40:40,545][07928] Num frames 13200...
+[2023-02-23 10:40:40,659][07928] Num frames 13300...
+[2023-02-23 10:40:40,771][07928] Num frames 13400...
+[2023-02-23 10:40:40,883][07928] Num frames 13500...
+[2023-02-23 10:40:40,997][07928] Num frames 13600...
+[2023-02-23 10:40:41,110][07928] Num frames 13700...
+[2023-02-23 10:40:41,221][07928] Num frames 13800...
+[2023-02-23 10:40:41,351][07928] Num frames 13900...
+[2023-02-23 10:40:41,463][07928] Num frames 14000...
+[2023-02-23 10:40:41,576][07928] Num frames 14100...
+[2023-02-23 10:40:41,693][07928] Num frames 14200...
+[2023-02-23 10:40:41,808][07928] Num frames 14300...
+[2023-02-23 10:40:41,919][07928] Num frames 14400...
+[2023-02-23 10:40:42,031][07928] Num frames 14500...
+[2023-02-23 10:40:42,142][07928] Num frames 14600...
+[2023-02-23 10:40:42,279][07928] Avg episode rewards: #0: 38.968, true rewards: #0: 14.669
+[2023-02-23 10:40:42,281][07928] Avg episode reward: 38.968, avg true_objective: 14.669
+[2023-02-23 10:41:16,530][07928] Replay video saved to /content/train_dir/default_experiment/replay.mp4!