seangogo's picture
Upload folder using huggingface_hub
1d17627 verified
raw
history blame
107 kB
[2024-10-03 20:35:48,586][00259] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-10-03 20:35:48,589][00259] Rollout worker 0 uses device cpu
[2024-10-03 20:35:48,591][00259] Rollout worker 1 uses device cpu
[2024-10-03 20:35:48,593][00259] Rollout worker 2 uses device cpu
[2024-10-03 20:35:48,594][00259] Rollout worker 3 uses device cpu
[2024-10-03 20:35:48,595][00259] Rollout worker 4 uses device cpu
[2024-10-03 20:35:48,596][00259] Rollout worker 5 uses device cpu
[2024-10-03 20:35:48,597][00259] Rollout worker 6 uses device cpu
[2024-10-03 20:35:48,598][00259] Rollout worker 7 uses device cpu
[2024-10-03 20:35:48,750][00259] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-03 20:35:48,752][00259] InferenceWorker_p0-w0: min num requests: 2
[2024-10-03 20:35:48,783][00259] Starting all processes...
[2024-10-03 20:35:48,785][00259] Starting process learner_proc0
[2024-10-03 20:35:49,461][00259] Starting all processes...
[2024-10-03 20:35:49,473][00259] Starting process inference_proc0-0
[2024-10-03 20:35:49,483][00259] Starting process rollout_proc0
[2024-10-03 20:35:49,484][00259] Starting process rollout_proc1
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc2
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc3
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc4
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc5
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc6
[2024-10-03 20:35:49,488][00259] Starting process rollout_proc7
[2024-10-03 20:36:06,413][02390] Worker 4 uses CPU cores [0]
[2024-10-03 20:36:06,494][02391] Worker 5 uses CPU cores [1]
[2024-10-03 20:36:06,517][02372] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-03 20:36:06,518][02372] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-10-03 20:36:06,549][02392] Worker 7 uses CPU cores [1]
[2024-10-03 20:36:06,551][02385] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-03 20:36:06,556][02385] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-10-03 20:36:06,573][02372] Num visible devices: 1
[2024-10-03 20:36:06,597][02372] Starting seed is not provided
[2024-10-03 20:36:06,598][02372] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-03 20:36:06,599][02372] Initializing actor-critic model on device cuda:0
[2024-10-03 20:36:06,600][02372] RunningMeanStd input shape: (3, 72, 128)
[2024-10-03 20:36:06,603][02372] RunningMeanStd input shape: (1,)
[2024-10-03 20:36:06,618][02393] Worker 6 uses CPU cores [0]
[2024-10-03 20:36:06,631][02385] Num visible devices: 1
[2024-10-03 20:36:06,658][02386] Worker 0 uses CPU cores [0]
[2024-10-03 20:36:06,666][02389] Worker 3 uses CPU cores [1]
[2024-10-03 20:36:06,670][02372] ConvEncoder: input_channels=3
[2024-10-03 20:36:06,673][02388] Worker 1 uses CPU cores [1]
[2024-10-03 20:36:06,714][02387] Worker 2 uses CPU cores [0]
[2024-10-03 20:36:06,934][02372] Conv encoder output size: 512
[2024-10-03 20:36:06,934][02372] Policy head output size: 512
[2024-10-03 20:36:07,001][02372] Created Actor Critic model with architecture:
[2024-10-03 20:36:07,002][02372] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-10-03 20:36:07,418][02372] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-10-03 20:36:08,306][02372] No checkpoints found
[2024-10-03 20:36:08,306][02372] Did not load from checkpoint, starting from scratch!
[2024-10-03 20:36:08,306][02372] Initialized policy 0 weights for model version 0
[2024-10-03 20:36:08,310][02372] LearnerWorker_p0 finished initialization!
[2024-10-03 20:36:08,311][02372] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-10-03 20:36:08,402][02385] RunningMeanStd input shape: (3, 72, 128)
[2024-10-03 20:36:08,403][02385] RunningMeanStd input shape: (1,)
[2024-10-03 20:36:08,415][02385] ConvEncoder: input_channels=3
[2024-10-03 20:36:08,515][02385] Conv encoder output size: 512
[2024-10-03 20:36:08,515][02385] Policy head output size: 512
[2024-10-03 20:36:08,565][00259] Inference worker 0-0 is ready!
[2024-10-03 20:36:08,568][00259] All inference workers are ready! Signal rollout workers to start!
[2024-10-03 20:36:08,744][00259] Heartbeat connected on Batcher_0
[2024-10-03 20:36:08,750][00259] Heartbeat connected on LearnerWorker_p0
[2024-10-03 20:36:08,783][02389] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,785][02392] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,787][02388] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,788][02391] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,802][00259] Heartbeat connected on InferenceWorker_p0-w0
[2024-10-03 20:36:08,806][02393] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,810][02386] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,812][02387] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:08,813][02390] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:36:10,706][02393] Decorrelating experience for 0 frames...
[2024-10-03 20:36:10,708][02386] Decorrelating experience for 0 frames...
[2024-10-03 20:36:10,719][02387] Decorrelating experience for 0 frames...
[2024-10-03 20:36:10,721][02390] Decorrelating experience for 0 frames...
[2024-10-03 20:36:10,988][02388] Decorrelating experience for 0 frames...
[2024-10-03 20:36:10,990][02391] Decorrelating experience for 0 frames...
[2024-10-03 20:36:11,492][02387] Decorrelating experience for 32 frames...
[2024-10-03 20:36:12,224][02390] Decorrelating experience for 32 frames...
[2024-10-03 20:36:12,243][02389] Decorrelating experience for 0 frames...
[2024-10-03 20:36:12,254][02392] Decorrelating experience for 0 frames...
[2024-10-03 20:36:12,741][00259] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-03 20:36:12,999][02391] Decorrelating experience for 32 frames...
[2024-10-03 20:36:13,916][02392] Decorrelating experience for 32 frames...
[2024-10-03 20:36:13,921][02389] Decorrelating experience for 32 frames...
[2024-10-03 20:36:14,031][02387] Decorrelating experience for 64 frames...
[2024-10-03 20:36:14,534][02386] Decorrelating experience for 32 frames...
[2024-10-03 20:36:14,554][02393] Decorrelating experience for 32 frames...
[2024-10-03 20:36:15,101][02390] Decorrelating experience for 64 frames...
[2024-10-03 20:36:15,682][02391] Decorrelating experience for 64 frames...
[2024-10-03 20:36:16,089][02387] Decorrelating experience for 96 frames...
[2024-10-03 20:36:16,292][00259] Heartbeat connected on RolloutWorker_w2
[2024-10-03 20:36:16,319][02392] Decorrelating experience for 64 frames...
[2024-10-03 20:36:16,325][02389] Decorrelating experience for 64 frames...
[2024-10-03 20:36:16,331][02388] Decorrelating experience for 32 frames...
[2024-10-03 20:36:16,494][02386] Decorrelating experience for 64 frames...
[2024-10-03 20:36:16,620][02390] Decorrelating experience for 96 frames...
[2024-10-03 20:36:16,827][00259] Heartbeat connected on RolloutWorker_w4
[2024-10-03 20:36:17,507][02393] Decorrelating experience for 64 frames...
[2024-10-03 20:36:17,687][02386] Decorrelating experience for 96 frames...
[2024-10-03 20:36:17,730][02391] Decorrelating experience for 96 frames...
[2024-10-03 20:36:17,741][00259] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-03 20:36:17,844][00259] Heartbeat connected on RolloutWorker_w0
[2024-10-03 20:36:18,013][00259] Heartbeat connected on RolloutWorker_w5
[2024-10-03 20:36:18,103][02392] Decorrelating experience for 96 frames...
[2024-10-03 20:36:18,275][00259] Heartbeat connected on RolloutWorker_w7
[2024-10-03 20:36:18,416][02393] Decorrelating experience for 96 frames...
[2024-10-03 20:36:18,573][00259] Heartbeat connected on RolloutWorker_w6
[2024-10-03 20:36:19,853][02388] Decorrelating experience for 64 frames...
[2024-10-03 20:36:20,588][02389] Decorrelating experience for 96 frames...
[2024-10-03 20:36:21,015][00259] Heartbeat connected on RolloutWorker_w3
[2024-10-03 20:36:22,741][00259] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 121.8. Samples: 1218. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-10-03 20:36:22,746][00259] Avg episode reward: [(0, '2.398')]
[2024-10-03 20:36:22,855][02372] Signal inference workers to stop experience collection...
[2024-10-03 20:36:22,873][02385] InferenceWorker_p0-w0: stopping experience collection
[2024-10-03 20:36:22,935][02388] Decorrelating experience for 96 frames...
[2024-10-03 20:36:23,022][00259] Heartbeat connected on RolloutWorker_w1
[2024-10-03 20:36:25,333][02372] Signal inference workers to resume experience collection...
[2024-10-03 20:36:25,335][02385] InferenceWorker_p0-w0: resuming experience collection
[2024-10-03 20:36:27,741][00259] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 218.9. Samples: 3284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2024-10-03 20:36:27,744][00259] Avg episode reward: [(0, '2.983')]
[2024-10-03 20:36:32,741][00259] Fps is (10 sec: 2457.6, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 371.4. Samples: 7428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:36:32,743][00259] Avg episode reward: [(0, '3.607')]
[2024-10-03 20:36:35,583][02385] Updated weights for policy 0, policy_version 10 (0.0202)
[2024-10-03 20:36:37,743][00259] Fps is (10 sec: 3685.4, 60 sec: 1965.9, 300 sec: 1965.9). Total num frames: 49152. Throughput: 0: 439.4. Samples: 10986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:36:37,746][00259] Avg episode reward: [(0, '4.219')]
[2024-10-03 20:36:42,741][00259] Fps is (10 sec: 4505.4, 60 sec: 2321.0, 300 sec: 2321.0). Total num frames: 69632. Throughput: 0: 570.9. Samples: 17126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:36:42,750][00259] Avg episode reward: [(0, '4.269')]
[2024-10-03 20:36:46,831][02385] Updated weights for policy 0, policy_version 20 (0.0031)
[2024-10-03 20:36:47,742][00259] Fps is (10 sec: 3277.2, 60 sec: 2340.5, 300 sec: 2340.5). Total num frames: 81920. Throughput: 0: 618.8. Samples: 21658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:36:47,747][00259] Avg episode reward: [(0, '4.344')]
[2024-10-03 20:36:52,741][00259] Fps is (10 sec: 3277.0, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 599.9. Samples: 23996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:36:52,746][00259] Avg episode reward: [(0, '4.489')]
[2024-10-03 20:36:52,754][02372] Saving new best policy, reward=4.489!
[2024-10-03 20:36:57,137][02385] Updated weights for policy 0, policy_version 30 (0.0040)
[2024-10-03 20:36:57,741][00259] Fps is (10 sec: 4096.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 684.1. Samples: 30786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:36:57,745][00259] Avg episode reward: [(0, '4.454')]
[2024-10-03 20:37:02,741][00259] Fps is (10 sec: 4095.9, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 812.8. Samples: 36578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:37:02,746][00259] Avg episode reward: [(0, '4.283')]
[2024-10-03 20:37:07,741][00259] Fps is (10 sec: 3276.8, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 833.6. Samples: 38728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:37:07,746][00259] Avg episode reward: [(0, '4.293')]
[2024-10-03 20:37:09,084][02385] Updated weights for policy 0, policy_version 40 (0.0046)
[2024-10-03 20:37:12,741][00259] Fps is (10 sec: 3686.5, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 919.1. Samples: 44644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:37:12,743][00259] Avg episode reward: [(0, '4.481')]
[2024-10-03 20:37:17,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 972.2. Samples: 51178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:37:17,747][00259] Avg episode reward: [(0, '4.493')]
[2024-10-03 20:37:17,753][02372] Saving new best policy, reward=4.493!
[2024-10-03 20:37:18,295][02385] Updated weights for policy 0, policy_version 50 (0.0039)
[2024-10-03 20:37:22,741][00259] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 929.0. Samples: 52790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:37:22,745][00259] Avg episode reward: [(0, '4.314')]
[2024-10-03 20:37:27,741][00259] Fps is (10 sec: 2048.0, 60 sec: 3481.6, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 857.3. Samples: 55704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:37:27,744][00259] Avg episode reward: [(0, '4.461')]
[2024-10-03 20:37:32,741][00259] Fps is (10 sec: 2867.4, 60 sec: 3618.1, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 881.4. Samples: 61320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:37:32,743][00259] Avg episode reward: [(0, '4.477')]
[2024-10-03 20:37:33,086][02385] Updated weights for policy 0, policy_version 60 (0.0041)
[2024-10-03 20:37:37,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3618.3, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 906.2. Samples: 64776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:37:37,750][00259] Avg episode reward: [(0, '4.574')]
[2024-10-03 20:37:37,753][02372] Saving new best policy, reward=4.574!
[2024-10-03 20:37:42,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 874.4. Samples: 70136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:37:42,743][00259] Avg episode reward: [(0, '4.507')]
[2024-10-03 20:37:42,754][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth...
[2024-10-03 20:37:44,306][02385] Updated weights for policy 0, policy_version 70 (0.0043)
[2024-10-03 20:37:47,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3104.3). Total num frames: 294912. Throughput: 0: 853.6. Samples: 74990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:37:47,743][00259] Avg episode reward: [(0, '4.374')]
[2024-10-03 20:37:52,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 882.4. Samples: 78436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:37:52,743][00259] Avg episode reward: [(0, '4.422')]
[2024-10-03 20:37:54,072][02385] Updated weights for policy 0, policy_version 80 (0.0040)
[2024-10-03 20:37:57,745][00259] Fps is (10 sec: 4503.5, 60 sec: 3617.9, 300 sec: 3237.7). Total num frames: 339968. Throughput: 0: 900.2. Samples: 85158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:37:57,756][00259] Avg episode reward: [(0, '4.449')]
[2024-10-03 20:38:02,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3202.3). Total num frames: 352256. Throughput: 0: 848.6. Samples: 89366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:38:02,743][00259] Avg episode reward: [(0, '4.452')]
[2024-10-03 20:38:05,795][02385] Updated weights for policy 0, policy_version 90 (0.0030)
[2024-10-03 20:38:07,742][00259] Fps is (10 sec: 3687.5, 60 sec: 3686.3, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 883.9. Samples: 92568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:38:07,747][00259] Avg episode reward: [(0, '4.713')]
[2024-10-03 20:38:07,750][02372] Saving new best policy, reward=4.713!
[2024-10-03 20:38:12,741][00259] Fps is (10 sec: 4505.5, 60 sec: 3618.1, 300 sec: 3310.9). Total num frames: 397312. Throughput: 0: 966.1. Samples: 99178. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:38:12,746][00259] Avg episode reward: [(0, '4.727')]
[2024-10-03 20:38:12,757][02372] Saving new best policy, reward=4.727!
[2024-10-03 20:38:15,705][02385] Updated weights for policy 0, policy_version 100 (0.0027)
[2024-10-03 20:38:17,741][00259] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 949.2. Samples: 104036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:38:17,743][00259] Avg episode reward: [(0, '4.613')]
[2024-10-03 20:38:22,741][00259] Fps is (10 sec: 3276.9, 60 sec: 3618.2, 300 sec: 3308.3). Total num frames: 430080. Throughput: 0: 921.4. Samples: 106238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:38:22,747][00259] Avg episode reward: [(0, '4.703')]
[2024-10-03 20:38:26,640][02385] Updated weights for policy 0, policy_version 110 (0.0035)
[2024-10-03 20:38:27,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 955.6. Samples: 113138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:38:27,748][00259] Avg episode reward: [(0, '4.710')]
[2024-10-03 20:38:32,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 980.1. Samples: 119094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:38:32,743][00259] Avg episode reward: [(0, '4.644')]
[2024-10-03 20:38:37,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3361.5). Total num frames: 487424. Throughput: 0: 949.3. Samples: 121156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:38:37,747][00259] Avg episode reward: [(0, '4.651')]
[2024-10-03 20:38:38,284][02385] Updated weights for policy 0, policy_version 120 (0.0039)
[2024-10-03 20:38:42,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3413.3). Total num frames: 512000. Throughput: 0: 937.3. Samples: 127334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:38:42,743][00259] Avg episode reward: [(0, '4.523')]
[2024-10-03 20:38:46,945][02385] Updated weights for policy 0, policy_version 130 (0.0041)
[2024-10-03 20:38:47,741][00259] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3435.3). Total num frames: 532480. Throughput: 0: 1004.3. Samples: 134558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:38:47,744][00259] Avg episode reward: [(0, '4.531')]
[2024-10-03 20:38:52,744][00259] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3430.3). Total num frames: 548864. Throughput: 0: 981.0. Samples: 136716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:38:52,747][00259] Avg episode reward: [(0, '4.569')]
[2024-10-03 20:38:57,741][00259] Fps is (10 sec: 3686.6, 60 sec: 3823.2, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 946.9. Samples: 141790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:38:57,742][00259] Avg episode reward: [(0, '4.518')]
[2024-10-03 20:38:58,256][02385] Updated weights for policy 0, policy_version 140 (0.0038)
[2024-10-03 20:39:02,741][00259] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 997.6. Samples: 148926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:39:02,748][00259] Avg episode reward: [(0, '4.604')]
[2024-10-03 20:39:07,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3487.5). Total num frames: 610304. Throughput: 0: 1018.4. Samples: 152064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:39:07,745][00259] Avg episode reward: [(0, '4.475')]
[2024-10-03 20:39:08,562][02385] Updated weights for policy 0, policy_version 150 (0.0025)
[2024-10-03 20:39:12,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 956.0. Samples: 156156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:39:12,743][00259] Avg episode reward: [(0, '4.447')]
[2024-10-03 20:39:17,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3498.2). Total num frames: 647168. Throughput: 0: 967.1. Samples: 162614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:39:17,743][00259] Avg episode reward: [(0, '4.423')]
[2024-10-03 20:39:19,326][02385] Updated weights for policy 0, policy_version 160 (0.0017)
[2024-10-03 20:39:22,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3513.9). Total num frames: 667648. Throughput: 0: 994.2. Samples: 165894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:39:22,743][00259] Avg episode reward: [(0, '4.472')]
[2024-10-03 20:39:27,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3486.9). Total num frames: 679936. Throughput: 0: 945.1. Samples: 169864. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:39:27,743][00259] Avg episode reward: [(0, '4.379')]
[2024-10-03 20:39:32,741][00259] Fps is (10 sec: 2048.0, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 688128. Throughput: 0: 857.1. Samples: 173128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:39:32,747][00259] Avg episode reward: [(0, '4.333')]
[2024-10-03 20:39:34,241][02385] Updated weights for policy 0, policy_version 170 (0.0051)
[2024-10-03 20:39:37,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 877.1. Samples: 176182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:39:37,745][00259] Avg episode reward: [(0, '4.435')]
[2024-10-03 20:39:42,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3491.4). Total num frames: 733184. Throughput: 0: 917.6. Samples: 183080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-10-03 20:39:42,743][00259] Avg episode reward: [(0, '4.574')]
[2024-10-03 20:39:42,754][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth...
[2024-10-03 20:39:43,358][02385] Updated weights for policy 0, policy_version 180 (0.0032)
[2024-10-03 20:39:47,741][00259] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3486.4). Total num frames: 749568. Throughput: 0: 871.9. Samples: 188164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:39:47,746][00259] Avg episode reward: [(0, '4.517')]
[2024-10-03 20:39:52,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3481.6). Total num frames: 765952. Throughput: 0: 846.4. Samples: 190154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:39:52,743][00259] Avg episode reward: [(0, '4.573')]
[2024-10-03 20:39:55,426][02385] Updated weights for policy 0, policy_version 190 (0.0026)
[2024-10-03 20:39:57,741][00259] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3495.3). Total num frames: 786432. Throughput: 0: 892.1. Samples: 196300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:39:57,743][00259] Avg episode reward: [(0, '4.685')]
[2024-10-03 20:40:02,743][00259] Fps is (10 sec: 3685.7, 60 sec: 3481.5, 300 sec: 3490.5). Total num frames: 802816. Throughput: 0: 878.8. Samples: 202160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:40:02,747][00259] Avg episode reward: [(0, '4.739')]
[2024-10-03 20:40:02,762][02372] Saving new best policy, reward=4.739!
[2024-10-03 20:40:07,675][02385] Updated weights for policy 0, policy_version 200 (0.0036)
[2024-10-03 20:40:07,741][00259] Fps is (10 sec: 3276.6, 60 sec: 3481.6, 300 sec: 3485.9). Total num frames: 819200. Throughput: 0: 847.4. Samples: 204026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:40:07,744][00259] Avg episode reward: [(0, '4.609')]
[2024-10-03 20:40:12,741][00259] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3481.6). Total num frames: 835584. Throughput: 0: 869.9. Samples: 209010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:40:12,743][00259] Avg episode reward: [(0, '4.645')]
[2024-10-03 20:40:17,694][02385] Updated weights for policy 0, policy_version 210 (0.0032)
[2024-10-03 20:40:17,741][00259] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3510.9). Total num frames: 860160. Throughput: 0: 940.0. Samples: 215428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:40:17,743][00259] Avg episode reward: [(0, '4.574')]
[2024-10-03 20:40:22,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3489.8). Total num frames: 872448. Throughput: 0: 929.8. Samples: 218022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:40:22,748][00259] Avg episode reward: [(0, '4.428')]
[2024-10-03 20:40:27,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.6). Total num frames: 888832. Throughput: 0: 861.5. Samples: 221848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:40:27,745][00259] Avg episode reward: [(0, '4.703')]
[2024-10-03 20:40:30,470][02385] Updated weights for policy 0, policy_version 220 (0.0023)
[2024-10-03 20:40:32,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3497.4). Total num frames: 909312. Throughput: 0: 888.5. Samples: 228146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2024-10-03 20:40:32,746][00259] Avg episode reward: [(0, '4.971')]
[2024-10-03 20:40:32,758][02372] Saving new best policy, reward=4.971!
[2024-10-03 20:40:37,743][00259] Fps is (10 sec: 4095.2, 60 sec: 3618.0, 300 sec: 3508.6). Total num frames: 929792. Throughput: 0: 916.6. Samples: 231404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:40:37,745][00259] Avg episode reward: [(0, '5.048')]
[2024-10-03 20:40:37,752][02372] Saving new best policy, reward=5.048!
[2024-10-03 20:40:41,523][02385] Updated weights for policy 0, policy_version 230 (0.0039)
[2024-10-03 20:40:42,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3489.2). Total num frames: 942080. Throughput: 0: 882.0. Samples: 235990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:40:42,746][00259] Avg episode reward: [(0, '5.134')]
[2024-10-03 20:40:42,758][02372] Saving new best policy, reward=5.134!
[2024-10-03 20:40:47,742][00259] Fps is (10 sec: 2867.4, 60 sec: 3481.6, 300 sec: 3485.3). Total num frames: 958464. Throughput: 0: 863.5. Samples: 241016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:40:47,744][00259] Avg episode reward: [(0, '5.183')]
[2024-10-03 20:40:47,757][02372] Saving new best policy, reward=5.183!
[2024-10-03 20:40:52,358][02385] Updated weights for policy 0, policy_version 240 (0.0033)
[2024-10-03 20:40:52,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3510.9). Total num frames: 983040. Throughput: 0: 896.4. Samples: 244364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:40:52,744][00259] Avg episode reward: [(0, '5.076')]
[2024-10-03 20:40:57,741][00259] Fps is (10 sec: 4096.5, 60 sec: 3549.9, 300 sec: 3506.8). Total num frames: 999424. Throughput: 0: 925.3. Samples: 250650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-10-03 20:40:57,744][00259] Avg episode reward: [(0, '5.258')]
[2024-10-03 20:40:57,749][02372] Saving new best policy, reward=5.258!
[2024-10-03 20:41:02,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3502.8). Total num frames: 1015808. Throughput: 0: 871.6. Samples: 254650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:41:02,747][00259] Avg episode reward: [(0, '5.264')]
[2024-10-03 20:41:02,763][02372] Saving new best policy, reward=5.264!
[2024-10-03 20:41:05,986][02385] Updated weights for policy 0, policy_version 250 (0.0042)
[2024-10-03 20:41:07,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1028096. Throughput: 0: 855.4. Samples: 256514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-10-03 20:41:07,747][00259] Avg episode reward: [(0, '5.401')]
[2024-10-03 20:41:07,750][02372] Saving new best policy, reward=5.401!
[2024-10-03 20:41:12,741][00259] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 887.0. Samples: 261762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:41:12,747][00259] Avg episode reward: [(0, '5.481')]
[2024-10-03 20:41:12,759][02372] Saving new best policy, reward=5.481!
[2024-10-03 20:41:17,741][00259] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 854.6. Samples: 266602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2024-10-03 20:41:17,743][00259] Avg episode reward: [(0, '5.850')]
[2024-10-03 20:41:17,746][02372] Saving new best policy, reward=5.850!
[2024-10-03 20:41:18,273][02385] Updated weights for policy 0, policy_version 260 (0.0025)
[2024-10-03 20:41:22,741][00259] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 1081344. Throughput: 0: 828.6. Samples: 268688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:41:22,743][00259] Avg episode reward: [(0, '5.700')]
[2024-10-03 20:41:27,741][00259] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1101824. Throughput: 0: 880.4. Samples: 275608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:41:27,746][00259] Avg episode reward: [(0, '5.352')]
[2024-10-03 20:41:27,887][02385] Updated weights for policy 0, policy_version 270 (0.0024)
[2024-10-03 20:41:32,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1122304. Throughput: 0: 907.2. Samples: 281840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:41:32,745][00259] Avg episode reward: [(0, '5.492')]
[2024-10-03 20:41:37,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3623.9). Total num frames: 1138688. Throughput: 0: 880.5. Samples: 283988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:41:37,745][00259] Avg episode reward: [(0, '5.715')]
[2024-10-03 20:41:39,398][02385] Updated weights for policy 0, policy_version 280 (0.0034)
[2024-10-03 20:41:42,741][00259] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1159168. Throughput: 0: 868.3. Samples: 289724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:41:42,743][00259] Avg episode reward: [(0, '5.523')]
[2024-10-03 20:41:42,752][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth...
[2024-10-03 20:41:42,871][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth
[2024-10-03 20:41:47,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 1179648. Throughput: 0: 925.8. Samples: 296312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:41:47,751][00259] Avg episode reward: [(0, '5.643')]
[2024-10-03 20:41:49,028][02385] Updated weights for policy 0, policy_version 290 (0.0022)
[2024-10-03 20:41:52,743][00259] Fps is (10 sec: 3685.6, 60 sec: 3549.7, 300 sec: 3637.8). Total num frames: 1196032. Throughput: 0: 933.5. Samples: 298522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:41:52,748][00259] Avg episode reward: [(0, '5.599')]
[2024-10-03 20:41:57,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 1212416. Throughput: 0: 919.4. Samples: 303136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:41:57,743][00259] Avg episode reward: [(0, '5.804')]
[2024-10-03 20:42:00,476][02385] Updated weights for policy 0, policy_version 300 (0.0050)
[2024-10-03 20:42:02,741][00259] Fps is (10 sec: 4097.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1236992. Throughput: 0: 969.7. Samples: 310238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:02,743][00259] Avg episode reward: [(0, '6.300')]
[2024-10-03 20:42:02,752][02372] Saving new best policy, reward=6.300!
[2024-10-03 20:42:07,741][00259] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1257472. Throughput: 0: 996.6. Samples: 313536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:42:07,744][00259] Avg episode reward: [(0, '6.509')]
[2024-10-03 20:42:07,749][02372] Saving new best policy, reward=6.509!
[2024-10-03 20:42:12,333][02385] Updated weights for policy 0, policy_version 310 (0.0048)
[2024-10-03 20:42:12,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1269760. Throughput: 0: 931.7. Samples: 317534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:12,744][00259] Avg episode reward: [(0, '6.479')]
[2024-10-03 20:42:17,741][00259] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1290240. Throughput: 0: 932.3. Samples: 323792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:42:17,743][00259] Avg episode reward: [(0, '7.169')]
[2024-10-03 20:42:17,751][02372] Saving new best policy, reward=7.169!
[2024-10-03 20:42:21,421][02385] Updated weights for policy 0, policy_version 320 (0.0021)
[2024-10-03 20:42:22,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1314816. Throughput: 0: 962.4. Samples: 327296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:22,745][00259] Avg episode reward: [(0, '7.551')]
[2024-10-03 20:42:22,755][02372] Saving new best policy, reward=7.551!
[2024-10-03 20:42:27,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1331200. Throughput: 0: 953.9. Samples: 332650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:27,747][00259] Avg episode reward: [(0, '7.276')]
[2024-10-03 20:42:32,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1347584. Throughput: 0: 924.3. Samples: 337904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:42:32,746][00259] Avg episode reward: [(0, '7.945')]
[2024-10-03 20:42:32,757][02372] Saving new best policy, reward=7.945!
[2024-10-03 20:42:33,206][02385] Updated weights for policy 0, policy_version 330 (0.0026)
[2024-10-03 20:42:37,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1372160. Throughput: 0: 953.9. Samples: 341446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:42:37,742][00259] Avg episode reward: [(0, '7.835')]
[2024-10-03 20:42:42,508][02385] Updated weights for policy 0, policy_version 340 (0.0019)
[2024-10-03 20:42:42,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3721.1). Total num frames: 1392640. Throughput: 0: 1000.4. Samples: 348154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:42,745][00259] Avg episode reward: [(0, '8.286')]
[2024-10-03 20:42:42,759][02372] Saving new best policy, reward=8.286!
[2024-10-03 20:42:47,743][00259] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 1404928. Throughput: 0: 933.0. Samples: 352224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:42:47,749][00259] Avg episode reward: [(0, '7.955')]
[2024-10-03 20:42:52,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3679.5). Total num frames: 1425408. Throughput: 0: 927.4. Samples: 355268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:42:52,743][00259] Avg episode reward: [(0, '8.276')]
[2024-10-03 20:42:54,046][02385] Updated weights for policy 0, policy_version 350 (0.0037)
[2024-10-03 20:42:57,741][00259] Fps is (10 sec: 4506.6, 60 sec: 3959.4, 300 sec: 3721.1). Total num frames: 1449984. Throughput: 0: 987.9. Samples: 361990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:42:57,748][00259] Avg episode reward: [(0, '8.887')]
[2024-10-03 20:42:57,751][02372] Saving new best policy, reward=8.887!
[2024-10-03 20:43:02,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1462272. Throughput: 0: 955.7. Samples: 366800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:43:02,746][00259] Avg episode reward: [(0, '9.462')]
[2024-10-03 20:43:02,767][02372] Saving new best policy, reward=9.462!
[2024-10-03 20:43:06,180][02385] Updated weights for policy 0, policy_version 360 (0.0034)
[2024-10-03 20:43:07,741][00259] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1478656. Throughput: 0: 923.8. Samples: 368868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:43:07,747][00259] Avg episode reward: [(0, '9.813')]
[2024-10-03 20:43:07,750][02372] Saving new best policy, reward=9.813!
[2024-10-03 20:43:12,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 1503232. Throughput: 0: 951.2. Samples: 375456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-10-03 20:43:12,747][00259] Avg episode reward: [(0, '9.021')]
[2024-10-03 20:43:15,369][02385] Updated weights for policy 0, policy_version 370 (0.0017)
[2024-10-03 20:43:17,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1519616. Throughput: 0: 966.2. Samples: 381382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:43:17,743][00259] Avg episode reward: [(0, '8.826')]
[2024-10-03 20:43:22,744][00259] Fps is (10 sec: 3275.6, 60 sec: 3686.2, 300 sec: 3665.5). Total num frames: 1536000. Throughput: 0: 932.2. Samples: 383400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:43:22,747][00259] Avg episode reward: [(0, '8.735')]
[2024-10-03 20:43:27,343][02385] Updated weights for policy 0, policy_version 380 (0.0057)
[2024-10-03 20:43:27,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1556480. Throughput: 0: 902.7. Samples: 388776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:43:27,748][00259] Avg episode reward: [(0, '9.132')]
[2024-10-03 20:43:32,741][00259] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3693.3). Total num frames: 1576960. Throughput: 0: 966.3. Samples: 395704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:43:32,743][00259] Avg episode reward: [(0, '8.647')]
[2024-10-03 20:43:37,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1593344. Throughput: 0: 956.1. Samples: 398294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:43:37,743][00259] Avg episode reward: [(0, '8.121')]
[2024-10-03 20:43:37,983][02385] Updated weights for policy 0, policy_version 390 (0.0050)
[2024-10-03 20:43:42,741][00259] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1613824. Throughput: 0: 910.1. Samples: 402942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2024-10-03 20:43:42,744][00259] Avg episode reward: [(0, '8.496')]
[2024-10-03 20:43:42,756][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000394_1613824.pth...
[2024-10-03 20:43:42,881][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth
[2024-10-03 20:43:47,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3679.5). Total num frames: 1634304. Throughput: 0: 954.8. Samples: 409766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:43:47,748][00259] Avg episode reward: [(0, '8.762')]
[2024-10-03 20:43:47,982][02385] Updated weights for policy 0, policy_version 400 (0.0020)
[2024-10-03 20:43:52,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 1654784. Throughput: 0: 985.5. Samples: 413216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:43:52,743][00259] Avg episode reward: [(0, '9.716')]
[2024-10-03 20:43:57,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 1667072. Throughput: 0: 929.2. Samples: 417268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:43:57,744][00259] Avg episode reward: [(0, '9.807')]
[2024-10-03 20:44:01,654][02385] Updated weights for policy 0, policy_version 410 (0.0036)
[2024-10-03 20:44:02,741][00259] Fps is (10 sec: 2457.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1679360. Throughput: 0: 881.3. Samples: 421042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:44:02,745][00259] Avg episode reward: [(0, '9.999')]
[2024-10-03 20:44:02,756][02372] Saving new best policy, reward=9.999!
[2024-10-03 20:44:07,743][00259] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3665.5). Total num frames: 1703936. Throughput: 0: 910.0. Samples: 424348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:44:07,752][00259] Avg episode reward: [(0, '11.114')]
[2024-10-03 20:44:07,755][02372] Saving new best policy, reward=11.114!
[2024-10-03 20:44:11,053][02385] Updated weights for policy 0, policy_version 420 (0.0024)
[2024-10-03 20:44:12,741][00259] Fps is (10 sec: 4505.9, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 1724416. Throughput: 0: 940.9. Samples: 431116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:44:12,743][00259] Avg episode reward: [(0, '11.692')]
[2024-10-03 20:44:12,752][02372] Saving new best policy, reward=11.692!
[2024-10-03 20:44:17,746][00259] Fps is (10 sec: 3275.8, 60 sec: 3617.8, 300 sec: 3623.8). Total num frames: 1736704. Throughput: 0: 880.7. Samples: 435338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:44:17,749][00259] Avg episode reward: [(0, '12.553')]
[2024-10-03 20:44:17,751][02372] Saving new best policy, reward=12.553!
[2024-10-03 20:44:22,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 1757184. Throughput: 0: 887.6. Samples: 438236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:44:22,746][00259] Avg episode reward: [(0, '12.833')]
[2024-10-03 20:44:22,842][02372] Saving new best policy, reward=12.833!
[2024-10-03 20:44:22,841][02385] Updated weights for policy 0, policy_version 430 (0.0029)
[2024-10-03 20:44:27,741][00259] Fps is (10 sec: 4508.2, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 1781760. Throughput: 0: 937.4. Samples: 445126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:44:27,743][00259] Avg episode reward: [(0, '11.734')]
[2024-10-03 20:44:32,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1798144. Throughput: 0: 904.0. Samples: 450446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:44:32,743][00259] Avg episode reward: [(0, '11.635')]
[2024-10-03 20:44:33,474][02385] Updated weights for policy 0, policy_version 440 (0.0039)
[2024-10-03 20:44:37,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1814528. Throughput: 0: 876.7. Samples: 452668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:44:37,743][00259] Avg episode reward: [(0, '11.406')]
[2024-10-03 20:44:42,743][00259] Fps is (10 sec: 4094.9, 60 sec: 3754.5, 300 sec: 3693.3). Total num frames: 1839104. Throughput: 0: 937.7. Samples: 459468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:44:42,750][00259] Avg episode reward: [(0, '11.759')]
[2024-10-03 20:44:44,111][02385] Updated weights for policy 0, policy_version 450 (0.0046)
[2024-10-03 20:44:47,741][00259] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1851392. Throughput: 0: 947.1. Samples: 463662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:44:47,744][00259] Avg episode reward: [(0, '12.506')]
[2024-10-03 20:44:52,741][00259] Fps is (10 sec: 2458.3, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1863680. Throughput: 0: 913.3. Samples: 465442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:44:52,743][00259] Avg episode reward: [(0, '13.054')]
[2024-10-03 20:44:52,760][02372] Saving new best policy, reward=13.054!
[2024-10-03 20:44:57,741][00259] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1880064. Throughput: 0: 866.6. Samples: 470114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:44:57,742][00259] Avg episode reward: [(0, '13.602')]
[2024-10-03 20:44:57,805][02385] Updated weights for policy 0, policy_version 460 (0.0024)
[2024-10-03 20:44:57,802][02372] Saving new best policy, reward=13.602!
[2024-10-03 20:45:02,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1904640. Throughput: 0: 930.0. Samples: 477182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:45:02,743][00259] Avg episode reward: [(0, '13.854')]
[2024-10-03 20:45:02,752][02372] Saving new best policy, reward=13.854!
[2024-10-03 20:45:07,039][02385] Updated weights for policy 0, policy_version 470 (0.0033)
[2024-10-03 20:45:07,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3686.6, 300 sec: 3693.3). Total num frames: 1925120. Throughput: 0: 943.2. Samples: 480680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:45:07,747][00259] Avg episode reward: [(0, '12.397')]
[2024-10-03 20:45:12,744][00259] Fps is (10 sec: 3685.1, 60 sec: 3617.9, 300 sec: 3665.5). Total num frames: 1941504. Throughput: 0: 887.5. Samples: 485068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:45:12,747][00259] Avg episode reward: [(0, '12.868')]
[2024-10-03 20:45:17,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3755.0, 300 sec: 3693.3). Total num frames: 1961984. Throughput: 0: 912.9. Samples: 491526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:45:17,743][00259] Avg episode reward: [(0, '12.971')]
[2024-10-03 20:45:18,028][02385] Updated weights for policy 0, policy_version 480 (0.0033)
[2024-10-03 20:45:22,741][00259] Fps is (10 sec: 4507.2, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1986560. Throughput: 0: 943.0. Samples: 495102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:45:22,747][00259] Avg episode reward: [(0, '13.402')]
[2024-10-03 20:45:27,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2002944. Throughput: 0: 912.1. Samples: 500510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:45:27,748][00259] Avg episode reward: [(0, '14.704')]
[2024-10-03 20:45:27,755][02372] Saving new best policy, reward=14.704!
[2024-10-03 20:45:28,919][02385] Updated weights for policy 0, policy_version 490 (0.0030)
[2024-10-03 20:45:32,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2019328. Throughput: 0: 945.0. Samples: 506188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:45:32,743][00259] Avg episode reward: [(0, '13.318')]
[2024-10-03 20:45:37,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2043904. Throughput: 0: 986.4. Samples: 509828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:45:37,743][00259] Avg episode reward: [(0, '12.248')]
[2024-10-03 20:45:37,775][02385] Updated weights for policy 0, policy_version 500 (0.0025)
[2024-10-03 20:45:42,743][00259] Fps is (10 sec: 4504.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2064384. Throughput: 0: 1028.3. Samples: 516392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:45:42,746][00259] Avg episode reward: [(0, '12.097')]
[2024-10-03 20:45:42,760][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000504_2064384.pth...
[2024-10-03 20:45:42,931][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth
[2024-10-03 20:45:47,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 2080768. Throughput: 0: 968.4. Samples: 520762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:45:47,748][00259] Avg episode reward: [(0, '13.026')]
[2024-10-03 20:45:49,160][02385] Updated weights for policy 0, policy_version 510 (0.0025)
[2024-10-03 20:45:52,741][00259] Fps is (10 sec: 4097.1, 60 sec: 4027.7, 300 sec: 3748.9). Total num frames: 2105344. Throughput: 0: 971.6. Samples: 524404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:45:52,747][00259] Avg episode reward: [(0, '14.032')]
[2024-10-03 20:45:57,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3762.8). Total num frames: 2125824. Throughput: 0: 1038.0. Samples: 531772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:45:57,743][00259] Avg episode reward: [(0, '14.200')]
[2024-10-03 20:45:57,755][02385] Updated weights for policy 0, policy_version 520 (0.0021)
[2024-10-03 20:46:02,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2142208. Throughput: 0: 1001.2. Samples: 536580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:46:02,743][00259] Avg episode reward: [(0, '14.970')]
[2024-10-03 20:46:02,753][02372] Saving new best policy, reward=14.970!
[2024-10-03 20:46:07,741][00259] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2162688. Throughput: 0: 982.0. Samples: 539292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:46:07,743][00259] Avg episode reward: [(0, '15.148')]
[2024-10-03 20:46:07,749][02372] Saving new best policy, reward=15.148!
[2024-10-03 20:46:08,812][02385] Updated weights for policy 0, policy_version 530 (0.0019)
[2024-10-03 20:46:12,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3818.3). Total num frames: 2187264. Throughput: 0: 1023.0. Samples: 546544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:46:12,743][00259] Avg episode reward: [(0, '16.196')]
[2024-10-03 20:46:12,753][02372] Saving new best policy, reward=16.196!
[2024-10-03 20:46:17,743][00259] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 3804.4). Total num frames: 2203648. Throughput: 0: 1025.9. Samples: 552354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:46:17,748][00259] Avg episode reward: [(0, '17.549')]
[2024-10-03 20:46:17,751][02372] Saving new best policy, reward=17.549!
[2024-10-03 20:46:19,391][02385] Updated weights for policy 0, policy_version 540 (0.0026)
[2024-10-03 20:46:22,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2220032. Throughput: 0: 992.8. Samples: 554504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:46:22,743][00259] Avg episode reward: [(0, '18.593')]
[2024-10-03 20:46:22,837][02372] Saving new best policy, reward=18.593!
[2024-10-03 20:46:27,741][00259] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3804.4). Total num frames: 2244608. Throughput: 0: 992.7. Samples: 561060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:46:27,747][00259] Avg episode reward: [(0, '19.135')]
[2024-10-03 20:46:27,752][02372] Saving new best policy, reward=19.135!
[2024-10-03 20:46:28,970][02385] Updated weights for policy 0, policy_version 550 (0.0032)
[2024-10-03 20:46:32,741][00259] Fps is (10 sec: 4915.0, 60 sec: 4164.2, 300 sec: 3832.2). Total num frames: 2269184. Throughput: 0: 1048.8. Samples: 567958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:46:32,747][00259] Avg episode reward: [(0, '18.281')]
[2024-10-03 20:46:37,741][00259] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 2281472. Throughput: 0: 1016.1. Samples: 570130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:46:37,747][00259] Avg episode reward: [(0, '18.524')]
[2024-10-03 20:46:40,021][02385] Updated weights for policy 0, policy_version 560 (0.0032)
[2024-10-03 20:46:42,741][00259] Fps is (10 sec: 3686.6, 60 sec: 4027.9, 300 sec: 3818.3). Total num frames: 2306048. Throughput: 0: 980.8. Samples: 575910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:46:42,743][00259] Avg episode reward: [(0, '17.212')]
[2024-10-03 20:46:47,741][00259] Fps is (10 sec: 4915.5, 60 sec: 4164.3, 300 sec: 3846.1). Total num frames: 2330624. Throughput: 0: 1034.3. Samples: 583122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:46:47,743][00259] Avg episode reward: [(0, '15.915')]
[2024-10-03 20:46:48,380][02385] Updated weights for policy 0, policy_version 570 (0.0037)
[2024-10-03 20:46:52,741][00259] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2347008. Throughput: 0: 1037.2. Samples: 585966. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-10-03 20:46:52,747][00259] Avg episode reward: [(0, '16.730')]
[2024-10-03 20:46:57,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2363392. Throughput: 0: 977.6. Samples: 590538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:46:57,749][00259] Avg episode reward: [(0, '16.765')]
[2024-10-03 20:46:59,889][02385] Updated weights for policy 0, policy_version 580 (0.0038)
[2024-10-03 20:47:02,741][00259] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3832.2). Total num frames: 2387968. Throughput: 0: 1011.6. Samples: 597874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:47:02,748][00259] Avg episode reward: [(0, '17.224')]
[2024-10-03 20:47:07,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 2408448. Throughput: 0: 1044.3. Samples: 601498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:47:07,745][00259] Avg episode reward: [(0, '16.715')]
[2024-10-03 20:47:09,315][02385] Updated weights for policy 0, policy_version 590 (0.0023)
[2024-10-03 20:47:12,743][00259] Fps is (10 sec: 3685.4, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 2424832. Throughput: 0: 1005.1. Samples: 606292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:47:12,746][00259] Avg episode reward: [(0, '15.965')]
[2024-10-03 20:47:17,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3846.1). Total num frames: 2449408. Throughput: 0: 995.9. Samples: 612774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:47:17,746][00259] Avg episode reward: [(0, '15.819')]
[2024-10-03 20:47:19,369][02385] Updated weights for policy 0, policy_version 600 (0.0024)
[2024-10-03 20:47:22,741][00259] Fps is (10 sec: 4506.8, 60 sec: 4164.3, 300 sec: 3860.0). Total num frames: 2469888. Throughput: 0: 1026.6. Samples: 616328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:47:22,747][00259] Avg episode reward: [(0, '15.580')]
[2024-10-03 20:47:27,741][00259] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2486272. Throughput: 0: 1027.3. Samples: 622140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:47:27,747][00259] Avg episode reward: [(0, '15.623')]
[2024-10-03 20:47:30,521][02385] Updated weights for policy 0, policy_version 610 (0.0033)
[2024-10-03 20:47:32,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2506752. Throughput: 0: 988.5. Samples: 627606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:47:32,743][00259] Avg episode reward: [(0, '15.879')]
[2024-10-03 20:47:37,741][00259] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 3860.0). Total num frames: 2531328. Throughput: 0: 1005.7. Samples: 631222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:47:37,744][00259] Avg episode reward: [(0, '16.184')]
[2024-10-03 20:47:39,044][02385] Updated weights for policy 0, policy_version 620 (0.0029)
[2024-10-03 20:47:42,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.8). Total num frames: 2551808. Throughput: 0: 1055.0. Samples: 638014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:47:42,743][00259] Avg episode reward: [(0, '17.439')]
[2024-10-03 20:47:42,753][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth...
[2024-10-03 20:47:42,963][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000394_1613824.pth
[2024-10-03 20:47:47,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2564096. Throughput: 0: 989.7. Samples: 642412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:47:47,747][00259] Avg episode reward: [(0, '17.725')]
[2024-10-03 20:47:50,462][02385] Updated weights for policy 0, policy_version 630 (0.0029)
[2024-10-03 20:47:52,741][00259] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3860.0). Total num frames: 2588672. Throughput: 0: 984.0. Samples: 645778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:47:52,746][00259] Avg episode reward: [(0, '18.744')]
[2024-10-03 20:47:57,741][00259] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3901.6). Total num frames: 2613248. Throughput: 0: 1040.0. Samples: 653090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:47:57,743][00259] Avg episode reward: [(0, '19.810')]
[2024-10-03 20:47:57,749][02372] Saving new best policy, reward=19.810!
[2024-10-03 20:47:59,527][02385] Updated weights for policy 0, policy_version 640 (0.0035)
[2024-10-03 20:48:02,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2629632. Throughput: 0: 1005.2. Samples: 658008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:48:02,746][00259] Avg episode reward: [(0, '21.157')]
[2024-10-03 20:48:02,765][02372] Saving new best policy, reward=21.157!
[2024-10-03 20:48:07,741][00259] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2650112. Throughput: 0: 981.9. Samples: 660512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2024-10-03 20:48:07,742][00259] Avg episode reward: [(0, '20.098')]
[2024-10-03 20:48:10,245][02385] Updated weights for policy 0, policy_version 650 (0.0033)
[2024-10-03 20:48:12,741][00259] Fps is (10 sec: 4095.9, 60 sec: 4096.2, 300 sec: 3901.6). Total num frames: 2670592. Throughput: 0: 1015.2. Samples: 667822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:48:12,745][00259] Avg episode reward: [(0, '20.820')]
[2024-10-03 20:48:17,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2691072. Throughput: 0: 1026.0. Samples: 673776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:48:17,745][00259] Avg episode reward: [(0, '20.247')]
[2024-10-03 20:48:21,305][02385] Updated weights for policy 0, policy_version 660 (0.0027)
[2024-10-03 20:48:22,741][00259] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2707456. Throughput: 0: 993.3. Samples: 675922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:48:22,743][00259] Avg episode reward: [(0, '19.093')]
[2024-10-03 20:48:27,743][00259] Fps is (10 sec: 3275.9, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 2723840. Throughput: 0: 960.7. Samples: 681250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:48:27,748][00259] Avg episode reward: [(0, '17.719')]
[2024-10-03 20:48:32,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2736128. Throughput: 0: 946.0. Samples: 684980. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-10-03 20:48:32,746][00259] Avg episode reward: [(0, '17.626')]
[2024-10-03 20:48:35,168][02385] Updated weights for policy 0, policy_version 670 (0.0046)
[2024-10-03 20:48:37,741][00259] Fps is (10 sec: 2458.2, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 2748416. Throughput: 0: 909.8. Samples: 686718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2024-10-03 20:48:37,747][00259] Avg episode reward: [(0, '17.006')]
[2024-10-03 20:48:42,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3832.2). Total num frames: 2764800. Throughput: 0: 836.1. Samples: 690714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:48:42,743][00259] Avg episode reward: [(0, '16.874')]
[2024-10-03 20:48:46,765][02385] Updated weights for policy 0, policy_version 680 (0.0050)
[2024-10-03 20:48:47,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2789376. Throughput: 0: 883.6. Samples: 697770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:48:47,751][00259] Avg episode reward: [(0, '17.966')]
[2024-10-03 20:48:52,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 2809856. Throughput: 0: 905.3. Samples: 701252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2024-10-03 20:48:52,746][00259] Avg episode reward: [(0, '18.221')]
[2024-10-03 20:48:57,441][02385] Updated weights for policy 0, policy_version 690 (0.0037)
[2024-10-03 20:48:57,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3887.7). Total num frames: 2826240. Throughput: 0: 856.8. Samples: 706378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:48:57,747][00259] Avg episode reward: [(0, '18.927')]
[2024-10-03 20:49:02,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3873.9). Total num frames: 2846720. Throughput: 0: 861.1. Samples: 712524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:49:02,743][00259] Avg episode reward: [(0, '18.118')]
[2024-10-03 20:49:06,577][02385] Updated weights for policy 0, policy_version 700 (0.0013)
[2024-10-03 20:49:07,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 2871296. Throughput: 0: 894.6. Samples: 716178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:49:07,743][00259] Avg episode reward: [(0, '20.028')]
[2024-10-03 20:49:12,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3901.7). Total num frames: 2887680. Throughput: 0: 915.0. Samples: 722422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:12,746][00259] Avg episode reward: [(0, '19.023')]
[2024-10-03 20:49:17,659][02385] Updated weights for policy 0, policy_version 710 (0.0044)
[2024-10-03 20:49:17,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3901.6). Total num frames: 2908160. Throughput: 0: 944.8. Samples: 727498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:49:17,748][00259] Avg episode reward: [(0, '19.611')]
[2024-10-03 20:49:22,741][00259] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 2928640. Throughput: 0: 985.8. Samples: 731080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:22,744][00259] Avg episode reward: [(0, '20.720')]
[2024-10-03 20:49:26,277][02385] Updated weights for policy 0, policy_version 720 (0.0048)
[2024-10-03 20:49:27,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3915.5). Total num frames: 2953216. Throughput: 0: 1052.3. Samples: 738068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:27,745][00259] Avg episode reward: [(0, '20.115')]
[2024-10-03 20:49:32,742][00259] Fps is (10 sec: 3686.0, 60 sec: 3822.8, 300 sec: 3901.6). Total num frames: 2965504. Throughput: 0: 996.6. Samples: 742620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:49:32,747][00259] Avg episode reward: [(0, '19.654')]
[2024-10-03 20:49:37,432][02385] Updated weights for policy 0, policy_version 730 (0.0020)
[2024-10-03 20:49:37,741][00259] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3901.7). Total num frames: 2990080. Throughput: 0: 991.0. Samples: 745848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:37,743][00259] Avg episode reward: [(0, '18.735')]
[2024-10-03 20:49:42,741][00259] Fps is (10 sec: 4915.8, 60 sec: 4164.3, 300 sec: 3943.3). Total num frames: 3014656. Throughput: 0: 1040.0. Samples: 753178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:42,747][00259] Avg episode reward: [(0, '18.832')]
[2024-10-03 20:49:42,756][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000736_3014656.pth...
[2024-10-03 20:49:42,886][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000504_2064384.pth
[2024-10-03 20:49:47,485][02385] Updated weights for policy 0, policy_version 740 (0.0034)
[2024-10-03 20:49:47,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3031040. Throughput: 0: 1020.2. Samples: 758434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:47,747][00259] Avg episode reward: [(0, '19.160')]
[2024-10-03 20:49:52,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3047424. Throughput: 0: 989.6. Samples: 760710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:49:52,743][00259] Avg episode reward: [(0, '19.283')]
[2024-10-03 20:49:57,309][02385] Updated weights for policy 0, policy_version 750 (0.0045)
[2024-10-03 20:49:57,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3072000. Throughput: 0: 1007.0. Samples: 767736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:49:57,746][00259] Avg episode reward: [(0, '21.016')]
[2024-10-03 20:50:02,743][00259] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 3957.1). Total num frames: 3092480. Throughput: 0: 1035.1. Samples: 774078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:50:02,748][00259] Avg episode reward: [(0, '22.047')]
[2024-10-03 20:50:02,756][02372] Saving new best policy, reward=22.047!
[2024-10-03 20:50:07,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3108864. Throughput: 0: 1002.6. Samples: 776198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:50:07,743][00259] Avg episode reward: [(0, '22.768')]
[2024-10-03 20:50:07,750][02372] Saving new best policy, reward=22.768!
[2024-10-03 20:50:08,704][02385] Updated weights for policy 0, policy_version 760 (0.0038)
[2024-10-03 20:50:12,741][00259] Fps is (10 sec: 3687.2, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3129344. Throughput: 0: 983.9. Samples: 782344. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:12,744][00259] Avg episode reward: [(0, '22.995')]
[2024-10-03 20:50:12,754][02372] Saving new best policy, reward=22.995!
[2024-10-03 20:50:17,161][02385] Updated weights for policy 0, policy_version 770 (0.0027)
[2024-10-03 20:50:17,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3153920. Throughput: 0: 1043.1. Samples: 789558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:17,746][00259] Avg episode reward: [(0, '22.152')]
[2024-10-03 20:50:22,741][00259] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3170304. Throughput: 0: 1026.1. Samples: 792024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:22,747][00259] Avg episode reward: [(0, '20.975')]
[2024-10-03 20:50:27,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3190784. Throughput: 0: 974.0. Samples: 797006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:50:27,745][00259] Avg episode reward: [(0, '20.770')]
[2024-10-03 20:50:28,511][02385] Updated weights for policy 0, policy_version 780 (0.0028)
[2024-10-03 20:50:32,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 3971.0). Total num frames: 3215360. Throughput: 0: 1019.7. Samples: 804320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:32,749][00259] Avg episode reward: [(0, '21.511')]
[2024-10-03 20:50:37,625][02385] Updated weights for policy 0, policy_version 790 (0.0031)
[2024-10-03 20:50:37,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.1). Total num frames: 3235840. Throughput: 0: 1048.9. Samples: 807912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:37,743][00259] Avg episode reward: [(0, '21.511')]
[2024-10-03 20:50:42,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3248128. Throughput: 0: 991.5. Samples: 812352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:42,744][00259] Avg episode reward: [(0, '21.532')]
[2024-10-03 20:50:47,741][00259] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 3272704. Throughput: 0: 1003.6. Samples: 819236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:47,743][00259] Avg episode reward: [(0, '21.797')]
[2024-10-03 20:50:48,020][02385] Updated weights for policy 0, policy_version 800 (0.0024)
[2024-10-03 20:50:52,741][00259] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 3297280. Throughput: 0: 1037.0. Samples: 822862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:52,745][00259] Avg episode reward: [(0, '23.622')]
[2024-10-03 20:50:52,758][02372] Saving new best policy, reward=23.622!
[2024-10-03 20:50:57,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3309568. Throughput: 0: 1015.3. Samples: 828034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:50:57,742][00259] Avg episode reward: [(0, '23.010')]
[2024-10-03 20:50:59,127][02385] Updated weights for policy 0, policy_version 810 (0.0025)
[2024-10-03 20:51:02,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 3330048. Throughput: 0: 985.6. Samples: 833910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:51:02,745][00259] Avg episode reward: [(0, '21.979')]
[2024-10-03 20:51:07,698][02385] Updated weights for policy 0, policy_version 820 (0.0018)
[2024-10-03 20:51:07,741][00259] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 3358720. Throughput: 0: 1012.7. Samples: 837594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:51:07,746][00259] Avg episode reward: [(0, '22.496')]
[2024-10-03 20:51:12,742][00259] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3375104. Throughput: 0: 1047.3. Samples: 844136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:51:12,748][00259] Avg episode reward: [(0, '23.351')]
[2024-10-03 20:51:17,741][00259] Fps is (10 sec: 3276.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3391488. Throughput: 0: 991.1. Samples: 848920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:51:17,749][00259] Avg episode reward: [(0, '22.656')]
[2024-10-03 20:51:18,895][02385] Updated weights for policy 0, policy_version 830 (0.0036)
[2024-10-03 20:51:22,742][00259] Fps is (10 sec: 4095.9, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3416064. Throughput: 0: 992.0. Samples: 852554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:51:22,745][00259] Avg episode reward: [(0, '22.798')]
[2024-10-03 20:51:27,741][00259] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3436544. Throughput: 0: 1052.3. Samples: 859704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:51:27,745][00259] Avg episode reward: [(0, '23.810')]
[2024-10-03 20:51:27,764][02372] Saving new best policy, reward=23.810!
[2024-10-03 20:51:27,769][02385] Updated weights for policy 0, policy_version 840 (0.0027)
[2024-10-03 20:51:32,741][00259] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3452928. Throughput: 0: 997.5. Samples: 864122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:51:32,743][00259] Avg episode reward: [(0, '23.845')]
[2024-10-03 20:51:32,754][02372] Saving new best policy, reward=23.845!
[2024-10-03 20:51:37,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3473408. Throughput: 0: 981.8. Samples: 867042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:51:37,743][00259] Avg episode reward: [(0, '23.846')]
[2024-10-03 20:51:37,746][02372] Saving new best policy, reward=23.846!
[2024-10-03 20:51:38,853][02385] Updated weights for policy 0, policy_version 850 (0.0028)
[2024-10-03 20:51:42,741][00259] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 3957.1). Total num frames: 3497984. Throughput: 0: 1027.9. Samples: 874288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:51:42,743][00259] Avg episode reward: [(0, '23.481')]
[2024-10-03 20:51:42,758][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth...
[2024-10-03 20:51:42,895][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000623_2551808.pth
[2024-10-03 20:51:47,744][00259] Fps is (10 sec: 4094.5, 60 sec: 4027.5, 300 sec: 3957.1). Total num frames: 3514368. Throughput: 0: 1019.3. Samples: 879780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:51:47,747][00259] Avg episode reward: [(0, '22.549')]
[2024-10-03 20:51:49,568][02385] Updated weights for policy 0, policy_version 860 (0.0025)
[2024-10-03 20:51:52,741][00259] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3534848. Throughput: 0: 987.2. Samples: 882016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:51:52,747][00259] Avg episode reward: [(0, '22.218')]
[2024-10-03 20:51:57,741][00259] Fps is (10 sec: 4507.3, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 3559424. Throughput: 0: 996.4. Samples: 888974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:51:57,742][00259] Avg episode reward: [(0, '22.063')]
[2024-10-03 20:51:58,540][02385] Updated weights for policy 0, policy_version 870 (0.0018)
[2024-10-03 20:52:02,741][00259] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 3579904. Throughput: 0: 1036.0. Samples: 895540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:52:02,747][00259] Avg episode reward: [(0, '21.288')]
[2024-10-03 20:52:07,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3592192. Throughput: 0: 1001.4. Samples: 897614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:52:07,745][00259] Avg episode reward: [(0, '21.577')]
[2024-10-03 20:52:11,022][02385] Updated weights for policy 0, policy_version 880 (0.0042)
[2024-10-03 20:52:12,741][00259] Fps is (10 sec: 2867.2, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 3608576. Throughput: 0: 941.5. Samples: 902072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:52:12,743][00259] Avg episode reward: [(0, '22.049')]
[2024-10-03 20:52:17,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3629056. Throughput: 0: 970.4. Samples: 907792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:52:17,746][00259] Avg episode reward: [(0, '22.538')]
[2024-10-03 20:52:22,599][02385] Updated weights for policy 0, policy_version 890 (0.0041)
[2024-10-03 20:52:22,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 3645440. Throughput: 0: 963.3. Samples: 910390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:52:22,746][00259] Avg episode reward: [(0, '22.041')]
[2024-10-03 20:52:27,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 3661824. Throughput: 0: 906.8. Samples: 915096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:52:27,748][00259] Avg episode reward: [(0, '21.779')]
[2024-10-03 20:52:32,441][02385] Updated weights for policy 0, policy_version 900 (0.0025)
[2024-10-03 20:52:32,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3686400. Throughput: 0: 947.5. Samples: 922412. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:52:32,746][00259] Avg episode reward: [(0, '23.290')]
[2024-10-03 20:52:37,741][00259] Fps is (10 sec: 4505.3, 60 sec: 3891.1, 300 sec: 3915.5). Total num frames: 3706880. Throughput: 0: 979.4. Samples: 926092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:52:37,743][00259] Avg episode reward: [(0, '23.419')]
[2024-10-03 20:52:42,741][00259] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3929.4). Total num frames: 3723264. Throughput: 0: 924.9. Samples: 930594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2024-10-03 20:52:42,744][00259] Avg episode reward: [(0, '23.787')]
[2024-10-03 20:52:43,618][02385] Updated weights for policy 0, policy_version 910 (0.0027)
[2024-10-03 20:52:47,741][00259] Fps is (10 sec: 3686.7, 60 sec: 3823.2, 300 sec: 3915.5). Total num frames: 3743744. Throughput: 0: 924.9. Samples: 937160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:52:47,744][00259] Avg episode reward: [(0, '24.678')]
[2024-10-03 20:52:47,748][02372] Saving new best policy, reward=24.678!
[2024-10-03 20:52:52,097][02385] Updated weights for policy 0, policy_version 920 (0.0032)
[2024-10-03 20:52:52,741][00259] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3768320. Throughput: 0: 958.8. Samples: 940758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:52:52,747][00259] Avg episode reward: [(0, '24.203')]
[2024-10-03 20:52:57,744][00259] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3901.6). Total num frames: 3780608. Throughput: 0: 974.2. Samples: 945914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:52:57,752][00259] Avg episode reward: [(0, '22.718')]
[2024-10-03 20:53:02,741][00259] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3873.8). Total num frames: 3792896. Throughput: 0: 927.1. Samples: 949510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:53:02,747][00259] Avg episode reward: [(0, '22.883')]
[2024-10-03 20:53:06,187][02385] Updated weights for policy 0, policy_version 930 (0.0031)
[2024-10-03 20:53:07,741][00259] Fps is (10 sec: 3278.0, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 3813376. Throughput: 0: 928.2. Samples: 952160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:53:07,743][00259] Avg episode reward: [(0, '20.831')]
[2024-10-03 20:53:12,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3837952. Throughput: 0: 985.6. Samples: 959446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:53:12,743][00259] Avg episode reward: [(0, '19.333')]
[2024-10-03 20:53:15,641][02385] Updated weights for policy 0, policy_version 940 (0.0027)
[2024-10-03 20:53:17,744][00259] Fps is (10 sec: 4094.5, 60 sec: 3754.4, 300 sec: 3887.7). Total num frames: 3854336. Throughput: 0: 937.8. Samples: 964616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:53:17,748][00259] Avg episode reward: [(0, '19.833')]
[2024-10-03 20:53:22,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3874816. Throughput: 0: 910.1. Samples: 967046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:53:22,743][00259] Avg episode reward: [(0, '20.469')]
[2024-10-03 20:53:25,972][02385] Updated weights for policy 0, policy_version 950 (0.0039)
[2024-10-03 20:53:27,741][00259] Fps is (10 sec: 4507.2, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3899392. Throughput: 0: 970.6. Samples: 974272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2024-10-03 20:53:27,742][00259] Avg episode reward: [(0, '20.761')]
[2024-10-03 20:53:32,741][00259] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3919872. Throughput: 0: 963.0. Samples: 980496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:53:32,745][00259] Avg episode reward: [(0, '22.087')]
[2024-10-03 20:53:36,784][02385] Updated weights for policy 0, policy_version 960 (0.0029)
[2024-10-03 20:53:37,741][00259] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 3932160. Throughput: 0: 930.9. Samples: 982648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2024-10-03 20:53:37,742][00259] Avg episode reward: [(0, '22.243')]
[2024-10-03 20:53:42,741][00259] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3956736. Throughput: 0: 962.2. Samples: 989208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2024-10-03 20:53:42,743][00259] Avg episode reward: [(0, '22.977')]
[2024-10-03 20:53:42,750][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000966_3956736.pth...
[2024-10-03 20:53:42,881][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000736_3014656.pth
[2024-10-03 20:53:45,461][02385] Updated weights for policy 0, policy_version 970 (0.0027)
[2024-10-03 20:53:47,741][00259] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3981312. Throughput: 0: 1038.0. Samples: 996220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:53:47,747][00259] Avg episode reward: [(0, '23.937')]
[2024-10-03 20:53:52,741][00259] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 3997696. Throughput: 0: 1027.9. Samples: 998416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2024-10-03 20:53:52,746][00259] Avg episode reward: [(0, '22.894')]
[2024-10-03 20:53:55,137][02372] Stopping Batcher_0...
[2024-10-03 20:53:55,137][02372] Loop batcher_evt_loop terminating...
[2024-10-03 20:53:55,139][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-03 20:53:55,137][00259] Component Batcher_0 stopped!
[2024-10-03 20:53:55,186][02385] Weights refcount: 2 0
[2024-10-03 20:53:55,191][02385] Stopping InferenceWorker_p0-w0...
[2024-10-03 20:53:55,191][02385] Loop inference_proc0-0_evt_loop terminating...
[2024-10-03 20:53:55,191][00259] Component InferenceWorker_p0-w0 stopped!
[2024-10-03 20:53:55,264][02372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth
[2024-10-03 20:53:55,282][02372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-03 20:53:55,549][02387] Stopping RolloutWorker_w2...
[2024-10-03 20:53:55,549][00259] Component RolloutWorker_w2 stopped!
[2024-10-03 20:53:55,562][02387] Loop rollout_proc2_evt_loop terminating...
[2024-10-03 20:53:55,592][02372] Stopping LearnerWorker_p0...
[2024-10-03 20:53:55,592][00259] Component LearnerWorker_p0 stopped!
[2024-10-03 20:53:55,594][02372] Loop learner_proc0_evt_loop terminating...
[2024-10-03 20:53:55,594][00259] Component RolloutWorker_w0 stopped!
[2024-10-03 20:53:55,592][02386] Stopping RolloutWorker_w0...
[2024-10-03 20:53:55,614][00259] Component RolloutWorker_w4 stopped!
[2024-10-03 20:53:55,614][02390] Stopping RolloutWorker_w4...
[2024-10-03 20:53:55,628][02390] Loop rollout_proc4_evt_loop terminating...
[2024-10-03 20:53:55,629][02386] Loop rollout_proc0_evt_loop terminating...
[2024-10-03 20:53:55,635][00259] Component RolloutWorker_w6 stopped!
[2024-10-03 20:53:55,634][02393] Stopping RolloutWorker_w6...
[2024-10-03 20:53:55,638][02393] Loop rollout_proc6_evt_loop terminating...
[2024-10-03 20:53:55,740][02389] Stopping RolloutWorker_w3...
[2024-10-03 20:53:55,739][00259] Component RolloutWorker_w3 stopped!
[2024-10-03 20:53:55,742][02391] Stopping RolloutWorker_w5...
[2024-10-03 20:53:55,742][00259] Component RolloutWorker_w5 stopped!
[2024-10-03 20:53:55,746][02389] Loop rollout_proc3_evt_loop terminating...
[2024-10-03 20:53:55,746][02391] Loop rollout_proc5_evt_loop terminating...
[2024-10-03 20:53:55,757][02388] Stopping RolloutWorker_w1...
[2024-10-03 20:53:55,756][00259] Component RolloutWorker_w1 stopped!
[2024-10-03 20:53:55,763][02388] Loop rollout_proc1_evt_loop terminating...
[2024-10-03 20:53:55,768][02392] Stopping RolloutWorker_w7...
[2024-10-03 20:53:55,768][00259] Component RolloutWorker_w7 stopped!
[2024-10-03 20:53:55,770][00259] Waiting for process learner_proc0 to stop...
[2024-10-03 20:53:55,776][02392] Loop rollout_proc7_evt_loop terminating...
[2024-10-03 20:53:57,128][00259] Waiting for process inference_proc0-0 to join...
[2024-10-03 20:53:57,137][00259] Waiting for process rollout_proc0 to join...
[2024-10-03 20:53:59,066][00259] Waiting for process rollout_proc1 to join...
[2024-10-03 20:53:59,072][00259] Waiting for process rollout_proc2 to join...
[2024-10-03 20:53:59,075][00259] Waiting for process rollout_proc3 to join...
[2024-10-03 20:53:59,079][00259] Waiting for process rollout_proc4 to join...
[2024-10-03 20:53:59,084][00259] Waiting for process rollout_proc5 to join...
[2024-10-03 20:53:59,086][00259] Waiting for process rollout_proc6 to join...
[2024-10-03 20:53:59,090][00259] Waiting for process rollout_proc7 to join...
[2024-10-03 20:53:59,093][00259] Batcher 0 profile tree view:
batching: 26.8302, releasing_batches: 0.0271
[2024-10-03 20:53:59,095][00259] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 388.1341
update_model: 9.4540
weight_update: 0.0022
one_step: 0.0046
handle_policy_step: 622.6190
deserialize: 14.9853, stack: 3.2818, obs_to_device_normalize: 125.2951, forward: 332.3708, send_messages: 30.5770
prepare_outputs: 85.1305
to_cpu: 49.3060
[2024-10-03 20:53:59,098][00259] Learner 0 profile tree view:
misc: 0.0061, prepare_batch: 13.6056
train: 75.9609
epoch_init: 0.0089, minibatch_init: 0.0065, losses_postprocess: 0.5767, kl_divergence: 0.6342, after_optimizer: 33.0287
calculate_losses: 29.1205
losses_init: 0.0035, forward_head: 1.3123, bptt_initial: 20.2529, tail: 1.1990, advantages_returns: 0.2508, losses: 3.8189
bptt: 1.9617
bptt_forward_core: 1.8508
update: 11.9389
clip: 0.8931
[2024-10-03 20:53:59,099][00259] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3985, enqueue_policy_requests: 96.5637, env_step: 825.6904, overhead: 12.8555, complete_rollouts: 7.2581
save_policy_outputs: 21.6612
split_output_tensors: 9.2229
[2024-10-03 20:53:59,100][00259] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3404, enqueue_policy_requests: 97.5402, env_step: 824.7818, overhead: 13.5768, complete_rollouts: 7.0575
save_policy_outputs: 20.6377
split_output_tensors: 8.3189
[2024-10-03 20:53:59,102][00259] Loop Runner_EvtLoop terminating...
[2024-10-03 20:53:59,104][00259] Runner profile tree view:
main_loop: 1090.3209
[2024-10-03 20:53:59,105][00259] Collected {0: 4005888}, FPS: 3674.0
[2024-10-03 20:53:59,139][00259] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-10-03 20:53:59,140][00259] Overriding arg 'num_workers' with value 1 passed from command line
[2024-10-03 20:53:59,141][00259] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-10-03 20:53:59,144][00259] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-10-03 20:53:59,145][00259] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-10-03 20:53:59,147][00259] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-10-03 20:53:59,148][00259] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-10-03 20:53:59,149][00259] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-10-03 20:53:59,150][00259] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-10-03 20:53:59,152][00259] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-10-03 20:53:59,155][00259] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-10-03 20:53:59,156][00259] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-10-03 20:53:59,157][00259] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-10-03 20:53:59,159][00259] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-10-03 20:53:59,160][00259] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-10-03 20:53:59,195][00259] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-10-03 20:53:59,199][00259] RunningMeanStd input shape: (3, 72, 128)
[2024-10-03 20:53:59,201][00259] RunningMeanStd input shape: (1,)
[2024-10-03 20:53:59,218][00259] ConvEncoder: input_channels=3
[2024-10-03 20:53:59,320][00259] Conv encoder output size: 512
[2024-10-03 20:53:59,321][00259] Policy head output size: 512
[2024-10-03 20:53:59,485][00259] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-03 20:54:00,314][00259] Num frames 100...
[2024-10-03 20:54:00,436][00259] Num frames 200...
[2024-10-03 20:54:00,559][00259] Num frames 300...
[2024-10-03 20:54:00,686][00259] Num frames 400...
[2024-10-03 20:54:00,814][00259] Num frames 500...
[2024-10-03 20:54:00,939][00259] Num frames 600...
[2024-10-03 20:54:01,058][00259] Num frames 700...
[2024-10-03 20:54:01,176][00259] Num frames 800...
[2024-10-03 20:54:01,297][00259] Num frames 900...
[2024-10-03 20:54:01,419][00259] Num frames 1000...
[2024-10-03 20:54:01,551][00259] Num frames 1100...
[2024-10-03 20:54:01,673][00259] Num frames 1200...
[2024-10-03 20:54:01,796][00259] Num frames 1300...
[2024-10-03 20:54:01,940][00259] Num frames 1400...
[2024-10-03 20:54:02,063][00259] Num frames 1500...
[2024-10-03 20:54:02,187][00259] Num frames 1600...
[2024-10-03 20:54:02,309][00259] Num frames 1700...
[2024-10-03 20:54:02,430][00259] Num frames 1800...
[2024-10-03 20:54:02,550][00259] Num frames 1900...
[2024-10-03 20:54:02,690][00259] Avg episode rewards: #0: 48.649, true rewards: #0: 19.650
[2024-10-03 20:54:02,693][00259] Avg episode reward: 48.649, avg true_objective: 19.650
[2024-10-03 20:54:02,738][00259] Num frames 2000...
[2024-10-03 20:54:02,867][00259] Num frames 2100...
[2024-10-03 20:54:02,994][00259] Num frames 2200...
[2024-10-03 20:54:03,114][00259] Num frames 2300...
[2024-10-03 20:54:03,237][00259] Num frames 2400...
[2024-10-03 20:54:03,356][00259] Num frames 2500...
[2024-10-03 20:54:03,478][00259] Num frames 2600...
[2024-10-03 20:54:03,601][00259] Num frames 2700...
[2024-10-03 20:54:03,730][00259] Num frames 2800...
[2024-10-03 20:54:03,851][00259] Num frames 2900...
[2024-10-03 20:54:03,985][00259] Num frames 3000...
[2024-10-03 20:54:04,105][00259] Num frames 3100...
[2024-10-03 20:54:04,226][00259] Num frames 3200...
[2024-10-03 20:54:04,348][00259] Num frames 3300...
[2024-10-03 20:54:04,476][00259] Num frames 3400...
[2024-10-03 20:54:04,649][00259] Num frames 3500...
[2024-10-03 20:54:04,824][00259] Num frames 3600...
[2024-10-03 20:54:05,007][00259] Num frames 3700...
[2024-10-03 20:54:05,177][00259] Num frames 3800...
[2024-10-03 20:54:05,342][00259] Num frames 3900...
[2024-10-03 20:54:05,511][00259] Num frames 4000...
[2024-10-03 20:54:05,675][00259] Avg episode rewards: #0: 51.824, true rewards: #0: 20.325
[2024-10-03 20:54:05,677][00259] Avg episode reward: 51.824, avg true_objective: 20.325
[2024-10-03 20:54:05,742][00259] Num frames 4100...
[2024-10-03 20:54:05,910][00259] Num frames 4200...
[2024-10-03 20:54:06,086][00259] Num frames 4300...
[2024-10-03 20:54:06,257][00259] Num frames 4400...
[2024-10-03 20:54:06,427][00259] Num frames 4500...
[2024-10-03 20:54:06,600][00259] Num frames 4600...
[2024-10-03 20:54:06,774][00259] Num frames 4700...
[2024-10-03 20:54:06,950][00259] Num frames 4800...
[2024-10-03 20:54:07,108][00259] Num frames 4900...
[2024-10-03 20:54:07,233][00259] Num frames 5000...
[2024-10-03 20:54:07,354][00259] Num frames 5100...
[2024-10-03 20:54:07,476][00259] Num frames 5200...
[2024-10-03 20:54:07,591][00259] Avg episode rewards: #0: 41.829, true rewards: #0: 17.497
[2024-10-03 20:54:07,593][00259] Avg episode reward: 41.829, avg true_objective: 17.497
[2024-10-03 20:54:07,655][00259] Num frames 5300...
[2024-10-03 20:54:07,776][00259] Num frames 5400...
[2024-10-03 20:54:07,908][00259] Num frames 5500...
[2024-10-03 20:54:08,043][00259] Num frames 5600...
[2024-10-03 20:54:08,163][00259] Num frames 5700...
[2024-10-03 20:54:08,315][00259] Avg episode rewards: #0: 35.205, true rewards: #0: 14.455
[2024-10-03 20:54:08,317][00259] Avg episode reward: 35.205, avg true_objective: 14.455
[2024-10-03 20:54:08,341][00259] Num frames 5800...
[2024-10-03 20:54:08,459][00259] Num frames 5900...
[2024-10-03 20:54:08,580][00259] Num frames 6000...
[2024-10-03 20:54:08,699][00259] Num frames 6100...
[2024-10-03 20:54:08,838][00259] Avg episode rewards: #0: 28.932, true rewards: #0: 12.332
[2024-10-03 20:54:08,840][00259] Avg episode reward: 28.932, avg true_objective: 12.332
[2024-10-03 20:54:08,884][00259] Num frames 6200...
[2024-10-03 20:54:09,011][00259] Num frames 6300...
[2024-10-03 20:54:09,141][00259] Num frames 6400...
[2024-10-03 20:54:09,261][00259] Num frames 6500...
[2024-10-03 20:54:09,381][00259] Num frames 6600...
[2024-10-03 20:54:09,506][00259] Num frames 6700...
[2024-10-03 20:54:09,637][00259] Num frames 6800...
[2024-10-03 20:54:09,766][00259] Num frames 6900...
[2024-10-03 20:54:09,889][00259] Num frames 7000...
[2024-10-03 20:54:10,025][00259] Avg episode rewards: #0: 27.436, true rewards: #0: 11.770
[2024-10-03 20:54:10,026][00259] Avg episode reward: 27.436, avg true_objective: 11.770
[2024-10-03 20:54:10,082][00259] Num frames 7100...
[2024-10-03 20:54:10,202][00259] Num frames 7200...
[2024-10-03 20:54:10,324][00259] Num frames 7300...
[2024-10-03 20:54:10,446][00259] Num frames 7400...
[2024-10-03 20:54:10,566][00259] Num frames 7500...
[2024-10-03 20:54:10,688][00259] Num frames 7600...
[2024-10-03 20:54:10,812][00259] Num frames 7700...
[2024-10-03 20:54:10,949][00259] Num frames 7800...
[2024-10-03 20:54:11,055][00259] Avg episode rewards: #0: 26.060, true rewards: #0: 11.203
[2024-10-03 20:54:11,056][00259] Avg episode reward: 26.060, avg true_objective: 11.203
[2024-10-03 20:54:11,133][00259] Num frames 7900...
[2024-10-03 20:54:11,254][00259] Num frames 8000...
[2024-10-03 20:54:11,374][00259] Num frames 8100...
[2024-10-03 20:54:11,494][00259] Num frames 8200...
[2024-10-03 20:54:11,617][00259] Num frames 8300...
[2024-10-03 20:54:11,737][00259] Num frames 8400...
[2024-10-03 20:54:11,856][00259] Num frames 8500...
[2024-10-03 20:54:11,973][00259] Avg episode rewards: #0: 24.557, true rewards: #0: 10.683
[2024-10-03 20:54:11,974][00259] Avg episode reward: 24.557, avg true_objective: 10.683
[2024-10-03 20:54:12,043][00259] Num frames 8600...
[2024-10-03 20:54:12,170][00259] Num frames 8700...
[2024-10-03 20:54:12,292][00259] Num frames 8800...
[2024-10-03 20:54:12,414][00259] Num frames 8900...
[2024-10-03 20:54:12,535][00259] Num frames 9000...
[2024-10-03 20:54:12,658][00259] Num frames 9100...
[2024-10-03 20:54:12,784][00259] Num frames 9200...
[2024-10-03 20:54:12,913][00259] Num frames 9300...
[2024-10-03 20:54:13,034][00259] Num frames 9400...
[2024-10-03 20:54:13,167][00259] Num frames 9500...
[2024-10-03 20:54:13,288][00259] Num frames 9600...
[2024-10-03 20:54:13,408][00259] Num frames 9700...
[2024-10-03 20:54:13,529][00259] Num frames 9800...
[2024-10-03 20:54:13,651][00259] Num frames 9900...
[2024-10-03 20:54:13,818][00259] Avg episode rewards: #0: 26.326, true rewards: #0: 11.104
[2024-10-03 20:54:13,819][00259] Avg episode reward: 26.326, avg true_objective: 11.104
[2024-10-03 20:54:13,829][00259] Num frames 10000...
[2024-10-03 20:54:13,970][00259] Num frames 10100...
[2024-10-03 20:54:14,090][00259] Num frames 10200...
[2024-10-03 20:54:14,219][00259] Num frames 10300...
[2024-10-03 20:54:14,348][00259] Num frames 10400...
[2024-10-03 20:54:14,477][00259] Num frames 10500...
[2024-10-03 20:54:14,599][00259] Num frames 10600...
[2024-10-03 20:54:14,721][00259] Num frames 10700...
[2024-10-03 20:54:14,843][00259] Num frames 10800...
[2024-10-03 20:54:14,973][00259] Num frames 10900...
[2024-10-03 20:54:15,095][00259] Num frames 11000...
[2024-10-03 20:54:15,234][00259] Num frames 11100...
[2024-10-03 20:54:15,362][00259] Num frames 11200...
[2024-10-03 20:54:15,469][00259] Avg episode rewards: #0: 26.242, true rewards: #0: 11.242
[2024-10-03 20:54:15,470][00259] Avg episode reward: 26.242, avg true_objective: 11.242
[2024-10-03 20:55:22,273][00259] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-10-03 20:55:22,932][00259] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-10-03 20:55:22,937][00259] Overriding arg 'num_workers' with value 1 passed from command line
[2024-10-03 20:55:22,939][00259] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-10-03 20:55:22,948][00259] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-10-03 20:55:22,949][00259] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-10-03 20:55:22,955][00259] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-10-03 20:55:22,956][00259] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-10-03 20:55:22,958][00259] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-10-03 20:55:22,959][00259] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-10-03 20:55:22,968][00259] Adding new argument 'hf_repository'='seangogo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-10-03 20:55:22,969][00259] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-10-03 20:55:22,969][00259] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-10-03 20:55:22,970][00259] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-10-03 20:55:22,971][00259] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-10-03 20:55:22,972][00259] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-10-03 20:55:23,037][00259] RunningMeanStd input shape: (3, 72, 128)
[2024-10-03 20:55:23,045][00259] RunningMeanStd input shape: (1,)
[2024-10-03 20:55:23,078][00259] ConvEncoder: input_channels=3
[2024-10-03 20:55:23,150][00259] Conv encoder output size: 512
[2024-10-03 20:55:23,152][00259] Policy head output size: 512
[2024-10-03 20:55:23,191][00259] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-10-03 20:55:23,898][00259] Num frames 100...
[2024-10-03 20:55:24,068][00259] Num frames 200...
[2024-10-03 20:55:24,290][00259] Num frames 300...
[2024-10-03 20:55:24,493][00259] Num frames 400...
[2024-10-03 20:55:24,677][00259] Num frames 500...
[2024-10-03 20:55:24,893][00259] Num frames 600...
[2024-10-03 20:55:25,122][00259] Num frames 700...
[2024-10-03 20:55:25,318][00259] Num frames 800...
[2024-10-03 20:55:25,530][00259] Num frames 900...
[2024-10-03 20:55:25,736][00259] Num frames 1000...
[2024-10-03 20:55:25,943][00259] Num frames 1100...
[2024-10-03 20:55:26,142][00259] Num frames 1200...
[2024-10-03 20:55:26,320][00259] Avg episode rewards: #0: 32.780, true rewards: #0: 12.780
[2024-10-03 20:55:26,322][00259] Avg episode reward: 32.780, avg true_objective: 12.780
[2024-10-03 20:55:26,357][00259] Num frames 1300...
[2024-10-03 20:55:26,510][00259] Num frames 1400...
[2024-10-03 20:55:26,676][00259] Num frames 1500...
[2024-10-03 20:55:26,841][00259] Num frames 1600...
[2024-10-03 20:55:27,011][00259] Num frames 1700...
[2024-10-03 20:55:27,166][00259] Num frames 1800...
[2024-10-03 20:55:27,318][00259] Num frames 1900...
[2024-10-03 20:55:27,475][00259] Num frames 2000...
[2024-10-03 20:55:27,632][00259] Num frames 2100...
[2024-10-03 20:55:27,816][00259] Avg episode rewards: #0: 25.385, true rewards: #0: 10.885
[2024-10-03 20:55:27,818][00259] Avg episode reward: 25.385, avg true_objective: 10.885
[2024-10-03 20:55:27,858][00259] Num frames 2200...
[2024-10-03 20:55:28,024][00259] Num frames 2300...
[2024-10-03 20:55:28,201][00259] Num frames 2400...
[2024-10-03 20:55:28,368][00259] Num frames 2500...
[2024-10-03 20:55:28,531][00259] Num frames 2600...
[2024-10-03 20:55:28,703][00259] Num frames 2700...
[2024-10-03 20:55:28,899][00259] Num frames 2800...
[2024-10-03 20:55:29,139][00259] Num frames 2900...
[2024-10-03 20:55:29,294][00259] Avg episode rewards: #0: 22.150, true rewards: #0: 9.817
[2024-10-03 20:55:29,296][00259] Avg episode reward: 22.150, avg true_objective: 9.817
[2024-10-03 20:55:29,400][00259] Num frames 3000...
[2024-10-03 20:55:29,581][00259] Num frames 3100...
[2024-10-03 20:55:29,766][00259] Num frames 3200...
[2024-10-03 20:55:29,944][00259] Num frames 3300...
[2024-10-03 20:55:30,142][00259] Num frames 3400...
[2024-10-03 20:55:30,321][00259] Num frames 3500...
[2024-10-03 20:55:30,503][00259] Num frames 3600...
[2024-10-03 20:55:30,664][00259] Num frames 3700...
[2024-10-03 20:55:30,744][00259] Avg episode rewards: #0: 20.533, true rewards: #0: 9.282
[2024-10-03 20:55:30,746][00259] Avg episode reward: 20.533, avg true_objective: 9.282
[2024-10-03 20:55:30,885][00259] Num frames 3800...
[2024-10-03 20:55:31,064][00259] Num frames 3900...
[2024-10-03 20:55:31,235][00259] Num frames 4000...
[2024-10-03 20:55:31,408][00259] Num frames 4100...
[2024-10-03 20:55:31,575][00259] Num frames 4200...
[2024-10-03 20:55:31,700][00259] Num frames 4300...
[2024-10-03 20:55:31,837][00259] Num frames 4400...
[2024-10-03 20:55:31,963][00259] Num frames 4500...
[2024-10-03 20:55:32,084][00259] Num frames 4600...
[2024-10-03 20:55:32,213][00259] Num frames 4700...
[2024-10-03 20:55:32,314][00259] Avg episode rewards: #0: 21.074, true rewards: #0: 9.474
[2024-10-03 20:55:32,316][00259] Avg episode reward: 21.074, avg true_objective: 9.474
[2024-10-03 20:55:32,393][00259] Num frames 4800...
[2024-10-03 20:55:32,514][00259] Num frames 4900...
[2024-10-03 20:55:32,633][00259] Num frames 5000...
[2024-10-03 20:55:32,754][00259] Num frames 5100...
[2024-10-03 20:55:32,875][00259] Num frames 5200...
[2024-10-03 20:55:33,029][00259] Avg episode rewards: #0: 19.288, true rewards: #0: 8.788
[2024-10-03 20:55:33,030][00259] Avg episode reward: 19.288, avg true_objective: 8.788
[2024-10-03 20:55:33,065][00259] Num frames 5300...
[2024-10-03 20:55:33,191][00259] Num frames 5400...
[2024-10-03 20:55:33,310][00259] Num frames 5500...
[2024-10-03 20:55:33,430][00259] Num frames 5600...
[2024-10-03 20:55:33,550][00259] Num frames 5700...
[2024-10-03 20:55:33,670][00259] Num frames 5800...
[2024-10-03 20:55:33,792][00259] Num frames 5900...
[2024-10-03 20:55:33,918][00259] Num frames 6000...
[2024-10-03 20:55:34,047][00259] Num frames 6100...
[2024-10-03 20:55:34,175][00259] Num frames 6200...
[2024-10-03 20:55:34,337][00259] Avg episode rewards: #0: 19.697, true rewards: #0: 8.983
[2024-10-03 20:55:34,338][00259] Avg episode reward: 19.697, avg true_objective: 8.983
[2024-10-03 20:55:34,356][00259] Num frames 6300...
[2024-10-03 20:55:34,474][00259] Num frames 6400...
[2024-10-03 20:55:34,593][00259] Num frames 6500...
[2024-10-03 20:55:34,716][00259] Num frames 6600...
[2024-10-03 20:55:34,843][00259] Num frames 6700...
[2024-10-03 20:55:34,974][00259] Num frames 6800...
[2024-10-03 20:55:35,095][00259] Num frames 6900...
[2024-10-03 20:55:35,224][00259] Num frames 7000...
[2024-10-03 20:55:35,342][00259] Num frames 7100...
[2024-10-03 20:55:35,464][00259] Num frames 7200...
[2024-10-03 20:55:35,586][00259] Num frames 7300...
[2024-10-03 20:55:35,704][00259] Num frames 7400...
[2024-10-03 20:55:35,827][00259] Num frames 7500...
[2024-10-03 20:55:35,956][00259] Num frames 7600...
[2024-10-03 20:55:36,079][00259] Num frames 7700...
[2024-10-03 20:55:36,247][00259] Num frames 7800...
[2024-10-03 20:55:36,414][00259] Num frames 7900...
[2024-10-03 20:55:36,579][00259] Num frames 8000...
[2024-10-03 20:55:36,743][00259] Num frames 8100...
[2024-10-03 20:55:36,923][00259] Avg episode rewards: #0: 23.717, true rewards: #0: 10.217
[2024-10-03 20:55:36,925][00259] Avg episode reward: 23.717, avg true_objective: 10.217
[2024-10-03 20:55:36,970][00259] Num frames 8200...
[2024-10-03 20:55:37,128][00259] Num frames 8300...
[2024-10-03 20:55:37,299][00259] Num frames 8400...
[2024-10-03 20:55:37,468][00259] Num frames 8500...
[2024-10-03 20:55:37,648][00259] Num frames 8600...
[2024-10-03 20:55:37,821][00259] Num frames 8700...
[2024-10-03 20:55:37,996][00259] Num frames 8800...
[2024-10-03 20:55:38,172][00259] Num frames 8900...
[2024-10-03 20:55:38,347][00259] Num frames 9000...
[2024-10-03 20:55:38,520][00259] Num frames 9100...
[2024-10-03 20:55:38,652][00259] Num frames 9200...
[2024-10-03 20:55:38,770][00259] Num frames 9300...
[2024-10-03 20:55:38,900][00259] Num frames 9400...
[2024-10-03 20:55:39,061][00259] Avg episode rewards: #0: 24.429, true rewards: #0: 10.540
[2024-10-03 20:55:39,062][00259] Avg episode reward: 24.429, avg true_objective: 10.540
[2024-10-03 20:55:39,081][00259] Num frames 9500...
[2024-10-03 20:55:39,199][00259] Num frames 9600...
[2024-10-03 20:55:39,320][00259] Num frames 9700...
[2024-10-03 20:55:39,449][00259] Num frames 9800...
[2024-10-03 20:55:39,567][00259] Num frames 9900...
[2024-10-03 20:55:39,704][00259] Num frames 10000...
[2024-10-03 20:55:39,827][00259] Num frames 10100...
[2024-10-03 20:55:39,953][00259] Num frames 10200...
[2024-10-03 20:55:40,075][00259] Num frames 10300...
[2024-10-03 20:55:40,195][00259] Num frames 10400...
[2024-10-03 20:55:40,328][00259] Num frames 10500...
[2024-10-03 20:55:40,460][00259] Num frames 10600...
[2024-10-03 20:55:40,582][00259] Num frames 10700...
[2024-10-03 20:55:40,710][00259] Num frames 10800...
[2024-10-03 20:55:40,833][00259] Num frames 10900...
[2024-10-03 20:55:40,966][00259] Num frames 11000...
[2024-10-03 20:55:41,087][00259] Num frames 11100...
[2024-10-03 20:55:41,210][00259] Num frames 11200...
[2024-10-03 20:55:41,333][00259] Num frames 11300...
[2024-10-03 20:55:41,463][00259] Num frames 11400...
[2024-10-03 20:55:41,585][00259] Num frames 11500...
[2024-10-03 20:55:41,744][00259] Avg episode rewards: #0: 28.286, true rewards: #0: 11.586
[2024-10-03 20:55:41,745][00259] Avg episode reward: 28.286, avg true_objective: 11.586
[2024-10-03 20:56:50,776][00259] Replay video saved to /content/train_dir/default_experiment/replay.mp4!