acblue's picture
Upload folder using huggingface_hub
044e8fe verified
[2024-12-13 21:43:00,055][00832] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-12-13 21:43:00,058][00832] Rollout worker 0 uses device cpu
[2024-12-13 21:43:00,061][00832] Rollout worker 1 uses device cpu
[2024-12-13 21:43:00,062][00832] Rollout worker 2 uses device cpu
[2024-12-13 21:43:00,063][00832] Rollout worker 3 uses device cpu
[2024-12-13 21:43:00,064][00832] Rollout worker 4 uses device cpu
[2024-12-13 21:43:00,065][00832] Rollout worker 5 uses device cpu
[2024-12-13 21:43:00,066][00832] Rollout worker 6 uses device cpu
[2024-12-13 21:43:00,067][00832] Rollout worker 7 uses device cpu
[2024-12-13 21:43:00,214][00832] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 21:43:00,216][00832] InferenceWorker_p0-w0: min num requests: 2
[2024-12-13 21:43:00,248][00832] Starting all processes...
[2024-12-13 21:43:00,250][00832] Starting process learner_proc0
[2024-12-13 21:43:00,298][00832] Starting all processes...
[2024-12-13 21:43:00,307][00832] Starting process inference_proc0-0
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc0
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc1
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc2
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc3
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc4
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc5
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc6
[2024-12-13 21:43:00,311][00832] Starting process rollout_proc7
[2024-12-13 21:43:10,949][11173] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 21:43:10,949][11173] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-12-13 21:43:11,081][11173] Num visible devices: 1
[2024-12-13 21:43:11,116][11173] Starting seed is not provided
[2024-12-13 21:43:11,116][11173] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 21:43:11,116][11173] Initializing actor-critic model on device cuda:0
[2024-12-13 21:43:11,117][11173] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 21:43:11,118][11173] RunningMeanStd input shape: (1,)
[2024-12-13 21:43:11,225][11173] ConvEncoder: input_channels=3
[2024-12-13 21:43:11,572][11187] Worker 0 uses CPU cores [0]
[2024-12-13 21:43:11,597][11192] Worker 4 uses CPU cores [0]
[2024-12-13 21:43:11,610][11191] Worker 5 uses CPU cores [1]
[2024-12-13 21:43:11,625][11188] Worker 1 uses CPU cores [1]
[2024-12-13 21:43:11,643][11189] Worker 2 uses CPU cores [0]
[2024-12-13 21:43:11,695][11186] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 21:43:11,696][11186] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-12-13 21:43:11,709][11190] Worker 3 uses CPU cores [1]
[2024-12-13 21:43:11,743][11193] Worker 6 uses CPU cores [0]
[2024-12-13 21:43:11,761][11194] Worker 7 uses CPU cores [1]
[2024-12-13 21:43:11,760][11186] Num visible devices: 1
[2024-12-13 21:43:11,802][11173] Conv encoder output size: 512
[2024-12-13 21:43:11,802][11173] Policy head output size: 512
[2024-12-13 21:43:11,818][11173] Created Actor Critic model with architecture:
[2024-12-13 21:43:11,818][11173] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-12-13 21:43:16,156][11173] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-12-13 21:43:16,157][11173] No checkpoints found
[2024-12-13 21:43:16,157][11173] Did not load from checkpoint, starting from scratch!
[2024-12-13 21:43:16,158][11173] Initialized policy 0 weights for model version 0
[2024-12-13 21:43:16,160][11173] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 21:43:16,168][11173] LearnerWorker_p0 finished initialization!
[2024-12-13 21:43:16,381][11186] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 21:43:16,382][11186] RunningMeanStd input shape: (1,)
[2024-12-13 21:43:16,396][11186] ConvEncoder: input_channels=3
[2024-12-13 21:43:16,495][11186] Conv encoder output size: 512
[2024-12-13 21:43:16,496][11186] Policy head output size: 512
[2024-12-13 21:43:18,020][00832] Inference worker 0-0 is ready!
[2024-12-13 21:43:18,022][00832] All inference workers are ready! Signal rollout workers to start!
[2024-12-13 21:43:18,153][11189] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,181][11187] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,183][11190] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,188][11192] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,197][11193] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,208][11191] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,209][11188] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:18,212][11194] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 21:43:19,201][11189] Decorrelating experience for 0 frames...
[2024-12-13 21:43:19,201][11194] Decorrelating experience for 0 frames...
[2024-12-13 21:43:19,575][11189] Decorrelating experience for 32 frames...
[2024-12-13 21:43:19,575][11194] Decorrelating experience for 32 frames...
[2024-12-13 21:43:20,067][11193] Decorrelating experience for 0 frames...
[2024-12-13 21:43:20,112][11189] Decorrelating experience for 64 frames...
[2024-12-13 21:43:20,207][00832] Heartbeat connected on Batcher_0
[2024-12-13 21:43:20,210][00832] Heartbeat connected on LearnerWorker_p0
[2024-12-13 21:43:20,241][00832] Heartbeat connected on InferenceWorker_p0-w0
[2024-12-13 21:43:20,487][11194] Decorrelating experience for 64 frames...
[2024-12-13 21:43:20,797][11193] Decorrelating experience for 32 frames...
[2024-12-13 21:43:20,932][11194] Decorrelating experience for 96 frames...
[2024-12-13 21:43:20,938][11189] Decorrelating experience for 96 frames...
[2024-12-13 21:43:21,025][00832] Heartbeat connected on RolloutWorker_w7
[2024-12-13 21:43:21,058][00832] Heartbeat connected on RolloutWorker_w2
[2024-12-13 21:43:21,117][00832] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-13 21:43:21,363][11193] Decorrelating experience for 64 frames...
[2024-12-13 21:43:21,804][11193] Decorrelating experience for 96 frames...
[2024-12-13 21:43:21,884][00832] Heartbeat connected on RolloutWorker_w6
[2024-12-13 21:43:26,117][00832] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 19.2. Samples: 96. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-13 21:43:26,119][00832] Avg episode reward: [(0, '3.209')]
[2024-12-13 21:43:26,751][11173] Signal inference workers to stop experience collection...
[2024-12-13 21:43:26,758][11186] InferenceWorker_p0-w0: stopping experience collection
[2024-12-13 21:43:28,429][11173] Signal inference workers to resume experience collection...
[2024-12-13 21:43:28,431][11186] InferenceWorker_p0-w0: resuming experience collection
[2024-12-13 21:43:31,118][00832] Fps is (10 sec: 1638.2, 60 sec: 1638.2, 300 sec: 1638.2). Total num frames: 16384. Throughput: 0: 317.2. Samples: 3172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0)
[2024-12-13 21:43:31,121][00832] Avg episode reward: [(0, '3.663')]
[2024-12-13 21:43:36,117][00832] Fps is (10 sec: 3276.8, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 32768. Throughput: 0: 583.3. Samples: 8750. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:43:36,120][00832] Avg episode reward: [(0, '4.043')]
[2024-12-13 21:43:38,535][11186] Updated weights for policy 0, policy_version 10 (0.0022)
[2024-12-13 21:43:41,117][00832] Fps is (10 sec: 3277.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 542.9. Samples: 10858. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:43:41,120][00832] Avg episode reward: [(0, '4.325')]
[2024-12-13 21:43:46,117][00832] Fps is (10 sec: 3686.4, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 69632. Throughput: 0: 677.8. Samples: 16946. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:43:46,119][00832] Avg episode reward: [(0, '4.376')]
[2024-12-13 21:43:48,569][11186] Updated weights for policy 0, policy_version 20 (0.0013)
[2024-12-13 21:43:51,117][00832] Fps is (10 sec: 3686.3, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 86016. Throughput: 0: 746.1. Samples: 22384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:43:51,125][00832] Avg episode reward: [(0, '4.442')]
[2024-12-13 21:43:56,122][00832] Fps is (10 sec: 3275.1, 60 sec: 2925.3, 300 sec: 2925.3). Total num frames: 102400. Throughput: 0: 701.7. Samples: 24564. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:43:56,126][00832] Avg episode reward: [(0, '4.432')]
[2024-12-13 21:43:56,153][11173] Saving new best policy, reward=4.432!
[2024-12-13 21:44:00,229][11186] Updated weights for policy 0, policy_version 30 (0.0012)
[2024-12-13 21:44:01,117][00832] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 771.3. Samples: 30854. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:01,119][00832] Avg episode reward: [(0, '4.177')]
[2024-12-13 21:44:06,117][00832] Fps is (10 sec: 4098.1, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 805.5. Samples: 36248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:06,119][00832] Avg episode reward: [(0, '4.109')]
[2024-12-13 21:44:11,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 855.4. Samples: 38588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:11,123][00832] Avg episode reward: [(0, '4.480')]
[2024-12-13 21:44:11,126][11173] Saving new best policy, reward=4.480!
[2024-12-13 21:44:11,478][11186] Updated weights for policy 0, policy_version 40 (0.0014)
[2024-12-13 21:44:16,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 926.4. Samples: 44858. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:44:16,121][00832] Avg episode reward: [(0, '4.704')]
[2024-12-13 21:44:16,131][11173] Saving new best policy, reward=4.704!
[2024-12-13 21:44:21,120][00832] Fps is (10 sec: 3275.8, 60 sec: 3208.4, 300 sec: 3208.4). Total num frames: 192512. Throughput: 0: 893.5. Samples: 48962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:21,128][00832] Avg episode reward: [(0, '4.750')]
[2024-12-13 21:44:21,129][11173] Saving new best policy, reward=4.750!
[2024-12-13 21:44:24,833][11186] Updated weights for policy 0, policy_version 50 (0.0015)
[2024-12-13 21:44:26,117][00832] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 208896. Throughput: 0: 885.8. Samples: 50720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:26,120][00832] Avg episode reward: [(0, '4.521')]
[2024-12-13 21:44:31,117][00832] Fps is (10 sec: 3687.6, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 229376. Throughput: 0: 887.7. Samples: 56892. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:44:31,119][00832] Avg episode reward: [(0, '4.491')]
[2024-12-13 21:44:34,753][11186] Updated weights for policy 0, policy_version 60 (0.0012)
[2024-12-13 21:44:36,120][00832] Fps is (10 sec: 4094.8, 60 sec: 3617.9, 300 sec: 3331.3). Total num frames: 249856. Throughput: 0: 894.1. Samples: 62622. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:44:36,122][00832] Avg episode reward: [(0, '4.496')]
[2024-12-13 21:44:41,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 889.6. Samples: 64592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:41,121][00832] Avg episode reward: [(0, '4.570')]
[2024-12-13 21:44:46,024][11186] Updated weights for policy 0, policy_version 70 (0.0013)
[2024-12-13 21:44:46,117][00832] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3373.2). Total num frames: 286720. Throughput: 0: 888.8. Samples: 70852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:46,123][00832] Avg episode reward: [(0, '4.574')]
[2024-12-13 21:44:51,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 891.2. Samples: 76350. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:51,122][00832] Avg episode reward: [(0, '4.475')]
[2024-12-13 21:44:56,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 888.4. Samples: 78566. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:44:56,119][00832] Avg episode reward: [(0, '4.394')]
[2024-12-13 21:44:56,129][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth...
[2024-12-13 21:44:57,444][11186] Updated weights for policy 0, policy_version 80 (0.0015)
[2024-12-13 21:45:01,119][00832] Fps is (10 sec: 3685.5, 60 sec: 3618.0, 300 sec: 3399.6). Total num frames: 339968. Throughput: 0: 888.0. Samples: 84818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:01,126][00832] Avg episode reward: [(0, '4.355')]
[2024-12-13 21:45:06,120][00832] Fps is (10 sec: 3685.2, 60 sec: 3549.7, 300 sec: 3393.7). Total num frames: 356352. Throughput: 0: 917.0. Samples: 90226. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:06,123][00832] Avg episode reward: [(0, '4.463')]
[2024-12-13 21:45:08,842][11186] Updated weights for policy 0, policy_version 90 (0.0017)
[2024-12-13 21:45:11,117][00832] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 929.4. Samples: 92542. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:11,119][00832] Avg episode reward: [(0, '4.487')]
[2024-12-13 21:45:16,117][00832] Fps is (10 sec: 4097.3, 60 sec: 3618.1, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 933.5. Samples: 98898. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:16,123][00832] Avg episode reward: [(0, '4.651')]
[2024-12-13 21:45:18,766][11186] Updated weights for policy 0, policy_version 100 (0.0013)
[2024-12-13 21:45:21,118][00832] Fps is (10 sec: 3686.0, 60 sec: 3686.5, 300 sec: 3447.4). Total num frames: 413696. Throughput: 0: 922.0. Samples: 104112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:21,124][00832] Avg episode reward: [(0, '4.655')]
[2024-12-13 21:45:26,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 934.4. Samples: 106642. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:26,121][00832] Avg episode reward: [(0, '4.407')]
[2024-12-13 21:45:30,028][11186] Updated weights for policy 0, policy_version 110 (0.0012)
[2024-12-13 21:45:31,117][00832] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3497.4). Total num frames: 454656. Throughput: 0: 934.5. Samples: 112904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:31,121][00832] Avg episode reward: [(0, '4.384')]
[2024-12-13 21:45:36,120][00832] Fps is (10 sec: 3685.4, 60 sec: 3686.4, 300 sec: 3489.1). Total num frames: 471040. Throughput: 0: 925.1. Samples: 117980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:36,122][00832] Avg episode reward: [(0, '4.481')]
[2024-12-13 21:45:41,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 934.7. Samples: 120628. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:45:41,122][00832] Avg episode reward: [(0, '4.596')]
[2024-12-13 21:45:41,418][11186] Updated weights for policy 0, policy_version 120 (0.0012)
[2024-12-13 21:45:46,117][00832] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 936.0. Samples: 126936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:45:46,119][00832] Avg episode reward: [(0, '4.636')]
[2024-12-13 21:45:51,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 925.2. Samples: 131858. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:45:51,121][00832] Avg episode reward: [(0, '4.631')]
[2024-12-13 21:45:52,854][11186] Updated weights for policy 0, policy_version 130 (0.0012)
[2024-12-13 21:45:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 934.0. Samples: 134572. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:45:56,119][00832] Avg episode reward: [(0, '4.350')]
[2024-12-13 21:46:01,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 933.9. Samples: 140922. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:01,121][00832] Avg episode reward: [(0, '4.304')]
[2024-12-13 21:46:02,808][11186] Updated weights for policy 0, policy_version 140 (0.0013)
[2024-12-13 21:46:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 925.1. Samples: 145740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:06,119][00832] Avg episode reward: [(0, '4.338')]
[2024-12-13 21:46:11,117][00832] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 932.9. Samples: 148622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:11,121][00832] Avg episode reward: [(0, '4.399')]
[2024-12-13 21:46:13,968][11186] Updated weights for policy 0, policy_version 150 (0.0012)
[2024-12-13 21:46:16,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3557.7). Total num frames: 622592. Throughput: 0: 931.2. Samples: 154810. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:46:16,122][00832] Avg episode reward: [(0, '4.713')]
[2024-12-13 21:46:21,117][00832] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 923.1. Samples: 159518. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:46:21,122][00832] Avg episode reward: [(0, '4.864')]
[2024-12-13 21:46:21,124][11173] Saving new best policy, reward=4.864!
[2024-12-13 21:46:25,615][11186] Updated weights for policy 0, policy_version 160 (0.0013)
[2024-12-13 21:46:26,119][00832] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3542.4). Total num frames: 655360. Throughput: 0: 930.3. Samples: 162492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:26,122][00832] Avg episode reward: [(0, '4.771')]
[2024-12-13 21:46:31,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3557.1). Total num frames: 675840. Throughput: 0: 931.3. Samples: 168844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:31,125][00832] Avg episode reward: [(0, '4.647')]
[2024-12-13 21:46:36,118][00832] Fps is (10 sec: 3687.0, 60 sec: 3686.5, 300 sec: 3549.9). Total num frames: 692224. Throughput: 0: 923.6. Samples: 173420. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:46:36,122][00832] Avg episode reward: [(0, '4.717')]
[2024-12-13 21:46:36,932][11186] Updated weights for policy 0, policy_version 170 (0.0012)
[2024-12-13 21:46:41,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 932.3. Samples: 176524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:41,119][00832] Avg episode reward: [(0, '4.551')]
[2024-12-13 21:46:46,117][00832] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 932.8. Samples: 182896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:46,126][00832] Avg episode reward: [(0, '4.399')]
[2024-12-13 21:46:46,790][11186] Updated weights for policy 0, policy_version 180 (0.0017)
[2024-12-13 21:46:51,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 924.8. Samples: 187354. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:46:51,119][00832] Avg episode reward: [(0, '4.509')]
[2024-12-13 21:46:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 930.9. Samples: 190512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:46:56,120][00832] Avg episode reward: [(0, '4.593')]
[2024-12-13 21:46:56,129][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth...
[2024-12-13 21:46:58,213][11186] Updated weights for policy 0, policy_version 190 (0.0012)
[2024-12-13 21:47:01,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 932.7. Samples: 196780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:01,121][00832] Avg episode reward: [(0, '4.607')]
[2024-12-13 21:47:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 928.5. Samples: 201300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:06,119][00832] Avg episode reward: [(0, '4.564')]
[2024-12-13 21:47:09,413][11186] Updated weights for policy 0, policy_version 200 (0.0016)
[2024-12-13 21:47:11,118][00832] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 933.2. Samples: 204484. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:47:11,121][00832] Avg episode reward: [(0, '4.595')]
[2024-12-13 21:47:16,117][00832] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 922.8. Samples: 210370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:16,122][00832] Avg episode reward: [(0, '4.653')]
[2024-12-13 21:47:21,117][00832] Fps is (10 sec: 2867.6, 60 sec: 3618.1, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 899.5. Samples: 213896. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:47:21,129][00832] Avg episode reward: [(0, '4.519')]
[2024-12-13 21:47:22,759][11186] Updated weights for policy 0, policy_version 210 (0.0011)
[2024-12-13 21:47:26,117][00832] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3561.0). Total num frames: 872448. Throughput: 0: 890.4. Samples: 216594. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:47:26,120][00832] Avg episode reward: [(0, '4.706')]
[2024-12-13 21:47:31,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 891.0. Samples: 222992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:31,123][00832] Avg episode reward: [(0, '4.785')]
[2024-12-13 21:47:32,606][11186] Updated weights for policy 0, policy_version 220 (0.0012)
[2024-12-13 21:47:36,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 898.9. Samples: 227804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:36,124][00832] Avg episode reward: [(0, '4.802')]
[2024-12-13 21:47:41,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3576.1). Total num frames: 929792. Throughput: 0: 894.1. Samples: 230746. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:47:41,119][00832] Avg episode reward: [(0, '5.083')]
[2024-12-13 21:47:41,122][11173] Saving new best policy, reward=5.083!
[2024-12-13 21:47:43,727][11186] Updated weights for policy 0, policy_version 230 (0.0012)
[2024-12-13 21:47:46,117][00832] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 895.6. Samples: 237082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:47:46,122][00832] Avg episode reward: [(0, '5.070')]
[2024-12-13 21:47:51,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3565.0). Total num frames: 962560. Throughput: 0: 895.4. Samples: 241592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:47:51,122][00832] Avg episode reward: [(0, '5.048')]
[2024-12-13 21:47:55,188][11186] Updated weights for policy 0, policy_version 240 (0.0016)
[2024-12-13 21:47:56,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 893.8. Samples: 244704. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:47:56,119][00832] Avg episode reward: [(0, '4.827')]
[2024-12-13 21:48:01,117][00832] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3598.6). Total num frames: 1007616. Throughput: 0: 904.7. Samples: 251080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:01,119][00832] Avg episode reward: [(0, '5.167')]
[2024-12-13 21:48:01,127][11173] Saving new best policy, reward=5.167!
[2024-12-13 21:48:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3578.6). Total num frames: 1019904. Throughput: 0: 927.2. Samples: 255622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:06,119][00832] Avg episode reward: [(0, '5.189')]
[2024-12-13 21:48:06,134][11173] Saving new best policy, reward=5.189!
[2024-12-13 21:48:06,673][11186] Updated weights for policy 0, policy_version 250 (0.0013)
[2024-12-13 21:48:11,117][00832] Fps is (10 sec: 3276.7, 60 sec: 3618.2, 300 sec: 3587.5). Total num frames: 1040384. Throughput: 0: 935.3. Samples: 258684. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:48:11,126][00832] Avg episode reward: [(0, '4.931')]
[2024-12-13 21:48:16,118][00832] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 935.2. Samples: 265078. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:16,124][00832] Avg episode reward: [(0, '5.018')]
[2024-12-13 21:48:16,485][11186] Updated weights for policy 0, policy_version 260 (0.0012)
[2024-12-13 21:48:21,117][00832] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 928.0. Samples: 269566. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:21,123][00832] Avg episode reward: [(0, '5.367')]
[2024-12-13 21:48:21,125][11173] Saving new best policy, reward=5.367!
[2024-12-13 21:48:26,117][00832] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1097728. Throughput: 0: 931.4. Samples: 272660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:26,119][00832] Avg episode reward: [(0, '5.821')]
[2024-12-13 21:48:26,133][11173] Saving new best policy, reward=5.821!
[2024-12-13 21:48:27,909][11186] Updated weights for policy 0, policy_version 270 (0.0012)
[2024-12-13 21:48:31,118][00832] Fps is (10 sec: 4095.4, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 1118208. Throughput: 0: 931.5. Samples: 279002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:31,123][00832] Avg episode reward: [(0, '6.148')]
[2024-12-13 21:48:31,126][11173] Saving new best policy, reward=6.148!
[2024-12-13 21:48:36,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1130496. Throughput: 0: 930.8. Samples: 283476. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:48:36,125][00832] Avg episode reward: [(0, '6.204')]
[2024-12-13 21:48:36,131][11173] Saving new best policy, reward=6.204!
[2024-12-13 21:48:39,389][11186] Updated weights for policy 0, policy_version 280 (0.0014)
[2024-12-13 21:48:41,117][00832] Fps is (10 sec: 3277.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1150976. Throughput: 0: 931.2. Samples: 286610. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:48:41,119][00832] Avg episode reward: [(0, '6.397')]
[2024-12-13 21:48:41,184][11173] Saving new best policy, reward=6.397!
[2024-12-13 21:48:46,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1171456. Throughput: 0: 929.7. Samples: 292918. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:46,123][00832] Avg episode reward: [(0, '6.361')]
[2024-12-13 21:48:50,777][11186] Updated weights for policy 0, policy_version 290 (0.0016)
[2024-12-13 21:48:51,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1187840. Throughput: 0: 928.0. Samples: 297384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:51,119][00832] Avg episode reward: [(0, '6.421')]
[2024-12-13 21:48:51,128][11173] Saving new best policy, reward=6.421!
[2024-12-13 21:48:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1208320. Throughput: 0: 928.7. Samples: 300476. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:48:56,122][00832] Avg episode reward: [(0, '6.102')]
[2024-12-13 21:48:56,132][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000295_1208320.pth...
[2024-12-13 21:48:56,219][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth
[2024-12-13 21:49:01,074][11186] Updated weights for policy 0, policy_version 300 (0.0012)
[2024-12-13 21:49:01,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1228800. Throughput: 0: 925.2. Samples: 306712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:01,121][00832] Avg episode reward: [(0, '6.057')]
[2024-12-13 21:49:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1241088. Throughput: 0: 927.5. Samples: 311302. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:06,126][00832] Avg episode reward: [(0, '5.983')]
[2024-12-13 21:49:11,119][00832] Fps is (10 sec: 3685.7, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 1265664. Throughput: 0: 928.3. Samples: 314434. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:11,122][00832] Avg episode reward: [(0, '6.309')]
[2024-12-13 21:49:12,028][11186] Updated weights for policy 0, policy_version 310 (0.0011)
[2024-12-13 21:49:16,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3693.4). Total num frames: 1282048. Throughput: 0: 924.8. Samples: 320618. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:16,126][00832] Avg episode reward: [(0, '6.586')]
[2024-12-13 21:49:16,140][11173] Saving new best policy, reward=6.586!
[2024-12-13 21:49:21,117][00832] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1298432. Throughput: 0: 927.7. Samples: 325222. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:21,126][00832] Avg episode reward: [(0, '6.990')]
[2024-12-13 21:49:21,128][11173] Saving new best policy, reward=6.990!
[2024-12-13 21:49:23,588][11186] Updated weights for policy 0, policy_version 320 (0.0011)
[2024-12-13 21:49:26,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1318912. Throughput: 0: 927.7. Samples: 328358. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:26,121][00832] Avg episode reward: [(0, '6.336')]
[2024-12-13 21:49:31,119][00832] Fps is (10 sec: 4095.2, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 1339392. Throughput: 0: 921.9. Samples: 334406. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:31,121][00832] Avg episode reward: [(0, '5.822')]
[2024-12-13 21:49:34,859][11186] Updated weights for policy 0, policy_version 330 (0.0016)
[2024-12-13 21:49:36,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1355776. Throughput: 0: 931.3. Samples: 339294. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:36,121][00832] Avg episode reward: [(0, '5.981')]
[2024-12-13 21:49:41,117][00832] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1376256. Throughput: 0: 932.4. Samples: 342436. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:41,124][00832] Avg episode reward: [(0, '6.242')]
[2024-12-13 21:49:45,090][11186] Updated weights for policy 0, policy_version 340 (0.0013)
[2024-12-13 21:49:46,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1392640. Throughput: 0: 924.4. Samples: 348310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:46,120][00832] Avg episode reward: [(0, '6.273')]
[2024-12-13 21:49:51,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1409024. Throughput: 0: 927.8. Samples: 353054. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:49:51,120][00832] Avg episode reward: [(0, '6.224')]
[2024-12-13 21:49:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 1429504. Throughput: 0: 928.4. Samples: 356212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:49:56,119][00832] Avg episode reward: [(0, '6.345')]
[2024-12-13 21:49:56,367][11186] Updated weights for policy 0, policy_version 350 (0.0015)
[2024-12-13 21:50:01,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.3). Total num frames: 1449984. Throughput: 0: 921.3. Samples: 362078. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:01,119][00832] Avg episode reward: [(0, '6.660')]
[2024-12-13 21:50:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1466368. Throughput: 0: 930.5. Samples: 367094. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:50:06,119][00832] Avg episode reward: [(0, '7.015')]
[2024-12-13 21:50:06,131][11173] Saving new best policy, reward=7.015!
[2024-12-13 21:50:07,701][11186] Updated weights for policy 0, policy_version 360 (0.0012)
[2024-12-13 21:50:11,118][00832] Fps is (10 sec: 3686.0, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 1486848. Throughput: 0: 930.1. Samples: 370212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:11,126][00832] Avg episode reward: [(0, '6.483')]
[2024-12-13 21:50:16,118][00832] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 1499136. Throughput: 0: 891.7. Samples: 374534. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:16,123][00832] Avg episode reward: [(0, '6.750')]
[2024-12-13 21:50:21,071][11186] Updated weights for policy 0, policy_version 370 (0.0014)
[2024-12-13 21:50:21,117][00832] Fps is (10 sec: 2867.5, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1515520. Throughput: 0: 884.2. Samples: 379084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:21,121][00832] Avg episode reward: [(0, '7.013')]
[2024-12-13 21:50:26,117][00832] Fps is (10 sec: 3686.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1536000. Throughput: 0: 883.9. Samples: 382210. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:26,121][00832] Avg episode reward: [(0, '7.349')]
[2024-12-13 21:50:26,133][11173] Saving new best policy, reward=7.349!
[2024-12-13 21:50:31,118][00832] Fps is (10 sec: 3685.9, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1552384. Throughput: 0: 885.8. Samples: 388172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:31,122][00832] Avg episode reward: [(0, '6.982')]
[2024-12-13 21:50:31,857][11186] Updated weights for policy 0, policy_version 380 (0.0013)
[2024-12-13 21:50:36,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1568768. Throughput: 0: 888.3. Samples: 393026. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:36,119][00832] Avg episode reward: [(0, '6.664')]
[2024-12-13 21:50:41,117][00832] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1589248. Throughput: 0: 889.1. Samples: 396220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:41,121][00832] Avg episode reward: [(0, '6.756')]
[2024-12-13 21:50:42,149][11186] Updated weights for policy 0, policy_version 390 (0.0020)
[2024-12-13 21:50:46,120][00832] Fps is (10 sec: 4094.7, 60 sec: 3617.9, 300 sec: 3679.4). Total num frames: 1609728. Throughput: 0: 888.4. Samples: 402058. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:46,126][00832] Avg episode reward: [(0, '6.879')]
[2024-12-13 21:50:51,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1626112. Throughput: 0: 885.9. Samples: 406958. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:50:51,119][00832] Avg episode reward: [(0, '7.366')]
[2024-12-13 21:50:51,124][11173] Saving new best policy, reward=7.366!
[2024-12-13 21:50:53,762][11186] Updated weights for policy 0, policy_version 400 (0.0012)
[2024-12-13 21:50:56,117][00832] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1646592. Throughput: 0: 886.0. Samples: 410080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:50:56,120][00832] Avg episode reward: [(0, '7.225')]
[2024-12-13 21:50:56,134][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000402_1646592.pth...
[2024-12-13 21:50:56,240][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth
[2024-12-13 21:51:01,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1662976. Throughput: 0: 919.0. Samples: 415890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:51:01,120][00832] Avg episode reward: [(0, '7.087')]
[2024-12-13 21:51:05,174][11186] Updated weights for policy 0, policy_version 410 (0.0014)
[2024-12-13 21:51:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 1679360. Throughput: 0: 929.4. Samples: 420906. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:06,119][00832] Avg episode reward: [(0, '7.382')]
[2024-12-13 21:51:06,182][11173] Saving new best policy, reward=7.382!
[2024-12-13 21:51:11,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3665.6). Total num frames: 1703936. Throughput: 0: 929.5. Samples: 424038. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:11,119][00832] Avg episode reward: [(0, '7.663')]
[2024-12-13 21:51:11,122][11173] Saving new best policy, reward=7.663!
[2024-12-13 21:51:15,908][11186] Updated weights for policy 0, policy_version 420 (0.0012)
[2024-12-13 21:51:16,118][00832] Fps is (10 sec: 4095.6, 60 sec: 3686.4, 300 sec: 3679.4). Total num frames: 1720320. Throughput: 0: 923.3. Samples: 429720. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:16,127][00832] Avg episode reward: [(0, '8.208')]
[2024-12-13 21:51:16,136][11173] Saving new best policy, reward=8.208!
[2024-12-13 21:51:21,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1736704. Throughput: 0: 925.0. Samples: 434652. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:21,119][00832] Avg episode reward: [(0, '8.748')]
[2024-12-13 21:51:21,121][11173] Saving new best policy, reward=8.748!
[2024-12-13 21:51:26,117][00832] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1757184. Throughput: 0: 923.2. Samples: 437762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:51:26,122][00832] Avg episode reward: [(0, '8.723')]
[2024-12-13 21:51:26,537][11186] Updated weights for policy 0, policy_version 430 (0.0012)
[2024-12-13 21:51:31,118][00832] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1773568. Throughput: 0: 920.4. Samples: 443474. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:31,124][00832] Avg episode reward: [(0, '8.593')]
[2024-12-13 21:51:36,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1794048. Throughput: 0: 926.4. Samples: 448646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:51:36,124][00832] Avg episode reward: [(0, '8.178')]
[2024-12-13 21:51:37,927][11186] Updated weights for policy 0, policy_version 440 (0.0021)
[2024-12-13 21:51:41,117][00832] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1814528. Throughput: 0: 927.4. Samples: 451814. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:51:41,121][00832] Avg episode reward: [(0, '8.342')]
[2024-12-13 21:51:46,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3679.5). Total num frames: 1830912. Throughput: 0: 922.0. Samples: 457380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:51:46,121][00832] Avg episode reward: [(0, '7.756')]
[2024-12-13 21:51:49,486][11186] Updated weights for policy 0, policy_version 450 (0.0017)
[2024-12-13 21:51:51,118][00832] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 1847296. Throughput: 0: 927.3. Samples: 462634. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:51:51,126][00832] Avg episode reward: [(0, '7.680')]
[2024-12-13 21:51:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1867776. Throughput: 0: 929.2. Samples: 465852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:51:56,119][00832] Avg episode reward: [(0, '7.927')]
[2024-12-13 21:52:00,110][11186] Updated weights for policy 0, policy_version 460 (0.0012)
[2024-12-13 21:52:01,117][00832] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1884160. Throughput: 0: 922.8. Samples: 471246. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:52:01,120][00832] Avg episode reward: [(0, '7.865')]
[2024-12-13 21:52:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 1904640. Throughput: 0: 935.4. Samples: 476746. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:52:06,122][00832] Avg episode reward: [(0, '8.057')]
[2024-12-13 21:52:10,543][11186] Updated weights for policy 0, policy_version 470 (0.0012)
[2024-12-13 21:52:11,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1925120. Throughput: 0: 936.8. Samples: 479918. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:52:11,119][00832] Avg episode reward: [(0, '8.714')]
[2024-12-13 21:52:16,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 1941504. Throughput: 0: 925.2. Samples: 485106. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:52:16,129][00832] Avg episode reward: [(0, '9.025')]
[2024-12-13 21:52:16,140][11173] Saving new best policy, reward=9.025!
[2024-12-13 21:52:21,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1961984. Throughput: 0: 934.4. Samples: 490694. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:52:21,123][00832] Avg episode reward: [(0, '9.745')]
[2024-12-13 21:52:21,126][11173] Saving new best policy, reward=9.745!
[2024-12-13 21:52:22,107][11186] Updated weights for policy 0, policy_version 480 (0.0014)
[2024-12-13 21:52:26,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 1982464. Throughput: 0: 932.6. Samples: 493780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:52:26,119][00832] Avg episode reward: [(0, '9.116')]
[2024-12-13 21:52:31,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 1994752. Throughput: 0: 919.5. Samples: 498756. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:52:31,121][00832] Avg episode reward: [(0, '9.042')]
[2024-12-13 21:52:33,502][11186] Updated weights for policy 0, policy_version 490 (0.0017)
[2024-12-13 21:52:36,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2015232. Throughput: 0: 934.3. Samples: 504676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:52:36,120][00832] Avg episode reward: [(0, '8.096')]
[2024-12-13 21:52:41,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2035712. Throughput: 0: 933.1. Samples: 507842. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:52:41,123][00832] Avg episode reward: [(0, '8.923')]
[2024-12-13 21:52:44,326][11186] Updated weights for policy 0, policy_version 500 (0.0012)
[2024-12-13 21:52:46,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2052096. Throughput: 0: 920.1. Samples: 512652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:52:46,119][00832] Avg episode reward: [(0, '9.018')]
[2024-12-13 21:52:51,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2072576. Throughput: 0: 930.6. Samples: 518624. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:52:51,120][00832] Avg episode reward: [(0, '9.210')]
[2024-12-13 21:52:54,711][11186] Updated weights for policy 0, policy_version 510 (0.0015)
[2024-12-13 21:52:56,117][00832] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2093056. Throughput: 0: 930.3. Samples: 521782. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:52:56,123][00832] Avg episode reward: [(0, '9.069')]
[2024-12-13 21:52:56,139][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth...
[2024-12-13 21:52:56,139][00832] Components not started: RolloutWorker_w0, RolloutWorker_w1, RolloutWorker_w3, RolloutWorker_w4, RolloutWorker_w5, wait_time=600.0 seconds
[2024-12-13 21:52:56,314][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000295_1208320.pth
[2024-12-13 21:53:01,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2105344. Throughput: 0: 916.6. Samples: 526354. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:01,122][00832] Avg episode reward: [(0, '8.946')]
[2024-12-13 21:53:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2125824. Throughput: 0: 912.4. Samples: 531750. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:06,126][00832] Avg episode reward: [(0, '9.052')]
[2024-12-13 21:53:07,525][11186] Updated weights for policy 0, policy_version 520 (0.0020)
[2024-12-13 21:53:11,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2138112. Throughput: 0: 888.0. Samples: 533740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:11,123][00832] Avg episode reward: [(0, '9.819')]
[2024-12-13 21:53:11,125][11173] Saving new best policy, reward=9.819!
[2024-12-13 21:53:16,117][00832] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2154496. Throughput: 0: 879.1. Samples: 538314. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:16,122][00832] Avg episode reward: [(0, '9.970')]
[2024-12-13 21:53:16,138][11173] Saving new best policy, reward=9.970!
[2024-12-13 21:53:19,446][11186] Updated weights for policy 0, policy_version 530 (0.0019)
[2024-12-13 21:53:21,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2174976. Throughput: 0: 885.3. Samples: 544516. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:53:21,125][00832] Avg episode reward: [(0, '10.232')]
[2024-12-13 21:53:21,127][11173] Saving new best policy, reward=10.232!
[2024-12-13 21:53:26,123][00832] Fps is (10 sec: 4093.4, 60 sec: 3549.5, 300 sec: 3651.6). Total num frames: 2195456. Throughput: 0: 883.3. Samples: 547598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:26,126][00832] Avg episode reward: [(0, '9.948')]
[2024-12-13 21:53:30,893][11186] Updated weights for policy 0, policy_version 540 (0.0012)
[2024-12-13 21:53:31,118][00832] Fps is (10 sec: 3686.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2211840. Throughput: 0: 877.7. Samples: 552148. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:31,123][00832] Avg episode reward: [(0, '9.612')]
[2024-12-13 21:53:36,117][00832] Fps is (10 sec: 3688.7, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2232320. Throughput: 0: 888.0. Samples: 558584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:36,122][00832] Avg episode reward: [(0, '10.443')]
[2024-12-13 21:53:36,130][11173] Saving new best policy, reward=10.443!
[2024-12-13 21:53:41,119][00832] Fps is (10 sec: 3685.9, 60 sec: 3549.7, 300 sec: 3651.7). Total num frames: 2248704. Throughput: 0: 886.6. Samples: 561682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:41,127][00832] Avg episode reward: [(0, '11.082')]
[2024-12-13 21:53:41,170][11173] Saving new best policy, reward=11.082!
[2024-12-13 21:53:41,166][11186] Updated weights for policy 0, policy_version 550 (0.0012)
[2024-12-13 21:53:46,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2265088. Throughput: 0: 883.6. Samples: 566114. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:46,119][00832] Avg episode reward: [(0, '11.522')]
[2024-12-13 21:53:46,138][11173] Saving new best policy, reward=11.522!
[2024-12-13 21:53:51,117][00832] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2285568. Throughput: 0: 901.0. Samples: 572296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:53:51,119][00832] Avg episode reward: [(0, '11.555')]
[2024-12-13 21:53:51,122][11173] Saving new best policy, reward=11.555!
[2024-12-13 21:53:52,415][11186] Updated weights for policy 0, policy_version 560 (0.0015)
[2024-12-13 21:53:56,117][00832] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2306048. Throughput: 0: 926.4. Samples: 575430. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:53:56,120][00832] Avg episode reward: [(0, '11.450')]
[2024-12-13 21:54:01,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2322432. Throughput: 0: 925.3. Samples: 579954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:01,122][00832] Avg episode reward: [(0, '10.628')]
[2024-12-13 21:54:03,705][11186] Updated weights for policy 0, policy_version 570 (0.0011)
[2024-12-13 21:54:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2342912. Throughput: 0: 928.3. Samples: 586290. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:54:06,122][00832] Avg episode reward: [(0, '10.767')]
[2024-12-13 21:54:11,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2359296. Throughput: 0: 926.8. Samples: 589298. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:11,124][00832] Avg episode reward: [(0, '10.498')]
[2024-12-13 21:54:15,325][11186] Updated weights for policy 0, policy_version 580 (0.0012)
[2024-12-13 21:54:16,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2375680. Throughput: 0: 926.9. Samples: 593856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:16,120][00832] Avg episode reward: [(0, '10.627')]
[2024-12-13 21:54:21,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2396160. Throughput: 0: 920.6. Samples: 600012. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:54:21,122][00832] Avg episode reward: [(0, '10.695')]
[2024-12-13 21:54:26,119][00832] Fps is (10 sec: 3686.0, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2412544. Throughput: 0: 916.1. Samples: 602904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:26,123][00832] Avg episode reward: [(0, '10.636')]
[2024-12-13 21:54:26,144][11186] Updated weights for policy 0, policy_version 590 (0.0012)
[2024-12-13 21:54:31,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 2433024. Throughput: 0: 923.5. Samples: 607670. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:31,124][00832] Avg episode reward: [(0, '10.868')]
[2024-12-13 21:54:36,117][00832] Fps is (10 sec: 4096.6, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2453504. Throughput: 0: 927.5. Samples: 614034. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:54:36,124][00832] Avg episode reward: [(0, '11.023')]
[2024-12-13 21:54:36,503][11186] Updated weights for policy 0, policy_version 600 (0.0012)
[2024-12-13 21:54:41,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 2469888. Throughput: 0: 917.6. Samples: 616724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:41,119][00832] Avg episode reward: [(0, '10.144')]
[2024-12-13 21:54:46,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2486272. Throughput: 0: 929.1. Samples: 621762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:54:46,119][00832] Avg episode reward: [(0, '10.370')]
[2024-12-13 21:54:47,925][11186] Updated weights for policy 0, policy_version 610 (0.0013)
[2024-12-13 21:54:51,117][00832] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2510848. Throughput: 0: 927.7. Samples: 628036. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:54:51,122][00832] Avg episode reward: [(0, '10.905')]
[2024-12-13 21:54:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2523136. Throughput: 0: 916.5. Samples: 630540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:54:56,121][00832] Avg episode reward: [(0, '11.444')]
[2024-12-13 21:54:56,149][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000617_2527232.pth...
[2024-12-13 21:54:56,255][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000402_1646592.pth
[2024-12-13 21:54:59,416][11186] Updated weights for policy 0, policy_version 620 (0.0013)
[2024-12-13 21:55:01,119][00832] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 2543616. Throughput: 0: 932.0. Samples: 635796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:01,123][00832] Avg episode reward: [(0, '11.943')]
[2024-12-13 21:55:01,136][11173] Saving new best policy, reward=11.943!
[2024-12-13 21:55:06,119][00832] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3651.7). Total num frames: 2564096. Throughput: 0: 935.4. Samples: 642106. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:06,122][00832] Avg episode reward: [(0, '13.202')]
[2024-12-13 21:55:06,130][11173] Saving new best policy, reward=13.202!
[2024-12-13 21:55:10,416][11186] Updated weights for policy 0, policy_version 630 (0.0012)
[2024-12-13 21:55:11,120][00832] Fps is (10 sec: 3686.1, 60 sec: 3686.2, 300 sec: 3665.5). Total num frames: 2580480. Throughput: 0: 921.0. Samples: 644350. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:55:11,123][00832] Avg episode reward: [(0, '13.904')]
[2024-12-13 21:55:11,124][11173] Saving new best policy, reward=13.904!
[2024-12-13 21:55:16,117][00832] Fps is (10 sec: 3687.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2600960. Throughput: 0: 934.2. Samples: 649708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:16,121][00832] Avg episode reward: [(0, '15.274')]
[2024-12-13 21:55:16,131][11173] Saving new best policy, reward=15.274!
[2024-12-13 21:55:20,623][11186] Updated weights for policy 0, policy_version 640 (0.0011)
[2024-12-13 21:55:21,118][00832] Fps is (10 sec: 4096.8, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 2621440. Throughput: 0: 932.1. Samples: 655978. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:55:21,122][00832] Avg episode reward: [(0, '15.414')]
[2024-12-13 21:55:21,126][11173] Saving new best policy, reward=15.414!
[2024-12-13 21:55:26,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 2633728. Throughput: 0: 918.0. Samples: 658036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:55:26,125][00832] Avg episode reward: [(0, '14.927')]
[2024-12-13 21:55:31,117][00832] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2658304. Throughput: 0: 931.7. Samples: 663688. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:55:31,119][00832] Avg episode reward: [(0, '14.126')]
[2024-12-13 21:55:32,011][11186] Updated weights for policy 0, policy_version 650 (0.0012)
[2024-12-13 21:55:36,117][00832] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3693.3). Total num frames: 2678784. Throughput: 0: 931.9. Samples: 669970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:36,122][00832] Avg episode reward: [(0, '14.258')]
[2024-12-13 21:55:41,117][00832] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2691072. Throughput: 0: 920.4. Samples: 671956. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:55:41,119][00832] Avg episode reward: [(0, '14.874')]
[2024-12-13 21:55:43,297][11186] Updated weights for policy 0, policy_version 660 (0.0012)
[2024-12-13 21:55:46,119][00832] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3679.4). Total num frames: 2711552. Throughput: 0: 935.9. Samples: 677912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:46,125][00832] Avg episode reward: [(0, '17.349')]
[2024-12-13 21:55:46,179][11173] Saving new best policy, reward=17.349!
[2024-12-13 21:55:51,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2732032. Throughput: 0: 925.3. Samples: 683742. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:51,122][00832] Avg episode reward: [(0, '18.704')]
[2024-12-13 21:55:51,126][11173] Saving new best policy, reward=18.704!
[2024-12-13 21:55:55,009][11186] Updated weights for policy 0, policy_version 670 (0.0013)
[2024-12-13 21:55:56,117][00832] Fps is (10 sec: 3687.1, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2748416. Throughput: 0: 917.4. Samples: 685632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:55:56,123][00832] Avg episode reward: [(0, '19.356')]
[2024-12-13 21:55:56,131][11173] Saving new best policy, reward=19.356!
[2024-12-13 21:56:01,117][00832] Fps is (10 sec: 2867.2, 60 sec: 3618.3, 300 sec: 3665.6). Total num frames: 2760704. Throughput: 0: 908.6. Samples: 690596. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:56:01,122][00832] Avg episode reward: [(0, '19.829')]
[2024-12-13 21:56:01,127][11173] Saving new best policy, reward=19.829!
[2024-12-13 21:56:06,118][00832] Fps is (10 sec: 3276.4, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 2781184. Throughput: 0: 880.0. Samples: 695580. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:56:06,120][00832] Avg episode reward: [(0, '19.146')]
[2024-12-13 21:56:07,316][11186] Updated weights for policy 0, policy_version 680 (0.0030)
[2024-12-13 21:56:11,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3651.7). Total num frames: 2797568. Throughput: 0: 878.7. Samples: 697578. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:56:11,125][00832] Avg episode reward: [(0, '18.378')]
[2024-12-13 21:56:16,119][00832] Fps is (10 sec: 3686.0, 60 sec: 3618.0, 300 sec: 3665.5). Total num frames: 2818048. Throughput: 0: 891.7. Samples: 703818. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:16,121][00832] Avg episode reward: [(0, '17.284')]
[2024-12-13 21:56:17,755][11186] Updated weights for policy 0, policy_version 690 (0.0012)
[2024-12-13 21:56:21,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2834432. Throughput: 0: 879.8. Samples: 709560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:21,122][00832] Avg episode reward: [(0, '17.206')]
[2024-12-13 21:56:26,117][00832] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2850816. Throughput: 0: 878.5. Samples: 711488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:26,121][00832] Avg episode reward: [(0, '17.357')]
[2024-12-13 21:56:29,186][11186] Updated weights for policy 0, policy_version 700 (0.0022)
[2024-12-13 21:56:31,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2875392. Throughput: 0: 888.7. Samples: 717902. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:31,119][00832] Avg episode reward: [(0, '17.517')]
[2024-12-13 21:56:36,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2891776. Throughput: 0: 883.9. Samples: 723518. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:36,123][00832] Avg episode reward: [(0, '18.612')]
[2024-12-13 21:56:40,458][11186] Updated weights for policy 0, policy_version 710 (0.0012)
[2024-12-13 21:56:41,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2908160. Throughput: 0: 889.8. Samples: 725672. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:56:41,125][00832] Avg episode reward: [(0, '19.058')]
[2024-12-13 21:56:46,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3665.6). Total num frames: 2928640. Throughput: 0: 921.3. Samples: 732054. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:56:46,119][00832] Avg episode reward: [(0, '19.054')]
[2024-12-13 21:56:51,117][00832] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 2945024. Throughput: 0: 930.8. Samples: 737464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:51,120][00832] Avg episode reward: [(0, '18.949')]
[2024-12-13 21:56:51,127][11186] Updated weights for policy 0, policy_version 720 (0.0012)
[2024-12-13 21:56:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2965504. Throughput: 0: 936.6. Samples: 739726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:56:56,121][00832] Avg episode reward: [(0, '19.252')]
[2024-12-13 21:56:56,134][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000724_2965504.pth...
[2024-12-13 21:56:56,237][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000511_2093056.pth
[2024-12-13 21:57:01,117][00832] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2985984. Throughput: 0: 937.5. Samples: 746004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:01,121][00832] Avg episode reward: [(0, '18.907')]
[2024-12-13 21:57:01,666][11186] Updated weights for policy 0, policy_version 730 (0.0015)
[2024-12-13 21:57:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 3002368. Throughput: 0: 928.5. Samples: 751342. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:57:06,119][00832] Avg episode reward: [(0, '19.202')]
[2024-12-13 21:57:11,122][00832] Fps is (10 sec: 3684.5, 60 sec: 3754.4, 300 sec: 3665.5). Total num frames: 3022848. Throughput: 0: 940.1. Samples: 753798. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:11,124][00832] Avg episode reward: [(0, '18.806')]
[2024-12-13 21:57:12,979][11186] Updated weights for policy 0, policy_version 740 (0.0011)
[2024-12-13 21:57:16,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3665.6). Total num frames: 3043328. Throughput: 0: 938.4. Samples: 760128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:16,123][00832] Avg episode reward: [(0, '19.585')]
[2024-12-13 21:57:21,117][00832] Fps is (10 sec: 3688.3, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 3059712. Throughput: 0: 927.4. Samples: 765250. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:21,121][00832] Avg episode reward: [(0, '19.770')]
[2024-12-13 21:57:24,465][11186] Updated weights for policy 0, policy_version 750 (0.0012)
[2024-12-13 21:57:26,119][00832] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3665.5). Total num frames: 3076096. Throughput: 0: 936.2. Samples: 767802. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:26,126][00832] Avg episode reward: [(0, '19.146')]
[2024-12-13 21:57:31,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3100672. Throughput: 0: 936.7. Samples: 774204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:31,124][00832] Avg episode reward: [(0, '19.460')]
[2024-12-13 21:57:34,562][11186] Updated weights for policy 0, policy_version 760 (0.0012)
[2024-12-13 21:57:36,117][00832] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3112960. Throughput: 0: 930.2. Samples: 779322. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:36,120][00832] Avg episode reward: [(0, '20.651')]
[2024-12-13 21:57:36,126][11173] Saving new best policy, reward=20.651!
[2024-12-13 21:57:41,118][00832] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3665.6). Total num frames: 3133440. Throughput: 0: 937.8. Samples: 781926. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 21:57:41,120][00832] Avg episode reward: [(0, '21.472')]
[2024-12-13 21:57:41,125][11173] Saving new best policy, reward=21.472!
[2024-12-13 21:57:45,387][11186] Updated weights for policy 0, policy_version 770 (0.0013)
[2024-12-13 21:57:46,117][00832] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3153920. Throughput: 0: 939.5. Samples: 788282. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:46,127][00832] Avg episode reward: [(0, '22.282')]
[2024-12-13 21:57:46,134][11173] Saving new best policy, reward=22.282!
[2024-12-13 21:57:51,117][00832] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 3170304. Throughput: 0: 926.1. Samples: 793018. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:57:51,119][00832] Avg episode reward: [(0, '23.349')]
[2024-12-13 21:57:51,123][11173] Saving new best policy, reward=23.349!
[2024-12-13 21:57:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3190784. Throughput: 0: 933.3. Samples: 795790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:57:56,120][00832] Avg episode reward: [(0, '23.150')]
[2024-12-13 21:57:56,961][11186] Updated weights for policy 0, policy_version 780 (0.0013)
[2024-12-13 21:58:01,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3211264. Throughput: 0: 933.2. Samples: 802122. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:01,119][00832] Avg episode reward: [(0, '23.558')]
[2024-12-13 21:58:01,123][11173] Saving new best policy, reward=23.558!
[2024-12-13 21:58:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3223552. Throughput: 0: 921.6. Samples: 806722. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:06,119][00832] Avg episode reward: [(0, '22.160')]
[2024-12-13 21:58:08,495][11186] Updated weights for policy 0, policy_version 790 (0.0018)
[2024-12-13 21:58:11,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.7, 300 sec: 3693.3). Total num frames: 3244032. Throughput: 0: 929.9. Samples: 809644. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:11,126][00832] Avg episode reward: [(0, '22.210')]
[2024-12-13 21:58:16,119][00832] Fps is (10 sec: 4095.3, 60 sec: 3686.3, 300 sec: 3693.3). Total num frames: 3264512. Throughput: 0: 927.1. Samples: 815926. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:16,121][00832] Avg episode reward: [(0, '20.914')]
[2024-12-13 21:58:19,284][11186] Updated weights for policy 0, policy_version 800 (0.0012)
[2024-12-13 21:58:21,119][00832] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3679.5). Total num frames: 3280896. Throughput: 0: 914.6. Samples: 820482. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:21,121][00832] Avg episode reward: [(0, '20.282')]
[2024-12-13 21:58:26,117][00832] Fps is (10 sec: 3687.0, 60 sec: 3754.8, 300 sec: 3693.4). Total num frames: 3301376. Throughput: 0: 923.1. Samples: 823466. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:26,119][00832] Avg episode reward: [(0, '20.157')]
[2024-12-13 21:58:29,931][11186] Updated weights for policy 0, policy_version 810 (0.0012)
[2024-12-13 21:58:31,117][00832] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3321856. Throughput: 0: 921.9. Samples: 829768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:31,120][00832] Avg episode reward: [(0, '19.612')]
[2024-12-13 21:58:36,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3334144. Throughput: 0: 918.4. Samples: 834344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:36,119][00832] Avg episode reward: [(0, '19.435')]
[2024-12-13 21:58:41,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3693.3). Total num frames: 3354624. Throughput: 0: 924.4. Samples: 837388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:41,120][00832] Avg episode reward: [(0, '18.770')]
[2024-12-13 21:58:41,457][11186] Updated weights for policy 0, policy_version 820 (0.0016)
[2024-12-13 21:58:46,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3375104. Throughput: 0: 923.5. Samples: 843678. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:46,122][00832] Avg episode reward: [(0, '19.203')]
[2024-12-13 21:58:51,117][00832] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3391488. Throughput: 0: 919.5. Samples: 848100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:58:51,120][00832] Avg episode reward: [(0, '20.141')]
[2024-12-13 21:58:53,379][11186] Updated weights for policy 0, policy_version 830 (0.0022)
[2024-12-13 21:58:56,117][00832] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3403776. Throughput: 0: 909.0. Samples: 850550. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:58:56,120][00832] Avg episode reward: [(0, '21.847')]
[2024-12-13 21:58:56,137][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000831_3403776.pth...
[2024-12-13 21:58:56,270][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000617_2527232.pth
[2024-12-13 21:59:01,117][00832] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3424256. Throughput: 0: 886.0. Samples: 855796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:01,122][00832] Avg episode reward: [(0, '22.568')]
[2024-12-13 21:59:06,018][11186] Updated weights for policy 0, policy_version 840 (0.0012)
[2024-12-13 21:59:06,117][00832] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3440640. Throughput: 0: 886.3. Samples: 860362. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:06,123][00832] Avg episode reward: [(0, '23.369')]
[2024-12-13 21:59:11,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3461120. Throughput: 0: 890.8. Samples: 863552. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:11,124][00832] Avg episode reward: [(0, '24.336')]
[2024-12-13 21:59:11,126][11173] Saving new best policy, reward=24.336!
[2024-12-13 21:59:15,752][11186] Updated weights for policy 0, policy_version 850 (0.0012)
[2024-12-13 21:59:16,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 3481600. Throughput: 0: 891.6. Samples: 869892. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:16,121][00832] Avg episode reward: [(0, '25.882')]
[2024-12-13 21:59:16,137][11173] Saving new best policy, reward=25.882!
[2024-12-13 21:59:21,117][00832] Fps is (10 sec: 3276.7, 60 sec: 3550.0, 300 sec: 3665.6). Total num frames: 3493888. Throughput: 0: 890.1. Samples: 874398. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:21,121][00832] Avg episode reward: [(0, '25.940')]
[2024-12-13 21:59:21,124][11173] Saving new best policy, reward=25.940!
[2024-12-13 21:59:26,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3514368. Throughput: 0: 890.2. Samples: 877446. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:26,121][00832] Avg episode reward: [(0, '25.762')]
[2024-12-13 21:59:27,234][11186] Updated weights for policy 0, policy_version 860 (0.0012)
[2024-12-13 21:59:31,117][00832] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3534848. Throughput: 0: 891.7. Samples: 883804. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:31,125][00832] Avg episode reward: [(0, '24.949')]
[2024-12-13 21:59:36,119][00832] Fps is (10 sec: 3685.6, 60 sec: 3618.0, 300 sec: 3665.5). Total num frames: 3551232. Throughput: 0: 895.6. Samples: 888402. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:36,123][00832] Avg episode reward: [(0, '24.576')]
[2024-12-13 21:59:38,555][11186] Updated weights for policy 0, policy_version 870 (0.0012)
[2024-12-13 21:59:41,119][00832] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 3571712. Throughput: 0: 912.5. Samples: 891614. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:41,121][00832] Avg episode reward: [(0, '22.086')]
[2024-12-13 21:59:46,117][00832] Fps is (10 sec: 4096.9, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3592192. Throughput: 0: 932.5. Samples: 897760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:46,123][00832] Avg episode reward: [(0, '21.452')]
[2024-12-13 21:59:49,989][11186] Updated weights for policy 0, policy_version 880 (0.0012)
[2024-12-13 21:59:51,117][00832] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3608576. Throughput: 0: 935.2. Samples: 902444. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 21:59:51,124][00832] Avg episode reward: [(0, '22.080')]
[2024-12-13 21:59:56,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3629056. Throughput: 0: 933.9. Samples: 905578. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 21:59:56,120][00832] Avg episode reward: [(0, '21.917')]
[2024-12-13 21:59:59,849][11186] Updated weights for policy 0, policy_version 890 (0.0012)
[2024-12-13 22:00:01,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3645440. Throughput: 0: 928.1. Samples: 911658. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 22:00:01,121][00832] Avg episode reward: [(0, '23.197')]
[2024-12-13 22:00:06,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3665920. Throughput: 0: 938.8. Samples: 916644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 22:00:06,119][00832] Avg episode reward: [(0, '23.819')]
[2024-12-13 22:00:10,839][11186] Updated weights for policy 0, policy_version 900 (0.0012)
[2024-12-13 22:00:11,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3686400. Throughput: 0: 941.4. Samples: 919810. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:11,122][00832] Avg episode reward: [(0, '22.917')]
[2024-12-13 22:00:16,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3702784. Throughput: 0: 932.8. Samples: 925778. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 22:00:16,123][00832] Avg episode reward: [(0, '21.944')]
[2024-12-13 22:00:21,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3719168. Throughput: 0: 943.0. Samples: 930836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:21,119][00832] Avg episode reward: [(0, '22.354')]
[2024-12-13 22:00:22,262][11186] Updated weights for policy 0, policy_version 910 (0.0013)
[2024-12-13 22:00:26,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3739648. Throughput: 0: 941.7. Samples: 933990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:26,119][00832] Avg episode reward: [(0, '21.238')]
[2024-12-13 22:00:31,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 3760128. Throughput: 0: 932.4. Samples: 939718. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:31,123][00832] Avg episode reward: [(0, '19.902')]
[2024-12-13 22:00:33,502][11186] Updated weights for policy 0, policy_version 920 (0.0014)
[2024-12-13 22:00:36,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 3776512. Throughput: 0: 941.3. Samples: 944804. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 22:00:36,124][00832] Avg episode reward: [(0, '20.494')]
[2024-12-13 22:00:41,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3679.5). Total num frames: 3796992. Throughput: 0: 941.5. Samples: 947946. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:41,119][00832] Avg episode reward: [(0, '22.372')]
[2024-12-13 22:00:43,623][11186] Updated weights for policy 0, policy_version 930 (0.0016)
[2024-12-13 22:00:46,119][00832] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3665.5). Total num frames: 3813376. Throughput: 0: 928.3. Samples: 953434. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:00:46,124][00832] Avg episode reward: [(0, '22.264')]
[2024-12-13 22:00:51,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3829760. Throughput: 0: 932.4. Samples: 958600. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
[2024-12-13 22:00:51,120][00832] Avg episode reward: [(0, '23.288')]
[2024-12-13 22:00:55,018][11186] Updated weights for policy 0, policy_version 940 (0.0012)
[2024-12-13 22:00:56,117][00832] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3854336. Throughput: 0: 929.4. Samples: 961634. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 22:00:56,120][00832] Avg episode reward: [(0, '24.682')]
[2024-12-13 22:00:56,135][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth...
[2024-12-13 22:00:56,239][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000724_2965504.pth
[2024-12-13 22:01:01,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3866624. Throughput: 0: 913.5. Samples: 966886. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:01:01,123][00832] Avg episode reward: [(0, '26.220')]
[2024-12-13 22:01:01,125][11173] Saving new best policy, reward=26.220!
[2024-12-13 22:01:06,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3887104. Throughput: 0: 919.2. Samples: 972198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:01:06,119][00832] Avg episode reward: [(0, '26.356')]
[2024-12-13 22:01:06,135][11173] Saving new best policy, reward=26.356!
[2024-12-13 22:01:06,790][11186] Updated weights for policy 0, policy_version 950 (0.0012)
[2024-12-13 22:01:11,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 3907584. Throughput: 0: 918.8. Samples: 975338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:01:11,119][00832] Avg episode reward: [(0, '26.791')]
[2024-12-13 22:01:11,123][11173] Saving new best policy, reward=26.791!
[2024-12-13 22:01:16,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3919872. Throughput: 0: 903.9. Samples: 980394. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:01:16,123][00832] Avg episode reward: [(0, '25.926')]
[2024-12-13 22:01:18,502][11186] Updated weights for policy 0, policy_version 960 (0.0016)
[2024-12-13 22:01:21,117][00832] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3940352. Throughput: 0: 915.4. Samples: 985998. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 22:01:21,122][00832] Avg episode reward: [(0, '24.460')]
[2024-12-13 22:01:26,117][00832] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3960832. Throughput: 0: 913.4. Samples: 989048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2024-12-13 22:01:26,120][00832] Avg episode reward: [(0, '24.187')]
[2024-12-13 22:01:29,198][11186] Updated weights for policy 0, policy_version 970 (0.0012)
[2024-12-13 22:01:31,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 3977216. Throughput: 0: 901.0. Samples: 993978. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2024-12-13 22:01:31,120][00832] Avg episode reward: [(0, '24.600')]
[2024-12-13 22:01:36,117][00832] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3997696. Throughput: 0: 913.8. Samples: 999720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2024-12-13 22:01:36,122][00832] Avg episode reward: [(0, '24.609')]
[2024-12-13 22:01:37,895][11173] Stopping Batcher_0...
[2024-12-13 22:01:37,896][11173] Loop batcher_evt_loop terminating...
[2024-12-13 22:01:37,895][00832] Component Batcher_0 stopped!
[2024-12-13 22:01:37,901][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-13 22:01:37,898][00832] Component RolloutWorker_w0 process died already! Don't wait for it.
[2024-12-13 22:01:37,907][00832] Component RolloutWorker_w1 process died already! Don't wait for it.
[2024-12-13 22:01:37,910][00832] Component RolloutWorker_w3 process died already! Don't wait for it.
[2024-12-13 22:01:37,912][00832] Component RolloutWorker_w4 process died already! Don't wait for it.
[2024-12-13 22:01:37,918][00832] Component RolloutWorker_w5 process died already! Don't wait for it.
[2024-12-13 22:01:37,942][11186] Weights refcount: 2 0
[2024-12-13 22:01:37,947][00832] Component InferenceWorker_p0-w0 stopped!
[2024-12-13 22:01:37,946][11186] Stopping InferenceWorker_p0-w0...
[2024-12-13 22:01:37,951][11186] Loop inference_proc0-0_evt_loop terminating...
[2024-12-13 22:01:38,009][11173] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000831_3403776.pth
[2024-12-13 22:01:38,019][11173] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-13 22:01:38,174][00832] Component RolloutWorker_w7 stopped!
[2024-12-13 22:01:38,176][11194] Stopping RolloutWorker_w7...
[2024-12-13 22:01:38,181][11194] Loop rollout_proc7_evt_loop terminating...
[2024-12-13 22:01:38,215][11173] Stopping LearnerWorker_p0...
[2024-12-13 22:01:38,215][11173] Loop learner_proc0_evt_loop terminating...
[2024-12-13 22:01:38,215][00832] Component LearnerWorker_p0 stopped!
[2024-12-13 22:01:38,287][00832] Component RolloutWorker_w6 stopped!
[2024-12-13 22:01:38,287][11193] Stopping RolloutWorker_w6...
[2024-12-13 22:01:38,291][11193] Loop rollout_proc6_evt_loop terminating...
[2024-12-13 22:01:38,297][11189] Stopping RolloutWorker_w2...
[2024-12-13 22:01:38,297][00832] Component RolloutWorker_w2 stopped!
[2024-12-13 22:01:38,299][11189] Loop rollout_proc2_evt_loop terminating...
[2024-12-13 22:01:38,299][00832] Waiting for process learner_proc0 to stop...
[2024-12-13 22:01:39,499][00832] Waiting for process inference_proc0-0 to join...
[2024-12-13 22:01:39,508][00832] Waiting for process rollout_proc0 to join...
[2024-12-13 22:01:39,512][00832] Waiting for process rollout_proc1 to join...
[2024-12-13 22:01:39,524][00832] Waiting for process rollout_proc2 to join...
[2024-12-13 22:01:39,992][00832] Waiting for process rollout_proc3 to join...
[2024-12-13 22:01:39,994][00832] Waiting for process rollout_proc4 to join...
[2024-12-13 22:01:39,995][00832] Waiting for process rollout_proc5 to join...
[2024-12-13 22:01:39,996][00832] Waiting for process rollout_proc6 to join...
[2024-12-13 22:01:40,002][00832] Waiting for process rollout_proc7 to join...
[2024-12-13 22:01:40,007][00832] Batcher 0 profile tree view:
batching: 20.8764, releasing_batches: 0.0222
[2024-12-13 22:01:40,008][00832] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0013
wait_policy_total: 467.3990
update_model: 8.9846
weight_update: 0.0012
one_step: 0.0142
handle_policy_step: 577.1505
deserialize: 15.8806, stack: 3.7718, obs_to_device_normalize: 133.9994, forward: 288.4752, send_messages: 21.7974
prepare_outputs: 82.8281
to_cpu: 51.7037
[2024-12-13 22:01:40,009][00832] Learner 0 profile tree view:
misc: 0.0046, prepare_batch: 14.3182
train: 66.3434
epoch_init: 0.0056, minibatch_init: 0.0098, losses_postprocess: 0.5564, kl_divergence: 0.4400, after_optimizer: 32.1411
calculate_losses: 20.6931
losses_init: 0.0034, forward_head: 1.4451, bptt_initial: 14.0306, tail: 0.7303, advantages_returns: 0.1835, losses: 2.3915
bptt: 1.6605
bptt_forward_core: 1.5993
update: 12.0700
clip: 1.3457
[2024-12-13 22:01:40,010][00832] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.5829, enqueue_policy_requests: 315.9717, env_step: 605.4816, overhead: 28.7443, complete_rollouts: 3.4975
save_policy_outputs: 44.7863
split_output_tensors: 15.4935
[2024-12-13 22:01:40,011][00832] Loop Runner_EvtLoop terminating...
[2024-12-13 22:01:40,012][00832] Runner profile tree view:
main_loop: 1119.7643
[2024-12-13 22:01:40,014][00832] Collected {0: 4005888}, FPS: 3577.4
[2024-12-13 22:14:16,697][00832] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-13 22:14:16,699][00832] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-13 22:14:16,702][00832] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-13 22:14:16,704][00832] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-13 22:14:16,706][00832] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:14:16,708][00832] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-13 22:14:16,710][00832] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:14:16,711][00832] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-13 22:14:16,713][00832] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-12-13 22:14:16,717][00832] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-12-13 22:14:16,718][00832] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-13 22:14:16,719][00832] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-13 22:14:16,723][00832] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-13 22:14:16,724][00832] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-13 22:14:16,725][00832] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-13 22:14:16,742][00832] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:14:16,744][00832] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:14:16,747][00832] RunningMeanStd input shape: (1,)
[2024-12-13 22:14:16,761][00832] ConvEncoder: input_channels=3
[2024-12-13 22:14:16,887][00832] Conv encoder output size: 512
[2024-12-13 22:14:16,889][00832] Policy head output size: 512
[2024-12-13 22:14:18,831][00832] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-13 22:14:20,097][00832] Num frames 100...
[2024-12-13 22:14:20,262][00832] Num frames 200...
[2024-12-13 22:14:20,384][00832] Num frames 300...
[2024-12-13 22:14:20,504][00832] Num frames 400...
[2024-12-13 22:14:20,625][00832] Num frames 500...
[2024-12-13 22:14:20,742][00832] Num frames 600...
[2024-12-13 22:14:20,865][00832] Num frames 700...
[2024-12-13 22:14:20,985][00832] Num frames 800...
[2024-12-13 22:14:21,120][00832] Num frames 900...
[2024-12-13 22:14:21,235][00832] Num frames 1000...
[2024-12-13 22:14:21,374][00832] Avg episode rewards: #0: 29.680, true rewards: #0: 10.680
[2024-12-13 22:14:21,377][00832] Avg episode reward: 29.680, avg true_objective: 10.680
[2024-12-13 22:14:21,420][00832] Num frames 1100...
[2024-12-13 22:14:21,541][00832] Num frames 1200...
[2024-12-13 22:14:21,659][00832] Num frames 1300...
[2024-12-13 22:14:21,782][00832] Num frames 1400...
[2024-12-13 22:14:21,901][00832] Num frames 1500...
[2024-12-13 22:14:22,024][00832] Num frames 1600...
[2024-12-13 22:14:22,157][00832] Num frames 1700...
[2024-12-13 22:14:22,273][00832] Num frames 1800...
[2024-12-13 22:14:22,398][00832] Num frames 1900...
[2024-12-13 22:14:22,567][00832] Avg episode rewards: #0: 25.485, true rewards: #0: 9.985
[2024-12-13 22:14:22,569][00832] Avg episode reward: 25.485, avg true_objective: 9.985
[2024-12-13 22:14:22,576][00832] Num frames 2000...
[2024-12-13 22:14:22,695][00832] Num frames 2100...
[2024-12-13 22:14:22,816][00832] Num frames 2200...
[2024-12-13 22:14:22,936][00832] Num frames 2300...
[2024-12-13 22:14:23,056][00832] Num frames 2400...
[2024-12-13 22:14:23,191][00832] Num frames 2500...
[2024-12-13 22:14:23,310][00832] Num frames 2600...
[2024-12-13 22:14:23,431][00832] Num frames 2700...
[2024-12-13 22:14:23,552][00832] Num frames 2800...
[2024-12-13 22:14:23,672][00832] Num frames 2900...
[2024-12-13 22:14:23,789][00832] Num frames 3000...
[2024-12-13 22:14:23,909][00832] Num frames 3100...
[2024-12-13 22:14:24,028][00832] Num frames 3200...
[2024-12-13 22:14:24,160][00832] Num frames 3300...
[2024-12-13 22:14:24,307][00832] Avg episode rewards: #0: 28.243, true rewards: #0: 11.243
[2024-12-13 22:14:24,308][00832] Avg episode reward: 28.243, avg true_objective: 11.243
[2024-12-13 22:14:24,344][00832] Num frames 3400...
[2024-12-13 22:14:24,484][00832] Num frames 3500...
[2024-12-13 22:14:24,618][00832] Num frames 3600...
[2024-12-13 22:14:24,736][00832] Num frames 3700...
[2024-12-13 22:14:24,857][00832] Num frames 3800...
[2024-12-13 22:14:24,978][00832] Num frames 3900...
[2024-12-13 22:14:25,102][00832] Num frames 4000...
[2024-12-13 22:14:25,226][00832] Num frames 4100...
[2024-12-13 22:14:25,349][00832] Num frames 4200...
[2024-12-13 22:14:25,468][00832] Num frames 4300...
[2024-12-13 22:14:25,587][00832] Num frames 4400...
[2024-12-13 22:14:25,707][00832] Num frames 4500...
[2024-12-13 22:14:25,829][00832] Num frames 4600...
[2024-12-13 22:14:25,950][00832] Num frames 4700...
[2024-12-13 22:14:26,078][00832] Num frames 4800...
[2024-12-13 22:14:26,211][00832] Num frames 4900...
[2024-12-13 22:14:26,332][00832] Num frames 5000...
[2024-12-13 22:14:26,452][00832] Num frames 5100...
[2024-12-13 22:14:26,567][00832] Num frames 5200...
[2024-12-13 22:14:26,688][00832] Num frames 5300...
[2024-12-13 22:14:26,806][00832] Num frames 5400...
[2024-12-13 22:14:26,883][00832] Avg episode rewards: #0: 35.545, true rewards: #0: 13.545
[2024-12-13 22:14:26,884][00832] Avg episode reward: 35.545, avg true_objective: 13.545
[2024-12-13 22:14:26,979][00832] Num frames 5500...
[2024-12-13 22:14:27,104][00832] Num frames 5600...
[2024-12-13 22:14:27,228][00832] Num frames 5700...
[2024-12-13 22:14:27,347][00832] Num frames 5800...
[2024-12-13 22:14:27,467][00832] Num frames 5900...
[2024-12-13 22:14:27,586][00832] Num frames 6000...
[2024-12-13 22:14:27,704][00832] Num frames 6100...
[2024-12-13 22:14:27,824][00832] Num frames 6200...
[2024-12-13 22:14:27,945][00832] Num frames 6300...
[2024-12-13 22:14:28,072][00832] Num frames 6400...
[2024-12-13 22:14:28,196][00832] Avg episode rewards: #0: 33.914, true rewards: #0: 12.914
[2024-12-13 22:14:28,198][00832] Avg episode reward: 33.914, avg true_objective: 12.914
[2024-12-13 22:14:28,260][00832] Num frames 6500...
[2024-12-13 22:14:28,377][00832] Num frames 6600...
[2024-12-13 22:14:28,497][00832] Num frames 6700...
[2024-12-13 22:14:28,615][00832] Num frames 6800...
[2024-12-13 22:14:28,733][00832] Num frames 6900...
[2024-12-13 22:14:28,852][00832] Num frames 7000...
[2024-12-13 22:14:28,971][00832] Num frames 7100...
[2024-12-13 22:14:29,088][00832] Avg episode rewards: #0: 31.253, true rewards: #0: 11.920
[2024-12-13 22:14:29,090][00832] Avg episode reward: 31.253, avg true_objective: 11.920
[2024-12-13 22:14:29,151][00832] Num frames 7200...
[2024-12-13 22:14:29,277][00832] Num frames 7300...
[2024-12-13 22:14:29,401][00832] Num frames 7400...
[2024-12-13 22:14:29,521][00832] Num frames 7500...
[2024-12-13 22:14:29,639][00832] Num frames 7600...
[2024-12-13 22:14:29,803][00832] Avg episode rewards: #0: 27.851, true rewards: #0: 10.994
[2024-12-13 22:14:29,804][00832] Avg episode reward: 27.851, avg true_objective: 10.994
[2024-12-13 22:14:29,813][00832] Num frames 7700...
[2024-12-13 22:14:29,937][00832] Num frames 7800...
[2024-12-13 22:14:30,061][00832] Num frames 7900...
[2024-12-13 22:14:30,188][00832] Num frames 8000...
[2024-12-13 22:14:30,346][00832] Num frames 8100...
[2024-12-13 22:14:30,515][00832] Num frames 8200...
[2024-12-13 22:14:30,692][00832] Avg episode rewards: #0: 25.590, true rewards: #0: 10.340
[2024-12-13 22:14:30,694][00832] Avg episode reward: 25.590, avg true_objective: 10.340
[2024-12-13 22:14:30,741][00832] Num frames 8300...
[2024-12-13 22:14:30,906][00832] Num frames 8400...
[2024-12-13 22:14:31,077][00832] Num frames 8500...
[2024-12-13 22:14:31,239][00832] Num frames 8600...
[2024-12-13 22:14:31,419][00832] Num frames 8700...
[2024-12-13 22:14:31,580][00832] Num frames 8800...
[2024-12-13 22:14:31,746][00832] Num frames 8900...
[2024-12-13 22:14:31,912][00832] Num frames 9000...
[2024-12-13 22:14:32,088][00832] Avg episode rewards: #0: 24.302, true rewards: #0: 10.080
[2024-12-13 22:14:32,090][00832] Avg episode reward: 24.302, avg true_objective: 10.080
[2024-12-13 22:14:32,148][00832] Num frames 9100...
[2024-12-13 22:14:32,320][00832] Num frames 9200...
[2024-12-13 22:14:32,509][00832] Num frames 9300...
[2024-12-13 22:14:32,675][00832] Num frames 9400...
[2024-12-13 22:14:32,815][00832] Num frames 9500...
[2024-12-13 22:14:32,937][00832] Num frames 9600...
[2024-12-13 22:14:33,060][00832] Num frames 9700...
[2024-12-13 22:14:33,190][00832] Num frames 9800...
[2024-12-13 22:14:33,310][00832] Num frames 9900...
[2024-12-13 22:14:33,433][00832] Num frames 10000...
[2024-12-13 22:14:33,560][00832] Num frames 10100...
[2024-12-13 22:14:33,681][00832] Num frames 10200...
[2024-12-13 22:14:33,801][00832] Num frames 10300...
[2024-12-13 22:14:33,919][00832] Num frames 10400...
[2024-12-13 22:14:34,040][00832] Num frames 10500...
[2024-12-13 22:14:34,168][00832] Num frames 10600...
[2024-12-13 22:14:34,236][00832] Avg episode rewards: #0: 25.509, true rewards: #0: 10.609
[2024-12-13 22:14:34,238][00832] Avg episode reward: 25.509, avg true_objective: 10.609
[2024-12-13 22:15:32,966][00832] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-12-13 22:20:46,079][00832] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-13 22:20:46,081][00832] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-13 22:20:46,083][00832] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-13 22:20:46,085][00832] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-13 22:20:46,087][00832] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:20:46,088][00832] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-13 22:20:46,090][00832] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-12-13 22:20:46,091][00832] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-13 22:20:46,092][00832] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-12-13 22:20:46,093][00832] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-12-13 22:20:46,094][00832] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-13 22:20:46,095][00832] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-13 22:20:46,096][00832] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-13 22:20:46,097][00832] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-13 22:20:46,098][00832] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-13 22:20:46,107][00832] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:20:46,115][00832] RunningMeanStd input shape: (1,)
[2024-12-13 22:20:46,127][00832] ConvEncoder: input_channels=3
[2024-12-13 22:20:46,175][00832] Conv encoder output size: 512
[2024-12-13 22:20:46,181][00832] Policy head output size: 512
[2024-12-13 22:20:46,209][00832] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-13 22:20:47,155][00832] Num frames 100...
[2024-12-13 22:20:47,318][00832] Num frames 200...
[2024-12-13 22:20:47,480][00832] Num frames 300...
[2024-12-13 22:20:47,657][00832] Num frames 400...
[2024-12-13 22:20:47,812][00832] Num frames 500...
[2024-12-13 22:20:47,971][00832] Num frames 600...
[2024-12-13 22:20:48,147][00832] Num frames 700...
[2024-12-13 22:20:48,266][00832] Avg episode rewards: #0: 13.360, true rewards: #0: 7.360
[2024-12-13 22:20:48,268][00832] Avg episode reward: 13.360, avg true_objective: 7.360
[2024-12-13 22:20:48,380][00832] Num frames 800...
[2024-12-13 22:20:48,555][00832] Num frames 900...
[2024-12-13 22:20:48,730][00832] Num frames 1000...
[2024-12-13 22:20:48,897][00832] Num frames 1100...
[2024-12-13 22:20:48,993][00832] Avg episode rewards: #0: 10.600, true rewards: #0: 5.600
[2024-12-13 22:20:48,995][00832] Avg episode reward: 10.600, avg true_objective: 5.600
[2024-12-13 22:20:49,148][00832] Num frames 1200...
[2024-12-13 22:20:49,266][00832] Num frames 1300...
[2024-12-13 22:20:49,383][00832] Num frames 1400...
[2024-12-13 22:20:49,506][00832] Num frames 1500...
[2024-12-13 22:20:49,624][00832] Num frames 1600...
[2024-12-13 22:20:49,750][00832] Num frames 1700...
[2024-12-13 22:20:49,869][00832] Num frames 1800...
[2024-12-13 22:20:49,989][00832] Num frames 1900...
[2024-12-13 22:20:50,117][00832] Num frames 2000...
[2024-12-13 22:20:50,234][00832] Num frames 2100...
[2024-12-13 22:20:50,364][00832] Avg episode rewards: #0: 13.860, true rewards: #0: 7.193
[2024-12-13 22:20:50,367][00832] Avg episode reward: 13.860, avg true_objective: 7.193
[2024-12-13 22:20:50,421][00832] Num frames 2200...
[2024-12-13 22:20:50,539][00832] Num frames 2300...
[2024-12-13 22:20:50,661][00832] Num frames 2400...
[2024-12-13 22:20:50,787][00832] Num frames 2500...
[2024-12-13 22:20:50,905][00832] Num frames 2600...
[2024-12-13 22:20:51,007][00832] Avg episode rewards: #0: 12.345, true rewards: #0: 6.595
[2024-12-13 22:20:51,009][00832] Avg episode reward: 12.345, avg true_objective: 6.595
[2024-12-13 22:20:51,091][00832] Num frames 2700...
[2024-12-13 22:20:51,214][00832] Num frames 2800...
[2024-12-13 22:20:51,334][00832] Num frames 2900...
[2024-12-13 22:20:51,456][00832] Num frames 3000...
[2024-12-13 22:20:51,611][00832] Avg episode rewards: #0: 10.972, true rewards: #0: 6.172
[2024-12-13 22:20:51,612][00832] Avg episode reward: 10.972, avg true_objective: 6.172
[2024-12-13 22:20:51,635][00832] Num frames 3100...
[2024-12-13 22:20:51,760][00832] Num frames 3200...
[2024-12-13 22:20:51,886][00832] Num frames 3300...
[2024-12-13 22:20:52,018][00832] Num frames 3400...
[2024-12-13 22:20:52,150][00832] Num frames 3500...
[2024-12-13 22:20:52,271][00832] Num frames 3600...
[2024-12-13 22:20:52,389][00832] Num frames 3700...
[2024-12-13 22:20:52,511][00832] Num frames 3800...
[2024-12-13 22:20:52,634][00832] Num frames 3900...
[2024-12-13 22:20:52,763][00832] Num frames 4000...
[2024-12-13 22:20:52,880][00832] Num frames 4100...
[2024-12-13 22:20:53,003][00832] Num frames 4200...
[2024-12-13 22:20:53,156][00832] Avg episode rewards: #0: 14.293, true rewards: #0: 7.127
[2024-12-13 22:20:53,158][00832] Avg episode reward: 14.293, avg true_objective: 7.127
[2024-12-13 22:20:53,190][00832] Num frames 4300...
[2024-12-13 22:20:53,307][00832] Num frames 4400...
[2024-12-13 22:20:53,424][00832] Num frames 4500...
[2024-12-13 22:20:53,542][00832] Num frames 4600...
[2024-12-13 22:20:53,666][00832] Num frames 4700...
[2024-12-13 22:20:53,788][00832] Num frames 4800...
[2024-12-13 22:20:53,905][00832] Num frames 4900...
[2024-12-13 22:20:54,026][00832] Num frames 5000...
[2024-12-13 22:20:54,153][00832] Num frames 5100...
[2024-12-13 22:20:54,272][00832] Num frames 5200...
[2024-12-13 22:20:54,392][00832] Num frames 5300...
[2024-12-13 22:20:54,511][00832] Num frames 5400...
[2024-12-13 22:20:54,626][00832] Num frames 5500...
[2024-12-13 22:20:54,710][00832] Avg episode rewards: #0: 16.749, true rewards: #0: 7.891
[2024-12-13 22:20:54,712][00832] Avg episode reward: 16.749, avg true_objective: 7.891
[2024-12-13 22:20:54,811][00832] Num frames 5600...
[2024-12-13 22:20:54,928][00832] Num frames 5700...
[2024-12-13 22:20:55,043][00832] Num frames 5800...
[2024-12-13 22:20:55,174][00832] Num frames 5900...
[2024-12-13 22:20:55,293][00832] Num frames 6000...
[2024-12-13 22:20:55,408][00832] Num frames 6100...
[2024-12-13 22:20:55,526][00832] Num frames 6200...
[2024-12-13 22:20:55,595][00832] Avg episode rewards: #0: 16.265, true rewards: #0: 7.765
[2024-12-13 22:20:55,598][00832] Avg episode reward: 16.265, avg true_objective: 7.765
[2024-12-13 22:20:55,702][00832] Num frames 6300...
[2024-12-13 22:20:55,831][00832] Num frames 6400...
[2024-12-13 22:20:55,951][00832] Num frames 6500...
[2024-12-13 22:20:56,080][00832] Num frames 6600...
[2024-12-13 22:20:56,211][00832] Avg episode rewards: #0: 15.403, true rewards: #0: 7.403
[2024-12-13 22:20:56,213][00832] Avg episode reward: 15.403, avg true_objective: 7.403
[2024-12-13 22:20:56,262][00832] Num frames 6700...
[2024-12-13 22:20:56,378][00832] Num frames 6800...
[2024-12-13 22:20:56,498][00832] Num frames 6900...
[2024-12-13 22:20:56,615][00832] Num frames 7000...
[2024-12-13 22:20:56,730][00832] Num frames 7100...
[2024-12-13 22:20:56,858][00832] Num frames 7200...
[2024-12-13 22:20:56,974][00832] Num frames 7300...
[2024-12-13 22:20:57,098][00832] Num frames 7400...
[2024-12-13 22:20:57,220][00832] Num frames 7500...
[2024-12-13 22:20:57,352][00832] Num frames 7600...
[2024-12-13 22:20:57,484][00832] Num frames 7700...
[2024-12-13 22:20:57,600][00832] Num frames 7800...
[2024-12-13 22:20:57,674][00832] Avg episode rewards: #0: 16.415, true rewards: #0: 7.815
[2024-12-13 22:20:57,675][00832] Avg episode reward: 16.415, avg true_objective: 7.815
[2024-12-13 22:21:08,916][00832] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-12-13 22:22:33,500][21197] Saving configuration to /content/train_dir/default_experiment/config.json...
[2024-12-13 22:22:33,502][21197] Rollout worker 0 uses device cpu
[2024-12-13 22:22:33,504][21197] Rollout worker 1 uses device cpu
[2024-12-13 22:22:33,505][21197] Rollout worker 2 uses device cpu
[2024-12-13 22:22:33,508][21197] Rollout worker 3 uses device cpu
[2024-12-13 22:22:33,509][21197] Rollout worker 4 uses device cpu
[2024-12-13 22:22:33,512][21197] Rollout worker 5 uses device cpu
[2024-12-13 22:22:33,513][21197] Rollout worker 6 uses device cpu
[2024-12-13 22:22:33,514][21197] Rollout worker 7 uses device cpu
[2024-12-13 22:22:33,718][21197] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 22:22:33,722][21197] InferenceWorker_p0-w0: min num requests: 2
[2024-12-13 22:22:33,769][21197] Starting all processes...
[2024-12-13 22:22:33,773][21197] Starting process learner_proc0
[2024-12-13 22:22:33,829][21197] Starting all processes...
[2024-12-13 22:22:33,857][21197] Starting process inference_proc0-0
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc0
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc1
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc2
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc3
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc4
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc5
[2024-12-13 22:22:33,861][21197] Starting process rollout_proc6
[2024-12-13 22:22:33,867][21197] Starting process rollout_proc7
[2024-12-13 22:22:44,351][21617] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 22:22:44,355][21617] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2024-12-13 22:22:44,424][21617] Num visible devices: 1
[2024-12-13 22:22:44,469][21617] Starting seed is not provided
[2024-12-13 22:22:44,470][21617] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 22:22:44,471][21617] Initializing actor-critic model on device cuda:0
[2024-12-13 22:22:44,472][21617] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:22:44,474][21617] RunningMeanStd input shape: (1,)
[2024-12-13 22:22:44,609][21617] ConvEncoder: input_channels=3
[2024-12-13 22:22:44,618][21630] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 22:22:44,621][21630] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2024-12-13 22:22:44,700][21630] Num visible devices: 1
[2024-12-13 22:22:44,766][21640] Worker 4 uses CPU cores [0]
[2024-12-13 22:22:44,765][21637] Worker 2 uses CPU cores [0]
[2024-12-13 22:22:44,782][21641] Worker 6 uses CPU cores [0]
[2024-12-13 22:22:44,833][21639] Worker 5 uses CPU cores [1]
[2024-12-13 22:22:44,847][21638] Worker 3 uses CPU cores [1]
[2024-12-13 22:22:44,887][21642] Worker 7 uses CPU cores [1]
[2024-12-13 22:22:44,922][21631] Worker 1 uses CPU cores [1]
[2024-12-13 22:22:44,934][21634] Worker 0 uses CPU cores [0]
[2024-12-13 22:22:44,991][21617] Conv encoder output size: 512
[2024-12-13 22:22:44,991][21617] Policy head output size: 512
[2024-12-13 22:22:45,006][21617] Created Actor Critic model with architecture:
[2024-12-13 22:22:45,007][21617] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2024-12-13 22:22:47,004][21617] Using optimizer <class 'torch.optim.adam.Adam'>
[2024-12-13 22:22:47,005][21617] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2024-12-13 22:22:47,038][21617] Loading model from checkpoint
[2024-12-13 22:22:47,044][21617] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2024-12-13 22:22:47,044][21617] Initialized policy 0 weights for model version 978
[2024-12-13 22:22:47,055][21617] LearnerWorker_p0 finished initialization!
[2024-12-13 22:22:47,056][21617] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2024-12-13 22:22:47,311][21630] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:22:47,312][21630] RunningMeanStd input shape: (1,)
[2024-12-13 22:22:47,333][21630] ConvEncoder: input_channels=3
[2024-12-13 22:22:47,497][21630] Conv encoder output size: 512
[2024-12-13 22:22:47,498][21630] Policy head output size: 512
[2024-12-13 22:22:49,383][21197] Inference worker 0-0 is ready!
[2024-12-13 22:22:49,385][21197] All inference workers are ready! Signal rollout workers to start!
[2024-12-13 22:22:49,457][21637] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,479][21641] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,485][21642] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,487][21639] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,482][21634] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,487][21640] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,512][21638] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:49,519][21631] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:22:50,290][21197] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-13 22:22:50,741][21631] Decorrelating experience for 0 frames...
[2024-12-13 22:22:50,742][21642] Decorrelating experience for 0 frames...
[2024-12-13 22:22:50,744][21638] Decorrelating experience for 0 frames...
[2024-12-13 22:22:51,020][21637] Decorrelating experience for 0 frames...
[2024-12-13 22:22:51,027][21641] Decorrelating experience for 0 frames...
[2024-12-13 22:22:51,035][21640] Decorrelating experience for 0 frames...
[2024-12-13 22:22:51,047][21634] Decorrelating experience for 0 frames...
[2024-12-13 22:22:51,489][21642] Decorrelating experience for 32 frames...
[2024-12-13 22:22:51,493][21631] Decorrelating experience for 32 frames...
[2024-12-13 22:22:52,140][21641] Decorrelating experience for 32 frames...
[2024-12-13 22:22:52,144][21637] Decorrelating experience for 32 frames...
[2024-12-13 22:22:52,155][21640] Decorrelating experience for 32 frames...
[2024-12-13 22:22:52,560][21642] Decorrelating experience for 64 frames...
[2024-12-13 22:22:52,567][21631] Decorrelating experience for 64 frames...
[2024-12-13 22:22:53,097][21634] Decorrelating experience for 32 frames...
[2024-12-13 22:22:53,262][21637] Decorrelating experience for 64 frames...
[2024-12-13 22:22:53,674][21639] Decorrelating experience for 0 frames...
[2024-12-13 22:22:53,678][21638] Decorrelating experience for 32 frames...
[2024-12-13 22:22:53,706][21197] Heartbeat connected on Batcher_0
[2024-12-13 22:22:53,712][21197] Heartbeat connected on LearnerWorker_p0
[2024-12-13 22:22:53,759][21197] Heartbeat connected on InferenceWorker_p0-w0
[2024-12-13 22:22:53,961][21640] Decorrelating experience for 64 frames...
[2024-12-13 22:22:54,138][21642] Decorrelating experience for 96 frames...
[2024-12-13 22:22:54,154][21631] Decorrelating experience for 96 frames...
[2024-12-13 22:22:54,219][21637] Decorrelating experience for 96 frames...
[2024-12-13 22:22:54,398][21197] Heartbeat connected on RolloutWorker_w7
[2024-12-13 22:22:54,413][21197] Heartbeat connected on RolloutWorker_w2
[2024-12-13 22:22:54,416][21197] Heartbeat connected on RolloutWorker_w1
[2024-12-13 22:22:55,138][21639] Decorrelating experience for 32 frames...
[2024-12-13 22:22:55,290][21197] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2024-12-13 22:22:55,610][21638] Decorrelating experience for 64 frames...
[2024-12-13 22:22:55,612][21641] Decorrelating experience for 64 frames...
[2024-12-13 22:22:55,747][21640] Decorrelating experience for 96 frames...
[2024-12-13 22:22:56,194][21197] Heartbeat connected on RolloutWorker_w4
[2024-12-13 22:22:57,115][21639] Decorrelating experience for 64 frames...
[2024-12-13 22:22:59,378][21634] Decorrelating experience for 64 frames...
[2024-12-13 22:22:59,845][21641] Decorrelating experience for 96 frames...
[2024-12-13 22:23:00,151][21617] Stopping Batcher_0...
[2024-12-13 22:23:00,151][21617] Loop batcher_evt_loop terminating...
[2024-12-13 22:23:00,154][21617] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2024-12-13 22:23:00,168][21197] Component Batcher_0 stopped!
[2024-12-13 22:23:00,263][21630] Weights refcount: 2 0
[2024-12-13 22:23:00,266][21630] Stopping InferenceWorker_p0-w0...
[2024-12-13 22:23:00,266][21630] Loop inference_proc0-0_evt_loop terminating...
[2024-12-13 22:23:00,266][21197] Component InferenceWorker_p0-w0 stopped!
[2024-12-13 22:23:00,384][21617] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth
[2024-12-13 22:23:00,409][21617] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2024-12-13 22:23:00,713][21197] Component LearnerWorker_p0 stopped!
[2024-12-13 22:23:00,713][21617] Stopping LearnerWorker_p0...
[2024-12-13 22:23:00,722][21617] Loop learner_proc0_evt_loop terminating...
[2024-12-13 22:23:00,907][21197] Component RolloutWorker_w1 stopped!
[2024-12-13 22:23:00,909][21631] Stopping RolloutWorker_w1...
[2024-12-13 22:23:00,910][21631] Loop rollout_proc1_evt_loop terminating...
[2024-12-13 22:23:00,936][21197] Component RolloutWorker_w7 stopped!
[2024-12-13 22:23:00,944][21642] Stopping RolloutWorker_w7...
[2024-12-13 22:23:00,944][21642] Loop rollout_proc7_evt_loop terminating...
[2024-12-13 22:23:00,984][21197] Component RolloutWorker_w4 stopped!
[2024-12-13 22:23:00,988][21640] Stopping RolloutWorker_w4...
[2024-12-13 22:23:00,989][21640] Loop rollout_proc4_evt_loop terminating...
[2024-12-13 22:23:01,093][21197] Component RolloutWorker_w2 stopped!
[2024-12-13 22:23:01,097][21637] Stopping RolloutWorker_w2...
[2024-12-13 22:23:01,098][21637] Loop rollout_proc2_evt_loop terminating...
[2024-12-13 22:23:01,693][21197] Component RolloutWorker_w6 stopped!
[2024-12-13 22:23:01,697][21641] Stopping RolloutWorker_w6...
[2024-12-13 22:23:01,698][21641] Loop rollout_proc6_evt_loop terminating...
[2024-12-13 22:23:02,556][21639] Decorrelating experience for 96 frames...
[2024-12-13 22:23:03,356][21639] Stopping RolloutWorker_w5...
[2024-12-13 22:23:03,356][21639] Loop rollout_proc5_evt_loop terminating...
[2024-12-13 22:23:03,356][21197] Component RolloutWorker_w5 stopped!
[2024-12-13 22:23:04,027][21634] Decorrelating experience for 96 frames...
[2024-12-13 22:23:04,030][21638] Decorrelating experience for 96 frames...
[2024-12-13 22:23:04,537][21634] Stopping RolloutWorker_w0...
[2024-12-13 22:23:04,537][21634] Loop rollout_proc0_evt_loop terminating...
[2024-12-13 22:23:04,541][21197] Component RolloutWorker_w0 stopped!
[2024-12-13 22:23:04,559][21197] Component RolloutWorker_w3 stopped!
[2024-12-13 22:23:04,561][21197] Waiting for process learner_proc0 to stop...
[2024-12-13 22:23:04,568][21197] Waiting for process inference_proc0-0 to join...
[2024-12-13 22:23:04,570][21638] Stopping RolloutWorker_w3...
[2024-12-13 22:23:04,571][21638] Loop rollout_proc3_evt_loop terminating...
[2024-12-13 22:23:04,570][21197] Waiting for process rollout_proc0 to join...
[2024-12-13 22:23:05,568][21197] Waiting for process rollout_proc1 to join...
[2024-12-13 22:23:05,571][21197] Waiting for process rollout_proc2 to join...
[2024-12-13 22:23:05,577][21197] Waiting for process rollout_proc3 to join...
[2024-12-13 22:23:05,716][21197] Waiting for process rollout_proc4 to join...
[2024-12-13 22:23:05,720][21197] Waiting for process rollout_proc5 to join...
[2024-12-13 22:23:05,724][21197] Waiting for process rollout_proc6 to join...
[2024-12-13 22:23:05,726][21197] Waiting for process rollout_proc7 to join...
[2024-12-13 22:23:05,730][21197] Batcher 0 profile tree view:
batching: 0.0256, releasing_batches: 0.0000
[2024-12-13 22:23:05,731][21197] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 7.8920
update_model: 0.0234
weight_update: 0.0011
one_step: 0.0029
handle_policy_step: 2.6361
deserialize: 0.0616, stack: 0.0092, obs_to_device_normalize: 0.3932, forward: 1.7349, send_messages: 0.0556
prepare_outputs: 0.2914
to_cpu: 0.1764
[2024-12-13 22:23:05,734][21197] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 2.2280
train: 1.0206
epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0002, kl_divergence: 0.0004, after_optimizer: 0.0046
calculate_losses: 0.3183
losses_init: 0.0000, forward_head: 0.2278, bptt_initial: 0.0570, tail: 0.0084, advantages_returns: 0.0011, losses: 0.0223
bptt: 0.0014
bptt_forward_core: 0.0013
update: 0.6964
clip: 0.0017
[2024-12-13 22:23:05,736][21197] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0003, enqueue_policy_requests: 0.0006
[2024-12-13 22:23:05,738][21197] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0012, enqueue_policy_requests: 1.2423, env_step: 3.0563, overhead: 0.1182, complete_rollouts: 0.0326
save_policy_outputs: 0.1814
split_output_tensors: 0.0687
[2024-12-13 22:23:05,740][21197] Loop Runner_EvtLoop terminating...
[2024-12-13 22:23:05,741][21197] Runner profile tree view:
main_loop: 31.9727
[2024-12-13 22:23:05,742][21197] Collected {0: 4009984}, FPS: 128.1
[2024-12-13 22:24:23,889][21197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-13 22:24:23,891][21197] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-13 22:24:23,893][21197] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-13 22:24:23,895][21197] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-13 22:24:23,897][21197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:24:23,898][21197] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-13 22:24:23,900][21197] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:24:23,902][21197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-13 22:24:23,903][21197] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2024-12-13 22:24:23,904][21197] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2024-12-13 22:24:23,905][21197] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-13 22:24:23,906][21197] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-13 22:24:23,907][21197] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-13 22:24:23,908][21197] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-13 22:24:23,909][21197] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-13 22:24:23,925][21197] Doom resolution: 160x120, resize resolution: (128, 72)
[2024-12-13 22:24:23,927][21197] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:24:23,930][21197] RunningMeanStd input shape: (1,)
[2024-12-13 22:24:23,945][21197] ConvEncoder: input_channels=3
[2024-12-13 22:24:24,109][21197] Conv encoder output size: 512
[2024-12-13 22:24:24,112][21197] Policy head output size: 512
[2024-12-13 22:24:26,348][21197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2024-12-13 22:24:27,279][21197] Num frames 100...
[2024-12-13 22:24:27,398][21197] Num frames 200...
[2024-12-13 22:24:27,518][21197] Num frames 300...
[2024-12-13 22:24:27,643][21197] Num frames 400...
[2024-12-13 22:24:27,767][21197] Num frames 500...
[2024-12-13 22:24:27,884][21197] Num frames 600...
[2024-12-13 22:24:28,005][21197] Num frames 700...
[2024-12-13 22:24:28,128][21197] Num frames 800...
[2024-12-13 22:24:28,254][21197] Num frames 900...
[2024-12-13 22:24:28,375][21197] Num frames 1000...
[2024-12-13 22:24:28,497][21197] Avg episode rewards: #0: 23.560, true rewards: #0: 10.560
[2024-12-13 22:24:28,499][21197] Avg episode reward: 23.560, avg true_objective: 10.560
[2024-12-13 22:24:28,554][21197] Num frames 1100...
[2024-12-13 22:24:28,675][21197] Num frames 1200...
[2024-12-13 22:24:28,793][21197] Num frames 1300...
[2024-12-13 22:24:28,908][21197] Num frames 1400...
[2024-12-13 22:24:29,032][21197] Num frames 1500...
[2024-12-13 22:24:29,157][21197] Num frames 1600...
[2024-12-13 22:24:29,297][21197] Num frames 1700...
[2024-12-13 22:24:29,417][21197] Num frames 1800...
[2024-12-13 22:24:29,528][21197] Avg episode rewards: #0: 21.730, true rewards: #0: 9.230
[2024-12-13 22:24:29,530][21197] Avg episode reward: 21.730, avg true_objective: 9.230
[2024-12-13 22:24:29,596][21197] Num frames 1900...
[2024-12-13 22:24:29,712][21197] Num frames 2000...
[2024-12-13 22:24:29,834][21197] Num frames 2100...
[2024-12-13 22:24:29,950][21197] Num frames 2200...
[2024-12-13 22:24:30,077][21197] Num frames 2300...
[2024-12-13 22:24:30,199][21197] Num frames 2400...
[2024-12-13 22:24:30,324][21197] Num frames 2500...
[2024-12-13 22:24:30,440][21197] Num frames 2600...
[2024-12-13 22:24:30,570][21197] Num frames 2700...
[2024-12-13 22:24:30,730][21197] Num frames 2800...
[2024-12-13 22:24:30,894][21197] Num frames 2900...
[2024-12-13 22:24:31,058][21197] Num frames 3000...
[2024-12-13 22:24:31,230][21197] Num frames 3100...
[2024-12-13 22:24:31,407][21197] Num frames 3200...
[2024-12-13 22:24:31,500][21197] Avg episode rewards: #0: 25.074, true rewards: #0: 10.740
[2024-12-13 22:24:31,502][21197] Avg episode reward: 25.074, avg true_objective: 10.740
[2024-12-13 22:24:31,634][21197] Num frames 3300...
[2024-12-13 22:24:31,790][21197] Num frames 3400...
[2024-12-13 22:24:31,947][21197] Num frames 3500...
[2024-12-13 22:24:32,118][21197] Num frames 3600...
[2024-12-13 22:24:32,305][21197] Num frames 3700...
[2024-12-13 22:24:32,484][21197] Num frames 3800...
[2024-12-13 22:24:32,658][21197] Num frames 3900...
[2024-12-13 22:24:32,834][21197] Num frames 4000...
[2024-12-13 22:24:33,010][21197] Num frames 4100...
[2024-12-13 22:24:33,185][21197] Num frames 4200...
[2024-12-13 22:24:33,408][21197] Avg episode rewards: #0: 24.745, true rewards: #0: 10.745
[2024-12-13 22:24:33,410][21197] Avg episode reward: 24.745, avg true_objective: 10.745
[2024-12-13 22:24:33,415][21197] Num frames 4300...
[2024-12-13 22:24:33,536][21197] Num frames 4400...
[2024-12-13 22:24:33,658][21197] Num frames 4500...
[2024-12-13 22:24:33,783][21197] Num frames 4600...
[2024-12-13 22:24:33,904][21197] Num frames 4700...
[2024-12-13 22:24:34,032][21197] Num frames 4800...
[2024-12-13 22:24:34,165][21197] Num frames 4900...
[2024-12-13 22:24:34,287][21197] Num frames 5000...
[2024-12-13 22:24:34,412][21197] Num frames 5100...
[2024-12-13 22:24:34,533][21197] Num frames 5200...
[2024-12-13 22:24:34,658][21197] Num frames 5300...
[2024-12-13 22:24:34,741][21197] Avg episode rewards: #0: 24.044, true rewards: #0: 10.644
[2024-12-13 22:24:34,743][21197] Avg episode reward: 24.044, avg true_objective: 10.644
[2024-12-13 22:24:34,842][21197] Num frames 5400...
[2024-12-13 22:24:34,967][21197] Num frames 5500...
[2024-12-13 22:24:35,093][21197] Num frames 5600...
[2024-12-13 22:24:35,214][21197] Num frames 5700...
[2024-12-13 22:24:35,334][21197] Num frames 5800...
[2024-12-13 22:24:35,465][21197] Num frames 5900...
[2024-12-13 22:24:35,584][21197] Num frames 6000...
[2024-12-13 22:24:35,704][21197] Num frames 6100...
[2024-12-13 22:24:35,823][21197] Num frames 6200...
[2024-12-13 22:24:35,938][21197] Num frames 6300...
[2024-12-13 22:24:36,062][21197] Num frames 6400...
[2024-12-13 22:24:36,188][21197] Num frames 6500...
[2024-12-13 22:24:36,306][21197] Num frames 6600...
[2024-12-13 22:24:36,427][21197] Num frames 6700...
[2024-12-13 22:24:36,556][21197] Num frames 6800...
[2024-12-13 22:24:36,731][21197] Avg episode rewards: #0: 26.960, true rewards: #0: 11.460
[2024-12-13 22:24:36,733][21197] Avg episode reward: 26.960, avg true_objective: 11.460
[2024-12-13 22:24:36,776][21197] Num frames 6900...
[2024-12-13 22:24:36,939][21197] Num frames 7000...
[2024-12-13 22:24:37,108][21197] Num frames 7100...
[2024-12-13 22:24:37,266][21197] Num frames 7200...
[2024-12-13 22:24:37,428][21197] Num frames 7300...
[2024-12-13 22:24:37,603][21197] Num frames 7400...
[2024-12-13 22:24:37,762][21197] Num frames 7500...
[2024-12-13 22:24:37,921][21197] Num frames 7600...
[2024-12-13 22:24:38,094][21197] Num frames 7700...
[2024-12-13 22:24:38,265][21197] Num frames 7800...
[2024-12-13 22:24:38,432][21197] Num frames 7900...
[2024-12-13 22:24:38,609][21197] Num frames 8000...
[2024-12-13 22:24:38,787][21197] Num frames 8100...
[2024-12-13 22:24:38,887][21197] Avg episode rewards: #0: 27.606, true rewards: #0: 11.606
[2024-12-13 22:24:38,889][21197] Avg episode reward: 27.606, avg true_objective: 11.606
[2024-12-13 22:24:39,024][21197] Num frames 8200...
[2024-12-13 22:24:39,174][21197] Num frames 8300...
[2024-12-13 22:24:39,293][21197] Num frames 8400...
[2024-12-13 22:24:39,415][21197] Num frames 8500...
[2024-12-13 22:24:39,535][21197] Num frames 8600...
[2024-12-13 22:24:39,661][21197] Num frames 8700...
[2024-12-13 22:24:39,782][21197] Num frames 8800...
[2024-12-13 22:24:39,902][21197] Num frames 8900...
[2024-12-13 22:24:40,024][21197] Num frames 9000...
[2024-12-13 22:24:40,150][21197] Num frames 9100...
[2024-12-13 22:24:40,226][21197] Avg episode rewards: #0: 27.270, true rewards: #0: 11.395
[2024-12-13 22:24:40,228][21197] Avg episode reward: 27.270, avg true_objective: 11.395
[2024-12-13 22:24:40,330][21197] Num frames 9200...
[2024-12-13 22:24:40,449][21197] Num frames 9300...
[2024-12-13 22:24:40,566][21197] Num frames 9400...
[2024-12-13 22:24:40,693][21197] Num frames 9500...
[2024-12-13 22:24:40,810][21197] Num frames 9600...
[2024-12-13 22:24:40,930][21197] Num frames 9700...
[2024-12-13 22:24:41,049][21197] Num frames 9800...
[2024-12-13 22:24:41,177][21197] Num frames 9900...
[2024-12-13 22:24:41,296][21197] Num frames 10000...
[2024-12-13 22:24:41,418][21197] Num frames 10100...
[2024-12-13 22:24:41,536][21197] Num frames 10200...
[2024-12-13 22:24:41,697][21197] Avg episode rewards: #0: 27.318, true rewards: #0: 11.429
[2024-12-13 22:24:41,699][21197] Avg episode reward: 27.318, avg true_objective: 11.429
[2024-12-13 22:24:41,718][21197] Num frames 10300...
[2024-12-13 22:24:41,832][21197] Num frames 10400...
[2024-12-13 22:24:41,951][21197] Num frames 10500...
[2024-12-13 22:24:42,076][21197] Num frames 10600...
[2024-12-13 22:24:42,200][21197] Num frames 10700...
[2024-12-13 22:24:42,318][21197] Num frames 10800...
[2024-12-13 22:24:42,440][21197] Num frames 10900...
[2024-12-13 22:24:42,562][21197] Num frames 11000...
[2024-12-13 22:24:42,692][21197] Num frames 11100...
[2024-12-13 22:24:42,816][21197] Num frames 11200...
[2024-12-13 22:24:42,938][21197] Num frames 11300...
[2024-12-13 22:24:43,052][21197] Avg episode rewards: #0: 27.142, true rewards: #0: 11.342
[2024-12-13 22:24:43,055][21197] Avg episode reward: 27.142, avg true_objective: 11.342
[2024-12-13 22:25:48,649][21197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2024-12-13 22:27:48,829][21197] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2024-12-13 22:27:48,831][21197] Overriding arg 'num_workers' with value 1 passed from command line
[2024-12-13 22:27:48,833][21197] Adding new argument 'no_render'=True that is not in the saved config file!
[2024-12-13 22:27:48,834][21197] Adding new argument 'save_video'=True that is not in the saved config file!
[2024-12-13 22:27:48,836][21197] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2024-12-13 22:27:48,838][21197] Adding new argument 'video_name'=None that is not in the saved config file!
[2024-12-13 22:27:48,839][21197] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2024-12-13 22:27:48,841][21197] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2024-12-13 22:27:48,842][21197] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2024-12-13 22:27:48,843][21197] Adding new argument 'hf_repository'='acblue/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2024-12-13 22:27:48,848][21197] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2024-12-13 22:27:48,849][21197] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2024-12-13 22:27:48,849][21197] Adding new argument 'train_script'=None that is not in the saved config file!
[2024-12-13 22:27:48,850][21197] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2024-12-13 22:27:48,852][21197] Using frameskip 1 and render_action_repeat=4 for evaluation
[2024-12-13 22:27:48,860][21197] RunningMeanStd input shape: (3, 72, 128)
[2024-12-13 22:27:48,870][21197] RunningMeanStd input shape: (1,)
[2024-12-13 22:27:48,882][21197] ConvEncoder: input_channels=3
[2024-12-13 22:27:48,918][21197] Conv encoder output size: 512
[2024-12-13 22:27:48,921][21197] Policy head output size: 512
[2024-12-13 22:27:48,939][21197] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000979_4009984.pth...
[2024-12-13 22:27:49,454][21197] Num frames 100...
[2024-12-13 22:27:49,573][21197] Num frames 200...
[2024-12-13 22:27:49,691][21197] Num frames 300...
[2024-12-13 22:27:49,814][21197] Num frames 400...
[2024-12-13 22:27:49,929][21197] Num frames 500...
[2024-12-13 22:27:50,044][21197] Num frames 600...
[2024-12-13 22:27:50,164][21197] Num frames 700...
[2024-12-13 22:27:50,283][21197] Num frames 800...
[2024-12-13 22:27:50,377][21197] Avg episode rewards: #0: 17.320, true rewards: #0: 8.320
[2024-12-13 22:27:50,379][21197] Avg episode reward: 17.320, avg true_objective: 8.320
[2024-12-13 22:27:50,504][21197] Num frames 900...
[2024-12-13 22:27:50,669][21197] Num frames 1000...
[2024-12-13 22:27:50,830][21197] Num frames 1100...
[2024-12-13 22:27:50,991][21197] Num frames 1200...
[2024-12-13 22:27:51,164][21197] Num frames 1300...
[2024-12-13 22:27:51,330][21197] Num frames 1400...
[2024-12-13 22:27:51,509][21197] Num frames 1500...
[2024-12-13 22:27:51,667][21197] Num frames 1600...
[2024-12-13 22:27:51,834][21197] Num frames 1700...
[2024-12-13 22:27:52,003][21197] Num frames 1800...
[2024-12-13 22:27:52,158][21197] Avg episode rewards: #0: 19.280, true rewards: #0: 9.280
[2024-12-13 22:27:52,160][21197] Avg episode reward: 19.280, avg true_objective: 9.280
[2024-12-13 22:27:52,245][21197] Num frames 1900...
[2024-12-13 22:27:52,411][21197] Num frames 2000...
[2024-12-13 22:27:52,580][21197] Num frames 2100...
[2024-12-13 22:27:52,760][21197] Num frames 2200...
[2024-12-13 22:27:52,933][21197] Num frames 2300...
[2024-12-13 22:27:53,072][21197] Num frames 2400...
[2024-12-13 22:27:53,189][21197] Num frames 2500...
[2024-12-13 22:27:53,307][21197] Num frames 2600...
[2024-12-13 22:27:53,434][21197] Num frames 2700...
[2024-12-13 22:27:53,563][21197] Num frames 2800...
[2024-12-13 22:27:53,684][21197] Num frames 2900...
[2024-12-13 22:27:53,804][21197] Num frames 3000...
[2024-12-13 22:27:53,934][21197] Avg episode rewards: #0: 22.884, true rewards: #0: 10.217
[2024-12-13 22:27:53,936][21197] Avg episode reward: 22.884, avg true_objective: 10.217
[2024-12-13 22:27:53,981][21197] Num frames 3100...
[2024-12-13 22:27:54,107][21197] Num frames 3200...
[2024-12-13 22:27:54,223][21197] Num frames 3300...
[2024-12-13 22:27:54,346][21197] Num frames 3400...
[2024-12-13 22:27:54,469][21197] Num frames 3500...
[2024-12-13 22:27:54,594][21197] Num frames 3600...
[2024-12-13 22:27:54,710][21197] Num frames 3700...
[2024-12-13 22:27:54,882][21197] Avg episode rewards: #0: 20.495, true rewards: #0: 9.495
[2024-12-13 22:27:54,884][21197] Avg episode reward: 20.495, avg true_objective: 9.495
[2024-12-13 22:27:54,889][21197] Num frames 3800...
[2024-12-13 22:27:55,005][21197] Num frames 3900...
[2024-12-13 22:27:55,128][21197] Num frames 4000...
[2024-12-13 22:27:55,241][21197] Num frames 4100...
[2024-12-13 22:27:55,360][21197] Num frames 4200...
[2024-12-13 22:27:55,477][21197] Num frames 4300...
[2024-12-13 22:27:55,581][21197] Avg episode rewards: #0: 17.884, true rewards: #0: 8.684
[2024-12-13 22:27:55,584][21197] Avg episode reward: 17.884, avg true_objective: 8.684
[2024-12-13 22:27:55,653][21197] Num frames 4400...
[2024-12-13 22:27:55,768][21197] Num frames 4500...
[2024-12-13 22:27:55,885][21197] Num frames 4600...
[2024-12-13 22:27:56,002][21197] Num frames 4700...
[2024-12-13 22:27:56,128][21197] Num frames 4800...
[2024-12-13 22:27:56,243][21197] Num frames 4900...
[2024-12-13 22:27:56,365][21197] Num frames 5000...
[2024-12-13 22:27:56,483][21197] Num frames 5100...
[2024-12-13 22:27:56,613][21197] Num frames 5200...
[2024-12-13 22:27:56,729][21197] Num frames 5300...
[2024-12-13 22:27:56,850][21197] Num frames 5400...
[2024-12-13 22:27:56,968][21197] Num frames 5500...
[2024-12-13 22:27:57,096][21197] Num frames 5600...
[2024-12-13 22:27:57,213][21197] Num frames 5700...
[2024-12-13 22:27:57,332][21197] Num frames 5800...
[2024-12-13 22:27:57,465][21197] Num frames 5900...
[2024-12-13 22:27:57,582][21197] Num frames 6000...
[2024-12-13 22:27:57,711][21197] Num frames 6100...
[2024-12-13 22:27:57,834][21197] Num frames 6200...
[2024-12-13 22:27:57,959][21197] Num frames 6300...
[2024-12-13 22:27:58,084][21197] Num frames 6400...
[2024-12-13 22:27:58,168][21197] Avg episode rewards: #0: 24.037, true rewards: #0: 10.703
[2024-12-13 22:27:58,171][21197] Avg episode reward: 24.037, avg true_objective: 10.703
[2024-12-13 22:27:58,266][21197] Num frames 6500...
[2024-12-13 22:27:58,385][21197] Num frames 6600...
[2024-12-13 22:27:58,503][21197] Num frames 6700...
[2024-12-13 22:27:58,650][21197] Avg episode rewards: #0: 21.396, true rewards: #0: 9.681
[2024-12-13 22:27:58,652][21197] Avg episode reward: 21.396, avg true_objective: 9.681
[2024-12-13 22:27:58,682][21197] Num frames 6800...
[2024-12-13 22:27:58,801][21197] Num frames 6900...
[2024-12-13 22:27:58,916][21197] Num frames 7000...
[2024-12-13 22:27:59,036][21197] Num frames 7100...
[2024-12-13 22:27:59,197][21197] Avg episode rewards: #0: 19.856, true rewards: #0: 8.981
[2024-12-13 22:27:59,199][21197] Avg episode reward: 19.856, avg true_objective: 8.981
[2024-12-13 22:27:59,222][21197] Num frames 7200...
[2024-12-13 22:27:59,338][21197] Num frames 7300...
[2024-12-13 22:27:59,469][21197] Num frames 7400...
[2024-12-13 22:27:59,588][21197] Num frames 7500...
[2024-12-13 22:27:59,718][21197] Num frames 7600...
[2024-12-13 22:27:59,835][21197] Num frames 7700...
[2024-12-13 22:27:59,956][21197] Num frames 7800...
[2024-12-13 22:28:00,078][21197] Num frames 7900...
[2024-12-13 22:28:00,200][21197] Num frames 8000...
[2024-12-13 22:28:00,319][21197] Num frames 8100...
[2024-12-13 22:28:00,440][21197] Num frames 8200...
[2024-12-13 22:28:00,557][21197] Num frames 8300...
[2024-12-13 22:28:00,674][21197] Num frames 8400...
[2024-12-13 22:28:00,737][21197] Avg episode rewards: #0: 20.557, true rewards: #0: 9.334
[2024-12-13 22:28:00,739][21197] Avg episode reward: 20.557, avg true_objective: 9.334
[2024-12-13 22:28:00,858][21197] Num frames 8500...
[2024-12-13 22:28:00,978][21197] Num frames 8600...
[2024-12-13 22:28:01,102][21197] Num frames 8700...
[2024-12-13 22:28:01,223][21197] Num frames 8800...
[2024-12-13 22:28:01,343][21197] Num frames 8900...
[2024-12-13 22:28:01,461][21197] Num frames 9000...
[2024-12-13 22:28:01,579][21197] Num frames 9100...
[2024-12-13 22:28:01,697][21197] Num frames 9200...
[2024-12-13 22:28:01,830][21197] Num frames 9300...
[2024-12-13 22:28:01,948][21197] Num frames 9400...
[2024-12-13 22:28:02,071][21197] Num frames 9500...
[2024-12-13 22:28:02,225][21197] Num frames 9600...
[2024-12-13 22:28:02,346][21197] Num frames 9700...
[2024-12-13 22:28:02,470][21197] Num frames 9800...
[2024-12-13 22:28:02,591][21197] Num frames 9900...
[2024-12-13 22:28:02,714][21197] Num frames 10000...
[2024-12-13 22:28:02,844][21197] Num frames 10100...
[2024-12-13 22:28:02,904][21197] Avg episode rewards: #0: 23.103, true rewards: #0: 10.103
[2024-12-13 22:28:02,906][21197] Avg episode reward: 23.103, avg true_objective: 10.103
[2024-12-13 22:28:59,748][21197] Replay video saved to /content/train_dir/default_experiment/replay.mp4!